Search test library by skills or roles

⌘ K

NumPy interview questions for freshers

1. What is NumPy and why do we use it?

2. Explain the difference between a list and a NumPy array. Give examples

3. How do you create a NumPy array? Show different ways.

4. What is the shape of a NumPy array, and how do you find it?

5. How can you access specific elements in a NumPy array? Explain with examples.

6. Can you add two NumPy arrays together? What are the conditions to do so?

7. How do you change the shape of a NumPy array?

8. What are some common mathematical operations you can perform on NumPy arrays (e.g., sum, mean, max)?

9. How do you create an array of zeros or ones using NumPy?

10. Explain what broadcasting is in NumPy. Why is it useful?

11. How do you find the data type of elements in a NumPy array?

12. How can you create a sequence of numbers as a NumPy array?

13. What is indexing and slicing in NumPy arrays?

14. How do you perform element-wise multiplication on two NumPy arrays?

15. Explain the difference between copy and view in NumPy. Give examples

16. How to reshape a NumPy array to have a different number of dimensions?

17. How can you sort a NumPy array? Show with and without modifying the original array.

18. How to filter elements from a NumPy array based on a condition?

19. Describe how to save a NumPy array to a file and load it back.

20. How can you concatenate two NumPy arrays? Explain the use of different axis.

NumPy interview questions for juniors

1. What is NumPy? Think of it like LEGOs for numbers, but what makes it special?

2. Can you describe a NumPy array? Imagine you're showing it to a friend who's never seen one.

3. How is a NumPy array different from a regular Python list? What are their strengths?

4. What does the 'shape' of a NumPy array tell you? How do you find out the shape?

5. What does the 'dtype' of a NumPy array mean? Why is it important?

6. How do you create a NumPy array filled with zeros? When might you use this?

7. How do you create a NumPy array filled with ones? When is that handy?

8. How can you create a NumPy array with a specific range of numbers? Show an example.

9. Explain how to reshape a NumPy array. Why might you want to do that?

10. How do you access a specific element in a NumPy array? (Like picking out a specific LEGO brick).

11. What are some ways to grab multiple elements from a NumPy array at once? (Slicing)

12. How can you change the value of an element in a NumPy array?

13. What does 'broadcasting' mean in NumPy? Give a simple example.

14. How do you perform element-wise addition on two NumPy arrays?

15. How do you perform element-wise multiplication on two NumPy arrays?

16. What's the difference between the '*' operator and the '@' operator when used with NumPy arrays? (Focus on basic usage).

17. How do you calculate the sum of all elements in a NumPy array?

18. How do you find the maximum and minimum values in a NumPy array?

19. How do you calculate the mean (average) of the elements in a NumPy array?

20. How can you filter a NumPy array to select only elements that meet a certain condition? (e.g., greater than 5)

21. What is a boolean mask in NumPy, and how is it used?

22. How can you combine two NumPy arrays horizontally?

23. How can you combine two NumPy arrays vertically?

NumPy intermediate interview questions

1. How can you efficiently compute the moving average of a NumPy array?

2. Explain how to use NumPy's broadcasting feature and provide a practical example.

3. Describe how you would implement a convolution operation using NumPy.

4. How can you sort a NumPy array by multiple columns?

5. How do you handle memory efficiently when working with extremely large NumPy arrays that don't fit in RAM?

6. Explain how to create a structured array in NumPy, and provide a use case.

7. How can you use NumPy to perform a Fast Fourier Transform (FFT) on a signal?

8. Explain the difference between `np.vectorize` and NumPy's broadcasting capabilities. When would you use one over the other?

9. How do you calculate the eigenvalues and eigenvectors of a matrix using NumPy?

10. Describe how to save and load NumPy arrays to and from disk efficiently.

11. How can you use NumPy to simulate a random walk?

12. Explain how to use NumPy to solve a system of linear equations.

13. How can you create a custom NumPy `dtype`?

14. Describe how you would implement a simple k-means clustering algorithm using NumPy.

15. How can you use NumPy to perform image manipulation tasks, such as resizing or color channel extraction?

16. Explain how to parallelize NumPy operations using libraries like `multiprocessing`.

17. How would you detect and replace NaN values in a NumPy array?

18. Describe a scenario where using NumPy's `memmap` would be beneficial.

19. Explain how to calculate the inverse of a matrix using NumPy, including checking if the matrix is invertible.

20. How can you use NumPy to generate different types of random numbers (e.g., normal, uniform, exponential)?

21. Describe how you would perform element-wise string operations on a NumPy array of strings.

22. How do you handle dealing with sparse matrices using NumPy (or related libraries)? Provide some possible advantages.

NumPy interview questions for experienced

1. How would you optimize NumPy code for both memory usage and computational speed, especially when dealing with large datasets?

2. Can you describe advanced techniques for effectively handling and processing datasets with missing values (NaNs) in NumPy?

3. Explain the concept of 'broadcasting' in NumPy and provide an example where it significantly simplifies array operations.

4. How does NumPy's memory model contribute to its performance, and what are some strategies for minimizing memory copies?

5. Describe how you can leverage NumPy's universal functions (ufuncs) for custom operations and performance gains.

6. Explain the trade-offs between using NumPy arrays and other data structures (e.g., lists, dictionaries) for numerical computations.

7. How do you handle scenarios where NumPy's default data types are insufficient for the precision required in a computation?

8. Describe the process of integrating NumPy with other scientific computing libraries (e.g., SciPy, scikit-learn) in a complex project.

9. Can you explain how to use NumPy for advanced array manipulation, such as reshaping, transposing, and splitting arrays?

10. How do you use structured arrays in NumPy to represent complex data structures, and what are their advantages?

11. Discuss methods for efficient data input and output with NumPy, particularly when dealing with very large files.

12. How can you use NumPy to implement custom algorithms or mathematical functions?

13. How would you approach debugging performance issues in NumPy code, and what tools would you use?

14. Explain how you can extend NumPy with custom C or Fortran code for performance-critical applications.

15. Discuss the implications of using different NumPy array storage orders (e.g., row-major vs. column-major) on performance.

16. Explain how to leverage NumPy's linear algebra capabilities for solving systems of equations and performing matrix decompositions.

17. How do you approach working with sparse matrices in NumPy, and what are the benefits of using them?

18. Discuss the differences between NumPy arrays and memory-mapped arrays, and when you might choose one over the other.

19. Explain the relationship between NumPy and vectorized operations, and how this impacts code performance.

20. How can NumPy be used in the context of image processing?

INTERVIEW QUESTIONS

85 NumPy interview questions to hire top developers

Siddhartha Gunti

September 09, 2024

When evaluating candidates for data science or engineering roles, assessing their NumPy skills is key. As with hiring any Python developer, it's important to ensure they have the right skills.

This blog post provides a list of NumPy interview questions tailored for various experience levels, ranging from freshers to experienced professionals, including multiple-choice questions (MCQs). It is structured to guide recruiters and hiring managers in their evaluation process.

By using these questions, you can ensure you're hiring candidates with the NumPy expertise your team needs, and consider using an Adaface NumPy test to make the process more effective.

Table of contents

NumPy interview questions for freshers

NumPy interview questions for juniors

NumPy intermediate interview questions

NumPy interview questions for experienced

NumPy MCQ

Which NumPy skills should you evaluate during the interview phase?

Hire Skilled NumPy Developers with Adaface

Download NumPy interview questions template in multiple formats

NumPy interview questions for freshers

1. What is NumPy and why do we use it?

NumPy is a fundamental Python library for numerical computing. It provides support for large, multi-dimensional arrays and matrices, along with a collection of mathematical functions to operate on these arrays efficiently.

We use NumPy for several reasons: its efficient array operations are much faster than Python lists, especially for large datasets; it provides a wide range of mathematical functions optimized for array operations; and it forms the basis for many other scientific computing libraries in Python, such as SciPy, pandas, and scikit-learn. For example:

import numpy as np

arr = np.array([1, 2, 3])
print(arr * 2)  # Output: [2 4 6]

2. Explain the difference between a list and a NumPy array. Give examples

A Python list is a built-in data structure that can hold elements of different data types. It's dynamic in size and offers methods for adding, removing, and manipulating elements. However, lists are not designed for numerical operations and can be inefficient for large datasets.

NumPy arrays, on the other hand, are specifically designed for numerical computations. They are homogeneous, meaning they contain elements of the same data type. NumPy arrays are stored contiguously in memory, enabling efficient vectorized operations. For example:

import numpy as np

my_list = [1, 2, 'a', 4.5] # list with mixed data types
my_array = np.array([1, 2, 3, 4]) # NumPy array with integers

my_array + 2 would perform element-wise addition, while attempting the same on my_list would result in an error or require a more complex approach.

3. How do you create a NumPy array? Show different ways.

NumPy arrays can be created in several ways. numpy.array() is the most general, taking a list or tuple as input. For example:

import numpy as np
arr = np.array([1, 2, 3])
arr2d = np.array([[1, 2], [3, 4]])

Other useful functions include:

numpy.zeros(): Creates an array filled with zeros.
numpy.ones(): Creates an array filled with ones.
numpy.empty(): Creates an array without initializing entries.
numpy.arange(): Creates an array with evenly spaced values within a given interval.
numpy.linspace(): Creates an array with evenly spaced numbers over a specified interval.
numpy.random.rand(): Creates an array of given shape with random samples from a uniform distribution over [0, 1).

4. What is the shape of a NumPy array, and how do you find it?

The shape of a NumPy array is a tuple of integers indicating the size of each dimension. For example, a 2D array (matrix) with 3 rows and 4 columns would have a shape of (3, 4). A 1D array (vector) with 5 elements would have a shape of (5,).

You can find the shape of a NumPy array using the .shape attribute. For instance:

import numpy as np

arr = np.array([[1, 2, 3], [4, 5, 6]])
print(arr.shape) # Output: (2, 3)

5. How can you access specific elements in a NumPy array? Explain with examples.

You can access specific elements in a NumPy array using indexing and slicing. Indexing allows you to retrieve a single element, while slicing lets you extract a portion of the array.

Examples:

Indexing: arr[row, column] (for 2D arrays), arr[index] (for 1D arrays)

import numpy as np
arr = np.array([[1, 2, 3], [4, 5, 6]])
print(arr[0, 1])  # Output: 2

Slicing: arr[start:end:step]

import numpy as np
arr = np.array([1, 2, 3, 4, 5])
print(arr[1:4])  # Output: [2 3 4]
print(arr[:]) # Output: [1 2 3 4 5]
print(arr[::2]) # Output: [1 3 5]

Boolean Indexing:

import numpy as np
arr = np.array([1, 2, 3, 4, 5])
bool_arr = arr > 2
print(arr[bool_arr]) # Output: [3 4 5]

6. Can you add two NumPy arrays together? What are the conditions to do so?

Yes, you can add two NumPy arrays together using the + operator or the np.add() function. For element-wise addition to work correctly, the arrays must satisfy certain conditions:

Shape Compatibility: The arrays must have compatible shapes. This means either they have the same shape, or one of the arrays has a shape that can be broadcast to match the other. Broadcasting allows NumPy to perform operations on arrays with different shapes if they meet certain criteria. The trailing dimensions of the arrays must match, or one of them must be 1.
Data Type Compatibility: While not strictly required, it's best practice to ensure the arrays have compatible data types. NumPy will automatically handle type conversions if necessary, but this can sometimes lead to unexpected results or loss of precision. It is better if both arrays are numeric, such as int, float, or complex.

7. How do you change the shape of a NumPy array?

You can change the shape of a NumPy array using the reshape() method or the ndarray.shape attribute. The reshape() method returns a new array with the specified dimensions, while the ndarray.shape attribute modifies the shape of the array in-place if possible. It's important that the new shape is compatible with the original array's size (i.e., the number of elements remains the same).

For example:

import numpy as np

arr = np.array([1, 2, 3, 4, 5, 6])

# Using reshape()
new_arr = arr.reshape((2, 3))
print(new_arr)

# Using ndarray.shape
arr.shape = (3, 2)
print(arr)

8. What are some common mathematical operations you can perform on NumPy arrays (e.g., sum, mean, max)?

NumPy provides a wide range of mathematical operations that can be efficiently performed on arrays. Some of the most common include:

Summation: np.sum(arr) calculates the sum of all elements in the array arr. np.sum(arr, axis=0) or np.sum(arr, axis=1) computes the sum along a specified axis.
Mean: np.mean(arr) calculates the arithmetic mean (average) of the elements in the array. Similar to np.sum, you can use axis argument to calculate mean along a specific axis.
Maximum/Minimum: np.max(arr) and np.min(arr) return the maximum and minimum values in the array, respectively. Also supports the axis argument.
Standard Deviation/Variance: np.std(arr) and np.var(arr) compute the standard deviation and variance of the array elements. Again, axis can be specified.
Element-wise operations: Many element-wise operations are supported, such as addition (+), subtraction (-), multiplication (*), division (/), exponentiation (**), and trigonometric functions (np.sin, np.cos, np.tan). These operations are applied to each element of the array.
Dot product: np.dot(a, b) calculates the dot product of two arrays a and b.

9. How do you create an array of zeros or ones using NumPy?

NumPy provides convenient functions to create arrays filled with zeros or ones. numpy.zeros() creates an array filled with zeros, while numpy.ones() creates an array filled with ones. Both functions take the shape of the desired array as an argument.

For example:

import numpy as np

zeros_array = np.zeros((3, 4)) # creates a 3x4 array filled with zeros
ones_array = np.ones(5) # creates a 1D array of length 5 filled with ones

10. Explain what broadcasting is in NumPy. Why is it useful?

Broadcasting in NumPy refers to the ability of NumPy to perform operations on arrays with different shapes. NumPy automatically expands the smaller array to match the shape of the larger array without creating copies of the data. This expansion occurs along dimensions of size 1 or where one of the arrays has no corresponding dimension.

Broadcasting is useful because it allows us to perform element-wise operations on arrays that don't have the same shape, simplifying code and making it more efficient. Without broadcasting, we would have to manually reshape or replicate arrays, which would be less convenient and consume more memory. For example:

import numpy as np

a = np.array([1, 2, 3]) # (3,)
b = 5 # ()

result = a + b # Broadcasting b to [5, 5, 5]
print(result) # Output: [6 7 8]

In this case, the scalar b is broadcasted to an array of shape (3,) to match a.

11. How do you find the data type of elements in a NumPy array?

You can find the data type of elements in a NumPy array using the dtype attribute. This attribute returns a numpy.dtype object representing the data type of the array's elements.

For example:

import numpy as np

arr = np.array([1, 2, 3])
print(arr.dtype) # Output: int64 (or int32 depending on the system)

arr = np.array([1.0, 2.0, 3.0])
print(arr.dtype) # Output: float64

12. How can you create a sequence of numbers as a NumPy array?

You can create a sequence of numbers as a NumPy array using several functions:

np.arange(): This is similar to Python's built-in range() function, but it returns a NumPy array. You specify the start, stop, and step values. For example:
```
import numpy as np
arr = np.arange(0, 10, 2) # start=0, stop=10, step=2
print(arr) # Output: [0 2 4 6 8]
```
np.linspace(): This function creates an array with a specified number of evenly spaced values over a given interval. You specify the start, stop, and the number of values you want. For example:
```
import numpy as np
arr = np.linspace(0, 1, 5) # start=0, stop=1, 5 values
print(arr) # Output: [0.   0.25 0.5  0.75 1.  ]
```

13. What is indexing and slicing in NumPy arrays?

Indexing and slicing are fundamental ways to access and manipulate portions of NumPy arrays.

Indexing allows you to select individual elements using their position (index). NumPy arrays are zero-indexed, meaning the first element is at index 0. Negative indices can be used to access elements from the end of the array (e.g., -1 is the last element).

Slicing extracts a sequence of elements. You specify a slice using the colon (:) operator within square brackets. The syntax is [start:stop:step]. start is the index of the first element to include (inclusive), stop is the index of the element before which to stop (exclusive), and step is the increment between elements. If start or stop are omitted, they default to the beginning or end of the array, respectively. If step is omitted, it defaults to 1. Example: arr[1:5:2] selects elements at indices 1 and 3.

import numpy as np

arr = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

print(arr[0])  # Indexing: Access the first element (0)
print(arr[-1]) # Indexing: Access the last element (9)
print(arr[2:5]) # Slicing: Elements from index 2 up to (but not including) 5: [2 3 4]
print(arr[:3])  # Slicing: Elements from the beginning up to (but not including) 3: [0 1 2]
print(arr[5:])  # Slicing: Elements from index 5 to the end: [5 6 7 8 9]
print(arr[::2]) # Slicing: Every other element from the beginning to the end: [0 2 4 6 8]

14. How do you perform element-wise multiplication on two NumPy arrays?

Element-wise multiplication of two NumPy arrays can be achieved using the * operator or the numpy.multiply() function. Both methods perform the same operation, where corresponding elements from the two arrays are multiplied together.

For example:

import numpy as np

arr1 = np.array([1, 2, 3])
arr2 = np.array([4, 5, 6])

# Using the * operator
result1 = arr1 * arr2  # Output: [ 4 10 18]

# Using numpy.multiply()
result2 = np.multiply(arr1, arr2)  # Output: [ 4 10 18]

15. Explain the difference between copy and view in NumPy. Give examples

In NumPy, both copy and view are ways to create new array objects from existing ones, but they differ in how they handle the underlying data.

A copy creates a completely new array object and allocates new memory to store its data. Modifying the copy will not affect the original array, and vice versa.

A view, on the other hand, creates a new array object that references the same memory location as the original array (or a portion of it). Changes made to the view will be reflected in the original array, and vice versa. This is a shallow copy.

import numpy as np

arr = np.array([1, 2, 3])

# Copy
arr_copy = arr.copy()
arr_copy[0] = 10
print(arr)       # Output: [1 2 3]
print(arr_copy)  # Output: [10 2 3]

# View
arr_view = arr.view()
arr_view[0] = 20
print(arr)       # Output: [20 2 3]
print(arr_view)  # Output: [20 2 3]

16. How to reshape a NumPy array to have a different number of dimensions?

To reshape a NumPy array to have a different number of dimensions, you can use the reshape() method. The key requirement is that the total number of elements in the original array must be the same as the total number of elements in the reshaped array. For instance, an array with shape (2, 3) has 6 elements, so it can be reshaped to (3, 2), (6,), or (1, 6), but not to (2, 2).

For example:

import numpy as np

arr = np.array([1, 2, 3, 4, 5, 6])

# Reshape to a 2x3 array
new_arr = arr.reshape(2, 3)
print(new_arr)

#Reshape to a 3x2 array
new_arr2 = arr.reshape(3,2)
print(new_arr2)

The -1 can be used to infer one dimension if the others are known, for example arr.reshape((2, -1)) when arr has 6 elements is equivalent to arr.reshape((2,3)).

17. How can you sort a NumPy array? Show with and without modifying the original array.

You can sort a NumPy array using np.sort() or ndarray.sort(). np.sort() returns a sorted copy of the array, leaving the original array unchanged. ndarray.sort(), on the other hand, sorts the array in-place, modifying the original array.

Here are examples:

Without modifying the original array:

import numpy as np

arr = np.array([3, 1, 4, 1, 5, 9, 2, 6])
sorted_arr = np.sort(arr)
print("Original array:", arr)      
print("Sorted array:", sorted_arr)

Modifying the original array:

import numpy as np

arr = np.array([3, 1, 4, 1, 5, 9, 2, 6])
arr.sort()
print("Sorted array:", arr)

18. How to filter elements from a NumPy array based on a condition?

To filter elements from a NumPy array based on a condition, you can use boolean indexing. Create a boolean array of the same shape as the original array where each element is True if the corresponding element in the original array satisfies the condition, and False otherwise. Then, use this boolean array to index the original array. This will return a new array containing only the elements where the boolean array is True.

For example:

import numpy as np

arr = np.array([1, 2, 3, 4, 5])

# Filter elements greater than 2
filtered_arr = arr[arr > 2]

print(filtered_arr) # Output: [3 4 5]

19. Describe how to save a NumPy array to a file and load it back.

NumPy provides convenient functions for saving arrays to files and loading them back.

To save a NumPy array, you can use the numpy.save() or numpy.savetxt() functions.

numpy.save('filename.npy', array): Saves the array to a binary file with the .npy extension. This format preserves the data type and shape.
numpy.savetxt('filename.txt', array): Saves the array to a text file. This is useful for human-readable data but may lose some precision.

To load a NumPy array, use numpy.load() or numpy.loadtxt():

loaded_array = numpy.load('filename.npy'): Loads the array from a .npy file.
loaded_array = numpy.loadtxt('filename.txt'): Loads the array from a text file. You may need to specify the delimiter if it's not whitespace. E.g., numpy.loadtxt('filename.txt', delimiter=',')

import numpy as np

arr = np.array([[1, 2], [3, 4]])

# Save to a .npy file
np.save('my_array.npy', arr)

# Load from the .npy file
loaded_arr = np.load('my_array.npy')

# Save to a text file
np.savetxt('my_array.txt', arr)

# Load from the text file
loaded_arr_txt = np.loadtxt('my_array.txt')

print(loaded_arr)
print(loaded_arr_txt)

20. How can you concatenate two NumPy arrays? Explain the use of different axis.

You can concatenate two NumPy arrays using the numpy.concatenate() function. This function joins a sequence of arrays along an existing axis. The axis parameter specifies the axis along which the arrays will be joined. If axis is 0 (the default), the arrays are concatenated vertically (row-wise). If axis is 1, they are concatenated horizontally (column-wise).

For example:

import numpy as np

a = np.array([[1, 2], [3, 4]])
b = np.array([[5, 6]])

# Concatenate along axis 0 (vertically)
result_axis_0 = np.concatenate((a, b), axis=0)

#Concatenate along axis 1 (horizontally)
c = np.array([[5], [6]])
result_axis_1 = np.concatenate((a, c), axis=1)

print(result_axis_0)
print(result_axis_1)

numpy.stack() can be an alternative, which joins a sequence of arrays along a new axis.

NumPy interview questions for juniors

1. What is NumPy? Think of it like LEGOs for numbers, but what makes it special?

NumPy is a fundamental Python library for numerical computing. Think of it as the foundation upon which many scientific and data science tools are built. It's like LEGOs for numbers because it allows you to create and manipulate large arrays of numerical data efficiently.

What makes NumPy special is its ability to perform fast operations on arrays using vectorized operations. This avoids explicit looping and makes code significantly faster than using standard Python lists for numerical computations. NumPy also provides a wide range of mathematical functions optimized for working with arrays, such as linear algebra, Fourier transforms, and random number generation. For example, you can add two NumPy arrays like this:

import numpy as np

arr1 = np.array([1, 2, 3])
arr2 = np.array([4, 5, 6])

result = arr1 + arr2 # Element-wise addition
print(result) # Output: [5 7 9]

2. Can you describe a NumPy array? Imagine you're showing it to a friend who's never seen one.

Imagine a NumPy array like a grid, or a list of lists, but with some key differences. All the elements in a NumPy array must be of the same data type (like numbers or strings). This allows for efficient storage and operations. Think of it as a super-powered list optimized for numerical computations.

More technically, a NumPy array is the central data structure of the NumPy library. It's an n-dimensional array (ndarray), where 'n' represents the number of dimensions (e.g., 1D, 2D, 3D). It provides fast and efficient operations because it stores data in a contiguous block of memory, enabling vectorized calculations. For example:

import numpy as np

arr = np.array([1, 2, 3]) # A 1D NumPy array
matrix = np.array([[1, 2], [3, 4]]) # A 2D NumPy array

3. How is a NumPy array different from a regular Python list? What are their strengths?

NumPy arrays and Python lists differ significantly in functionality and performance. A NumPy array is designed for numerical operations, storing elements of the same data type contiguously in memory, enabling efficient vectorized operations. A Python list, on the other hand, can store elements of different data types and uses pointers to objects scattered in memory, making it more flexible but slower for numerical computations.

Strengths:

NumPy arrays: Excellent for numerical computations, efficient memory usage, and fast vectorized operations. Useful for scientific computing, data analysis, and machine learning.
Python lists: More flexible for storing heterogeneous data, easy to use for general-purpose tasks, and dynamic resizing. Good for tasks where data types vary and performance is not critical.

4. What does the 'shape' of a NumPy array tell you? How do you find out the shape?

The 'shape' of a NumPy array tells you the dimensions of the array. It's a tuple of integers, where each integer represents the number of elements along a particular axis (dimension). For example, an array with shape (3, 4) has 3 rows and 4 columns. This information is crucial for understanding the array's structure and performing operations like reshaping, transposing, or broadcasting.

You can find out the shape of a NumPy array using the .shape attribute. For example:

import numpy as np

arr = np.array([[1, 2, 3], [4, 5, 6]])
print(arr.shape) # Output: (2, 3)

5. What does the 'dtype' of a NumPy array mean? Why is it important?

The dtype of a NumPy array specifies the data type of the elements stored within the array. It determines the kind of values the array can hold (e.g., integers, floating-point numbers, booleans, strings) and the amount of memory each element occupies. For example, int32 means each element is a 32-bit integer.

dtype is important because it impacts memory usage, computational speed, and the types of operations that can be performed on the array. Choosing the appropriate dtype can optimize performance and prevent unexpected behavior or errors. For example, using float32 instead of float64 can reduce memory consumption when high precision isn't needed, while using an integer dtype prevents storing fractional values. Incorrect dtype specification can lead to loss of precision or errors during calculations.

6. How do you create a NumPy array filled with zeros? When might you use this?

To create a NumPy array filled with zeros, you can use the numpy.zeros() function. This function takes the desired shape of the array as input (either as a single integer for a 1D array or a tuple for multi-dimensional arrays) and returns a new array filled with zeros of a specified data type (default is float). For example:

import numpy as np

# Create a 1D array of 5 zeros
arr_1d = np.zeros(5)

# Create a 2x3 array of zeros
arr_2d = np.zeros((2, 3))

#Create a 3x3 array of zeros with int datatype
arr_int = np.zeros((3,3), dtype=int)

print(arr_1d)
print(arr_2d)
print(arr_int)

Arrays filled with zeros are often used for:

Initialization: As a starting point for accumulating values during a calculation.
Placeholders: When you need an array of a specific size and type before you know the actual values.
Masking: Creating a mask array where zero represents elements to be ignored or excluded.

7. How do you create a NumPy array filled with ones? When is that handy?

To create a NumPy array filled with ones, you can use the numpy.ones() function. This function takes the desired shape of the array as an argument and returns a new array filled with ones of the specified data type (default is float64).

It's handy in several situations. For example:

Initialization: Initializing an array before populating it with actual values.
Masking: Creating a mask where ones represent elements to be included or processed.
Multiplication identity: Using it as an identity element in certain mathematical operations, though numpy.identity is more commonly used for identity matrices.
Image processing: Creating white images.

import numpy as np

# Create a 3x4 array filled with ones
array_of_ones = np.ones((3, 4))
print(array_of_ones)

8. How can you create a NumPy array with a specific range of numbers? Show an example.

You can create a NumPy array with a specific range of numbers using the arange() function. It's similar to Python's range() but creates a NumPy array instead of a list.

For example, numpy.arange(start, stop, step) creates an array with values starting from start (inclusive), up to stop (exclusive), with a step size of step. Here's an example:

import numpy as np

arr = np.arange(0, 10, 2) # Start=0, Stop=10, Step=2
print(arr) # Output: [0 2 4 6 8]

9. Explain how to reshape a NumPy array. Why might you want to do that?

Reshaping a NumPy array changes the dimensions of the array without altering its data. The reshape() method takes a tuple specifying the new shape. For example, an array of shape (4, 3) can be reshaped to (6, 2) or (12, 1) as long as the total number of elements remains the same (4 * 3 = 6 * 2 = 12 * 1).

You might want to reshape an array for several reasons:

To prepare data for a specific machine learning model that requires a certain input shape.
To change the layout of image data (e.g., from a flattened array to a height x width x channels array).
To facilitate calculations that are easier to perform with arrays of certain dimensions.

import numpy as np

arr = np.array([1, 2, 3, 4, 5, 6])
reshaped_arr = arr.reshape((2, 3))
print(reshaped_arr)

10. How do you access a specific element in a NumPy array? (Like picking out a specific LEGO brick).

You can access specific elements in a NumPy array using indexing, similar to Python lists. For a one-dimensional array, you simply specify the index within square brackets (e.g., my_array[3] retrieves the element at index 3). Remember that indexing starts at 0.

For multi-dimensional arrays, you provide a comma-separated tuple of indices. For example, in a 2D array, my_array[1, 2] accesses the element at row 1, column 2. You can also use slicing to extract portions of the array, like my_array[0:2, :] to get the first two rows and all columns.

11. What are some ways to grab multiple elements from a NumPy array at once? (Slicing)

NumPy provides several ways to grab multiple elements at once using slicing and advanced indexing. Slicing uses the colon operator (:) to specify a range of indices. For example,

arr[start:stop:step] selects elements from start (inclusive) to stop (exclusive), with a step size of step. Omitting start defaults to 0, omitting stop defaults to the end of the array, and omitting step defaults to 1.

Advanced indexing allows you to select elements using integer arrays or boolean arrays. Integer array indexing uses arrays of indices to specify which elements to select. Boolean array indexing uses a boolean array of the same shape as the original array to select elements where the boolean array is True. For example:

import numpy as np

arr = np.array([10, 20, 30, 40, 50])

# Slicing
print(arr[1:4])  # Output: [20 30 40]

# Integer array indexing
indices = [0, 2, 4]
print(arr[indices])  # Output: [10 30 50]

# Boolean array indexing
mask = np.array([True, False, True, False, True])
print(arr[mask])  # Output: [10 30 50]

12. How can you change the value of an element in a NumPy array?

You can change the value of an element in a NumPy array using indexing or slicing. With indexing, you specify the index of the element you want to modify. For example, arr[2] = 10 changes the value of the element at index 2 to 10.

Alternatively, you can use slicing to modify a range of elements. For example, arr[1:4] = [20, 30, 40] changes the values of elements from index 1 up to (but not including) index 4 to the given values. Boolean indexing can also be used, like arr[arr > 5] = 0 which sets all elements greater than 5 to 0. arr.itemset(index, value) can also set an array element value.

13. What does 'broadcasting' mean in NumPy? Give a simple example.

Broadcasting in NumPy refers to the ability of NumPy to perform arithmetic operations on arrays with different shapes. NumPy automatically expands the smaller array to match the shape of the larger array without creating copies of the data. This makes it possible to perform element-wise operations on arrays that aren't perfectly aligned in terms of dimensions.

For example:

import numpy as np

a = np.array([1, 2, 3])
b = 2

c = a + b  # Broadcasting occurs here; b is treated as [2, 2, 2]
print(c)  # Output: [3 4 5]

In this example, the scalar b is broadcasted to the shape of array a before the addition is performed.

14. How do you perform element-wise addition on two NumPy arrays?

Element-wise addition on two NumPy arrays can be performed using the + operator or the numpy.add() function. Both achieve the same result, adding corresponding elements from the two arrays to produce a new array containing the sums.

For example:

import numpy as np

a = np.array([1, 2, 3])
b = np.array([4, 5, 6])

# Using the + operator
result_plus = a + b  # Output: [5 7 9]

# Using numpy.add()
result_add = np.add(a, b)  # Output: [5 7 9]

print(result_plus)
print(result_add)

15. How do you perform element-wise multiplication on two NumPy arrays?

Element-wise multiplication of two NumPy arrays can be achieved using the * operator or the numpy.multiply() function. Both methods perform the same operation, multiplying corresponding elements from the two arrays and returning a new array containing the results.

For example:

import numpy as np

arr1 = np.array([1, 2, 3])
arr2 = np.array([4, 5, 6])

# Using the * operator
result1 = arr1 * arr2  # Output: [ 4 10 18]

# Using numpy.multiply()
result2 = np.multiply(arr1, arr2)  # Output: [ 4 10 18]

16. What's the difference between the '*' operator and the '@' operator when used with NumPy arrays? (Focus on basic usage).

The * operator performs element-wise multiplication on NumPy arrays. This means that corresponding elements in the arrays are multiplied together. The arrays must be broadcastable, meaning they have compatible shapes or one of them is a scalar.

In contrast, the @ operator (or the numpy.matmul() function) performs matrix multiplication. For 2D arrays, this is the standard matrix product as defined in linear algebra. For higher-dimensional arrays, it behaves like a stack of matrix multiplications. The dimensions must be compatible for matrix multiplication, i.e., if a is of shape (m, n) and b is of shape (n, k), then a @ b will have shape (m, k).

17. How do you calculate the sum of all elements in a NumPy array?

You can calculate the sum of all elements in a NumPy array using the np.sum() function or the sum() method of the NumPy array object.

For example:

import numpy as np

arr = np.array([1, 2, 3, 4, 5])

# Using np.sum()
sum_of_elements = np.sum(arr)
print(sum_of_elements)  # Output: 15

# Using the sum() method
sum_of_elements = arr.sum()
print(sum_of_elements) # Output: 15

18. How do you find the maximum and minimum values in a NumPy array?

To find the maximum and minimum values in a NumPy array, you can use the np.max() and np.min() functions, respectively. These functions efficiently iterate through the array and return the desired value. For example:

import numpy as np

arr = np.array([3, 1, 4, 1, 5, 9, 2, 6])

maximum_value = np.max(arr)
minimum_value = np.min(arr)

print("Maximum:", maximum_value)
print("Minimum:", minimum_value)

19. How do you calculate the mean (average) of the elements in a NumPy array?

To calculate the mean (average) of the elements in a NumPy array, you can use the numpy.mean() function. This function takes the NumPy array as input and returns the mean of its elements.

For example:

import numpy as np

arr = np.array([1, 2, 3, 4, 5])
mean_value = np.mean(arr)
print(mean_value) # Output: 3.0

20. How can you filter a NumPy array to select only elements that meet a certain condition? (e.g., greater than 5)

You can filter a NumPy array using boolean indexing. This involves creating a boolean array of the same shape as the original array, where each element is True if the corresponding element in the original array meets the specified condition, and False otherwise. This boolean array is then used to index the original array, returning a new array containing only the elements where the boolean array is True.

For example, to select elements greater than 5 from an array arr, you can use the following code:

import numpy as np

arr = np.array([1, 6, 3, 8, 2, 9])
filtered_arr = arr[arr > 5]
print(filtered_arr) # Output: [6 8 9]

21. What is a boolean mask in NumPy, and how is it used?

A boolean mask in NumPy is a boolean array (an array containing only True or False values) that is used to select elements from another NumPy array. The boolean mask has the same shape as the array you want to select from. Wherever the mask has a True value, the corresponding element from the original array is selected; where it has False, the element is ignored.

Boolean masks are commonly created using comparison operators (e.g., ==, >, <) on NumPy arrays. For example, arr > 5 creates a boolean mask where each element is True if the corresponding element in arr is greater than 5, and False otherwise. You can then use this mask to extract only the elements of arr that satisfy the condition:

import numpy as np

arr = np.array([1, 6, 3, 8, 2, 9])
mask = arr > 5
print(mask) # Output: [False  True False  True False  True]
selected_elements = arr[mask]
print(selected_elements) # Output: [6 8 9]

22. How can you combine two NumPy arrays horizontally?

You can combine two NumPy arrays horizontally using several methods:

np.hstack((arr1, arr2)): Stacks arrays in sequence horizontally (column-wise).
np.concatenate((arr1, arr2), axis=1): Concatenates arrays along the second axis (axis=1, which represents columns).
np.column_stack((arr1, arr2)): Stacks 1-D arrays as columns into a 2-D array.

23. How can you combine two NumPy arrays vertically?

You can combine two NumPy arrays vertically using the numpy.vstack() function or numpy.concatenate() with axis=0. numpy.vstack() is a convenience function specifically designed for vertical stacking, while numpy.concatenate() offers more general array concatenation capabilities.

For example:

import numpy as np

a = np.array([1, 2, 3])
b = np.array([4, 5, 6])

# Using vstack
c = np.vstack((a, b))
print(c)

# Using concatenate
d = np.concatenate((a.reshape(1,-1), b.reshape(1,-1)), axis=0)
print(d)

NumPy intermediate interview questions

1. How can you efficiently compute the moving average of a NumPy array?

To efficiently compute the moving average of a NumPy array, you can use the numpy.convolve() function. This function performs a discrete, linear convolution of two one-dimensional sequences. The general approach is to convolve the input array with a window of weights (e.g., an array of ones for a simple moving average) and then divide by the sum of the weights (window size). To prevent boundary effects you can use a convolution mode like 'valid' which returns only the central part of the convolution. Alternatively, padding can be used.

For example, to compute a moving average of size n on a NumPy array arr, you can use np.convolve(arr, np.ones(n)/n, mode='valid'). Here np.ones(n) creates the averaging window, dividing by n normalizes it, and mode='valid' ensures that only complete averages are returned, avoiding edge effects where the window extends beyond the array boundaries. Other efficient options include using scipy.ndimage.convolve1d which may offer performance benefits for larger arrays, or using a cumulative sum approach.

2. Explain how to use NumPy's broadcasting feature and provide a practical example.

NumPy's broadcasting allows arithmetic operations between arrays with different shapes, making code more concise and efficient. NumPy automatically expands the smaller array to match the shape of the larger array without creating copies of the data, which saves memory and time. This expansion happens along dimensions of size 1.

For example, consider adding a vector to a matrix. If A is a (3, 3) matrix and b is a (1, 3) vector, NumPy will automatically "stretch" b into a (3, 3) matrix by repeating it three times along the first dimension, before adding it to A. Here's a practical example in code:

import numpy as np

A = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
b = np.array([1, 0, 1])

C = A + b
print(C)

In this example, b will be broadcast to [[1, 0, 1], [1, 0, 1], [1, 0, 1]] before element-wise addition with A.

3. Describe how you would implement a convolution operation using NumPy.

To implement a convolution operation using NumPy, you can manually compute the convolution or leverage scipy.signal.convolve2d for 2D convolution or numpy.convolve for 1D. For the manual implementation, you'd iterate through the image, and for each pixel, perform element-wise multiplication with the kernel, and then sum the results. The summed result is placed in the corresponding location of the output feature map. The critical part is properly handling the boundaries of the image to prevent out-of-bounds access. A simple implementation may involve padding the image.

For example, using scipy.signal.convolve2d: from scipy import signal; result = signal.convolve2d(image, kernel, mode='valid'). The mode argument controls how boundaries are handled; 'valid' means no padding and the output will be smaller than the input, 'same' means the output will be of the same size as the input (typically with padding), and 'full' means the output will be the full discrete convolution.

4. How can you sort a NumPy array by multiple columns?

You can sort a NumPy array by multiple columns using np.lexsort. np.lexsort performs an indirect sort using a sequence of keys. The last key in the sequence is the primary sorting key, the second to last is the secondary sorting key, and so on.

For example, to sort array a by column 'b' primarily and then by column 'a' secondarily, you would use: np.lexsort((a['a'], a['b'])). This returns the indices that would sort the array in the specified order. You can then use these indices to reorder the original array: sorted_array = a[np.lexsort((a['a'], a['b']))]

5. How do you handle memory efficiently when working with extremely large NumPy arrays that don't fit in RAM?

When working with NumPy arrays that exceed available RAM, memory efficiency is crucial. Memory mapping with numpy.memmap allows accessing portions of the array from disk as if they were in memory, avoiding loading the entire dataset at once. This is useful when you need to read, write or perform calculations on large datasets without fully loading them into RAM. Dask arrays are another option, they allow for out-of-core computation by dividing the array into smaller chunks and processing them in parallel or sequentially, providing a high-level interface for distributed computing with NumPy-like syntax. Finally, using the np.float16 or np.int8 datatypes if the full precision of np.float64 or np.int64 is not needed can drastically reduce memory usage.

6. Explain how to create a structured array in NumPy, and provide a use case.

A structured array in NumPy allows you to store data of different data types in a single array. You define the structure by specifying the name and data type for each field. You can create a structured array by providing a list of tuples, where each tuple defines a field's name and data type (e.g., [('name', 'U20'), ('age', 'i4'), ('weight', 'f4')]). This defines fields 'name' (Unicode string of length 20), 'age' (32-bit integer), and 'weight' (32-bit float).

Use case: Consider storing employee data where you need to track employee name (string), employee ID (integer), and salary (float) together. A structured array allows you to efficiently store and access all this information for each employee in a single array. For example:

import numpy as np

dtype = [('name', 'U20'), ('id', 'i4'), ('salary', 'f4')]
employees = np.array([('Alice', 123, 60000.0), ('Bob', 456, 75000.0)], dtype=dtype)
print(employees['name'])

7. How can you use NumPy to perform a Fast Fourier Transform (FFT) on a signal?

You can use NumPy's fft module to perform a Fast Fourier Transform (FFT) on a signal. Specifically, the numpy.fft.fft() function computes the one-dimensional discrete Fourier Transform.

Here's a basic example:

import numpy as np

# Example signal (e.g., a sine wave)
sample_rate = 1000  # Samples per second
time = np.arange(0, 1, 1/sample_rate) # 1 second
signal = np.sin(2 * np.pi * 5 * time)  # 5 Hz sine wave

# Perform FFT
fft_result = np.fft.fft(signal)

# Get the frequencies corresponding to the FFT result
fft_freq = np.fft.fftfreq(signal.size, d=1/sample_rate)

# fft_result contains the complex-valued FFT coefficients
# fft_freq contains the corresponding frequencies

fft_result will contain the complex Fourier coefficients, and fft_freq holds the corresponding frequencies. You'll typically take the absolute value of fft_result to get the magnitude spectrum (amplitudes) for further analysis.

8. Explain the difference between `np.vectorize` and NumPy's broadcasting capabilities. When would you use one over the other?

np.vectorize and NumPy broadcasting both aim to apply operations across arrays, but they do so in fundamentally different ways. Broadcasting automatically handles element-wise operations on arrays with different shapes, as long as their dimensions are compatible. NumPy implicitly expands smaller arrays to match the shape of the larger array. np.vectorize, on the other hand, is a function decorator that explicitly creates a NumPy ufunc (universal function) from a Python function. This allows you to apply a Python function that might not be inherently NumPy-compatible to NumPy arrays element-wise. It essentially loops over the elements, which can be slow.

When choosing between them, favor broadcasting whenever possible. Broadcasting is highly optimized and leverages NumPy's compiled code for performance. Use np.vectorize only when you need to apply a complex Python function that cannot be expressed using NumPy's built-in operations and broadcasting, and when performance is not critical. For example, if you have a Python function with if/else logic that depends on element values, np.vectorize might be useful. But be aware that it is generally much slower than broadcasting.

9. How do you calculate the eigenvalues and eigenvectors of a matrix using NumPy?

To calculate eigenvalues and eigenvectors of a matrix using NumPy, you can use the numpy.linalg.eig() function. This function takes a square matrix as input and returns a tuple containing an array of eigenvalues and a matrix of corresponding eigenvectors.

Here's how you can do it:

import numpy as np

matrix = np.array([[4, 2],
                   [2, 1]])

eigenvalues, eigenvectors = np.linalg.eig(matrix)

print("Eigenvalues:", eigenvalues)
print("Eigenvectors:", eigenvectors)

10. Describe how to save and load NumPy arrays to and from disk efficiently.

NumPy provides efficient methods for saving and loading arrays to and from disk. The most common methods are numpy.save() and numpy.load() for saving to and loading from .npy files, respectively. These are suitable for single arrays. For multiple arrays, numpy.savez() saves multiple arrays into a single .npz archive, and numpy.load() can then be used to retrieve them.

For example:

Saving a single array:

import numpy as np
arr = np.arange(10)
np.save('my_array.npy', arr)

Loading a single array:
```
loaded_arr = np.load('my_array.npy')
```

Saving multiple arrays:

arr1 = np.arange(5)
arr2 = np.arange(5, 10)
np.savez('my_arrays.npz', array1=arr1, array2=arr2)

Loading multiple arrays:

loaded_data = np.load('my_arrays.npz')
array1 = loaded_data['array1']
array2 = loaded_data['array2']

11. How can you use NumPy to simulate a random walk?

NumPy can simulate a random walk efficiently using its array operations and random number generation capabilities. A random walk is a sequence of steps where the direction and magnitude of each step are random. Here's a basic approach:

First, generate random step directions (e.g., -1 or 1) using numpy.random.choice. Then, calculate the cumulative sum of these steps using numpy.cumsum. This cumulative sum represents the position at each point in the random walk. Here's a simple example:

import numpy as np

# Number of steps
n_steps = 1000

# Random steps (-1 or 1)
steps = np.random.choice([-1, 1], size=n_steps)

# Cumulative sum to get the walk
walk = np.cumsum(steps)

# walk is your random walk

12. Explain how to use NumPy to solve a system of linear equations.

NumPy's linalg.solve() function provides a straightforward way to solve systems of linear equations. Given a system represented as Ax = b, where A is the coefficient matrix, x is the vector of unknowns, and b is the vector of constants, you can find x using linalg.solve(A, b). For example:

import numpy as np

A = np.array([[2, 1], [1, 3]])
b = np.array([5, 8])

x = np.linalg.solve(A, b)
print(x) # Output: [1. 3.]

In this example, A and b are defined as NumPy arrays, and linalg.solve(A, b) returns the solution x to the system of equations. The function efficiently calculates the inverse of A and multiplies it with b to find x.

13. How can you create a custom NumPy `dtype`?

You can create custom NumPy dtype using numpy.dtype. There are primarily two approaches:

Using a list of tuples: Each tuple defines a field name, its data type, and optionally its shape. For example:

import numpy as np

dt = np.dtype([('name', np.unicode_, 16), ('grades', np.float64, (2,))])
x = np.array([('Alice', (80.0, 85.5)), ('Bob', (90.0, 92.0))], dtype=dt)
print(x[0]['name'])
# Alice
print(x[1]['grades'])
# [90. 92.]

Using a dictionary: This approach specifies the fields, their data types, and offsets in bytes. While more complex, it offers more control over memory layout. You can define the names and formats for each field. Also, you can define offsets and itemsize if necessary. This approach is less commonly used for simple custom dtypes.

14. Describe how you would implement a simple k-means clustering algorithm using NumPy.

K-means clustering with NumPy involves initializing k centroids randomly from the data points. Then, iteratively assign each data point to the nearest centroid based on Euclidean distance (using NumPy's linalg.norm). After assignment, recalculate the centroids by taking the mean of all data points assigned to each cluster. Repeat the assignment and centroid update steps until the centroids no longer change significantly or a maximum number of iterations is reached.

Here's a basic implementation outline:

Initialization: Randomly select k data points as initial centroids.
Assignment: For each data point:
- Calculate the Euclidean distance to each centroid.
- Assign the data point to the cluster with the nearest centroid.
Update: For each cluster:
- Calculate the new centroid as the mean of all data points in the cluster.
Iteration: Repeat steps 2 and 3 until convergence (centroids stop changing significantly) or a maximum number of iterations is reached.

def kmeans(data, k, max_iters=100):
    centroids = data[np.random.choice(len(data), k, replace=False)]
    for _ in range(max_iters):
        distances = np.linalg.norm(data[:, np.newaxis] - centroids, axis=2)
        labels = np.argmin(distances, axis=1)
        new_centroids = np.array([data[labels == i].mean(axis=0) for i in range(k)])
        if np.allclose(centroids, new_centroids):
            break
        centroids = new_centroids
    return centroids, labels

15. How can you use NumPy to perform image manipulation tasks, such as resizing or color channel extraction?

NumPy is fundamental for image manipulation because images can be represented as multi-dimensional arrays. Resizing can be achieved through libraries like SciPy or OpenCV, which leverage NumPy arrays. For instance, OpenCV's cv2.resize() takes a NumPy array (image) as input and resizes it. Color channel extraction is straightforward with NumPy array indexing. If an image is represented as a NumPy array img with shape (height, width, channels), you can extract the red channel using red_channel = img[:, :, 0], the green channel with green_channel = img[:, :, 1], and the blue channel with blue_channel = img[:, :, 2]. These extracted channels are also NumPy arrays, allowing for further processing.

For example, the following code shows the process:

import cv2
import numpy as np

# Load an image
img = cv2.imread('image.jpg')

# Resize the image
resized_img = cv2.resize(img, (width, height))

# Extract the red channel
red_channel = img[:, :, 2] #OpenCV stores colors as BGR

16. Explain how to parallelize NumPy operations using libraries like `multiprocessing`.

NumPy operations can be parallelized using multiprocessing to leverage multiple CPU cores. The basic idea is to split the large NumPy array into smaller chunks, process each chunk in parallel using separate processes, and then combine the results. The multiprocessing.Pool class is commonly used for this purpose. For example, to apply a function func to a NumPy array data in parallel: first, define a helper function that operates on a portion of data, then use Pool.map to apply this function to chunks of the data.

Here's a concise example:

import numpy as np
import multiprocessing

def process_chunk(chunk):
    return np.square(chunk) # Example operation: square each element

if __name__ == '__main__':
    data = np.arange(100)
    num_processes = multiprocessing.cpu_count()
    chunk_size = len(data) // num_processes
    chunks = [data[i*chunk_size:(i+1)*chunk_size] for i in range(num_processes)]

    with multiprocessing.Pool(processes=num_processes) as pool:
        results = pool.map(process_chunk, chunks)

    squared_data = np.concatenate(results)
    print(squared_data)

17. How would you detect and replace NaN values in a NumPy array?

To detect NaN values in a NumPy array, you can use the np.isnan() function. This function returns a boolean array of the same shape as the input array, where True indicates the presence of a NaN value and False otherwise.

To replace NaN values, you can use np.nan_to_num() or boolean indexing along with np.nan. np.nan_to_num() replaces NaN with zero and inf with large finite numbers. Alternatively, using boolean indexing is more explicit and allows for greater control. For instance, to replace all NaN values with 0, you would use: arr[np.isnan(arr)] = 0.

18. Describe a scenario where using NumPy's `memmap` would be beneficial.

NumPy's memmap is beneficial when working with large datasets that exceed available RAM. Instead of loading the entire dataset into memory, memmap creates a memory-like interface to a portion of a file on disk. This allows you to access and manipulate sections of the data as if it were in memory, without actually loading it all at once.

For example, imagine processing a massive image dataset stored as a binary file. Using memmap, you can open the image file and treat it as a NumPy array. You can then perform operations like:

Reading specific sections of the image.
Applying filters or transformations to portions of the image.
Performing calculations on the pixel data.

Because only the required portions of the file are loaded into memory on demand, memmap enables you to work with datasets much larger than your RAM can hold. This approach also avoids the overhead of repeatedly loading and saving the entire dataset to disk.

19. Explain how to calculate the inverse of a matrix using NumPy, including checking if the matrix is invertible.

To calculate the inverse of a matrix using NumPy, you can use the numpy.linalg.inv() function. First, create a NumPy array representing your matrix. Then, pass this array to np.linalg.inv(). However, a matrix is invertible only if its determinant is non-zero. NumPy doesn't explicitly check for invertibility before computing the inverse; it will raise a LinAlgError if the matrix is singular (non-invertible). Therefore, a good practice is to wrap the np.linalg.inv() call in a try...except block to handle potential LinAlgError exceptions. Here is an example:

import numpy as np

matrix = np.array([[1, 2], [3, 4]])

try:
    inverse_matrix = np.linalg.inv(matrix)
    print("Inverse:\n", inverse_matrix)
except np.linalg.LinAlgError:
    print("Matrix is not invertible.")

20. How can you use NumPy to generate different types of random numbers (e.g., normal, uniform, exponential)?

NumPy's random module provides functions for generating various types of random numbers. For normal (Gaussian) distribution, use numpy.random.normal(loc=0.0, scale=1.0, size=None), where loc is the mean, scale is the standard deviation, and size is the shape of the output array. For uniform distribution between a low and high value, use numpy.random.uniform(low=0.0, high=1.0, size=None). To generate random numbers from an exponential distribution, use numpy.random.exponential(scale=1.0, size=None), where scale is the inverse of the rate parameter (lambda).

For example:

numpy.random.normal(0, 1, (2, 3)) generates a 2x3 array of normally distributed random numbers with mean 0 and standard deviation 1.
numpy.random.uniform(1, 5, 10) generates 10 uniformly distributed random numbers between 1 and 5.
numpy.random.exponential(2, 5) generates 5 exponentially distributed random numbers with a scale of 2.

21. Describe how you would perform element-wise string operations on a NumPy array of strings.

NumPy provides vectorized string operations via the np.char module, allowing efficient element-wise string manipulation on NumPy arrays. To perform element-wise string operations, access the desired function within np.char and pass the NumPy array of strings as input. For instance, to convert all strings to uppercase, you would use np.char.upper(my_array).

Common string operations include:

np.char.lower(arr): Converts to lowercase.
np.char.upper(arr): Converts to uppercase.
np.char.strip(arr): Removes leading/trailing whitespace.
np.char.add(arr1, arr2): Concatenates strings element-wise.
np.char.replace(arr, old, new): Replaces occurrences of a substring.
np.char.find(arr, sub): Finds the lowest index of a substring.

These functions operate element-wise, returning a new NumPy array with the modified strings.

22. How do you handle dealing with sparse matrices using NumPy (or related libraries)? Provide some possible advantages.

Sparse matrices can be handled efficiently in NumPy (and SciPy) using the scipy.sparse module. This module provides several sparse matrix formats like CSR (Compressed Sparse Row), CSC (Compressed Sparse Column), COO (Coordinate list), and LIL (List of Lists), each optimized for different operations and sparsity patterns. To create a sparse matrix, you can convert a dense NumPy array or directly initialize a sparse matrix with its non-zero elements.

Some advantages of using sparse matrices:

Memory Efficiency: Significant reduction in memory usage when dealing with matrices containing a large number of zero elements.
Computational Efficiency: Optimized algorithms for sparse matrix operations (e.g., multiplication, linear solvers) can lead to faster computations compared to dense matrix operations, as they avoid unnecessary operations on zero elements.
Suitable for Large Datasets: Enables handling and processing of very large matrices that would be impossible to store in dense format.

Example:

from scipy.sparse import csr_matrix
import numpy as np

data = np.array([1, 2, 3, 4, 5, 6])
row_ind = np.array([0, 0, 1, 2, 2, 2])
col_ind = np.array([0, 2, 2, 0, 1, 2])

sparse_matrix = csr_matrix((data, (row_ind, col_ind)), shape=(3, 3))

print(sparse_matrix)

NumPy interview questions for experienced

1. How would you optimize NumPy code for both memory usage and computational speed, especially when dealing with large datasets?

To optimize NumPy code for large datasets, focus on minimizing memory footprint and maximizing computational efficiency. Memory optimization can be achieved by using appropriate data types (e.g., int8 instead of int64 if the range allows), leveraging NumPy's array views to avoid unnecessary copies when performing operations like slicing. Also use del to explicitly release memory when arrays are no longer needed.

For speed optimization, vectorize operations to avoid explicit loops; NumPy's universal functions (ufuncs) are highly optimized. Use in-place operations (e.g., += instead of =) to modify arrays directly when possible. Choose efficient algorithms and leverage libraries like Numba or Cython to further accelerate critical sections of code. Consider using sparse matrices if the data is sparse. Parallelize computations where feasible using libraries like joblib or dask to utilize multi-core processors or distributed computing environments.

2. Can you describe advanced techniques for effectively handling and processing datasets with missing values (NaNs) in NumPy?

Beyond basic replacement with mean/median/mode, advanced NaN handling in NumPy involves considering the nature of missingness. For Missing Completely At Random (MCAR), simple imputation might suffice. However, for Missing At Random (MAR) or Missing Not At Random (MNAR), more sophisticated techniques are needed.

Techniques include:

k-NN imputation: Imputing missing values based on the average of 'k' nearest neighbors.
Multiple Imputation: Creating multiple plausible datasets by imputing different values for the missing entries and then combining the results.
Model-based imputation: Using regression models (linear regression, decision trees, etc.) to predict the missing values based on other features.
Using np.nan_to_num() with parameters: Control how NaN values are replaced with zeros, positive infinity, or negative infinity. For example, you can avoid divide-by-zero errors and specify a large value in place of inf with the nan values replaced with a number of your choice.
Masked arrays: Using np.ma.masked_array() to create arrays where missing values are masked out, allowing calculations to be performed without considering the NaNs.
Advanced Imputation Libraries: Utilize libraries like sklearn.impute (SimpleImputer, IterativeImputer) or fancyimpute (various matrix completion algorithms).

3. Explain the concept of 'broadcasting' in NumPy and provide an example where it significantly simplifies array operations.

Broadcasting in NumPy refers to the ability of NumPy to perform operations on arrays of different shapes. NumPy automatically expands the dimensions of the smaller array to match the larger array, without creating copies of data, allowing element-wise operations between them. This is done by virtually repeating elements of the smaller array along specific dimensions.

For example, consider adding a scalar to a matrix. Without broadcasting, you'd have to manually create a matrix with the same dimensions as the original matrix filled with the scalar value. With broadcasting, you can directly add the scalar to the matrix:

import numpy as np

a = np.array([[1, 2, 3], [4, 5, 6]])
b = 2

c = a + b  # Broadcasting `b` to match the shape of `a`
print(c)

This significantly simplifies the code and improves performance by avoiding unnecessary memory allocation and copy operations.

4. How does NumPy's memory model contribute to its performance, and what are some strategies for minimizing memory copies?

NumPy's memory model significantly boosts performance because it stores arrays in contiguous blocks of memory. This allows for efficient vectorized operations through SIMD (Single Instruction, Multiple Data) processing and reduces the overhead of memory access. Furthermore, NumPy arrays are homogeneous (all elements have the same data type), which enables optimized memory layout and calculations.

To minimize memory copies and improve performance, consider these strategies:

In-place operations: Use operators like +=, -=, *=, /= instead of creating new arrays.
Views instead of copies: Leverage slicing and indexing, which often create views (shared memory) rather than copies.
np.copyto(): When a copy is unavoidable, use np.copyto() to copy data into an existing array instead of creating a new one.
Avoid unnecessary type conversions: Ensure that data types are consistent to prevent implicit copies during operations.
Use np.asarray(): Convert lists to NumPy arrays only when necessary; if the input is already an ndarray, np.asarray() will return it without copying.
Beware of broadcasting: Be mindful of how broadcasting affects memory usage; sometimes reshaping or explicitly specifying the output array can help.

5. Describe how you can leverage NumPy's universal functions (ufuncs) for custom operations and performance gains.

NumPy's universal functions (ufuncs) allow applying element-wise operations on arrays, providing significant performance benefits compared to using Python loops. Custom operations can be vectorized by using numpy.frompyfunc or numpy.vectorize to create a ufunc from a Python function. This enables you to apply complex logic across NumPy arrays efficiently.

For performance gains, ufuncs leverage optimized C implementations and SIMD (Single Instruction, Multiple Data) instructions. They also handle broadcasting automatically, simplifying operations between arrays of different shapes. Instead of iterating in Python, ufuncs perform computations in compiled code, resulting in faster execution times, especially for large arrays.

6. Explain the trade-offs between using NumPy arrays and other data structures (e.g., lists, dictionaries) for numerical computations.

NumPy arrays excel in numerical computations due to their homogenous data type and contiguous memory allocation, leading to efficient storage and vectorized operations. This results in significantly faster execution compared to Python lists, which store pointers to objects scattered in memory and lack built-in support for element-wise operations. While lists are more flexible for mixed data types and dynamic resizing, they incur performance overhead for numerical tasks. Similarly, dictionaries, optimized for key-value lookups, are not designed for numerical computations and offer no performance advantages in this domain.

Other data structures like Pandas DataFrames or SciPy sparse matrices build upon NumPy arrays to offer specialized functionality. DataFrames provide labeled axes and data alignment, making them suitable for data analysis, but they introduce additional overhead compared to raw NumPy arrays. Sparse matrices are optimized for storing and operating on large matrices with many zero elements, offering memory efficiency over dense NumPy arrays when dealing with such data.

7. How do you handle scenarios where NumPy's default data types are insufficient for the precision required in a computation?

When NumPy's default data types (like int32 or float64) don't offer enough precision, I use NumPy's support for higher-precision data types. For integers, I'd use int64 (if available and sufficient), or if that's still not enough, explore using Python's built-in arbitrary-precision integers directly, though this would mean losing some of NumPy's performance benefits. For floating-point numbers, I would switch to float128 if supported by the hardware and NumPy installation. If not, libraries like decimal can be used for arbitrary-precision decimal arithmetic, but again, with potential performance trade-offs.

I always consider the memory implications of using higher-precision data types, as they consume more memory. Additionally, I would verify that all NumPy functions and operations I'm using support the chosen data type correctly to avoid unexpected behavior or errors. If performance is critical and extremely high precision is needed, I would research specialized libraries designed for high-performance, high-precision numerical computations, possibly outside of the NumPy ecosystem, or consider using a combination of techniques to optimize the critical parts of the calculation.

8. Describe the process of integrating NumPy with other scientific computing libraries (e.g., SciPy, scikit-learn) in a complex project.

Integrating NumPy with SciPy and scikit-learn in a complex project typically involves leveraging NumPy arrays as the fundamental data structure. SciPy builds upon NumPy, providing advanced mathematical functions, signal processing, optimization, and more, often accepting NumPy arrays as input and returning NumPy arrays as output. Scikit-learn similarly uses NumPy arrays for machine learning tasks like model training, prediction, and evaluation.

Key aspects include data preprocessing with NumPy (e.g., handling missing values, normalization), utilizing SciPy for statistical analysis or signal processing, and then feeding the processed data (still as NumPy arrays) into scikit-learn models. It is crucial to ensure data type consistency (e.g., using .astype() to convert array dtypes) and to understand the expected input formats of functions from each library. For example:

import numpy as np
from scipy import signal
from sklearn.linear_model import LinearRegression

data = np.array([1.0, 2.0, 3.0, 4.0, 5.0])
filtered_data = signal.medfilt(data, kernel_size=3) #SciPy

X = filtered_data[:-1].reshape(-1, 1)
y = filtered_data[1:]
model = LinearRegression()
model.fit(X, y) #scikit-learn

9. Can you explain how to use NumPy for advanced array manipulation, such as reshaping, transposing, and splitting arrays?

NumPy provides powerful functions for advanced array manipulation. Reshaping changes the dimensions of an array without altering its data, using np.reshape() or array.reshape(). For example, arr.reshape((2, 3)) transforms a 1D array into a 2x3 array. Transposing swaps the rows and columns, commonly done with np.transpose() or array.T. Splitting divides an array into multiple sub-arrays. np.split() splits along a specified axis into equal parts, while np.array_split() allows for splitting into unequal parts.

Here are a few examples:

Reshape:

import numpy as np
arr = np.arange(6)
reshaped_arr = arr.reshape((2, 3))
print(reshaped_arr)

Transpose:

arr = np.array([[1, 2], [3, 4]])
transposed_arr = arr.T
print(transposed_arr)

Split:

arr = np.arange(8)
split_arr = np.split(arr, 2) #Splits into 2 equal parts
print(split_arr)

10. How do you use structured arrays in NumPy to represent complex data structures, and what are their advantages?

Structured arrays in NumPy allow you to represent data with heterogeneous types in a single array. You define a dtype that specifies the name and data type of each field. For example, you could represent a person with fields like 'name' (string), 'age' (integer), and 'height' (float). You create the structured array using numpy.zeros or other array creation functions, passing in the defined dtype.

Advantages include:

Organization: Data is grouped logically by fields.
Efficiency: Data is stored contiguously in memory, potentially improving performance.
Readability: Code becomes more readable and easier to understand as you can access data using meaningful field names instead of just indices. For example, to access the name of the first person, you can use person_array[0]['name'] instead of having names in a separate array and using an index to refer to the first person. It also becomes easy to use NumPy's vectorized operations on specific fields, such as calculating the average age of all people in the array.

11. Discuss methods for efficient data input and output with NumPy, particularly when dealing with very large files.

Efficient data input and output with NumPy, especially for large files, involves leveraging memory mapping and optimized file formats. Memory mapping (using np.memmap) allows you to access portions of a file as if it were an in-memory array without loading the entire file into RAM. This is crucial for files that exceed available memory. For example:

import numpy as np

# Create a memory-mapped array
data = np.memmap('large_data.dat', dtype='float32', mode='w+', shape=(10000, 10000))

# Perform operations on a slice
data[:1000, :1000] = np.random.rand(1000, 1000)

# Flush changes to disk
del data

Alternatively, using NumPy's built-in functions, use the np.save and np.load commands. For even larger datasets, consider using formats like HDF5 (Hierarchical Data Format) which is designed for storing and managing large, complex datasets. Libraries like h5py provide Python interfaces for reading and writing HDF5 files efficiently. Chunking data within HDF5 files is crucial for out-of-core operations. Another important aspect is to load data in appropriate chunks or blocks to avoid memory overflow. For text based data you might also consider np.loadtxt or np.genfromtxt with careful consideration of data types to minimize memory usage. Furthermore, using optimized data types (e.g., float32 instead of float64) can reduce memory footprint and improve I/O performance.

12. How can you use NumPy to implement custom algorithms or mathematical functions?

NumPy provides a powerful foundation for implementing custom algorithms and mathematical functions due to its efficient array operations and broadcasting capabilities. You can leverage NumPy arrays to represent data structures and use NumPy functions (like np.vectorize, np.frompyfunc) to apply custom Python functions element-wise across arrays. This approach enables you to create highly optimized implementations. Also, you can extend NumPy's functionality by subclassing ndarray to implement custom array behavior.

For example:

def custom_function(x, y):
    return x**2 + y

vectorized_func = np.vectorize(custom_function)

# OR

vectorized_func = np.frompyfunc(custom_function, 2, 1) # function, number of inputs, number of outputs

a = np.array([1, 2, 3])
b = np.array([4, 5, 6])
result = vectorized_func(a, b)
print(result)

This shows how a standard python function can be made to operate on NumPy arrays in an element-wise fashion. NumPy allows you to avoid explicit looping, leading to faster and more readable code.

13. How would you approach debugging performance issues in NumPy code, and what tools would you use?

To debug performance issues in NumPy code, I would start by profiling the code to identify bottlenecks. timeit is useful for simple timing of code snippets. For more detailed profiling, I would use tools like cProfile or %prun within IPython/Jupyter notebooks. These tools help pinpoint the functions consuming the most time.

Once I've identified the slow parts, I would investigate potential causes. Common culprits include using loops instead of vectorized operations, unnecessary memory allocation, and inefficient use of NumPy functions. I would use tools like memory_profiler to track memory usage. I would try to rewrite the code to leverage NumPy's vectorized operations, ensure optimal data types, and minimize memory copies. If possible, I'd use techniques like broadcasting or in-place operations. For very large arrays, tools like Dask may be employed for parallel processing.

14. Explain how you can extend NumPy with custom C or Fortran code for performance-critical applications.

NumPy can be extended with custom C or Fortran code to improve performance in computationally intensive tasks. This is typically achieved by writing a C/Fortran function that performs the core computation, then creating a wrapper using NumPy's C API or tools like f2py (for Fortran) to make this function callable from Python.

The general steps involve: 1. Writing the C/Fortran code, ensuring it handles NumPy data structures correctly (e.g., using ndarray pointers in C). 2. Creating a wrapper that translates NumPy arrays into the format expected by the C/Fortran function, and back from C/Fortran to NumPy. 3. Compiling the C/Fortran code into a shared library. 4. Using ctypes or a generated wrapper (e.g. by f2py) in Python to load the library and call the custom function. Here's an example using f2py with fortran:

! file: add.f90
subroutine add(a,b,out,n)
    integer, intent(in) :: n
    real(8), dimension(n), intent(in) :: a, b
    real(8), dimension(n), intent(out) :: out
    integer :: i
    do i = 1, n
        out(i) = a(i) + b(i)
    end do
end subroutine add

compile with f2py -c -m add add.f90, then in python:

import numpy as np
import add

a = np.array([1.0, 2.0, 3.0])
b = np.array([4.0, 5.0, 6.0])
result = np.zeros_like(a)
add.add(a, b, result)
print(result)

15. Discuss the implications of using different NumPy array storage orders (e.g., row-major vs. column-major) on performance.

NumPy's array storage order significantly impacts performance due to memory access patterns. Row-major (C-style) means rows are stored contiguously in memory, while column-major (Fortran-style) stores columns contiguously. When accessing elements in the order they are stored, performance is optimized because of better cache utilization (spatial locality). Accessing elements in a non-contiguous manner, such as iterating through columns in a row-major array, leads to cache misses and slower performance.

Consider these points:

Row-major (C-style): Faster for operations that iterate through rows (e.g., calculating the sum of each row). Default in NumPy.
Column-major (Fortran-style): Faster for operations that iterate through columns (e.g., calculating the sum of each column).
Transposing: Transposing an array can change the effective storage order. A transposed C-style array is effectively F-style. Be mindful of this when performing computations after a transpose.
np.ascontiguousarray() and np.asfortranarray(): These functions can be used to ensure an array is stored in the desired order, potentially improving performance if the array was not originally contiguous in that order.

16. Explain how to leverage NumPy's linear algebra capabilities for solving systems of equations and performing matrix decompositions.

NumPy's linalg module provides tools for solving linear equations and performing matrix decompositions. To solve a system of linear equations (Ax = b), you can use numpy.linalg.solve(A, b). This function directly solves for x, given the coefficient matrix A and the constant vector b.

For matrix decompositions, NumPy offers functions like numpy.linalg.eig() for eigenvalue decomposition, numpy.linalg.svd() for Singular Value Decomposition (SVD), and numpy.linalg.qr() for QR decomposition. These functions return the decomposed matrices (e.g., U, S, V for SVD) which can then be used for various applications like dimensionality reduction or solving least squares problems.

17. How do you approach working with sparse matrices in NumPy, and what are the benefits of using them?

When working with sparse matrices in NumPy (which doesn't directly support sparse matrices; we typically use scipy.sparse), I leverage the scipy.sparse library. I first identify if a matrix is truly sparse (a large percentage of zero values). If so, I choose an appropriate sparse matrix format from scipy.sparse such as:

CSR (Compressed Sparse Row): Good for row-wise operations, arithmetic, and matrix-vector products.
CSC (Compressed Sparse Column): Good for column-wise operations.
COO (Coordinate list): Good for constructing sparse matrices.
LIL (List of Lists): Good for incremental construction.

I then use the chosen format to create the sparse matrix, e.g., sparse.csr_matrix(dense_matrix). Benefits include significantly reduced memory usage compared to dense NumPy arrays, as only non-zero elements are stored. This also leads to faster computations for certain operations, especially matrix multiplication and linear algebra, as unnecessary calculations with zero values are avoided.

18. Discuss the differences between NumPy arrays and memory-mapped arrays, and when you might choose one over the other.

NumPy arrays store all their data in RAM, providing fast access and manipulation of data directly in memory. Memory-mapped arrays, on the other hand, store their data on disk and only load portions of it into memory as needed. This allows working with datasets much larger than available RAM.

The choice depends on the dataset size and performance needs. NumPy is preferred for smaller datasets that fit in memory and require fast computations. Memory-mapped arrays are suitable for very large datasets that exceed RAM capacity, trading some speed for the ability to process huge amounts of data. Consider memory-mapped arrays when you encounter MemoryError with NumPy. For example, consider a huge log file; a memory mapped array allows analysis without loading it entirely into memory. NumPy is ideal for operations like matrix multiplication when all data is in RAM.

19. Explain the relationship between NumPy and vectorized operations, and how this impacts code performance.

NumPy leverages vectorized operations to perform calculations on entire arrays at once, instead of iterating through individual elements. This is significantly faster than traditional Python loops because the operations are implemented in optimized C code under the hood, which avoids Python's interpreter overhead for each element. Vectorization allows NumPy to efficiently process large datasets.

The performance impact is substantial. Vectorized operations can lead to orders-of-magnitude speed improvements compared to equivalent Python loops. For example, consider adding two arrays: using array1 + array2 in NumPy (vectorized) is much faster than manually iterating through the arrays and adding elements one by one. This optimization is key to NumPy's efficiency in numerical computations.

20. How can NumPy be used in the context of image processing?

NumPy is fundamental to image processing in Python because images are essentially multi-dimensional arrays of pixel data. NumPy's ndarray object provides an efficient way to store and manipulate this data. Operations like cropping, resizing, color manipulation, and filtering can be implemented using NumPy's array slicing, broadcasting, and mathematical functions.

For instance, to access the red channel of an RGB image represented as a NumPy array img, you can simply use red_channel = img[:, :, 0]. Image filtering can be implemented using convolution with NumPy's array operations. Libraries like OpenCV and scikit-image build upon NumPy, providing higher-level functions, but NumPy remains the underlying workhorse for efficient array-based image manipulation.

NumPy MCQ

Question 1.

Given the NumPy array arr = np.array([[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]]), what will be the output of arr[1:, 2:]?

Options:

[[7 8] [11 12]]

[[6 7 8] [10 11 12]]

[[3 4] [7 8] [11 12]]

[[1 2] [5 6] [9 10]]

Question 2.

Which of the following NumPy commands correctly creates a 2x2 array filled with zeros, but ensures that the data type of the elements is specifically int32?

Options:

Options:

np.zeros((2, 2), dtype=np.int32)

np.zeros((2, 2), dtype='float64')

np.array([[0, 0], [0, 0]], dtype='i4')

np.zeros([2, 2])

Question 3.

What is the shape of the resulting NumPy array when you perform a broadcasting operation between an array of shape (5, 1) and an array of shape (1, 5)?

Options:

Options:

(5, 1)

(1, 5)

(5, 5)

(1, 1)

Question 4.

Consider a NumPy array arr = np.array([1, 2, 3, 4, 5, 6]). What is the sum of the elements of the array after it has been reshaped to a 2x3 matrix and then multiplied by 2?

Options:

21

42

12

14

Question 5.

Given a NumPy array arr = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10]), which of the following options correctly filters the array to select elements that are greater than 3 AND less than 8?

options:

Options:

arr[(arr > 3) & (arr < 8)]

arr[(arr > 3) || (arr < 8)]

arr[(arr > 3) and (arr < 8)]

arr[arr > 3, arr < 8]

Question 6.

What is the shape of the matrix product of the following two NumPy arrays?

a = np.array([[1, 2], [3, 4]])
b = np.array([[5, 6], [7, 8]])
c = np.matmul(a, b)

options:

Options:

(2, 2)

(4,)

(2,)

(4, 4)

Question 7.

Given a NumPy array arr with shape (3, 4):

arr = np.array([[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]])

What is the result of np.mean(arr, axis=1)?

options:

Options:

array([2.5, 6.5, 10.5])

array([5., 6., 7., 8.])

array([6.])

array([7., 8., 9., 10.])

Question 8.

Given a NumPy array arr = np.array([[1, 2, 3], [4, 5, 6]]), what is the standard deviation of all its elements, and what would be the shape of the array if it were reshaped to a 1D array?

Options:

Options:

Standard deviation: 1.7078, Shape: (3, 2)

Standard deviation: 1.7078, Shape: (6,)

Standard deviation: 2.9155, Shape: (6,)

Standard deviation: 2.9155, Shape: (3, 2)

Question 9.

Given a NumPy array arr = np.array([3, 1, 4, 1, 5, 9, 2, 6, 5, 3, 5]), which of the following options correctly returns a new sorted array containing only the unique elements of arr?

Options:

np.unique(arr).sort()

np.sort(np.unique(arr))

arr.unique().sort()

np.sort(arr.unique())

Question 10.

What is the result of using np.hstack((a, b)) on the following two NumPy arrays?

a = np.array([[1, 2], [3, 4]])
b = np.array([[5, 6], [7, 8]])

options:

Options:

[[1 2] [3 4] [5 6] [7 8]]

[[1 2 5 6] [3 4 7 8]]

[[1 2] [3 4]]

[[5 6] [7 8]]

Question 11.

Given a NumPy array arr = np.array([3, 1, 4, 1, 5, 9, 2, 6]), which of the following code snippets will return the index of the maximum value in the array?

Options:

np.max(arr)

np.argmax(arr)

np.where(arr == np.max(arr))

arr.max()

Question 12.

Given two NumPy arrays, a = np.array([[1, 2], [3, 4]]) and b = np.array([[5, 6], [7, 8]]), which of the following options correctly concatenates them vertically using NumPy?

Options:

np.concatenate((a, b), axis=0)

np.concatenate((a, b), axis=1)

np.stack((a, b), axis=0)

np.hstack((a, b))

Question 13.

Given a NumPy array arr = np.array([[1, 2, 3], [4, 5, 6]]), what is the result of np.cumsum(arr, axis=1)?

Options:

np.array([[1, 3, 6], [4, 9, 15]])

np.array([[1, 2, 3], [5, 7, 9]])

np.array([[1, 4], [2, 5], [3, 6]])

np.array([[1, 2, 3], [4, 5, 6]])

Question 14.

Given a NumPy array arr = np.array([1, 2, 3, 4, 5]), calculate its variance using np.var(arr). Which of the following statements is true regarding the variance and the standard deviation of the array?

options:

Options:

The variance is equal to the square root of the standard deviation.

The variance is equal to the square of the standard deviation.

The variance is half of the standard deviation.

The variance and the standard deviation are unrelated.

Question 15.

Consider the following NumPy array:

arr = np.array([1, 5, 2, 9, 3, 7])

What is the median of the array and what is the data type of the median?

Options:

Options:

Median: 4.0, Data type: float64

Median: 5, Data type: int64

Median: 4.0, Data type: int64

Median: 5, Data type: float64

Question 16.

Given two NumPy arrays, array1 = np.array([5, 10, 15, 20]) and array2 = np.array([1, 2, 3, 4]), what is the result of calculating the element-wise difference (array1 - array2)?

options:

Options:

[4, 8, 12, 16]

[6, 12, 18, 24]

[5, 20, 45, 80]

[1, 2, 3, 4]

Question 17.

What is the number of dimensions of the NumPy array created by the following code?

import numpy as np
arr = np.array([[[1, 2], [3, 4]], [[5, 6], [7, 8]]])

Options:

Options:

1

2

3

4

Question 18.

What is the sum of all elements in the dot product of the following two NumPy arrays?

import numpy as np
a = np.array([[1, 2], [3, 4]])
b = np.array([[5, 6], [7, 8]])

options:

Options:

80

100

86

92

Question 19.

What is the correct NumPy function to find the common elements between two NumPy arrays arr1 and arr2?

options:

Options:

np.intersect1d(arr1, arr2)

np.union1d(arr1, arr2)

np.setdiff1d(arr1, arr2)

np.setxor1d(arr1, arr2)

Question 20.

What is the result of applying the np.exp() function to the NumPy array arr = np.array([0, 1, 2, 3])?

Options:

[0, 1, 2, 3]

[1, 2.718, 7.389, 20.086]

[0, 1, 4, 9]

[1, 1, 1, 1]

Question 21.

Given a NumPy array arr = np.array([-2, -1, 0, 1, 2]), what is the maximum value of the absolute values of its elements?

Options:

Options:

0

1

2

-2

Question 22.

Consider a NumPy array arr = np.array([1, 2, 3], dtype=np.int8). What will be the data type of the resulting array after subtracting a float scalar value of 1.5 from it (arr - 1.5)?

options:

Options:

int8

int64

float64

float32

Question 23.

Consider the following NumPy array:

arr = numpy.array([[1, 2, 3], [4, 5, 6]], dtype=numpy.int16)

What is the size of the array (number of elements) and the data type of its elements?

Options:

Size: 6, Data type: int16

Size: 6, Data type: int32

Size: 2, Data type: int16

Size: 3, Data type: int32

Question 24.

What is the resulting NumPy array when you create a 3x3 identity matrix using np.eye() and multiply it by the scalar value of 5?

Options:

Options:

[[5. 5. 5.] [5. 5. 5.] [5. 5. 5.]]

[[5. 0. 0.] [0. 5. 0.] [0. 0. 5.]]

[[1. 0. 0.] [0. 1. 0.] [0. 0. 1.]]

[[0. 0. 5.] [0. 5. 0.] [5. 0. 0.]]

Question 25.

What is the correlation coefficient between array1 = np.array([1, 2, 3, 4, 5]) and array2 = np.array([2, 4, 5, 4, 5]) using NumPy?

Options:

Approximately 0.65

Approximately 0.76

Approximately 0.87

Approximately 0.98

Which NumPy skills should you evaluate during the interview phase?

Assessing a candidate's NumPy expertise in a single interview is challenging, but focusing on core skills ensures you identify strong candidates. Look for candidates who understand the fundamental concepts and can apply them practically. Here are key NumPy skills to evaluate during the interview phase.

Which NumPy skills should you evaluate during the interview phase?

Array Creation and Manipulation

An assessment test can help quickly filter candidates on this skill. Our NumPy test includes targeted MCQs to evaluate a candidate's proficiency in array creation and manipulation.

To assess this skill, ask candidates to perform a specific array manipulation task. This allows you to see their approach and problem-solving skills.

How would you create a 3x3 NumPy array with values ranging from 2 to 10, and then transpose it?

Look for candidates who can correctly use np.arange() or np.linspace() to create the initial array and then use .reshape() and .T to transpose it. Bonus points if they can explain the difference between .T and .transpose()!

Indexing and Slicing

Use an assessment test with relevant MCQs to assess indexing and slicing skills. Our NumPy test has questions to check the candidate's knowledge of different indexing techniques.

Ask a question that requires the candidate to extract specific data from an array using indexing and slicing. This will help evaluate their ability to apply these concepts.

Given a NumPy array arr = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10]), how would you extract all elements at odd indices?

The ideal answer should use array slicing with a step: arr[1::2]. Look for candidates who understand how to use the start, stop, and step parameters in slicing.

Broadcasting

An assessment test can quickly identify candidates familiar with broadcasting concepts. Our NumPy test can help assess a candidate's understanding through multiple-choice questions.

Present a scenario where broadcasting can be used to simplify an operation. This assesses their understanding of when and how to apply broadcasting.

How would you add a 1D array [1, 2, 3] to each row of a 2D array of shape (4, 3)?

Look for candidates who understand that NumPy will automatically broadcast the 1D array to match the shape of the 2D array. The correct solution involves simply adding the two arrays together.

Hire Skilled NumPy Developers with Adaface

Looking to hire a NumPy expert? It's important to accurately assess their NumPy skills to ensure they meet your requirements.

Using skills tests is the best way to assess candidates. Consider using Adaface's NumPy test or the broader Python online test to evaluate candidates' abilities.

Once you've used a skills test, you can quickly identify top performers. Shortlist the best applicants and invite them for interviews to assess fit and experience.

Ready to find your next NumPy star? Sign up for a free trial on our assessment platform and start testing today!

Numpy Online Test

30 mins | 15 MCQs

The NumPy online test uses multiple-choice questions to evaluate a candidate's knowledge and skills related to NumPy arrays and operations, indexing and slicing, linear algebra and statistics, broadcasting, ufuncs and vectorization, and data input and output. The test aims to assess the candidate's proficiency in NumPy and their ability to apply numerical computing and data analysis techniques using Python.

Try Numpy Online Test

Download NumPy interview questions template in multiple formats

Download NumPy interview questions template in PNG, PDF and TXT format

Download image Download PDF

Download TXT

NumPy Interview Questions FAQs

What are some NumPy interview questions for freshers?

There are 20 NumPy interview questions for freshers. These questions cover basic NumPy concepts.

What kind of NumPy interview questions are suitable for junior developers?

There are 23 NumPy interview questions for junior developers. These questions test understanding of core NumPy functionalities.

What are the important NumPy interview questions for intermediate-level developers?

There are 22 NumPy interview questions for intermediate-level developers. These explore advanced NumPy features and usage.

What type of NumPy interview questions are recommended for experienced developers?

There are 20 NumPy interview questions for experienced developers. These focus on performance optimization and complex problem-solving using NumPy.

Why is NumPy important in data science and software engineering?

NumPy provides tools for numerical computation. This helps in data science and software engineering domains.

40 min skill tests.
No trick questions.
Accurate shortlisting.

We make it easy for you to find the best candidates in your pipeline with a 40 min skills test.

Related posts

Interview Questions

91 Google AdWords Interview Questions to Hire Top Talent

Looking to hire AdWords experts? Use these 91 AdWords interview questions to assess skills and hire top talent for your team.

Interview Questions

87 Market Analysis Interview Questions to Hire Top Talent

Use these Market Analysis interview questions to assess your applicants’ skills and hire top talent for your organization.

Interview Questions

110 IBM MQ interview questions to hire top engineers

Assess your applicants’ IBM MQ skills with our skills test and interview. Use our 110 IBM MQ interview questions.

Interview Questions

89 WebLogic interview questions to hire top engineers

Use these 89 WebLogic interview questions to evaluate skills and hire the right person for your team. Prepare for your next interview now!

Interview Questions

102 Probability Interview Questions to Hire Top Data Scientists

Assess candidates’ probability skills with probability skills tests and back this up with 102 probability interview questions.

Interview Questions

98 Statistics Interview Questions to Hire Top Talent

Looking for Statistics interview questions to assess your applicants’ skills? Use these lists of questions and answers.

Interview Questions

100+ Computer Science Fundamentals Interview Questions

Ace your tech interviews with 100+ Computer Science Fundamentals interview questions for all experience levels. Streamline your hiring process!

Interview Questions

96 Webpack interview questions to hire top engineers

Assess applicants’ Webpack skills with these Webpack interview questions to find top talent. Hire the best engineers for your team.

Interview Questions

91 Azure Data Factory interview questions to hire top engineers

Ace your Azure Data Factory hiring process! Use these 91 interview questions to evaluate candidates' skills and build your data team.

Free resources

Python Developer Job Description

Find out what you need to include in your Python Developer job description.

Software Engineer Job Description

Find out what you need to include in your Software Engineer job description.

Data Scientist Job Description

Find out what you need to include in your Data Scientist job description.

Application Developer Job Description

Find out what you need to include in your Application Developer job description.

Software Developer Job Description

Find out what you need to include in your Software Developer job description.

C Developer Job Description

Find out what you need to include in your C developer job description.

customers across world

Join 1200+ companies in 80+ countries.

Try the most candidate friendly skills assessment tool today.

GET STARTED FOR FREE

g2 badges

40 min tests.
No trick questions.
Accurate shortlisting.

Pricing

Features

Integrations

AI Resume Parser

Singapore (HQ)
32 Carpenter Street, Singapore 059911

Contact: +65 9447 0488
India
WeWork Prestige Atlanta, 80 Feet Main Road, Koramangala 1A Block, Bengaluru, Karnataka, 560034
Contact: +91 6305713227