NumPy#

Overview#

In this class, you will be writing algorithms with linear algebra operations. Use NumPy to do this efficiently. The NumPy package in Python is a wrapper for a parallelized, internally optimized implementation of common matrix operations in C++. Vectorizing your code in NumPy will improve your code’s speed, performance, and elegance.

Working with Arrays#

Here are some common NumPy operations that you will use in this class.

Create a Matrix in NumPy#

import numpy as np
# Create the array:
# [[1, 2, 3]
#  [4, 5, 6]]
a = np.array([[1, 2, 3], [4, 5, 6]])

Calculate the Shape of a Matrix in NumPy#

Consider the same array as in the example above. The shape of a can be calculated with a.shape, which returns (2, 3).

Size of a Matrix in NumPy#

The size of a matrix represents the total number of elements in the array. For the same array a, the size can be calculated with a.size, which returns 6 because the matrix has 2 rows and 3 columns.

Example: Shape and Size of a Matrix#

import numpy as np
# Define the matrix
a = np.array([[1, 2, 3], [4, 5, 6]])

# Get the shape and size
shape = a.shape  # Output: (2, 3)
size = a.size    # Output: 6

print(f"Shape: {shape}")
print(f"Size: {size}")

Converting Row Vector to Column Vector#

import numpy as np
A = np.array([1, 2, 3, 4])
B = A.reshape(-1, 1)

# A = [1, 2, 3, 4]
# B = [[1],
#      [2],
#      [3],
#      [4]]

Matrix Multiplication#

We can use either the infix (@) or the function (np.matmul) operator for matrix multiplication. The infix operator is often preferred for its simplicity.

Infix:

import numpy as np
A = np.array([[1, 2, 3], [4, 5, 6]])
B = np.array([[1, 2], [3, 4], [5, 6]])
C = A @ B

# C = [[22, 28],
#      [49, 64]]

Function (np.matmul):

import numpy as np
A = np.array([[1, 2, 3], [4, 5, 6]])
B = np.array([[1, 2], [3, 4], [5, 6]])
C = np.matmul(A, B)

# C = [[22, 28],
#      [49, 64]]

Remember that the order of matrix multiplication matters: the number of columns of the first matrix must match the number of rows of the second matrix.

Element-wise Multiplication#

Not to be confused with matrix multiplication, np.multiply (or * operator) multiplies two arrays element-wise. The arrays must have the same dimensions, or broadcasting rules must apply.

import numpy as np
A = np.array([[1, 2], [3, 4]])
B = np.array([[1, 2], [3, 4]])
C = A * B  # Can also use np.multiply(A, B)

# C = [[1, 4],
#      [9, 16]]

You can also multiply an array by a scalar:

import numpy as np
A = np.array([[1, 2], [3, 4]])
A *= 3

# A = [[3, 6],
#      [9, 12]]

Element-wise Division#

Similarly to np.multiply, np.divide (or / operator) divides two arrays element-wise.

import numpy as np
A = np.array([[1, 2], [3, 4]])
B = np.array([[1, 2], [3, 4]])
C = A / B  # Can also use np.divide(A, B)

# C = [[1., 1.],
#      [1., 1.]]

You can also divide an array by a scalar:

import numpy as np
A = np.array([[1, 2], [3, 4]])
A = A / 3

# A = [[0.33333333, 0.66666667],
#      [1.0, 1.33333333]]

Stacking Matrices#

NumPy allows you to stack arrays vertically or horizontally using np.vstack and np.hstack.

Vertical Stacking:

import numpy as np
A = np.array([[1, 2, 3], [4, 5, 6]])
B = np.array([[7, 8, 9]])
C = np.vstack((A, B))

# C = [[1, 2, 3],
#      [4, 5, 6],
#      [7, 8, 9]]

Horizontal Stacking:

import numpy as np
A = np.array([[1, 2], [3, 4]])
B = np.array([[5, 6], [7, 8]])
C = np.hstack((A, B))

# C = [[1, 2, 5, 6],
#      [3, 4, 7, 8]]

Vectorization and Broadcasting#

Vectorization allows you to apply operations on entire arrays without using explicit loops, making your code more concise and faster.

Broadcasting automatically expands arrays of different shapes to make their dimensions compatible for element-wise operations. For more details, refer to the NumPy broadcasting rules.

Broadcasting Rules (Simplified):

  1. If arrays have a different number of dimensions, the shape of the smaller array is padded with ones on the left.

  2. Arrays are compatible for broadcasting if, in all dimensions, their sizes are either equal or one of them is 1.

  3. Broadcasting proceeds from the trailing dimensions (i.e., starting from the last dimension).

Example of Vectorization:

import numpy as np
A = np.array([1, 2, 3, 4, 5])
B = A ** 2  # Squaring each element

# B = [1, 4, 9, 16, 25]

Example of Broadcasting:

import numpy as np
A = np.array([[1, 2, 3], [4, 5, 6]])
B = np.array([1, 0, 1])
C = A + B  # B is broadcasted to match A's shape

# C = [[2, 2, 4],
#      [5, 5, 7]]

Broadcasting simplifies arithmetic operations between arrays of different shapes and is a fundamental feature for efficient numerical computations in NumPy.

Masking#

Numpy lets you bulk select elements in an array with an array, which can be very useful for filtering based on certain conditions.

import numpy as np
A = np.array([[1,2,3],
              [4,5,6],
              [7,8,9])
B = np.array([2,1])
C = A[B] # Selects the rows of A corresponding to each index of B
# C = [[7,8,9],
#      [4,5,6]]
D = A[:,B] # Selects the columns of A corresponding to each index of B
# D = [[3,2],
#      [6,5],
#      [9,8]]

To extend this to filtering, consider indexing into an array with a boolean list of True and False, where each element is included iff it’s corresponding index is True.

Filtering Example

import numpy as np
A = np.array([[0,1,2],
              [3,4,5],
              [6,7,8]])
B = A[A%2==0] # Selects the even elements of A
# B = [0,2,4,6,8]
# Can get more complex:
C = A[:,A[-1]%2==0] # Selects the columns of A that have an even number in their last row
# C = [[0,2],
#      [3,5],
#      [6,8]]

If you would instead like to do a find-and-replace, np.where is quite useful. It takes the form np.where(CONDITION, IF, ELSE), and produces an array with the indexed values of IF for all indices where CONDITION is True, and ELSE at the others.

import numpy as np
A = np.arange(-5,6)
B = np.where(A>0,A,-A)
# B = [5,4,3,2,1,0,1,2,3,4,5]

Loading Data into NumPy#

NumPy provides convenient functions for loading data from text files (such as .txt or .csv) directly into arrays.

Loading a Simple Text File with np.loadtxt#

Use np.loadtxt when your file contains numeric data arranged in a regular, rectangular format with no missing values.

import numpy as np

# Example file (data.txt):
# 1.0  2.0  3.0
# 4.0  5.0  6.0

data = np.loadtxt("data.txt")

# data =
# [[1. 2. 3.]
#  [4. 5. 6.]]

You can also specify a delimiter, such as a comma for CSV files:

data = np.loadtxt("data.csv", delimiter=",")

Loading Named Columns#

If your file includes a header row with column names, you can preserve them by enabling names=True:

import numpy as np

# Example file (data_named.csv):
# x,y,z
# 1,2,3
# 4,5,6

data = np.genfromtxt("data_named.csv", delimiter=",", names=True)

# data["x"] -> array([1., 4.])
# data["y"] -> array([2., 5.])
# data["z"] -> array([3., 6.])

This loads the data as a structured array, allowing you to access columns by name.

Loading Data with Mixed Data Types Using np.genfromtxt#

Use np.genfromtxt when different columns in your file have different data types (for example, strings and numbers).

import numpy as np

# Example file (students.csv):
# name,age,gpa
# Alice,20,3.8
# Bob,22,3.5
# Carol,19,3.9

data = np.genfromtxt(
    "students.csv",
    delimiter=",",
    names=True,
    dtype=None,
    encoding="utf-8"
)

Because dtype=None is specified, NumPy automatically infers the correct data type for each column.

# Access individual columns
names = data["name"]   # array(['Alice', 'Bob', 'Carol'], dtype='<U5')
ages  = data["age"]    # array([20, 22, 19])
gpas  = data["gpa"]    # array([3.8, 3.5, 3.9])

Each row is stored as a structured record, and each column can be accessed by name.

In summary:

  • Use np.loadtxt for clean, purely numeric data.

  • Use np.genfromtxt when dealing with column names or varying dtypes.

NumPy Documentation#

It is often useful to refer to the official NumPy documentation when unsure about which functions to use. For NumPy beginners, the beginner guide is quite helpful, and the user guide is a great resource for documentation about specific NumPy functions.

Understanding Function Details#

The documentation will detail the function’s parameters, return types, and possible errors. For example, the dot function takes two NumPy array parameters and returns their dot product. It may throw a ValueError if the dimensions of the input arrays are incompatible.

The documentation may also include:

  • Links to related functions: Useful if you need variations of a function.

  • Notes: Additional explanations or special cases.

  • Examples: Practical code snippets that demonstrate how to use the function.

NumPy documentation: Dot function

By exploring these resources, you can efficiently learn and implement NumPy functions in your code.