NumPy Array Manipulation: Indexing, Slicing, Reshaping, Joining, and Splitting

In our previous deep-dive, we explored the hidden memory costs of standard Python lists and learned how to generate lightning-fast, fixed-type NumPy arrays from scratch.
But generating data is only the very first step. Data manipulation in Python is virtually synonymous with NumPy array manipulation. Even newer, incredibly popular tools like Pandas are fundamentally built directly on top of the NumPy array.
Whether you are cropping a bounding box out of an image for Computer Vision, appending a new column of features to a dataset, or splitting your data into training and testing sets for a Deep Learning neural network, you will be relying on these foundational array manipulations.
In this comprehensive guide, we will cover six core categories of array operations:
Attributes of Arrays: Determining size, shape, memory consumption, and data types.
Indexing of Arrays: Getting and setting the value of individual array elements.
Slicing of Arrays: Getting and setting smaller subarrays within a larger array.
Reshaping of Arrays: Changing the dimensional structure of an array.
Joining Arrays: Combining multiple distinct arrays into a single structure.
Splitting Arrays: Breaking a single array down into multiple smaller arrays.
Let's begin by generating some sample data.
1. NumPy Array Attributes: Inspecting Your Data
Before we manipulate arrays, we need to generate a few standard multi-dimensional arrays. We will use NumPy's random number generator.
Pro-Tip: The Random Seed Whenever you generate random data for machine learning, you should always set a seed. This ensures that the pseudo-random number generator produces the exact same "random" arrays every single time the code is run. This is critical for reproducibility when debugging models.
import numpy as np
# Seed the generator for reproducibility
np.random.seed(0)
# Generate three different arrays
x1 = np.random.randint(10, size=6) # 1D array (Vector)
x2 = np.random.randint(10, size=(3, 4)) # 2D array (Matrix)
x3 = np.random.randint(10, size=(3, 4, 5)) # 3D array (Tensor/Volume)
Every NumPy array comes with built-in attributes that allow you to instantly inspect its structure.
Dimensional Attributes
ndim: The number of dimensions (axes).shape: A tuple representing the exact size of each dimension.size: The total number of individual elements across the entire array.
print("x3 ndim: ", x3.ndim)
print("x3 shape:", x3.shape)
print("x3 size: ", x3.size)
# Output:
# x3 ndim: 3
# x3 shape: (3, 4, 5)
# x3 size: 60
Memory Attributes
Knowing exactly how much RAM your dataset consumes is a vital skill. NumPy provides instant access to this metadata:
dtype: The exact data type of the elements (e.g.,int64).itemsize: The size (in bytes) of a single array element.nbytes: The total size (in bytes) of the entire array.
print("dtype:", x3.dtype)
print("itemsize:", x3.itemsize, "bytes")
print("nbytes:", x3.nbytes, "bytes")
# Output:
# dtype: int64
# itemsize: 8 bytes
# nbytes: 480 bytes
Mathematical check: nbytes is exactly equal to itemsize multiplied by size (8 x 60 = 480).
2. Array Indexing: Accessing Single Elements
If you are familiar with standard Python list indexing, NumPy's 1D indexing will feel entirely natural. It uses a zero-based index system.
One-Dimensional Indexing
# Our array: [5, 0, 3, 3, 7, 9]
print(x1[0]) # Output: 5 (The first element)
print(x1[4]) # Output: 7 (The fifth element)
You can also use negative indices to count backward from the end of the array. This is incredibly useful in time-series data when you want the "most recent" entry.
print(x1[-1]) # Output: 9 (The last element)
print(x1[-2]) # Output: 7 (The second to last element)
Multi-Dimensional Indexing (The NumPy Way)
This is where NumPy diverges from standard Python. If you have a list of lists in Python, accessing a nested element requires chaining brackets: my_list[0][1].
NumPy arrays use a much cleaner comma-separated tuple of indices.
# Our 2D array (x2):
# [[12, 5, 2, 4],
# [ 7, 6, 8, 8],
# [ 1, 6, 7, 7]]
print(x2[0, 0]) # Output: 12 (Row 0, Column 0)
print(x2[2, 0]) # Output: 1 (Row 2, Column 0)
print(x2[2, -1]) # Output: 7 (Row 2, Last Column)
Modifying Values and The Silent Truncation Pitfall
You can use standard index notation to overwrite elements.
x2[0, 0] = 12
⚠️ DANGER: The Fixed-Type Truncation Trap Unlike Python lists, NumPy arrays have a fixed data type. If you try to insert a floating-point value into an integer array, NumPy will silently truncate the decimal without throwing an error or warning.
# x1 is an integer array
x1[0] = 3.14159
print(x1)
# Output: [3, 0, 3, 3, 7, 9]
Notice that 3.14159 became 3. If you do not monitor your dtypes, this silent truncation can completely ruin mathematical accuracy in a machine learning model!
3. Array Slicing: Accessing Subarrays
To access an entire sub-section of an array, we use slice notation, marked by the colon (:) character. The syntax universally follows this pattern:
x[start:stop:step]
If any of these are unspecified, they default to start=0, stop=size of dimension, and step=1.
One-Dimensional Subarrays
x = np.arange(10)
# Array: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
print(x[:5]) # First five elements: [0, 1, 2, 3, 4]
print(x[5:]) # Elements after index 5: [5, 6, 7, 8, 9]
print(x[4:7]) # Middle subarray: [4, 5, 6]
print(x[::2]) # Every other element (step by 2): [0, 2, 4, 6, 8]
print(x[1::2]) # Every other element, starting at index 1: [1, 3, 5, 7, 9]
Reversing an Array: A highly elegant trick in Python/NumPy is using a negative step value. When the step is negative, the defaults for start and stop are swapped, giving you a perfectly reversed array instantly.
print(x[::-1]) # All elements, reversed: [9, 8, 7, 6, 5, 4, 3, 2, 1, 0]
Multi-Dimensional Subarrays
Multi-dimensional slices follow the exact same logic, simply separated by commas.
# First two rows, first three columns
print(x2[:2, :3])
# Output:
# [[12, 5, 2],
# [ 7, 6, 8]]
# All rows, every other column
print(x2[:3, ::2])
# Output:
# [[12, 2],
# [ 7, 8],
# [ 1, 7]]
# Reversing an entire 2D matrix (both rows and columns reversed)
print(x2[::-1, ::-1])
The Power of No-Copy Views
In standard Python lists, slicing creates a copy of the data. If you modify the slice, the original list remains untouched. NumPy array slices return views rather than copies. When you extract a subarray, you are simply looking at the exact same physical memory buffer through a smaller window. Modifying the slice modifies the original dataset! This is incredibly efficient for processing massive datasets "in-place" without crashing your RAM.
(If you explicitly need an isolated copy, use the .copy() method: x2[:2, :2].copy())
4. Reshaping Arrays
In machine learning, algorithms are incredibly strict about the dimensional shape of the data they receive. For example, Scikit-Learn expects a 2D matrix of features (samples, features), even if you only have one feature.
The most flexible way to alter dimensional structure is the reshape() method.
# Put the numbers 1 through 9 into a 3x3 grid
grid = np.arange(1, 10).reshape((3, 3))
print(grid)
# Output:
# [[1, 2, 3],
# [4, 5, 6],
# [7, 8, 9]]
Note: For reshape to work, the initial size must exactly match the reshaped size (9 = 3 x 3).
1D to 2D Conversion (Row and Column Vectors)
Converting a flat 1D array into a 2D row or column vector is a daily task in data engineering. You can use reshape(), or the visually explicit np.newaxis keyword.
x = np.array([1, 2, 3]) # Currently a 1D array of shape (3,)
# Convert to a 1x3 Row Vector
x[np.newaxis, :]
# Output: array([[1, 2, 3]])
# Convert to a 3x1 Column Vector
x[:, np.newaxis]
# Output:
# array([[1],
# [2],
# [3]])
5. Joining Arrays: Concatenation and Stacking
Often, you will have multiple datasets that you need to merge. For instance, combining data from two different sensors, or adding a new column of engineered features to an existing matrix.
np.concatenate
The most basic joining routine is np.concatenate. It takes a tuple or list of arrays as its first argument.
x = np.array([1, 2, 3])
y = np.array([3, 2, 1])
# Joining two 1D arrays
np.concatenate([x, y])
# Output: array([1, 2, 3, 3, 2, 1])
# You can join more than two at once!
z = [99, 99, 99]
np.concatenate([x, y, z])
# Output: array([ 1, 2, 3, 3, 2, 1, 99, 99, 99])
When concatenating 2D arrays, you must pay attention to the axis parameter.
axis=0(the default) stacks them vertically (adding rows).axis=1stacks them horizontally (adding columns).
grid = np.array([[1, 2, 3],
[4, 5, 6]])
# Concatenate along the first axis (axis=0, vertical)
np.concatenate([grid, grid])
# Output:
# [[1, 2, 3],
# [4, 5, 6],
# [1, 2, 3],
# [4, 5, 6]]
# Concatenate along the second axis (axis=1, horizontal)
np.concatenate([grid, grid], axis=1)
# Output:
# [[1, 2, 3, 1, 2, 3],
# [4, 5, 6, 4, 5, 6]]
Stacking with Mixed Dimensions (vstack and hstack)
np.concatenate can be strict and confusing when you are trying to combine arrays of different dimensions (like putting a 1D array on top of a 2D matrix). For these tasks, it is vastly cleaner to use np.vstack (vertical stack) and np.hstack (horizontal stack).
x = np.array([1, 2, 3])
grid = np.array([[9, 8, 7],
[6, 5, 4]])
# Vertically stack a 1D array onto a 2D grid
np.vstack([x, grid])
# Output:
# [[1, 2, 3],
# [9, 8, 7],
# [6, 5, 4]]
# Horizontally stack a column vector to a 2D grid
y = np.array([[99],
[99]])
np.hstack([grid, y])
# Output:
# [[ 9, 8, 7, 99],
# [ 6, 5, 4, 99]]
(There is also np.dstack which stacks arrays along the third axis, representing depth.)
6. Splitting Arrays
The exact opposite of concatenation is splitting. In Machine Learning, this is the fundamental operation used to break a massive dataset into a "Training Set" and a "Testing Set", or to separate your Features (X) from your Target Labels (y).
The routines are np.split, np.hsplit (horizontal), and np.vsplit (vertical).
Instead of telling NumPy how many arrays you want, you pass a list of indices representing the split points.
The Golden Rule of Splitting:
Nsplit points will always lead toN + 1subarrays.
x = [1, 2, 3, 99, 99, 3, 2, 1]
# We pass two split points (index 3 and index 5).
# This results in 3 separate arrays.
x1, x2, x3 = np.split(x, [3, 5])
print(x1) # Elements up to index 3 (not inclusive): [1, 2, 3]
print(x2) # Elements from index 3 up to index 5: [99, 99]
print(x3) # Elements from index 5 to the end: [3, 2, 1]
Splitting Multi-Dimensional Grids
The specialized directional splitters (vsplit and hsplit) are perfect for 2D matrices.
grid = np.arange(16).reshape((4, 4))
# grid is:
# [[ 0, 1, 2, 3],
# [ 4, 5, 6, 7],
# [ 8, 9, 10, 11],
# [12, 13, 14, 15]]
# Split vertically after the 2nd row (index 2)
upper, lower = np.vsplit(grid, [2])
print(upper)
# [[0 1 2 3]
# [4 5 6 7]]
print(lower)
# [[ 8 9 10 11]
# [12 13 14 15]]
# Split horizontally after the 2nd column (index 2)
left, right = np.hsplit(grid, [2])
print(left)
# [[ 0 1]
# [ 4 5]
# [ 8 9]
# [12 13]]
(Similarly, np.dsplit will split 3D arrays along the third depth axis).
Free Resources to Dive Deeper
Mastering these manipulations takes practice. If you want to test these exact functions and read more about the computer science behind them, check out these free resources:
Official NumPy Documentation - Indexing on ndarrays: The definitive guide to how NumPy handles complex slicing, indexing, and no-copy views.
Official NumPy Documentation - Array Manipulation Routines: A complete cheat sheet of every single function used to reshape, join, or split arrays.
Python Data Science Handbook - The Basics of NumPy Arrays: A fantastic, free, interactive Jupyter Notebook chapter that walks through these exact concatenation and splitting techniques.
Hmm, I think i have a good reading speed :0





