Tensors: The Data Containers of Machine Learning

If you are diving into Machine Learning, you will immediately encounter the word "Tensor." From TensorFlow to PyTorch, everything revolves around them. But what exactly is a tensor?
At its simplest, a tensor is a container for storing numbers. While humans understand data through text, images, or sounds, machine learning models only understand numbers. Tensors are the standardized mathematical structures we use to organize these numbers so algorithms can process them.
Before we look at the different types of tensors, let's define three critical terms you will see everywhere in ML: Rank, Axes, and Shape.
The Anatomy of a Tensor: Rank, Axes, and Shape
Axis (plural: Axes): A specific dimension of a tensor. For example, a spreadsheet has two axes: rows and columns.
Rank (or Number of Dimensions): The total number of axes a tensor has. Number of Axes = Rank = Dimension of the Tensor.
Shape: A tuple (a sequence of numbers) that tells us exactly how many elements exist along each axis.
Size: The total number of individual elements inside the tensor. You calculate this by multiplying all the values in the shape together (e.g., a shape of
(3, 4)has a size of12).
Let's explore tensors from the simplest 0D point to the massive 5D structures used in advanced AI.
0D Tensors: The Scalar
A 0D (Zero-Dimensional) tensor is known as a Scalar. It stores a single, isolated numeric value. It has zero axes, zero rank, and an empty shape.
Think of it as a single point of data, like the temperature outside right now: 32.
import numpy as np
# Creating a 0D tensor (Scalar)
scalar_tensor = np.array(3)
print("Value:", scalar_tensor)
print("Number of Dimensions (Rank):", scalar_tensor.ndim)
print("Shape:", scalar_tensor.shape)
# Output:
# Value: 3
# Number of Dimensions (Rank): 0
# Shape: ()
1D Tensors: The Vector
When you group multiple scalars together into a list, you create a 1D tensor, commonly called a Vector (or a 1D array). It has exactly one axis.
⚠️ A Crucial Distinction: There is a common trap here! If a vector has 4 elements (like
[1, 2, 3, 4]), mathematicians often call it a "4-dimensional vector" because it exists in a 4D space. However, in Machine Learning, this is still a 1D tensor. The tensor dimension (rank) is 1 because it only has one axis, even though that axis contains 4 elements.
import numpy as np
# Creating a 1D tensor (Vector)
vector_tensor = np.array([1, 2, 3, 4])
print("Value:\n", vector_tensor)
print("Number of Dimensions (Rank):", vector_tensor.ndim)
print("Shape:", vector_tensor.shape) # Notice it has 4 elements on its 1 axis
# Output:
# Value: [1 2 3 4]
# Number of Dimensions (Rank): 1
# Shape: (4,)
2D Tensors: The Matrix
If you group multiple vectors together, you get a 2D tensor, known as a Matrix. A matrix has two axes: rows and columns. This is exactly how data looks in a standard Excel spreadsheet or a CSV file.
import numpy as np
# Creating a 2D tensor (Matrix)
matrix_tensor = np.array([
[1, 2, 3],
[4, 5, 6]
])
print("Number of Dimensions (Rank):", matrix_tensor.ndim)
print("Shape:", matrix_tensor.shape) # 2 rows, 3 columns
print("Size (Total elements):", matrix_tensor.size) # 2 * 3 = 6
# Output:
# Number of Dimensions (Rank): 2
# Shape: (2, 3)
# Size (Total elements): 6
3D Tensors to 5D Tensors
As we keep grouping lower-dimensional tensors, we build higher-dimensional ones. The logic remains the same:
3D Tensor: A grouping of 2D matrices. Visually, think of this as a cube or a "cuboid" of numbers. It has a row, a column, and a depth axis.
4D Tensor: A grouping (or vector) of 3D tensors.
5D Tensor: A grouping (or matrix) of 4D tensors.
While theoretically, you can have infinite dimensions, in everyday Machine Learning, 5D or 6D is usually the absolute maximum we deal with. Let's look at how these translate to real-world data.
Real-World Examples: What Data Looks Like at Each Dimension
It is much easier to understand tensors when you map them to actual data domains.
1D & 2D Tensors: Standard Tabular Data
1D Example: A single row of data about a house (e.g.,
[bedrooms, bathrooms, square_feet, price]).2D Example: A full dataset of 1,000 houses. The shape would be
(1000, 4). This is a 2D tensor because it is a collection of 1,000 1D vectors.
3D Tensors: Natural Language Processing (NLP)
In NLP, we convert text into numbers (vectorization) so the model can read it. Imagine we have a batch of sentences.
We have 128 sentences in our batch.
We standardize each sentence to be exactly 50 words long (sequence length).
Every single word is converted into a vector of 300 numbers (word embeddings) to capture its meaning.
The resulting data structure is a 3D tensor with the shape: (128, 50, 300). It is a collection of 2D matrices (where each matrix represents a single sentence).
4D Tensors: Computer Vision (Images)
Images are essentially grids of pixels, and every pixel is a numeric value. If you have a standard color image (RGB), it actually has 3 layers of color (Red, Green, and Blue channels).
An image with a resolution of 1200x800 pixels is stored as:
(3 channels, 1200 height, 800 width).In ML, we rarely process one image at a time. We process batches. If we load a batch of 32 images, our tensor becomes 4D:
(32, 3, 1200, 800).
5D Tensors: Video Processing
Videos are just sequences of images (frames) playing at a very fast rate. Because a single image is 3D (Channels, Height, Width), a single video becomes a 4D tensor (Frames, Channels, Height, Width).
Let's break down the math for a 5D tensor involving a batch of videos:
Resolution: Let's take a 480p video (480x720 pixels).
Color: It's RGB, so 3 channels.
Time: A 60-second video at 30 frames per second (fps) contains \(60 \times 30 = 1,800\) frames.
- One Single Video Tensor Shape:
(1800, 3, 480, 720)-> This is a 4D tensor.
Now, if we want to train our model on a batch of 4 videos at the same time, we group them together into a 5D tensor:
- Final 5D Tensor Shape:
(4, 1800, 3, 480, 720)
The Memory Cost: This is where things get heavy! Let's calculate the size. \(4 \text{ videos} \times 1800 \text{ frames} \times 3 \text{ channels} \times 480 \text{ height} \times 720 \text{ width} = 7,464,960,000\) individual numeric elements. If we store each number as a standard 32-bit float (which takes 4 bytes of memory), this single 5D tensor will consume roughly 29.8 Gigabytes of RAM. This is exactly why training video-based AI requires incredibly powerful GPUs!
Free Resources to Learn More
If you want to dig deeper into tensors and practice manipulating them in code, here are some excellent free resources:
NumPy Quickstart Tutorial: NumPy is the foundational library for tensor/array math in Python. Their official guide on array basics is fantastic.
PyTorch "Tensors" Tutorial: PyTorch is an industry-standard ML framework. This short, interactive tutorial shows exactly how tensors are used directly in machine learning.
TensorFlow Core: Introduction to Tensors: Google's deep dive into how their framework handles multidimensional arrays, complete with visual diagrams.
Tech Is Exhilarating !






