PyTorch#

PyTorch is an open source optimized tensor library for deep learning using GPUs and CPUs. It is a deep learning framework that provides a whole stack to preprocess data, model data and deploy the models in cloud. PyTorch leverages CUDA (Compute Unified Device Architecture) which is an API (Application Programming Interface) developed by Nvidia for general computing on GPUs (Graphical Processing Units). This enables us to accelerate the modelling process due to faster computations because CUDA enables us to run the computations on GPU (if available).

import torch
import numpy as np
torch.__version__
'2.7.0+cu126'

What is a Tensor?#

Tensors are simply arrays that are used to represent data similar to NumPy’s ndarrays, except that tensors can run on GPUs or other hardware accelerators. In PyTorch, we use tensors to encode the inputs and outputs of a model, as well as the model’s parameters.

A scalar is a single number, and in tensor terms it is a zero dimension tensor.

scalar = torch.tensor(1)
scalar
tensor(1)

Dimensions of a tensor

scalar.ndim
0

Converting a tensor to a python number

scalar.item()
1

A vector is a single dimension tensor

# Vector
vector = torch.tensor([7, 7])
vector
tensor([7, 7])
vector.ndim
1

A matrix is a 2 dimensional tensor.

# Matrix
M = torch.tensor([[7, 8],
                [9, 10]])
M
tensor([[ 7,  8],
        [ 9, 10]])
M.ndim
2

Creation of a tensor

TENSOR = torch.tensor([[[1,2,3],
                        [4,5,6],
                        [7,8,9]]])
TENSOR.shape
torch.Size([1, 3, 3])

Random Tensors

random_tensor = torch.rand(3,4)
random_tensor
tensor([[0.2737, 0.8310, 0.8006, 0.9075],
        [0.5493, 0.3890, 0.8400, 0.7622],
        [0.9302, 0.1014, 0.2331, 0.7443]])

Random tensor with similar shape to an image tensor. \(1^{st}\) dimension is the color channels, \(2^{nd}\) dimension is the height and \(3^{rd}\) dimension is the width.

rand_image_tensor = torch.rand(size = (3,224,224))
rand_image_tensor
tensor([[[0.0630, 0.5679, 0.6759,  ..., 0.1176, 0.3749, 0.2978],
         [0.2935, 0.5852, 0.1575,  ..., 0.7756, 0.2829, 0.5329],
         [0.4083, 0.8007, 0.7083,  ..., 0.3091, 0.9156, 0.3664],
         ...,
         [0.6021, 0.0944, 0.5473,  ..., 0.9868, 0.7072, 0.1648],
         [0.8186, 0.7864, 0.2941,  ..., 0.5029, 0.9712, 0.1800],
         [0.5375, 0.7352, 0.7405,  ..., 0.3646, 0.8880, 0.7781]],

        [[0.6218, 0.1442, 0.1092,  ..., 0.1807, 0.3614, 0.6528],
         [0.0428, 0.0704, 0.7689,  ..., 0.8425, 0.2056, 0.6117],
         [0.4680, 0.7592, 0.1936,  ..., 0.0849, 0.1556, 0.4307],
         ...,
         [0.6817, 0.0646, 0.3813,  ..., 0.0628, 0.0771, 0.3918],
         [0.5976, 0.8478, 0.1157,  ..., 0.7134, 0.7329, 0.4841],
         [0.8081, 0.1711, 0.7638,  ..., 0.7552, 0.1960, 0.0996]],

        [[0.9566, 0.5724, 0.8422,  ..., 0.6180, 0.0205, 0.8487],
         [0.8026, 0.5053, 0.9624,  ..., 0.6132, 0.6074, 0.7101],
         [0.4665, 0.5141, 0.3195,  ..., 0.3887, 0.0969, 0.9523],
         ...,
         [0.3810, 0.6603, 0.6463,  ..., 0.0382, 0.3075, 0.6614],
         [0.3321, 0.7622, 0.5076,  ..., 0.0762, 0.7127, 0.6388],
         [0.7218, 0.9143, 0.9537,  ..., 0.3262, 0.7709, 0.4682]]])

Zeros and ones

zero_tensor = torch.zeros(size = (3,4))
zero_tensor
tensor([[0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.]])
one_tensor = torch.ones(size = (3,4))
one_tensor
tensor([[1., 1., 1., 1.],
        [1., 1., 1., 1.],
        [1., 1., 1., 1.]])

Default datatype in pytorch is float32

one_tensor.dtype
torch.float32

Creating a range of tensors and tensors-like

range_tensor = torch.arange(0,10)
range_tensor
tensor([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
# creating tensors-like
torch.zeros_like(range_tensor)
tensor([0, 0, 0, 0, 0, 0, 0, 0, 0, 0])

Three most important parmeters when creating tensors are dtype, device and requires_grad.

  1. Some datatypes are specific for GPUs and some are specific for CPUs. Generally if you see torch.cuda anywhere, the tensor is being used for GPU (since Nvidia GPUs use a computing toolkit called CUDA). The different types of bits for datatype has to do with the precision of the value. The higher the precision value (8, 16, 32), the more detail and hence data used to express a number. This matters in deep learning and numerical computing because you’re making so many operations, the more detail you have to calculate on, the more compute you have to use. So lower precision datatypes are generally faster to compute on but sacrifice some performance on evaluation metrics like accuracy (faster to compute but less accurate).

  2. The argument device refers to what device the tensor is saved on. If one of your tensors is on the CPU and the other is on the GPU, you get an error when you perform a computation on them.

  3. requires_grad refers to whether or not to track gradients with the tensor operations

x_tensor = torch.tensor([1.0,2.0,3.0],
             dtype = torch.float32,
             device = 'cpu',
             requires_grad=False,)
print('Datatype of the tensor is ',x_tensor.dtype)
print('Device the tensor is save on is ',x_tensor.device)
Datatype of the tensor is  torch.float32
Device the tensor is save on is  cpu
# converting dtype of a tensor
x16_tensor = x_tensor.type(torch.float16)
x16_tensor.dtype
torch.float16

Basic operations#

# element wise multiplication
torch.mul(x16_tensor,10)
x16_tensor * 10
tensor([10., 20., 30.], dtype=torch.float16)
torch.subtract(x16_tensor,2)
x16_tensor - 2
tensor([-1.,  0.,  1.], dtype=torch.float16)
torch.add(x16_tensor,3)
x16_tensor + 3
tensor([4., 5., 6.], dtype=torch.float16)
torch.divide(x16_tensor,2)
x16_tensor/2
tensor([0.5000, 1.0000, 1.5000], dtype=torch.float16)
rand_tensor = torch.rand(size = (3,3))
# you can use @ to perform matrix multiplication
rand_tensor @ rand_tensor
# you can use the predefined method
torch.matmul(rand_tensor,rand_tensor)
# mm is short form of matmul
torch.mm(rand_tensor,rand_tensor)
tensor([[0.6276, 0.0443, 0.3012],
        [0.5305, 0.1390, 0.6947],
        [0.2866, 0.0977, 0.6306]])

There are several other functions and methods which are more or less similar to the ones on numpy.

In-place operations Operations that store the result into the operand are called in-place. They are denoted by a _ suffix.

rand_tensor.add_(5)
rand_tensor
tensor([[5.0711, 5.1179, 5.7007],
        [5.6805, 5.2314, 5.4194],
        [5.7740, 5.0124, 5.2883]])

PyTorch and NumPy#

Transform numpy array to pytorch tensor

arr = np.arange(1.0,10.0)
arr_tensor = torch.from_numpy(arr)
arr, arr_tensor
(array([1., 2., 3., 4., 5., 6., 7., 8., 9.]),
 tensor([1., 2., 3., 4., 5., 6., 7., 8., 9.], dtype=torch.float64))
arr.dtype, arr_tensor.dtype
(dtype('float64'), torch.float64)

Tensor to numpy array

tensor_numpy = arr_tensor.numpy()
tensor_numpy
array([1., 2., 3., 4., 5., 6., 7., 8., 9.])
tensor_numpy.dtype
dtype('float64')

Setting a random seed. When using a jupyter notebook, you shall set random seed everytime you use random.

torch.manual_seed(42)
<torch._C.Generator at 0x7f423815c510>

Accessing a GPU#

# configuration of the GPU
!nvidia-smi
/bin/bash: line 1: nvidia-smi: command not found

Check for GPU access

# if False it means GPU is not available
torch.cuda.is_available()
False

Its not likely that you always have access to a GPU. So we can set a device agnostic code as below

# setup device agnostic code
device = 'cuda' if torch.cuda.is_available() else 'cpu'
device
'cpu'
# count number of devices
torch.cuda.device_count()
0

Since I am using a mac, I can check what type of accelerator I have.

if torch.accelerator.is_available():
    device = torch.accelerator.current_accelerator()
    print(device)

MPS in the context of PyTorch refers to the Metal Performance Shaders backend.

  • Metal: This is Apple’s proprietary API (Application Programming Interface) for programming their GPUs (Graphics Processing Units). It provides low-level access to the graphics hardware for maximum performance in graphics rendering and parallel computations.

  • Metal Performance Shaders (MPS): This is a framework built on top of Metal. It’s a collection of highly optimized compute and graphics shaders specifically designed to integrate into applications using the Metal API. These shaders are fine-tuned for the unique characteristics of Apple’s GPUs (found in Apple Silicon and some older AMD-based Macs).

  • MPS Backend in PyTorch: PyTorch has integrated MPS as a backend, allowing you to run tensor computations and train neural networks on Apple GPUs. By moving your PyTorch tensors and models to an “mps” device (e.g., torch.device("mps")), you can leverage the power of the Apple GPU for significantly faster computations compared to running on the CPU.

Key benefits of using the MPS backend in PyTorch:

  • Accelerated Training and Inference: Utilizing the GPU’s parallel processing capabilities can drastically reduce the time required for training complex models and performing inference.

  • Optimized Performance: MPS is specifically designed for Apple’s hardware, meaning the operations are highly optimized for their architecture.

  • Ease of Use: PyTorch’s integration allows you to switch to GPU computation with minimal code changes, similar to using CUDA on NVIDIA GPUs.

In summary, MPS in PyTorch enables high-performance training and inference on Apple Silicon and compatible AMD GPUs by utilizing Apple’s Metal Performance Shaders framework.

Putting Tensors and Models on the GPUs#

cpu_tensor = torch.tensor([1,2,3], device=device)
cpu_tensor.device
device(type='cpu')
# use 'to' to change the device. incase of GPU availability it shows cuda with the index of the GPU used
cpu_tensor.to(device)
cpu_tensor
tensor([1, 2, 3])

To convert a tensor on GPU to numpy is not possible. The device has to be changed to CPU and then converted to numpy array.

cpu_tensor.cpu().numpy()
array([1, 2, 3])

Preparing and converting dataset#

The nn module of torch provides all the building blocks for neural networks.

from torch import nn
# subclass the nn.module class that contains all the building blocks required
class LinearRegression(nn.Module):
    def __init__(self, *args, **kwargs) -> None:
        super().__init__(*args, **kwargs)
        self.weights = nn.Parameter(torch.randn(1,requires_grad=True,dtype=torch.float))
        self.bias = nn.Parameter(torch.randn(1,requires_grad=True,dtype=torch.float))

    def forward(self,x: torch.Tensor):
        return x*self.weights + self.bias
torch.manual_seed(42)
la = LinearRegression()
list(la.parameters())
[Parameter containing:
 tensor([0.3367], requires_grad=True),
 Parameter containing:
 tensor([0.1288], requires_grad=True)]
# list of named parameters
la.state_dict()
OrderedDict([('weights', tensor([0.3367])), ('bias', tensor([0.1288]))])