Internal Implementation of Tensors in PyTorch

Tensors serve as the fundamental data structure in PyTorch, enabling efficient computation and manipulation of multi-dimensional arrays. Understanding the underlying implementation of tensors is crucial for comprehending PyTorch's computational graph execution, automatic differentiation, and GPU acceleration capabilities.

In this OpenGenusarticle, we'll delve into the internals of how tensors are implemented in PyTorch, referencing the code from various components of the PyTorch source code repository.

Key Terminologies
PyTorch's tensor implementations
i. TensorImpl.h
ii. TensorOptions.h
iii. StorageImpl.h
Conclusion

Key Terminologies

Some of the keywords used in the below PyTorch files are explained :

Gradients:
Gradients help models predict things more accurately by pointing us in the direction where we need to adjust our model's parameters. In PyTorch, there's a helpful tool called autograd that automatically calculates these gradients for us. So, when we train our neural networks using PyTorch, it handles all the complicated math behind the scenes, making it easier for us to improve our models.
Backpropagation:
The backpropagation algorithm efficiently computes gradients within a computational graph. It operates in two phases: the forward pass, where input data is used to make predictions, and the backward pass, where gradients are calculated by applying the chain rule from the output layer back through the network. These gradients are subsequently utilized to adjust network parameters via optimization techniques such as Adagrad, stochastic gradient descent (SGD), etc.
Loss Function:
A loss function quantifies the difference between the predicted outputs of the model and the actual target. Common types of loss function includes Mean Squared Error(MSE), Cross-Entropy Loss, and so on.
Optimization algorithms :
Optimization algorithms play a crucial role in reducing prediction errors by fine-tuning model parameters like weights, biases, and other neural network parameters. These algorithms work by adjusting the parameters in a direction that opposes the gradient of the loss function, ultimately converging towards optimal parameter values that result in minimal loss.
Tensor Options:
Tensor options within PyTorch enable users to define different settings when initializing tensors. These settings encompass device (CPU or GPU), dtype (data type), requires_grad (tracking gradients), and layout (memory arrangement). By adjusting these tensor options, users can manage tensor storage location, memory representation, and participation in gradient calculations.
Sparse Matrices:
Sparse matrices are matrices that contain a large number of zero values. Storing and manipulating huge matrices with many zeros can be inefficient in terms of memory usage and computational complexity. Sparse matrix formats like Compressed Sparse Row (CSR) and Coordinate (COO) formats are used to efficiently represent and perform operations on sparse matrices. CSR format stores the non-zero values along with row indices and column pointers, while COO format stores the non-zero values along with their row and column indices.
Metal Format for Apple Devices:
PyTorch supports running models on Metal for acceleration on Apple devices. Metal is a low-level GPU programming framework used on Apple platforms (iOS, macOS, tvOS).
MKL-DNN:
PyTorch uses MKL-DNN for efficient CPU computations, offering significant speedups for certain operations. The MKL-DNN (Math Kernel Library for Deep Neural Networks) is an optimized library for deep learning operations, developed by Intel. Tensors stored in the MKL-DNN format are optimized for performance, especially on CPUs.
Autograd:
Autograd serves as PyTorch's automatic differentiation engine, responsible for computing gradients of tensors relative to a scalar value. This functionality allows for the automatic calculation of gradients within any computational graph defined in PyTorch, thereby supporting gradient-based optimization methods such as backpropagation.
Quantized Tensor:
Quantized tensors store data using reduced precision integers instead of floating-point numbers. This reduces memory usage and improves performance, especially on hardware with optimized support for integer operations.
Strided Tensor:
A strided tensor refers to a tensor where the elements are stored in memory in a contiguous manner, with a fixed stride (i.e., step size) between consecutive elements along each dimension.

PyTorch's Tensor implentations

Let us explore some of the common ways the PyTorch codebase uses tensors to enable efficient and flexible manipulation of data and models.

TensorImpl.h

The TensorImpl.h file contains the implementation details for the TensorImpl class, which represents the internal representation of a tensor in PyTorch. Let's explore the key components of this file and how they contribute to the functionality of PyTorch tensors.

Functions

is_sparse_csr() const and is_sparse_compressed():

bool is_sparse_csr() const
  { return layout() == kSparseCsr; }

This function checks whether a tensor is in the Compressed Sparse Row (CSR) format or not. We simply compare the layout of the tensor with the CSR layout and return true if they match, indicating that the tensor is in CSR format.

bool is_sparse_compressed() const { return key_set_.has_all(c10::sparse_csr_ks); }

The above function checks whether a tensor is in any compressed sparse format, which includes CSR (Compressed Sparse Row), CSC (Compressed Sparse Column), BSR (Block Compressed Sparse Row), or BSC (Block Compressed Sparse Column).

is_mkldnn() const:

  bool is_mkldnn() const {
    return key_set_.has_all(c10::mkldnn_ks);
  }

This function returns whether a tensor is stored in the MKL-DNN (Math Kernel Library for Deep Neural Networks) format or not.

AutogradMetaInterface, AutogradMetaFactory and AutogradMetaFactoryRegisterer:

i. AutogradMetaInterface:
This is a pure virtual class defining an interface for autograd metadata. It provides methods to interact with autograd-related properties of tensors.

set_requires_grad(bool requires_grad, at::TensorImpl* self_impl): Sets the required tensor gradient computation.
mutable_grad(): and grad() const returns a mutable reference and constant reference to the gradient tensor associated with this tensor.
set_fw_grad(const at::TensorBase& new_grad, const at::TensorBase& self, uint64_t level, bool is_inplace_op) and fw_grad(uint64_t level, const at::TensorBase& self) const sets and returns the forward gradient tensor for a specific level.

ii. AutogradMetaFactory:
This struct defines an interface for creating instances of AutogradMetaInterface.

make() const is a virtual method to create a new instance of AutogradMetaInterface.
undefined_tensor() const returns a reference to an undefined tensor. This method provides a placeholder tensor for cases where a tensor is expected but not defined.

iii. AutogradMetaFactoryRegisterer:
This struct is used for registering a factory that creates instances of AutogradMetaInterface.
It takes a pointer to an AutogradMetaFactory instance in its constructor and sets it using SetAutogradMetaFactory.

device_type() const :

DeviceType device_type() const {
    TORCH_CHECK(
        device_opt_.has_value(),
        "device_type cannot be run on undefined Tensor");
    return (*device_opt_).type();
}

This function defines a method device_type() that returns the device type of a tensor, checking if the tensor is defined before accessing its device type.

is_metal() const:
This function checks whether the tensor's device is Metal (a graphics API for macOS and iOS).
The below condition checks if there's a device policy specified. If so, it indicates that there's a custom device configuration. In this case, it delegates the check to the is_metal() method of the custom device.
if (C10_UNLIKELY(device_policy_)) { return device_custom().is_metal(); }

If there's no custom device policy specified, it checks if the tensor has a device specified (device_opt_.has_value()) and whether that device's type is Metal (device_opt_->type() == kMetal).
return device_opt_.has_value() && device_opt_->type() == kMetal;

is_quantized() const:
This function checks if the tensor is quantized by verifying if its key_set_ contains the Quantized dispatch key. If present, it returns true; otherwise, it returns false.
return key_set_.has_all(quantized_ks);
is_nested() const:

bool is_nested() const {
    return key_set_.has(DispatchKey::NestedTensor);
  }

This function defines a method is_nested() that checks whether the tensor is a nested tensor. It allows for the creation of tensors containing other tensors as elements, enabling the representation of complex data structures such as trees or graphs.

support_as_strided() const:

 inline bool support_as_strided() const {
    if (is_nested()) {
      return false;
    }
    if (key_set_.has(DispatchKey::Functionalize)) {
      return false;
    }
    return device().supports_as_strided();
  }

The support_as_strided() method checks if the tensor supports the as_strided() operation, which creates a new view of the tensor with different shape and strides without copying the data. It returns true if the tensor is not nested, does not have the Functionalize dispatch key, and if its device supports the operation; otherwise, it returns false.

is_contiguous():

bool is_contiguous(at::MemoryFormat memory_format = at::MemoryFormat::Contiguous) const {
    if (C10_UNLIKELY(matches_policy(SizesStridesPolicy::CustomStrides))) {
        return is_contiguous_custom(memory_format);
    }
    return is_contiguous_default(memory_format);
}

The above snippet defines a method is_contiguous() that checks if the tensor is laid out in contiguous memory.

Similarly, is_contiguous_custom(memory_format) checks if the tensor matches a custom strides policy. If so, it calls to determine if the tensor is contiguous based on the custom strides policy.
If the tensor does not match a custom strides policy, it calls is_contiguous_default(memory_format) to determine if the tensor is contiguous based on the default policy.

size(int64_t d), stride(int64_t d) and sym_size(int64_t d):
These functions return the size, stride and symbolic size of the tensor at a specified dimension, considering customizations and policies. They account for various customization levels and policies.

TensorOptions.h

The TensorOptions.h file provides the implementation for TensorOptions, which represents the configuration options for creating new tensors in PyTorch. Let's explore the key components of this file and how they contribute to tensor creation and initialization.

dtype():

  bool has_dtype() const noexcept {
    return has_dtype_;
  }

TensorOptions& dtype(at::ScalarType dtype) noexcept {
  dtype_ = dtype;
  return *this;
}

This function returns whether the data type is specified and sets the data type (scalar type) for the tensor options. It allows configuring the tensor to use a specific data type for its elements.

device():

  Device device() const noexcept {
    return device_or_default(device_opt());
  }

inline TensorOptions device(Device device) {
  return TensorOptions().device(device);
}

The above functions creates a new TensorOptions object and sets its device to the specified device. The device() const function retrieves the device associated with the tensor. It returns the device stored in the device_opt_ member variable, using the device_or_default() function to provide a default device if none is explicitly set.

dispatchKeyToTensorOptions(DispatchKey dispatch_key):

inline TensorOptions dispatchKeyToTensorOptions(DispatchKey dispatch_key) {
  return TensorOptions()
      .layout(dispatchKeyToLayout(dispatch_key))
      .device(dispatchKeyToDeviceType(dispatch_key));
}

The function dispatchKeyToLayout, maps a given dispatch key to a corresponding layout. The layout indicates the memory layout format associated with the tensor data.

The function takes a DispatchKey as input, representing the dispatch key associated with the tensor operation.
It uses a switch-case statement to handle different dispatch keys and map them to specific layouts.
For sparse tensor operations, it maps them to the Sparse layout. The C10_FORALL_BACKEND_COMPONENTS macro iterates over all backend components, generating cases for each sparse dispatch key. For specific sparse CSR tensor operations, it maps them to the SparseCsr layout.
If the dispatch key does not correspond to any known layout (which should not happen in practice), it raises an error using TORCH_CHECK.
Finally, it handles some special cases like MkldnnCPU, mapping them to the Mkldnn layout.
If none of the above cases match, it defaults to the Strided layout.

requires_grad():

  bool requires_grad() const noexcept {
    return has_requires_grad_ ? requires_grad_ : false;
  }

inline TensorOptions requires_grad(bool requires_grad = true) {
  return TensorOptions().requires_grad(requires_grad);
}

  C10_NODISCARD TensorOptions
  requires_grad(std::optional<bool> requires_grad) const noexcept {
    TensorOptions r = *this;
    r.set_requires_grad(requires_grad);
    return r;
  }

This function returns whether gradient computation is enabled for the tensor options. It indicates whether operations on the tensor will be tracked for computing gradients during backpropagation.

They allow enabling or disabling gradient computation for operations performed with these tensor options. The third overload allows setting requires_grad conditionally based on the presence of an optional boolean value. It provides flexibility in specifying whether gradient computation is enabled for operations performed with these tensor options.

These functions provide essential functionality for manipulating tensor options, including querying and setting gradient properties such as data type, device, layout and memory. They are crucial for configuring tensors according to specific requirements and executing tensor operations effectively.

StorageImpl.h

In addition to TensorImpl, the PyTorch codebase also includes the StorageImpl class, which represents the underlying storage for tensor data. Here's an overview of StorageImpl and its role in PyTorch's tensor implementation:

set_nbytes(size_t size_bytes) and set_nbytes(c10::SymInt size_bytes):

// Set size of the tensor's storage in bytes
void StorageImpl::set_nbytes(size_t size_bytes) {
    size_bytes_ = static_cast<int64_t>(size_bytes);
    size_bytes_is_heap_allocated_ = false;
}

// Set size of the tensor's storage symbolically
void StorageImpl::set_nbytes(c10::SymInt size_bytes) {
    size_bytes_ = std::move(size_bytes);
}

These functions are crucial for setting the size of the tensor's storage. They allow specifying the size in bytes, either as a regular integer or symbolically. The size of the storage determines the capacity of the tensor to hold data.

data_ptr() and mutable_data_ptr():

  const at::DataPtr& data_ptr() const {
    return data_ptr_;
  }

  void* mutable_data() {
    if (C10_UNLIKELY(has_data_ptr_check_)) {
      if (throw_on_mutable_data_ptr_) {
        throwNullDataPtrError();
      }
      if (warn_deprecated_on_mutable_data_ptr_) {
        warnDeprecatedDataPtr();
      }
      maybe_materialize_cow();
    }
    return data_ptr_.mutable_get();
  }

These functions provide a reference to the data pointer (data_ptr_) of the tensor. The mutable_data() returns a mutable pointer to the tensor's data, allowing modifications to the underlying data.

It first checks for data pointer validity, potential warnings, and materialization.
If necessary, it materializes copy-on-write (COW) to ensure unique ownership of the data.
Finally, it returns the mutable pointer to the data.

set_data_ptr(at::DataPtr&& data_ptr) and set_data_ptr_noswap(at::DataPtr&& data_ptr):

  // Returns the previous data_ptr
  at::DataPtr set_data_ptr(at::DataPtr&& data_ptr) {
    // We need to materialize the old COW DataPtr because it is
    // being returned as mutable.
    maybe_materialize_cow();
    return set_data_ptr_no_materialize_cow(std::move(data_ptr));
  }

  void set_data_ptr_noswap(at::DataPtr&& data_ptr) {
    data_ptr_ = std::move(data_ptr);
    refresh_has_data_ptr_check();
  }

These functions are used to set a new data pointer for the tensor and return the previous data pointer. The set_data_ptr_noswap() sets a new data pointer for the tensor without swapping ownership. The 2 functions ensure proper handling of copy-on-write semantics and maintain internal state consistency.

device_type(), allocator(), and set_allocator(at::Allocator* allocator):

 at::DeviceType device_type() const {
    return data_ptr_.device().type();
  }

  at::Allocator* allocator() {
    return allocator_;
  }
  
  void set_allocator(at::Allocator* allocator) {
    allocator_ = allocator;
  }

These functions deal with the device type and allocator of the tensor's storage. The device_type() function returns the type of device where the tensor's data is stored. The allocator() function returns the allocator used for memory allocation, while set_allocator() allows setting a new allocator for the storage.

UniqueStorageShareExternalPointer():

  void UniqueStorageShareExternalPointer(
      void* src,
      size_t size_bytes,
      DeleterFnPtr d = nullptr) {
    UniqueStorageShareExternalPointer(
        at::DataPtr(src, src, d, data_ptr_.device()), size_bytes);
  }

  void UniqueStorageShareExternalPointer(
      at::DataPtr&& data_ptr,
      size_t size_bytes) {
    data_ptr_ = std::move(data_ptr);
    size_bytes_ = static_cast<int64_t>(size_bytes);
    size_bytes_is_heap_allocated_ = false;
    allocator_ = nullptr;
    resizable_ = false;
  }

These functions are used for efficiently sharing external pointer with tensor's storage objects.

The first overload accepts a raw pointer src, its size size_bytes, and an optional deleter function pointer d. It then constructs a DataPtr and calls the second overload.
The second overload accepts a DataPtr and size_bytes directly, setting the storage's data pointer, size, and other related attributes accordingly. This functionality enables interoperability with external data sources or libraries, allowing tensors to share data without duplicating it.

received_cuda() and set_received_cuda(bool received_cuda):

  void set_received_cuda(bool received_cuda) {
    received_cuda_ = received_cuda;
  }

  bool received_cuda() {
    return received_cuda_;
  }

These functions manage a flag indicating whether the tensor's storage has received data from a CUDA device.

This information is essential for memory management and synchronization, particularly in multi-device environments.
set_received_cuda() sets the flag to indicate whether the data was received from a CUDA device, while received_cuda() retrieves the current state of the flag.

pyobj_slot():

  impl::PyObjectSlot* pyobj_slot() {
    return &pyobj_slot_;
  }

  const impl::PyObjectSlot* pyobj_slot() const {
    return &pyobj_slot_;
  }

This function provides access to an object associated with the tensor's storage which can be used for Python object management, allowing integration with Python frameworks or libraries.

The pyobj_slot() function returns a pointer to the Python object slot, which can be used to interact with Python objects associated with the tensor.

set_throw_on_mutable_data_ptr() and set_warn_deprecated_on_mutable_data_ptr():

  void set_throw_on_mutable_data_ptr() {
    throw_on_mutable_data_ptr_ = true;
    refresh_has_data_ptr_check();
  }

  void set_warn_deprecated_on_mutable_data_ptr() {
    warn_deprecated_on_mutable_data_ptr_ = true;
    refresh_has_data_ptr_check();
  }

These functions set flags to control whether an error should be thrown or a warning issued when attempting to modify the tensor's data pointer.

They are used to manage behavior related to mutable access to the data pointer, providing options for enforcing safety measures or issuing warnings based on the storage's state.

nbytes() and sym_nbytes():

  size_t nbytes() const {
    // OK to do this instead of maybe_as_int as nbytes is guaranteed positive
    TORCH_CHECK(!size_bytes_is_heap_allocated_);
    return size_bytes_.as_int_unchecked();
  }

  SymInt sym_nbytes() const {
    return size_bytes_;
  }
  
  void set_nbytes(c10::SymInt size_bytes) {
    size_bytes_ = std::move(size_bytes);
  }

nbytes() returns the size of the tensor's data in bytes, while sym_nbytes() returns the size symbolically. the set_nbytes() function sets the size of the tensor's data.

resizable():

  bool resizable() const {
    return resizable_;
  }

This function indicates whether the tensor's storage is resizable. Resizable storage can dynamically change its size, allowing for flexibility in tensor manipulation and memory management operations such as resizing or concatenation.

The above functions collectively contribute to the efficient management and manipulation of tensors, providing functionality for sharing data, managing CUDA-related information, interacting with Python objects, and controlling safety measures during data pointer access.

Conclusion

PyTorch emphasizes performance, flexibility, and usability through its tensor implementations, such as TensorImpl, TensorOptions, and StorageImpl. Delving into the inner workings of tensors and their related elements empowers developers to harness PyTorch's features for constructing scalable, efficient, and high-performing deep learning solutions. As PyTorch evolves, improvements in tensor management and optimization will bolster its standing as a premier deep learning framework.

Internal Implementation of Tensors in PyTorch

PyTorch Deep Learning

Table of contents

Key Terminologies

PyTorch's Tensor implentations

TensorImpl.h

TensorOptions.h

StorageImpl.h

Conclusion

Implementing Simple CNN model in PyTorch

Understanding DOM Parsing and Serialization Techniques

Table of contents

Key Terminologies

PyTorch's Tensor implentations

TensorImpl.h

TensorOptions.h

StorageImpl.h

Conclusion

Subscribe to OpenGenus IQ: Learn Algorithms, DL, System Design