Skip to content

XPOS Documentation

Table of Contents

  1. Introduction
  2. Purpose and Functionality
  3. Class: XPOS
  4. Initialization
  5. Parameters
  6. Forward Method
  7. Functions
  8. fixed_pos_embedding
  9. rotate_every_two
  10. duplicate_interleave
  11. apply_rotary_pos_emb
  12. Usage Examples
  13. Using the XPOS Class
  14. Using the Functions
  15. Additional Information
  16. Positional Embeddings in Transformers
  17. References

1. Introduction

Welcome to the Zeta documentation for the XPOS class and related functions! Zeta is a powerful library for deep learning in PyTorch, and this documentation will provide a comprehensive understanding of the XPOS class and its associated functions.


2. Purpose and Functionality

The XPOS class and its related functions are designed to generate and apply rotary positional embeddings to input tensors. These embeddings are crucial for sequence-to-sequence models, particularly in transformer architectures. Below, we will explore their purpose and functionality.


3. Class: XPOS

The XPOS class is used to apply rotary positional embeddings to input tensors. These embeddings are essential for transformers to understand the positional information of elements in a sequence.

Initialization

To create an instance of the XPOS class, you need to specify the following parameters:

XPOS(
    head_dim: int = None,
    scale_base: int = 512
)

Parameters

  • head_dim (int, optional): The dimensionality of the positional embeddings. If not specified, it defaults to None, which is used to calculate the dimension based on the input tensor. It is recommended to set this value explicitly for consistency.

  • scale_base (int, optional): The base value for scaling the positional embeddings. Default is 512.

Forward Method

The forward method of the XPOS class applies rotary positional embeddings to the input tensor. It can be called as follows:

output = xpos(input_tensor, offset=0, downscale=False)
  • input_tensor (Tensor): The input tensor to which positional embeddings will be applied.

  • offset (int, optional): An offset value for positional embeddings. Default is 0.

  • downscale (bool, optional): If True, the positional embeddings are downscaled. Default is False.


4. Functions

In addition to the XPOS class, there are several functions provided for working with positional embeddings.

fixed_pos_embedding

This function generates fixed sine and cosine positional embeddings based on the input tensor's scale.

sin, cos = fixed_pos_embedding(x)
  • x (Tensor): Input tensor of shape (seq_len, dim).

rotate_every_two

This function rearranges the elements of the input tensor by rotating every two elements.

output_tensor = rotate_every_two(input_tensor)
  • input_tensor (Tensor): Input tensor of shape (batch_size, seq_len, dim).

duplicate_interleave

This function duplicates a matrix while interleaving the copy.

duplicated_matrix = duplicate_interleave(matrix)
  • matrix (Tensor): Input matrix.

apply_rotary_pos_emb

This function applies rotary positional embeddings to the input tensor.

output_tensor = apply_rotary_pos_emb(input_tensor, sin, cos, scale=1)
  • input_tensor (Tensor): Input tensor of shape (batch_size, seq_len, dim).
  • sin (Tensor): Sine positional embeddings of shape (seq_len, dim).
  • cos (Tensor): Cosine positional embeddings of shape (seq_len, dim).
  • scale (float): Scaling factor for the positional embeddings.

5. Usage Examples

Let's explore some usage examples of the XPOS class and related functions to understand how to use them effectively.

Using the XPOS Class

import torch

from zeta.nn import XPOS

# Create an XPOS instance
xpos = XPOS(head_dim=256, scale_base=512)

# Apply positional embeddings to an input tensor
input_tensor = torch.rand(16, 32, 256)  # Example input tensor
output = xpos(input_tensor, offset=0, downscale=False)

Using the Functions

import torch

from zeta.nn import (
    apply_rotary_pos_emb,
    duplicate_interleave,
    fixed_pos_embedding,
    rotate_every_two,
)

# Generate fixed positional embeddings
input_tensor = torch.rand(32, 512)  # Example input tensor
sin, cos = fixed_pos_embedding(input_tensor)

# Rotate every two elements in a tensor
input_tensor = torch.rand(16, 64, 256)  # Example input tensor
output_tensor = rotate_every_two(input_tensor)

# Duplicate and interleave a matrix
input_matrix = torch.rand(8, 8)  # Example input matrix
duplicated_matrix = duplicate_interleave(input_matrix)

# Apply rotary positional embeddings
input_tensor = torch.rand(16, 32, 256)  # Example input tensor
output_tensor = apply_rotary_pos_emb(input_tensor, sin, cos, scale=1)

6. Additional Information

Positional Embeddings in Transformers

Positional embeddings play a crucial role in transformers and other sequence-to-sequence models. They enable the model to understand the order of elements in a sequence, which is essential for tasks like natural language processing, machine translation, and text generation.


7. References

This documentation provides a comprehensive guide to the XPOS class and related functions in the Zeta library, explaining their purpose, functionality, parameters, and usage. You can now effectively integrate these components into your deep learning models, particularly in transformer-based architectures, for various sequence-based tasks.

For further information on the underlying concepts and principles of positional embeddings in

transformers, you may refer to the original paper:

Please consult the official PyTorch documentation for any specific PyTorch-related details: PyTorch Documentation.