`XPOS` Documentation

Introduction
Purpose and Functionality
Class: XPOS
Initialization
Parameters
Forward Method
Functions
fixed_pos_embedding
rotate_every_two
duplicate_interleave
apply_rotary_pos_emb
Usage Examples
Using the XPOS Class
Using the Functions
Additional Information
Positional Embeddings in Transformers
References

1. Introduction

Welcome to the Zeta documentation for the XPOS class and related functions! Zeta is a powerful library for deep learning in PyTorch, and this documentation will provide a comprehensive understanding of the XPOS class and its associated functions.

2. Purpose and Functionality

The XPOS class and its related functions are designed to generate and apply rotary positional embeddings to input tensors. These embeddings are crucial for sequence-to-sequence models, particularly in transformer architectures. Below, we will explore their purpose and functionality.

3. Class: `XPOS`

The XPOS class is used to apply rotary positional embeddings to input tensors. These embeddings are essential for transformers to understand the positional information of elements in a sequence.

Initialization

To create an instance of the XPOS class, you need to specify the following parameters:

XPOS(
    head_dim: int = None,
    scale_base: int = 512
)

Parameters

head_dim (int, optional): The dimensionality of the positional embeddings. If not specified, it defaults to None, which is used to calculate the dimension based on the input tensor. It is recommended to set this value explicitly for consistency.
scale_base (int, optional): The base value for scaling the positional embeddings. Default is 512.

Forward Method

The forward method of the XPOS class applies rotary positional embeddings to the input tensor. It can be called as follows:

output = xpos(input_tensor, offset=0, downscale=False)

input_tensor (Tensor): The input tensor to which positional embeddings will be applied.
offset (int, optional): An offset value for positional embeddings. Default is 0.
downscale (bool, optional): If True, the positional embeddings are downscaled. Default is False.

4. Functions

In addition to the XPOS class, there are several functions provided for working with positional embeddings.

`fixed_pos_embedding`

This function generates fixed sine and cosine positional embeddings based on the input tensor's scale.

sin, cos = fixed_pos_embedding(x)

x (Tensor): Input tensor of shape (seq_len, dim).

`rotate_every_two`

This function rearranges the elements of the input tensor by rotating every two elements.

output_tensor = rotate_every_two(input_tensor)

input_tensor (Tensor): Input tensor of shape (batch_size, seq_len, dim).

`duplicate_interleave`

This function duplicates a matrix while interleaving the copy.

duplicated_matrix = duplicate_interleave(matrix)

matrix (Tensor): Input matrix.

`apply_rotary_pos_emb`

This function applies rotary positional embeddings to the input tensor.

output_tensor = apply_rotary_pos_emb(input_tensor, sin, cos, scale=1)

input_tensor (Tensor): Input tensor of shape (batch_size, seq_len, dim).
sin (Tensor): Sine positional embeddings of shape (seq_len, dim).
cos (Tensor): Cosine positional embeddings of shape (seq_len, dim).
scale (float): Scaling factor for the positional embeddings.

5. Usage Examples

Let's explore some usage examples of the XPOS class and related functions to understand how to use them effectively.

Using the `XPOS` Class

import torch

from zeta.nn import XPOS

# Create an XPOS instance
xpos = XPOS(head_dim=256, scale_base=512)

# Apply positional embeddings to an input tensor
input_tensor = torch.rand(16, 32, 256)  # Example input tensor
output = xpos(input_tensor, offset=0, downscale=False)

Using the Functions

import torch

from zeta.nn import (
    apply_rotary_pos_emb,
    duplicate_interleave,
    fixed_pos_embedding,
    rotate_every_two,
)

# Generate fixed positional embeddings
input_tensor = torch.rand(32, 512)  # Example input tensor
sin, cos = fixed_pos_embedding(input_tensor)

# Rotate every two elements in a tensor
input_tensor = torch.rand(16, 64, 256)  # Example input tensor
output_tensor = rotate_every_two(input_tensor)

# Duplicate and interleave a matrix
input_matrix = torch.rand(8, 8)  # Example input matrix
duplicated_matrix = duplicate_interleave(input_matrix)

# Apply rotary positional embeddings
input_tensor = torch.rand(16, 32, 256)  # Example input tensor
output_tensor = apply_rotary_pos_emb(input_tensor, sin, cos, scale=1)

6. Additional Information

Positional Embeddings in Transformers

Positional embeddings play a crucial role in transformers and other sequence-to-sequence models. They enable the model to understand the order of elements in a sequence, which is essential for tasks like natural language processing, machine translation, and text generation.

7. References

This documentation provides a comprehensive guide to the XPOS class and related functions in the Zeta library, explaining their purpose, functionality, parameters, and usage. You can now effectively integrate these components into your deep learning models, particularly in transformer-based architectures, for various sequence-based tasks.

For further information on the underlying concepts and principles of positional embeddings in

transformers, you may refer to the original paper:

Attention Is All You Need (Transformer)

Please consult the official PyTorch documentation for any specific PyTorch-related details: PyTorch Documentation.

XPOS Documentation

Table of Contents