XPOS
Documentation¶
Table of Contents¶
- Introduction
- Purpose and Functionality
- Class:
XPOS
- Initialization
- Parameters
- Forward Method
- Functions
fixed_pos_embedding
rotate_every_two
duplicate_interleave
apply_rotary_pos_emb
- Usage Examples
- Using the
XPOS
Class - Using the Functions
- Additional Information
- Positional Embeddings in Transformers
- References
1. Introduction ¶
Welcome to the Zeta documentation for the XPOS
class and related functions! Zeta is a powerful library for deep learning in PyTorch, and this documentation will provide a comprehensive understanding of the XPOS
class and its associated functions.
2. Purpose and Functionality ¶
The XPOS
class and its related functions are designed to generate and apply rotary positional embeddings to input tensors. These embeddings are crucial for sequence-to-sequence models, particularly in transformer architectures. Below, we will explore their purpose and functionality.
3. Class: XPOS
¶
The XPOS
class is used to apply rotary positional embeddings to input tensors. These embeddings are essential for transformers to understand the positional information of elements in a sequence.
Initialization ¶
To create an instance of the XPOS
class, you need to specify the following parameters:
Parameters ¶
-
head_dim
(int, optional): The dimensionality of the positional embeddings. If not specified, it defaults toNone
, which is used to calculate the dimension based on the input tensor. It is recommended to set this value explicitly for consistency. -
scale_base
(int, optional): The base value for scaling the positional embeddings. Default is512
.
Forward Method ¶
The forward
method of the XPOS
class applies rotary positional embeddings to the input tensor. It can be called as follows:
-
input_tensor
(Tensor): The input tensor to which positional embeddings will be applied. -
offset
(int, optional): An offset value for positional embeddings. Default is0
. -
downscale
(bool, optional): IfTrue
, the positional embeddings are downscaled. Default isFalse
.
4. Functions ¶
In addition to the XPOS
class, there are several functions provided for working with positional embeddings.
fixed_pos_embedding
¶
This function generates fixed sine and cosine positional embeddings based on the input tensor's scale.
x
(Tensor): Input tensor of shape(seq_len, dim)
.
rotate_every_two
¶
This function rearranges the elements of the input tensor by rotating every two elements.
input_tensor
(Tensor): Input tensor of shape(batch_size, seq_len, dim)
.
duplicate_interleave
¶
This function duplicates a matrix while interleaving the copy.
matrix
(Tensor): Input matrix.
apply_rotary_pos_emb
¶
This function applies rotary positional embeddings to the input tensor.
input_tensor
(Tensor): Input tensor of shape(batch_size, seq_len, dim)
.sin
(Tensor): Sine positional embeddings of shape(seq_len, dim)
.cos
(Tensor): Cosine positional embeddings of shape(seq_len, dim)
.scale
(float): Scaling factor for the positional embeddings.
5. Usage Examples ¶
Let's explore some usage examples of the XPOS
class and related functions to understand how to use them effectively.
Using the XPOS
Class ¶
import torch
from zeta.nn import XPOS
# Create an XPOS instance
xpos = XPOS(head_dim=256, scale_base=512)
# Apply positional embeddings to an input tensor
input_tensor = torch.rand(16, 32, 256) # Example input tensor
output = xpos(input_tensor, offset=0, downscale=False)
Using the Functions ¶
import torch
from zeta.nn import (
apply_rotary_pos_emb,
duplicate_interleave,
fixed_pos_embedding,
rotate_every_two,
)
# Generate fixed positional embeddings
input_tensor = torch.rand(32, 512) # Example input tensor
sin, cos = fixed_pos_embedding(input_tensor)
# Rotate every two elements in a tensor
input_tensor = torch.rand(16, 64, 256) # Example input tensor
output_tensor = rotate_every_two(input_tensor)
# Duplicate and interleave a matrix
input_matrix = torch.rand(8, 8) # Example input matrix
duplicated_matrix = duplicate_interleave(input_matrix)
# Apply rotary positional embeddings
input_tensor = torch.rand(16, 32, 256) # Example input tensor
output_tensor = apply_rotary_pos_emb(input_tensor, sin, cos, scale=1)
6. Additional Information ¶
Positional Embeddings in Transformers ¶
Positional embeddings play a crucial role in transformers and other sequence-to-sequence models. They enable the model to understand the order of elements in a sequence, which is essential for tasks like natural language processing, machine translation, and text generation.
7. References ¶
This documentation provides a comprehensive guide to the XPOS
class and related functions in the Zeta library, explaining their purpose, functionality, parameters, and usage. You can now effectively integrate these components into your deep learning models, particularly in transformer-based architectures, for various sequence-based tasks.
For further information on the underlying concepts and principles of positional embeddings in
transformers, you may refer to the original paper:
Please consult the official PyTorch documentation for any specific PyTorch-related details: PyTorch Documentation.