DynamicPositionBias Documentation¶
Overview and Introduction¶
The DynamicPositionBias
class from the zeta
library is designed to compute positional biases dynamically based on relative distances between positions in a sequence. This module can be crucial in attention mechanisms where relative position matters, as commonly seen in Transformers.
Key concepts: - Relative Position: The difference in position between two tokens in a sequence. - Positional Bias: A bias introduced based on the relative position, to indicate how two positions are related. - MLP (Multi-Layer Perceptron): A type of feedforward neural network consisting of multiple layers of nodes in a directed graph.
Class Definition¶
Parameters:¶
dim
(int
): The dimension of the intermediary layer in the MLP.heads
(int
): The number of attention heads. This also dictates the output dimension of the bias.
Attributes:¶
mlp
(nn.Sequential
): Multi-Layer Perceptron used to compute the bias based on relative distance.
Functionality and Usage¶
Method: forward(i: int, j: int) -> torch.Tensor
¶
Computes the positional bias based on the relative distance between positions i
and j
.
Parameters:¶
i
(int
): Starting position in the sequence.j
(int
): Ending position in the sequence.
Returns:¶
bias
(torch.Tensor
): A tensor representing the bias, of shape(heads, i, j)
.
Usage:¶
The positional bias can be utilized in attention mechanisms to provide awareness of relative position between tokens.
Examples:¶
-
Basic Usage:
-
Integration with Transformer:
import torch from torch.nn import MultiheadAttention from zeta import DynamicPositionBias class CustomAttention(MultiheadAttention): def __init__(self, embed_dim, num_heads): super().__init__(embed_dim, num_heads) self.pos_bias = DynamicPositionBias(dim=embed_dim, heads=num_heads) # Override the forward method to include positional bias # ... (implementation details)
-
Inspecting the Bias:
Additional Information and Tips¶
- Ensure that
j >= i
when calling the forward method. - The model relies on the
einops
library for tensor rearrangement. Ensure you have this dependency installed. - This module primarily assists in capturing the relative positional information between two positions in a sequence. It might be beneficial when absolute positional embeddings are not available or not preferred.
References and Resources¶
- Attention Is All You Need (Vaswani et al., 2017) - Introduces the concept of attention mechanisms that can benefit from positional information.
- Einops Documentation - For tensor rearrangement operations used in the implementation.
Mathematical Representation:¶
Given a sequence from i
to j
:
The relative distance \( R \) for any two elements \( s_x \) and \( s_y \) from this sequence is:
The bias for a specific head h
and relative distance \( r \) can be represented as:
Where MLP_h
is the Multi-Layer Perceptron specific to head h
.