Class Name: Andromeda¶
Module Description
This documentation provides details on the functionality of the Andromeda class from the zeta.models library.
The Andromeda class is a transformer-based model helper class that acts as a wrapper for the Transformer and AutoRegressiveWrapper modules, defaulting or accepting user-specified values in its configuration.
Features of the Andromeda model include but are not limited to: - Configurable model dimensions, including token count, maximum sequence length, layer depth, and head dimensions. - Abstract position embeddings, alibi position biases, rotary positions, attentions, and buffer elements which are all modifiable by the user.
Class Definition:¶
class Andromeda(Module):
"""
Andromeda is a transformer-based model architecture. It initializes with
a Transformer and AutoRegressiveWrapper with default or user-specified parameters.
"""
Initialization (init) Function:¶
The init function is where the Transformer and AutoRegressiveWrapper objects are assigned to self.Andromeda
and self.decoder
respectively.
def __init__(
self,
num_tokens=50432,
max_seq_len=8192,
dim=2560,
depth=32,
dim_head=128,
heads=24,
use_abs_pos_emb=False,
alibi_pos_bias=True,
alibi_num_heads=12,
rotary_xpos=True,
attn_flash=True,
attn_kv_heads=2,
qk_norm=True,
attn_qk_norm=True,
attn_qk_norm_dim_scale=True,
):
The parameters and their defaults used in initialization are listed below
Parameter | Default Value | Description |
---|---|---|
num_tokens | 50432 | Number of tokens in the vocabulary |
max_seq_len | 8192 | Maximum sequence length |
dim | 2560 | Dimension of the model |
depth | 32 | Depth of the model |
dim_head | 128 | Dimension of the model head |
heads | 24 | Number of heads |
use_abs_pos_emb | False | Whether to use absolute position embedding |
alibi_pos_bias | True | Alibi position bias |
alibi_num_heads | 12 | Number of alibi heads |
rotary_xpos | True | Rotary position |
attn_flash | True | Attention flash |
attn_kv_heads | 2 | Number of attention key/value heads |
qk_norm | True | Query-key normalization |
attn_qk_norm | True | Attention query-key normalization |
attn_qk_norm_dim_scale | True | Attention query-key normalization dimension scale |
Forward Function¶
Forward propagation in PyTorch involves defining the computation performed at every call. In the Andromeda class, this computation involves passing input text tokens through the decoder. If an exception occurs during this forward propagation, an error message will be printed and an exception will be thrown.
def forward(self, text_tokens, **kwargs):
"""
Forward pass through the model. It expects the input text_tokens.
"""
```
The parameters used in forward function are listed below:
| Parameter | Description |
| ------------- | ------------- |
| text_tokens | Input tokens |
| **kwargs | Other arguments |
The forward function returns the output from the decoder.
## Code Example:
Below is a simple example of instantiating the Andromeda class and using it for forward propagation:
```python
# Import necessary libraries and modules
from torch.nn import Module
from zeta.models import Andromeda
# Initialize the Andromeda class with default parameters
model = Andromeda()
# Define your input text tokens
text_tokens = torch.randn(1, 8192)
# Perform forward pass through the model
output = model.forward(text_tokens)
Note Techniques such as query-key normalization aid in the alignment of the query’s distribution to that of the key, in order to reduce the negative impacts of any input with a wildly different distribution. As such, the parameters related to normalization (qk_norm, attn_qk_norm, attn_qk_norm_dim_scale) default to True, but can be toggled off based on the specific needs of your application.
Also, It's important to ensure that the defined text tokens fit within the dimensions defined for num_tokens
and max_seq_len
. Otherwise, you might encounter an error during forward pass.
For more information on the underlying Transformer and AutoRegressiveWrapper modules, please check the official PyTorch documentation.
Other Additional Information & Tips¶
The Andromeda class is notable for its robust set of flexible features that can lend it to varying use-cases and it is inherently versatile due to its Transformer and AutoRegressiveWrapper architecture. This model emphasizes on the detail to accepting user-specified parameters for a high level of customization.
However, due to its complexity and high-dimensional nature, this model may not be preferable under constraints of memory, processing power or the need for simplicity.
References & External Resources¶
- Official PyTorch Docs for more information on underlying classes and modules.
- Understanding Transformers in NLP for conceptual knowledge on Transformer models.
- Autoregressive Models for understanding on autoregressive models.
Enjoy exploring the Andromeda class from the zeta.models library!