LLama2¶
Class Overview¶
The class LLama2 is a custom transformer model built for Natural Language Processing (NLP) tasks. The objective of this class is to provide a compact yet powerful transformer model for the application of various NLP tasks, from translation to text generation and more.
The LLama2 transformer in this class provides a broad range of customizable parameters, allowing for it to be fine-tuned for specific tasks and datasets. It supports arguments for the sequence length, model dimensions, layer depths, number of heads, and several other options, providing extensive adaptability for various NLP tasks.
Class Structure¶
class LLama2:
def __init__(
self,
num_tokens=50432,
max_seq_len=8192,
dim=2560,
depth=32,
dim_head=128,
heads=24,
rotary_xpos=True,
attn_flash=True,
):
super().__init__()
self.llama2 = Transformer(
num_tokens=50000,
max_seq_len=4096,
attn_layers=Decoder(
dim=dim,
depth=depth,
dim_head=dim_head,
heads=heads,
attn_flash=attn_flash,
rotary_xpos=rotary_xpos,
),
)
self.decoder = AutoRegressiveWrapper(self.decoder)
def forward(self, text):
model_input = self.decoder.forward(text)[0]
return self.decoder(model_input, padded_x=model_input[0])
Function Name: __init__
Purpose: Initializes the LLama2 class.
Parameter | Data Type | Default Value | Description |
---|---|---|---|
num_tokens | int | 50432 | The total number of tokens in the input vocabulary. |
max_seq_len | int | 8192 | The maximum sequence length that the model can accept. |
dim | int | 2560 | The model's embedding dimensionality. |
depth | int | 32 | The number of transformer layers in the model. |
dim_head | int | 128 | The dimensionality of the head in the self-attention mechanism of the transformer model. |
heads | int | 24 | The number of heads for the multi-head self attention mechanism of the transformer model. |
rotary_xpos | bool | True | Whether to apply rotary positional embeddings to the input sequence. |
attn_flash | bool | True | Whether to use the flash attention mechanism. |
Function Name: forward
Purpose: Defines the forward pass of the model.
Parameter | Data Type | Default Value | Description |
---|---|---|---|
text | string | The input text which the model processes. |
Returns: A tensor representation of model's output given the model_input.
Usage Examples¶
Example 1: Text Processing¶
This example illustrates how to instantiate the model and pass a sample text through it.
import torch
from torch.nn import Decoder, Transformer
from zeta.models import LLama2
from zeta.structs import AutoRegressiveWrapper
# Initializing model
llama2_model = LLama2()
# Cut-off long text or pad short text
text = torch.tensor([1, 2, 3, 4])
# Passing text through model
output = llama2_model.forward(text)
print(output)
Example 2: Customizing Model Parameters¶
This example illustrates how to instantiate the model with custom parameters.
llama2_model = LLama2(
num_tokens=1000, max_seq_len=512, dim=512, depth=4, dim_head=64, heads=4
)
text = torch.tensor([1, 2, 3, 4])
output = llama2_model.forward(text)
print(output)
Example 3: Sequence Classification¶
This example illustrates how you could use this model for a sequence classification task.
llama2_model = LLama2(
num_tokens=5000, max_seq_len=256, dim=128, depth=2, dim_head=32, heads=2
)
text_sequences = torch.tensor([[1, 2, 3, 4], [2, 3, 1, 4]])
target_sequences = torch.tensor([1, 0]) # 2 sequences, 1 for each sequence
outputs = llama2_model.forward(text_sequences)
loss = loss_function(outputs, target_sequences)
Note: The provided code is a basic example and might require adjustments like adding an appropriate classifier layer at the end, depending on the specific task requirements.