LLama2

Class Overview

The class LLama2 is a custom transformer model built for Natural Language Processing (NLP) tasks. The objective of this class is to provide a compact yet powerful transformer model for the application of various NLP tasks, from translation to text generation and more.

The LLama2 transformer in this class provides a broad range of customizable parameters, allowing for it to be fine-tuned for specific tasks and datasets. It supports arguments for the sequence length, model dimensions, layer depths, number of heads, and several other options, providing extensive adaptability for various NLP tasks.

Class Structure

class LLama2:
    def __init__(
        self,
        num_tokens=50432,
        max_seq_len=8192,
        dim=2560,
        depth=32,
        dim_head=128,
        heads=24,
        rotary_xpos=True,
        attn_flash=True,
    ):
        super().__init__()

        self.llama2 = Transformer(
            num_tokens=50000,
            max_seq_len=4096,
            attn_layers=Decoder(
                dim=dim,
                depth=depth,
                dim_head=dim_head,
                heads=heads,
                attn_flash=attn_flash,
                rotary_xpos=rotary_xpos,
            ),
        )
        self.decoder = AutoRegressiveWrapper(self.decoder)

    def forward(self, text):
        model_input = self.decoder.forward(text)[0]
        return self.decoder(model_input, padded_x=model_input[0])

Function Name: __init__

Purpose: Initializes the LLama2 class.

Parameter	Data Type	Default Value	Description
num_tokens	int	50432	The total number of tokens in the input vocabulary.
max_seq_len	int	8192	The maximum sequence length that the model can accept.
dim	int	2560	The model's embedding dimensionality.
depth	int	32	The number of transformer layers in the model.
dim_head	int	128	The dimensionality of the head in the self-attention mechanism of the transformer model.
heads	int	24	The number of heads for the multi-head self attention mechanism of the transformer model.
rotary_xpos	bool	True	Whether to apply rotary positional embeddings to the input sequence.
attn_flash	bool	True	Whether to use the flash attention mechanism.

Function Name: forward

Purpose: Defines the forward pass of the model.

Parameter	Data Type	Default Value	Description
text	string		The input text which the model processes.

Returns: A tensor representation of model's output given the model_input.

Usage Examples

Example 1: Text Processing

This example illustrates how to instantiate the model and pass a sample text through it.

import torch
from torch.nn import Decoder, Transformer

from zeta.models import LLama2
from zeta.structs import AutoRegressiveWrapper

# Initializing model
llama2_model = LLama2()

# Cut-off long text or pad short text
text = torch.tensor([1, 2, 3, 4])

# Passing text through model
output = llama2_model.forward(text)

print(output)

Example 2: Customizing Model Parameters

This example illustrates how to instantiate the model with custom parameters.

llama2_model = LLama2(
    num_tokens=1000, max_seq_len=512, dim=512, depth=4, dim_head=64, heads=4
)

text = torch.tensor([1, 2, 3, 4])

output = llama2_model.forward(text)

print(output)

Example 3: Sequence Classification

This example illustrates how you could use this model for a sequence classification task.

llama2_model = LLama2(
    num_tokens=5000, max_seq_len=256, dim=128, depth=2, dim_head=32, heads=2
)

text_sequences = torch.tensor([[1, 2, 3, 4], [2, 3, 1, 4]])
target_sequences = torch.tensor([1, 0])  # 2 sequences, 1 for each sequence

outputs = llama2_model.forward(text_sequences)
loss = loss_function(outputs, target_sequences)

In this usage example, an instance of the LLama2 class is created using custom parameters. A tensor representing text sequences is passed to the model, and the output is computed. You would typically use a loss function suitable for classification tasks (like Cross-Entropy Loss) and compute the loss against some target sequences.

Note: The provided code is a basic example and might require adjustments like adding an appropriate classifier layer at the end, depending on the specific task requirements.