GPT4 Class¶
GPT4 is a class providing the architecture of a transformer-based model. The class primarily consists of two main components, a Transformer and an AutoRegressiveWrapper.
Based on the method used by OpenAI's GPT-3, the GPT4 in this implementation expands on that base with user-specified or default parameters. These parameters allow users to customize the architecture, depth, and functionality of their models for specific use-cases.
Initialize the class¶
The class is initialized by the following arguments:
Argument | Type | Default | Description |
---|---|---|---|
num_tokens | int | 50432 | Number of tokens in the vocabulary |
max_seq_len | int | 8192 | Maximum length of the sequence |
dim | int | 2560 | Dimension of the model |
depth | int | 32 | Depth of the model |
dim_head | int | 128 | Dimension of the model head |
heads | int | 24 | Number of heads |
use_abs_pos_emb | bool | False | Whether to use absolute position embedding |
alibi_pos_bias | bool | True | Alibi position bias |
alibi_num_heads | int | 12 | Number of alibi heads |
rotary_xpos | bool | True | Rotary position |
attn_flash | bool | True | Attention flash |
attn_one_kv_head | bool | True | Attention one key/value head for multiquery attention |
qk_norm | bool | True | Query-key normalization |
attn_qk_norm | bool | True | Attention query-key normalization |
attn_qk_norm_dim_scale | bool | True | Attention query-key normalization dimension scale |
Each of these arguments can be modified to suit specific needs of the user.
Implementing the transformer class¶
The Transformer architecture used in the GPT4 model forms the backbone of the class. It utilizes an attention mechanism to focus on different words in a sequence while processing the input data.
In this case, the Transformer is a Decoder, which transpires the depth, dim_head, heads, alibi_pos_bias, alibi_num_heads, rotary_xpos, attn_flash, attn_one_kv_head, qk_norm, attn_qk_norm, and attn_qk_norm_dim_scale properties from the GPT4 arguments.
If initialization fails for any reason, an exception is caught and logged in the console, and the exception is re-raised.
AutoRegressiveWrapper¶
As a next step, the transformer is wrapped with an AutoRegressiveWrapper. Autoregressive models are ones where the output from one step is fed as an input to the next step. This allows for modeling the sequence of data effectively, thus making it excellent for tasks like text generation and language modelling.
Forward function¶
The forward
function of the GPT4 class starts by taking text_tokens
as input. This variable represents the tokenized input sentences.
In the forward function, a Transformer (loaded by the decoder) is applied to forward text_tokens
. The result is a model_input
variable, which is then passed into the decoder along with the padded_x
parameter.
If exceptions occur during the forward pass, they are caught and logged in the console, and the exception is re-raised.
Usage¶
Here's how you can use the GPT4 class:
import torch
from torch import nn
from zeta.models import GPT4
# Initialize with default parameters
model = GPT4()
# Representing 3 sequences of the maximum length of 8192
input = torch.randint(0, 50432, (3, 8192))
# Pass the input to the model's forward method
output = model.forward(input)
Conclusion¶
The GPT4 class is a powerful tool for creating Transformer-based language models. With the flexibility it provides, users can customize the model per their requirements and specifications. Whether it be altering the dimensionality, the number of heads in multihead attention, or whether to use absolute position embeddings, the GPT4 class provides a versatile and flexible architecture for your next natural language processing project.