FusedDenseGELUDense
¶
Overview¶
The FusedDenseGELUDense
module is a versatile neural network layer designed for efficient computation of dense layers with GELU (Gaussian Error Linear Unit) activations. This documentation will provide an in-depth understanding of the module's architecture, purpose, parameters, and usage examples.
Table of Contents¶
- Introduction
- Architecture
- Purpose
- Class Definition
- Functionality and Usage
- Examples
- Additional Information
- References
1. Introduction ¶
The FusedDenseGELUDense
module combines dense layers with GELU activations in a single neural network layer. This fusion improves computational efficiency and is particularly useful in various deep learning applications.
2. Architecture ¶
The FusedDenseGELUDense
layer consists of two dense sub-layers, each followed by a GELU activation function. It takes an input tensor and passes it through these sub-layers to produce the final output.
3. Purpose ¶
The primary purpose of the FusedDenseGELUDense
layer is to efficiently compute dense transformations with GELU activations. It is designed for use in neural networks, providing a convenient way to incorporate these operations into deep learning models.
4. Class Definition ¶
Parameters ¶
dim
(int): Input dimension.dim_out
(int): Output dimension.bias
(bool, optional): Whether to include bias terms. Defaults to True.has_fp16_weights
(bool, optional): Whether to use fp16 weights. Defaults to False.threshold
(float, optional): Threshold for quantization. Defaults to 6.0.
Internal Layers ¶
The FusedDenseGELUDense
layer consists of the following internal layers:
dense1
: The first dense layer.act
: The GELU activation function.dense2
: The second dense layer.
5. Functionality and Usage ¶
Forward Pass ¶
The forward
method of the FusedDenseGELUDense
layer performs the following operations:
- Applies the first dense layer (
dense1
) to the input tensor. - Applies the GELU activation function (
act
) to the result. - Applies the second dense layer (
dense2
) to the GELU-activated output.
6. Examples ¶
Basic Usage ¶
Here's a basic example of using the FusedDenseGELUDense
layer:
import torch
from zeta.nn import FusedDenseGELUDense
# Create an instance of FusedDenseGELUDense
model = FusedDenseGELUDense(dim=512, dim_out=1024)
# Generate random input tensor
x = torch.randn(1, 512)
# Forward pass
out = model(x)
# Check the output shape
print(out.shape) # torch.Size([1, 512])
Custom Configuration ¶
You can customize the layer by specifying different parameters:
# Create a custom FusedDenseGELUDense layer
custom_model = FusedDenseGELUDense(
dim=256, dim_out=512, bias=False, has_fp16_weights=True, threshold=4.0
)
# Generate random input tensor
x = torch.randn(1, 256)
# Forward pass with the custom configuration
out = custom_model(x)
Quantization with bitsandbytes ¶
You can enable quantization using the bitsandbytes
library by providing a quantized implementation of the dense layers:
# Install bitsandbytes if not already installed
# pip install bitsandbytes
import torch
from zeta.nn import FusedDenseGELUDense
# Create an instance of FusedDenseGELUDense with quantization
quantizedim = FusedDenseGELUDense(
dim=512, dim_out=1024, has_fp16_weights=True, threshold=4.0
)
# Generate random input tensor
x = torch.randn(1, 512)
# Forward pass with quantization
out = quantizedim(x)
7. Additional Information ¶
- The
FusedDenseGELUDense
layer efficiently combines dense and GELU activation operations. - Custom configurations for bias, weight precision, and threshold are supported.
- Quantization can be enabled using the
bitsandbytes
library for further efficiency.
8. References ¶
For more information on GELU activations and dense layers in PyTorch, refer to the official PyTorch documentation: