AccurateGELUActivation¶
Overview¶
The AccurateGELUActivation class is a part of the PyTorch library's nn.Module. This class allows us to apply the Gaussian Error Linear Unit (GELU) approximation that is faster than the default and more accurate than QuickGELU. This can be useful in situations where the default GELU is considered computationally expensive or its speed could be an issue. The implementation of this class comes as a support for MEGA, which stands for Moving Average Equipped Gated Attention, in neural networks.
The class has been designed following the work on GELUs available at: https://github.com/hendrycks/GELUs
Class Definition¶
Here is a look at the parameters and methods used in the AccurateGELUActivation
class:
class AccurateGELUActivation(nn.Module):
"""
Applies GELU approximation that is faster than default and more accurate than QuickGELU. See:
https://github.com/hendrycks/GELUs
Implemented along with MEGA (Moving Average Equipped Gated Attention)
"""
def __init__(self):
super().__init__()
self.precomputed_constant = math.sqrt(2 / math.pi)
def forward(self, input: Tensor) -> Tensor:
return (
0.5
* input
* (
1
+ torch.tanh(
self.precomputed_constant * (input + 0.044715 * torch.pow(input, 3))
)
)
)
The class does not require any parameters during initialization. Here are the explanations for the various attributes and methods in the class:
Method/Attribute | Description | Argument |
---|---|---|
__init__ |
This is the constructor method that gets called when an object is created from the class. | None |
forward |
This method is a PyTorch standard for forward propagation in a Module or a neural network layer. It accepts a tensor input and returns a tensor. | input: Tensor |
Class Usage¶
Now, let's look at some examples of how to use this class.
Example 1: Basic Usage¶
import torch
from torch import Tensor
from torch.nn import Module
from zeta import AccurateGELUActivation
# Create an instance of the class
gelu_activation = AccurateGELUActivation()
# Create a PyTorch tensor
input = torch.tensor(
[[-1.0, -0.1, 0.1, 1.0], [0.5, -0.2, -2.1, 3.2]], dtype=torch.float32
)
# Use the AccurateGELUActivation instance to activate the input
output = gelu_activation(input)
print(output)
Example 2: Applying on Neural Network¶
The AccurateGELUActivation module can also be used as an activation layer in a PyTorch model.
import torch
from torch import Tensor
from torch.nn import Linear, Module
from zeta.nn import AccurateGELUActivation
class Net(Module):
def __init__(self):
super().__init__()
self.fc1 = Linear(10, 5)
self.fc2 = Linear(5, 2)
self.activation = AccurateGELUActivation()
def forward(self, x: Tensor) -> Tensor:
x = self.fc1(x)
x = self.activation(x)
x = self.fc2(x)
return x
# Create a model from the neural network class
model = Net()
input = torch.randn(3, 10)
# Pass the input to the model
output = model(input)
print(output)
Note: Please remember, understanding what activation functions like GELU can do, what benefits they can bring to your architecture, is crucial before applying it to your models.