Module Name: MMFusionFFN

Overview

The MMFusionFFN module represents a positionwise feedforward layer and is used in the context of multi-modal image and text processing.

Class Definition

MMFusionFFN(input_dim, hidden_dim, dropout=0.0)

Args

Name	Type	Description	Default
input_dim	int	Input dimension	-
hidden_dim	int	Hidden dimension	-
output_dim	int	Output dimension	-
dropout	float	Dropout probability.	0.1

Functionality and Usage

The MMFusionFFN module is a subclass of the nn.Module class and contains a forward method which computes the output of the positionwise feedforward layer.

The method performs the following operations: 1. Apply layer normalization to the input tensor. 2. Pass the resulting tensor through a linear transformation (fully connected layer) with a SiLU (Sigmoid Linear Unit) activation function. 3. Apply dropout to the tensor. 4. Repeat steps 2 and 3 with a second fully connected layer. 5. Return the output tensor.

Usage Examples

import torch
from torch import nn

from zeta.nn import MMFusionFFN

# Define the input and hidden dimensions
input_dim = 512
hidden_dim = 1024
output_dim = 512
dropout = 0.1

# Create an instance of MMFusionFFN
ffn = MMFusionFFN(input_dim, hidden_dim, output_dim, dropout)

# Example 1 - Forward pass with random input data
input_data = torch.randn(
    5, 32, input_dim
)  # Random input data of shape (5, 32, input_dim)
output = ffn(input_data)
print(output.shape)  # Output tensor shape

# Example 2 - Create an instance with default dropout
ffn_default_dropout = MMFusionFFN(input_dim, hidden_dim, output_dim)

# Example 3 - Forward pass with another input data
input_data2 = torch.randn(
    8, 16, input_dim
)  # Random input data of shape (8, 16, input_dim)
output2 = ffn_default_dropout(input_data2)
print(output2.shape)  # Output tensor shape

Additional Information and Tips

The MMFusionFFN module is commonly used in multimodal machine learning applications to process multi-dimensional input data from different modalities, such as image and text.
The most important parameters to consider when creating an instance of MMFusionFFN are input_dim and hidden_dim. These parameters can be adjusted based on the specifics of the input data and the desired level of transformation.
The dropout parameter controls the probability of an element to be zeroed in the forward pass, which can help prevent overfitting.

References and Resources

PyTorch Documentation: nn.Module
Hugging Face Documentation: SiLU Activation Function

This comprehensive documentation provides a detailed overview of the MMFusionFFN module, including its purpose, architecture, usage examples, and additional information. Developers can now use this documentation to effectively utilize the module in their applications.

The examples illustrate how to create instances of MMFusionFFN, perform forward passes, and handle different input shapes, providing a practical guide for utilizing the module. Additionally, important attributes, such as input_dim, hidden_dim, and dropout, are explained in the class definition table for easy reference and understanding.