-- Living Mobile --: Multi Layer Perceptron (MLP)

Monday, March 10, 2025

Multi Layer Perceptron (MLP)

MLP is a fundamental building block within the transformer’s feed-forward network. Its role is to introduce non-linearity and learn complex relationships within the embedded representations. When defining an MLP module, an important parameter is n_embed, which defines the dimensionality of the input embedding.

The MLP typically consists of a hidden linear layer that expands the input dimension by a factor (often 4, which we will use), followed by a non-linear activation function, commonly ReLU. This structure allows our network to learn more complex features. Finally, a projection linear layer maps the expanded representation back to the original embedding dimension. This sequence of transformations enables the MLP to refine the representations learned by the attention mechanism.

-- Living Mobile --

Monday, March 10, 2025

Multi Layer Perceptron (MLP)

No comments:

Post a Comment

Followers

Blog Archive

About Me