Saturday, May 24, 2025

How to initialise a neural network with Hidden layer and activation function as ReLU, Explain param at each layer.

# Determine the number of input features

input_dim = X_train.shape[1]


# Build the Sequential model

model = Sequential([

    # First hidden layer with ReLU activation

    Dense(64, activation='relu', input_shape=(input_dim,)),

    # Second hidden layer with ReLU activation

    Dense(32, activation='relu'),

    # Output layer with Sigmoid activation for binary classification

    Dense(1, activation='sigmoid')

])


Why First hidden layer is having 64 and second layer as 32 and output layer 1 ?


The choice of layer sizes (64, 32, 1) is somewhat arbitrary and often determined through experimentation  and based on the complexity of the problem and dataset size.


Input Layer: The number of neurons in the input layer is determined by the number of features in your dataset.


In this case, we have 10 features, so the input layer effectively has 10 neurons (though it's implicitly defined by the input_shape in the first Dense layer).


First Hidden Layer (64 neurons): Starting with a larger number of neurons (like 64) in the first hidden layer allows the network to learn a rich set of initial representations from the raw input features. It provides enough capacity to capture various patterns and combinations within the data.


Second Hidden Layer (32 neurons): Reducing the number of neurons in the second hidden layer (to 32) is a common practice. This layer learns more abstract and compressed representations from the output of the first hidden layer. It helps in capturing higher-level patterns and can also help in reducing computational cost and preventing

overfitting by forcing the network to learn more compact representations. The idea is to progressively reduce the dimensionality and complexity as we move deeper into the network, extracting more meaningful features.


Output Layer (1 neuron): For a binary classification problem (like predicting 0 or 1), the output layer needs to produce a single value that can be interpreted as the probability of belonging to one of the classe A single neuron with a sigmoid activation function outputs a value between 0 and 1, which represents the estimated probability of the positive class (target = 1). If the output is > 0.5, the prediction is typically classified as 1, otherwise as 0.


In summary, the numbers 64 and 32 are common starting points for hidden layer sizes in many neural network architectures. They provide sufficient capacity for many tasks without being excessively large, which could lead to overfitting on smaller datasets. The output layer size is dictated by the nature of the prediction task (1 for binary classification, number of classes for multi-class classification, etc.).

Now if we print the summary of the model, it looks below. How the number of params calculated? 


Explanation of Parameter Calculation:

Total parameters in a Dense layer are calculated as:

(number of neurons in previous layer + 1) * number of neurons in current layer

The '+ 1' accounts for the bias term for each neuron in the current layer. 


Layer 1 (Dense, 64 neurons, ReLU):

Input layer has X_train.shape[1] features (which is 10).

Parameters = (number of inputs + 1) * number of neurons

Parameters = (10 + 1) * 64 = 11 * 64 = 704

These are the weights connecting the 10 input features and 1 bias to the 64 neurons.


Layer 2 (Dense, 32 neurons, ReLU):

Previous layer (Layer 1) has 64 neurons.

Parameters = (number of neurons in previous layer + 1) * number of neurons

Parameters = (64 + 1) * 32 = 65 * 32 = 2080

These are the weights connecting the 64 neurons of the first hidden layer and 1 bias to the 32 neurons of the second hidden layer.


Layer 3 (Dense, 1 neuron, Sigmoid):

Previous layer (Layer 2) has 32 neurons.

Parameters = (number of neurons in previous layer + 1) * number of neurons

Parameters = (32 + 1) * 1 = 33 * 1 = 33

These are the weights connecting the 32 neurons of the second hidden layer and 1 bias to the single output neuron.


Total parameters = Parameters from Layer 1 + Parameters from Layer 2 + Parameters from Layer 3

Total parameters = 704 + 2080 + 33 = 2817

The model summary confirms this total number of parameters.

No comments:

Post a Comment