Convolutional Neural Network

Convolutional Neural Network | Generated by AI

Home PDF

What is CNN (Convolutional Neural Network)?

A Convolutional Neural Network (CNN) is a type of deep neural network that is primarily used for image-related tasks such as classification, object detection, and segmentation. CNNs are designed to automatically and adaptively learn spatial hierarchies of features from input images. They are highly effective at capturing local dependencies and reducing the number of parameters compared to traditional fully connected layers in neural networks.

Key Components of a CNN:

Convolutional Layer:
This layer applies convolutional operations to the input data, which helps extract features from the input image (e.g., edges, textures, patterns). The convolution operation uses filters (also called kernels) that slide over the input image.
Pooling Layer:
Pooling layers are used to downsample the feature maps, reducing their spatial dimensions and making the network more computationally efficient while also helping with translation invariance (the ability to recognize objects even if they are moved around in the image).
Fully Connected Layer:
After the convolution and pooling layers, fully connected layers are used to classify the features extracted from the previous layers. The final output layer typically uses a softmax or sigmoid activation function for classification tasks.
Activation Function (ReLU):
After each convolution or fully connected layer, an activation function like ReLU (Rectified Linear Unit) is often used to introduce non-linearity into the model, allowing it to learn more complex patterns.

CNN Architecture Example:

Input Layer: An image or a batch of images.
Convolutional Layer 1: Apply a set of convolutional filters (kernels).
ReLU Activation: Apply ReLU to introduce non-linearity.
Pooling Layer 1: Max pooling or average pooling.
Convolutional Layer 2: Apply additional convolutions.
Fully Connected Layer: Flatten the output and feed it into fully connected layers for classification.
Output Layer: Softmax or sigmoid activation for the final classification result.

CNN Implementation from Scratch (without using frameworks like TensorFlow/PyTorch)

Here’s a simple implementation of a CNN using NumPy. This will give you a basic idea of how the operations (convolution, ReLU, pooling, etc.) work in a CNN.

We’ll implement a basic CNN with:

A convolution layer
A ReLU activation layer
A pooling layer
A fully connected layer

We’ll focus on a very simplified version of a CNN, where we don’t have advanced features like batch normalization, dropout, etc.

Step 1: Convolution Layer

We will implement the convolution operation, which involves sliding a filter (kernel) over the input image.

import numpy as np

def convolve2d(input_image, kernel):
    kernel_height, kernel_width = kernel.shape
    image_height, image_width = input_image.shape
    
    # Output dimensions after convolution
    output_height = image_height - kernel_height + 1
    output_width = image_width - kernel_width + 1
    
    output = np.zeros((output_height, output_width))
    
    # Sliding the kernel over the input image
    for i in range(output_height):
        for j in range(output_width):
            region = input_image[i:i+kernel_height, j:j+kernel_width]
            output[i, j] = np.sum(region * kernel)  # Element-wise multiplication and sum
    return output

Step 2: ReLU Activation

ReLU is applied element-wise on the convolution output.

def relu(input_image):
    return np.maximum(0, input_image)  # ReLU operation

Step 3: Pooling Layer (Max Pooling)

We will implement a simple max pooling layer with a 2x2 window and stride 2.

def max_pooling(input_image, pool_size=2, stride=2):
    image_height, image_width = input_image.shape
    output_height = (image_height - pool_size) // stride + 1
    output_width = (image_width - pool_size) // stride + 1
    
    output = np.zeros((output_height, output_width))
    
    # Apply max pooling
    for i in range(0, image_height - pool_size + 1, stride):
        for j in range(0, image_width - pool_size + 1, stride):
            region = input_image[i:i+pool_size, j:j+pool_size]
            output[i // stride, j // stride] = np.max(region)
    
    return output

Step 4: Fully Connected Layer

The fully connected layer is simply a dense layer that takes the output of the previous layers and computes a weighted sum.

def fully_connected(input_image, weights, bias):
    # Flatten the input image (if it's multi-dimensional)
    flattened_input = input_image.flatten()
    
    # Calculate the output of the fully connected layer
    output = np.dot(flattened_input, weights) + bias
    return output

Step 5: Putting it All Together

Now, let’s define a simple example where we will create a CNN that takes an image, applies convolution, ReLU, pooling, and then a fully connected layer to make a prediction.

# Example image (5x5)
image = np.array([[1, 2, 3, 0, 1],
                  [4, 5, 6, 1, 2],
                  [7, 8, 9, 2, 3],
                  [1, 2, 3, 4, 5],
                  [6, 7, 8, 5, 6]])

# Define a simple kernel (3x3)
kernel = np.array([[1, 0, -1],
                   [1, 0, -1],
                   [1, 0, -1]])

# Convolution operation
conv_output = convolve2d(image, kernel)
print("Convolution Output:")
print(conv_output)

# Apply ReLU activation
relu_output = relu(conv_output)
print("ReLU Output:")
print(relu_output)

# Apply Max Pooling
pool_output = max_pooling(relu_output)
print("Max Pooling Output:")
print(pool_output)

# Fully connected layer (flattened output, 1D weights and bias)
weights = np.random.randn(pool_output.size)  # Random weights
bias = np.random.randn()  # Random bias
fc_output = fully_connected(pool_output, weights, bias)
print("Fully Connected Output:")
print(fc_output)

Explanation of the Process:

Input Image: We create a simple 5x5 image.
Convolution: We apply a 3x3 filter to the image, resulting in a smaller output.
ReLU: We apply the ReLU function to introduce non-linearity.
Pooling: We downsample the image using max pooling with a 2x2 window and stride 2.
Fully Connected: We flatten the pooled output and apply a fully connected layer with random weights and bias.

This is a very simplified CNN, but it should give you a sense of how convolution, ReLU, pooling, and fully connected layers work.

Conclusion

This implementation provides a basic framework for understanding how a CNN works at a low level. In practice, frameworks like TensorFlow, Keras, and PyTorch handle many of the optimizations, efficiency improvements, and additional features (e.g., batch normalization, dropout) that make CNNs more powerful and easier to train on large datasets.

Let me know if you’d like further details on how to extend this or if you need help with another part of CNNs!

Back 2025.03.30 Donate