Deep LearningIntermediate

Neural Networks: From Fundamentals to Modern AI

13 Chapters65 Lessons

The course covers the full scope of neural networks — from mathematical foundations (linear algebra, calculus, statistics), through the backpropagation mechanism, to modern deep learning architectures used in industry and research. Participants study fully connected networks (MLP), convolutional networks (CNN), recurrent networks (RNN, LSTM, GRU), attention mechanisms, and the fundamentals of transformers. All material is grounded in the PyTorch ecosystem — every implementation is coded from scratch and then refactored to idiomatic framework code. Prerequisites: Python scripting and basic NumPy; no prior ML library experience or advanced mathematics required (all necessary concepts are introduced in-course). Not covered: large language models (LLM), diffusion models, reinforcement learning, production deployment (MLOps), or advanced regularization beyond the practical level. Graduates are ready to independently design deep network experiments, interpret training results, and join PyTorch-based projects without senior support.

Chapters

MODULE 01

What is a neural network — your mental model of AI

0 / 4 · 0%

A beginner-friendly introductory chapter: what AI, ML and deep learning are, how an artificial neural network works, the three learning paradigms, and the lifecycle of an ML project. No code, no formulas — only intuition and everyday-life analogies.

MODULE 02

Math and tools: tensors, gradients, Python, NumPy

0 / 6 · 0%

Mathematical foundation before PyTorch: scalar, vector, matrix and tensor with geometric intuition, tensor operations, derivative and chain rule, gradient of a multi-variable function, gradient descent on a simple 1D function, and Python + NumPy as a bridge to PyTorch. No epsilon-delta — only intuition, directions, and arrows on the loss map.

MODULE 03

Your First End-to-End Training — From Data to Prediction

0 / 5 · 0%

Your first working classifier: how data becomes a prediction. You learn the dataset, the loss, the training loop (forward, loss, gradient, update), evaluation, and code an XOR classifier in pure NumPy.

MODULE 04

PyTorch Environment and Tensor Foundations

0 / 4 · 0%

PyTorch fundamentals: tensors and their operations, autograd and the computational graph, layers via nn.Module, and a full training cycle with metrics and GPU usage.

MODULE 05

From Neuron to MLP: Architecture and Forward Pass

0 / 6 · 0%

From a single perceptron to a multilayer MLP: activation functions (sigmoid, ReLU, GELU, tanh), the Universal Approximation Theorem, forward pass mechanics, loss functions (MSE and Cross-Entropy), and implementing a 2-layer network from scratch in pure NumPy.

MODULE 06

Backpropagation: How a Network Learns

0 / 5 · 0%

The backpropagation algorithm from its mathematical foundation to practical implementation: the chain rule as the core of backprop, symmetry between forward and backward pass, building a Karpathy-style micrograd autograd, hand-deriving gradients through cross-entropy, a linear layer and tanh, and the impact of Xavier and He initialization on healthy gradient flow.

MODULE 07

Training in practice: optimizers and diagnostics

0 / 6 · 0%

The practical side of training neural networks: geometry of the loss landscape and mini-batch SGD, momentum and Adam as a family of adaptive optimizers, learning rate schedules (step decay, cosine annealing, warmup), systematic training diagnostics (overfit a single batch, sanity-check loss at init), gradient histograms, the dead neurons problem and gradient clipping, and the classical bias-variance tradeoff as a framework for diagnosing underfitting and overfitting.

MODULE 08

Regularization — how to avoid overfitting

0 / 5 · 0%

Regularization as a set of techniques that preserve model generalization: dropout as stochastic neuron suppression with different behavior in train vs eval mode, weight decay and L2 as a penalty for large weights, batch normalization addressing internal covariate shift, layer normalization as an alternative for small batches and variable-length sequences, and early stopping together with systematic training monitoring (loss curves, train/val splits, stopping criteria).

MODULE 09

Convolutional Neural Networks (CNN)

0 / 5 · 0%

Convolutional networks as the foundation of modern computer vision: 2D convolution with a filter as a feature detector, the role of padding, stride, and translation equivariance; pooling and the flow of spatial dimensions through successive layers; the evolution of architectures from AlexNet through VGG to ResNet and the answer to what changed and why; skip connections and residual blocks that solve the degradation problem in very deep networks (He et al. 2015); transfer learning as feature extraction and fine-tuning of pretrained models.

MODULE 10

Interpretation and Visualization of Neural Networks

0 / 4 · 0%

How to open the black box of a deep network: visualization of learned filters and activation maps in a CNN (Zeiler & Fergus 2014); GradCAM as a gradient-weighted class saliency map (Selvaraju et al. 2017); adversarial examples and FGSM as a proof of decision fragility (Goodfellow et al. 2015); model profiling — parameter count, FLOPs, inference latency as concrete computational cost metrics.

MODULE 11

Sequences: RNN, LSTM and GRU

0 / 5 · 0%

Why feedforward networks are insufficient for sequential data and how recurrence solves this problem. The classical RNN and its training via BPTT (backpropagation through time, Werbos 1990). Gradient pathology in deep time unrollings — vanishing and exploding (Bengio et al. 1994). LSTM as the answer to vanishing gradients with forget, input and output gates (Hochreiter & Schmidhuber 1997). GRU as a simplified LSTM alternative with fewer gates (Cho et al. 2014).

MODULE 12

The Attention Mechanism and the Transformer

0 / 6 · 0%

The attention mechanism is the invention that replaced recurrence as the foundation of sequence modeling and gave rise to the Transformer architecture (Vaswani et al. 2017). The chapter covers the motivation — RNN limitations on long-range dependencies (vanishing gradients, lack of parallelism) — followed by scaled dot-product attention with the Query/Key/Value triple, multi-head attention and positional encoding, the full encoder block (FFN, residual, Layer Norm), BPE tokenization, and a mini-Transformer implementation from scratch in PyTorch.

MODULE 13

Generative Models: Autoencoders, VAEs, and GANs

0 / 4 · 0%

This chapter introduces deep generative models: autoencoders and their latent space, variational autoencoders (VAEs) with the reparametrization trick and ELBO, and GANs — from adversarial training to common pathologies (mode collapse, instability) and techniques to mitigate them.