Robots Atlas>ROBOTS ATLAS

โ† Courses

Sieci neuronowe od podstaw do nowoczesnej AI Cover

Deep LearningIntermediate

Neural Networks: From Fundamentals to Modern AI

13 Chapters65 Lessons

The course covers the full scope of neural networks โ€” from mathematical foundations (linear algebra, calculus, statistics), through the backpropagation mechanism, to modern deep learning architectures used in industry and research. Participants study fully connected networks (MLP), convolutional networks (CNN), recurrent networks (RNN, LSTM, GRU), attention mechanisms, and the fundamentals of transformers. All material is grounded in the PyTorch ecosystem โ€” every implementation is coded from scratch and then refactored to idiomatic framework code. Prerequisites: Python scripting and basic NumPy; no prior ML library experience or advanced mathematics required (all necessary concepts are introduced in-course). Not covered: large language models (LLM), diffusion models, reinforcement learning, production deployment (MLOps), or advanced regularization beyond the practical level. Graduates are ready to independently design deep network experiments, interpret training results, and join PyTorch-based projects without senior support.

Chapters

MODULE 01

What is a neural network โ€” your mental model of AI

0 / 4 ยท 0%

A beginner-friendly introductory chapter: what AI, ML and deep learning are, how an artificial neural network works, the three learning paradigms, and the lifecycle of an ML project. No code, no formulas โ€” only intuition and everyday-life analogies.

  1. 1.1What is artificial intelligence, machine learning, and deep learning
  2. 1.2What is an artificial neural network โ€” the biological analogy and its limits
  3. 1.3Three learning paradigms: supervised, unsupervised, reinforcement
  4. 1.4ML project lifecycle: data โ†’ training โ†’ evaluation โ†’ deployment
MODULE 02

Math and tools: tensors, gradients, Python, NumPy

0 / 6 ยท 0%

Mathematical foundation before PyTorch: scalar, vector, matrix and tensor with geometric intuition, tensor operations, derivative and chain rule, gradient of a multi-variable function, gradient descent on a simple 1D function, and Python + NumPy as a bridge to PyTorch. No epsilon-delta โ€” only intuition, directions, and arrows on the loss map.

  1. 2.1Scalar, vector, matrix, tensor โ€” geometric intuition
  2. 2.2Tensor operations: addition, multiplication, matrix multiplication
  3. 2.3Derivative and chain rule โ€” intuition of the direction of fastest growth
  4. 2.4Gradient of a multi-variable function โ€” an arrow on the loss map
  5. 2.5Gradient descent on a simple function โ€” walking downhill step by step
  6. 2.6Python, NumPy and the first tensor โ€” a bridge to PyTorch
MODULE 03

Your First End-to-End Training โ€” From Data to Prediction

0 / 5 ยท 0%

Your first working classifier: how data becomes a prediction. You learn the dataset, the loss, the training loop (forward, loss, gradient, update), evaluation, and code an XOR classifier in pure NumPy.

  1. 3.1Data: features, labels, dataset, batch, epoch
  2. 3.2Loss function โ€” measuring how wrong the network is
  3. 3.3The training loop: forward โ†’ loss โ†’ gradient โ†’ update
  4. 3.4Evaluation: train/val/test split, accuracy, when to stop training
  5. 3.5XOR classifier in pure NumPy โ€” a 2-2-1 network from scratch
MODULE 04

PyTorch Environment and Tensor Foundations

0 / 4 ยท 0%

PyTorch fundamentals: tensors and their operations, autograd and the computational graph, layers via nn.Module, and a full training cycle with metrics and GPU usage.

  1. 4.1Tensor: shape, dtype, operations and broadcasting
  2. 4.2Autograd: computational graph, backward() and grad_fn
  3. 4.3nn.Module, nn.Parameter, layers and their registration
  4. 4.4Train/val/test cycle, metrics and working with GPU (AMP, mixed precision)
MODULE 05

From Neuron to MLP: Architecture and Forward Pass

0 / 6 ยท 0%

From a single perceptron to a multilayer MLP: activation functions (sigmoid, ReLU, GELU, tanh), the Universal Approximation Theorem, forward pass mechanics, loss functions (MSE and Cross-Entropy), and implementing a 2-layer network from scratch in pure NumPy.

  1. 5.1Perceptron: input, weight, bias, activation
  2. 5.2Activation functions: sigmoid, ReLU, GELU, tanh โ€” when and why
  3. 5.3The Universal Approximation Theorem โ€” why non-linearity is necessary
  4. 5.4Multilayer network (MLP) and the forward pass step by step
  5. 5.5Loss functions: MSE and Cross-Entropy โ€” intuition and choice
  6. 5.6Implementing a 2-layer MLP from scratch (no autograd, pure NumPy)
MODULE 06

Backpropagation: How a Network Learns

0 / 5 ยท 0%

The backpropagation algorithm from its mathematical foundation to practical implementation: the chain rule as the core of backprop, symmetry between forward and backward pass, building a Karpathy-style micrograd autograd, hand-deriving gradients through cross-entropy, a linear layer and tanh, and the impact of Xavier and He initialization on healthy gradient flow.

  1. 6.1Chain rule โ€” the foundation of backpropagation
  2. 6.2Forward pass vs backward pass โ€” symmetry and gradient flow
  3. 6.3Building micrograd: Value, backward(), graph visualization (Karpathy)
  4. 6.4Backprop Ninja: manual backward through cross-entropy, linear, tanh and batch-norm
  5. 6.5Weight initialization: Xavier and He โ€” how the start decides gradient flow
MODULE 07

Training in practice: optimizers and diagnostics

0 / 6 ยท 0%

The practical side of training neural networks: geometry of the loss landscape and mini-batch SGD, momentum and Adam as a family of adaptive optimizers, learning rate schedules (step decay, cosine annealing, warmup), systematic training diagnostics (overfit a single batch, sanity-check loss at init), gradient histograms, the dead neurons problem and gradient clipping, and the classical bias-variance tradeoff as a framework for diagnosing underfitting and overfitting.

  1. 7.1Gradient descent geometrically: loss surface, learning rate and mini-batch SGD
  2. 7.2Momentum and Adam: adaptive learning rates and when to use them
  3. 7.3LR schedules: step decay, cosine annealing, warmup
  4. 7.4Systematic diagnostics: overfit single batch, init loss, learning curves
  5. 7.5Gradient histograms, dead neurons and gradient clipping
  6. 7.6Bias-variance tradeoff and diagnosing underfitting vs overfitting
MODULE 08

Regularization โ€” how to avoid overfitting

0 / 5 ยท 0%

Regularization as a set of techniques that preserve model generalization: dropout as stochastic neuron suppression with different behavior in train vs eval mode, weight decay and L2 as a penalty for large weights, batch normalization addressing internal covariate shift, layer normalization as an alternative for small batches and variable-length sequences, and early stopping together with systematic training monitoring (loss curves, train/val splits, stopping criteria).

  1. 8.1Dropout: mechanism, train vs eval mode, implementation
  2. 8.2Weight decay and L2 regularization โ€” penalizing large weights
  3. 8.3Batch Normalization: the internal covariate shift problem and its solution
  4. 8.4Layer Normalization: when BatchNorm fails and how to replace it
  5. 8.5Early stopping and training monitoring strategies
MODULE 09

Convolutional Neural Networks (CNN)

0 / 5 ยท 0%

Convolutional networks as the foundation of modern computer vision: 2D convolution with a filter as a feature detector, the role of padding, stride, and translation equivariance; pooling and the flow of spatial dimensions through successive layers; the evolution of architectures from AlexNet through VGG to ResNet and the answer to what changed and why; skip connections and residual blocks that solve the degradation problem in very deep networks (He et al. 2015); transfer learning as feature extraction and fine-tuning of pretrained models.

  1. 9.12D convolution: filter as feature detector, padding, stride, equivariance
  2. 9.2Pooling, feature maps, and dimension flow through the network
  3. 9.3Architecture evolution: AlexNet โ†’ VGG โ†’ ResNet โ€” what changed and why
  4. 9.4Skip connections and residual blocks โ€” solving the degradation problem (He 2015)
  5. 9.5Transfer learning โ€” feature extraction vs fine-tuning (how to leverage ImageNet)
MODULE 10

Interpretation and Visualization of Neural Networks

0 / 4 ยท 0%

How to open the black box of a deep network: visualization of learned filters and activation maps in a CNN (Zeiler & Fergus 2014); GradCAM as a gradient-weighted class saliency map (Selvaraju et al. 2017); adversarial examples and FGSM as a proof of decision fragility (Goodfellow et al. 2015); model profiling โ€” parameter count, FLOPs, inference latency as concrete computational cost metrics.

  1. 10.1Visualizing filters and activation maps in a CNN
  2. 10.2GradCAM: gradient-weighted class activation maps
  3. 10.3Adversarial examples โ€” when the network fails and why
  4. 10.4Model profiling: parameters, FLOPs, inference time
MODULE 11

Sequences: RNN, LSTM and GRU

0 / 5 ยท 0%

Why feedforward networks are insufficient for sequential data and how recurrence solves this problem. The classical RNN and its training via BPTT (backpropagation through time, Werbos 1990). Gradient pathology in deep time unrollings โ€” vanishing and exploding (Bengio et al. 1994). LSTM as the answer to vanishing gradients with forget, input and output gates (Hochreiter & Schmidhuber 1997). GRU as a simplified LSTM alternative with fewer gates (Cho et al. 2014).

  1. 11.1The sequence problem โ€” why feedforward is not enough
  2. 11.2RNN: hidden-state loop, BPTT and a first-step implementation
  3. 11.3Vanishing and exploding gradients in deep RNNs
  4. 11.4LSTM: gates, cell state and the constant error carousel
  5. 11.5GRU: the simplified LSTM variant
MODULE 12

The Attention Mechanism and the Transformer

0 / 6 ยท 0%

The attention mechanism is the invention that replaced recurrence as the foundation of sequence modeling and gave rise to the Transformer architecture (Vaswani et al. 2017). The chapter covers the motivation โ€” RNN limitations on long-range dependencies (vanishing gradients, lack of parallelism) โ€” followed by scaled dot-product attention with the Query/Key/Value triple, multi-head attention and positional encoding, the full encoder block (FFN, residual, Layer Norm), BPE tokenization, and a mini-Transformer implementation from scratch in PyTorch.

  1. 12.1Motivation โ€” RNN limitations and long-range dependencies
  2. 12.2Self-attention โ€” Query, Key, Value and scaled dot-product attention
  3. 12.3Multi-head attention and positional encoding
  4. 12.4Transformer architecture โ€” encoder block, FFN, LayerNorm, residual
  5. 12.5Tokenization and BPE โ€” why text is neither characters nor words
  6. 12.6Implementing a mini-Transformer from scratch in PyTorch
MODULE 13

Generative Models: Autoencoders, VAEs, and GANs

0 / 4 ยท 0%

This chapter introduces deep generative models: autoencoders and their latent space, variational autoencoders (VAEs) with the reparametrization trick and ELBO, and GANs โ€” from adversarial training to common pathologies (mode collapse, instability) and techniques to mitigate them.

  1. 13.1Autoencoders: encoder, decoder, and the latent space
  2. 13.2VAE: the reparametrization trick and ELBO
  3. 13.3GAN: adversarial training, generator vs discriminator
  4. 13.4GAN training issues: mode collapse, instability, and stabilisation techniques