What is machine learning?
Machine learning is a method of building systems that, instead of being given ready-made rules by a programmer, learn those rules themselves from data.
Machine learning is a class of methods in which a program is not explicitly coded for a specific task — instead, it learns to perform it by analysing historical data. The most frequently cited definition comes from Arthur Samuel in 1959:
"The field of study that gives computers the ability to learn without being explicitly programmed." — Arthur Samuel, 1959
In practice this means a system that takes a large set of observations as input and adjusts its internal parameters so that it performs the task better and better — classification, value prediction, text generation, or controlling a robotic arm.
ML is not a synonym for AI. Artificial intelligence is a broader umbrella that also covers systems built on hand-written rules (classical expert systems, the min-max algorithm in 1990s chess engines). Machine learning is a specific methodology for achieving AI — one in which the system's behaviour is derived from data rather than from a decision tree authored by an engineer. Deep learning (DL) is in turn a subset of ML that uses multi-layer neural networks. It is DL that powers most breakthroughs of the past decade — from image recognition to ChatGPT — but classical ML (regression, decision trees, SVM, k-means) still dominates hundreds of less spectacular, everyday business deployments. The hierarchy is therefore inclusive: all DL is ML, all ML is AI, but not the other way round. This is how IBM frames it in its educational documentation and how LeCun, Bengio and Hinton describe it in their 2015 Nature survey.
Overview diagram
The diagram below shows how AI, ML and DL nest inside each other and lays out the four main learning paradigms of machine learning, with examples of algorithms and applications in each category.
Machine learning hierarchy
Who is behind it?
Machine learning has no single father. The mathematical foundation was laid in 1943 by Warren McCulloch and Walter Pitts, who formalised the artificial neuron, and in 1949 by Donald Hebb, who introduced the synaptic learning rule (Hebb's rule). The term "machine learning" itself was coined in 1959 by Arthur Samuel at IBM, author of the first practical draughts program that improved its own strategy over time.
In 1957 Frank Rosenblatt built the perceptron — the first simple neural network capable of classifying visual patterns. The 1970s and 1980s brought backpropagation (with key work by David Rumelhart, Geoffrey Hinton and Yann LeCun), which made multi-layer networks trainable. In the 1990s Vladimir Vapnik developed Support Vector Machines. The modern explosion began in 2012, when AlexNet — a convolutional neural network by Alex Krizhevsky, Ilya Sutskever and Geoffrey Hinton — won the ImageNet competition, proving that deep networks trained on GPUs beat every prior approach in computer vision. The full history and the basic taxonomy are summarised in the Machine learning entry on Wikipedia, backed by textbooks from Tom Mitchell and Goodfellow, Bengio and Courville's Deep Learning.
Why did ML explode only after 2012?
This is the question most people ask: if the perceptron dates back to 1957 and backpropagation to the 1980s, why did machine learning become ubiquitous only a decade ago? The answer is that the algorithms alone were not enough — three things had to come together at once.
- Data — the internet, smartphones and social platforms generated an unprecedented volume of labelled examples. The ImageNet dataset (over a million annotated images) gave deep networks something to learn from.
- Compute — graphics cards (GPUs), originally designed for gaming, turned out to be perfect for the parallel matrix operations at the heart of training networks. What once took weeks began to take hours.
- Algorithms and software — techniques for training deep networks matured (better activation functions, regularisation, weight initialisation), and open libraries like TensorFlow and PyTorch meant building models no longer required writing everything from scratch.
AlexNet’s 2012 win was the spark, but the real cause was this convergence of data, hardware and algorithms — which is why the revolution arrived exactly then, and not twenty years earlier.
How does it work?
At the heart of every ML system there is a loop: input data → model → prediction → comparison with ground truth → parameter update. The model is a function with parameters (for example the weights of a neural network), and learning is a process of mathematical optimisation — usually minimising a so-called loss function (the gap between the model's prediction and the desired output).
The ML training loop
A typical pipeline goes through several steps:
- Gather the data
- Clean it — remove duplicates, normalise, fill in missing values
- Split into three subsets — training (typically 70–80%), validation and test
- Train — fit the model on the training set
- Tune hyperparameters — on the validation set
- Final evaluation — on the test set the model has never seen
This last step is critical: without it you cannot tell whether the model has learned real dependencies or merely memorised the training data (the overfitting problem).
The simplest illustration of the mechanics is Rosenblatt's perceptron — a single artificial neuron that makes a yes/no decision from a handful of inputs. Five elements govern how it works:
- Features — the individual, measurable properties of an object that you feed into the model as input. For an image these can be pixel values; for a loan application — age, income and repayment history; for text — word frequencies. The model never sees the “raw” object, only this set of numbers, and choosing good features often decides whether anything can be learned at all.
- Weights — every input has an associated number describing how strongly it influences the decision. A large positive weight means “this feature strongly argues for,” a large negative one “strongly against,” and one near zero “irrelevant.” The weights are exactly what the model learns: they start out random, and training gradually tunes them so the answers come out right.
- Bias — an extra number that tips the neuron toward “yes” or “no” before it even looks at the inputs — a bit like an innate predisposition. Thanks to it the model can learn that something is more or less likely by default, instead of always starting from a perfectly neutral position. It is another parameter that training tunes alongside the weights.
- Activation function — it takes the weighted sum of the inputs (plus the bias) and turns it into the neuron’s output. In the classic perceptron this is a simple threshold: above a certain value the neuron returns 1, below it 0. Modern networks use smooth functions (such as ReLU or sigmoid) that introduce non-linearity — without it even the deepest network could only model linear relationships.
- Weight correction — after each wrong prediction the algorithm checks in which direction and how far the output missed, then nudges every weight (and bias) slightly in the direction that reduces the error. The size of the update depends on the magnitude of the error and on the learning rate, which sets how big a step the model takes.
Modern deep networks do exactly the same thing, only with billions of such weights spread over hundreds of layers, and they use backpropagation and gradients to work out how to correct each one.
What are its key components?
A production-grade ML system consists of several layers:
- Data — raw material: transactional databases, images, audio, sensor logs, text corpora.
- Labels or training signal — what the model should learn (e.g. “this image shows a cat,” “this customer will default,” or, in reinforcement learning, a reward or penalty).
- Model — the architecture itself (logistic regression, decision tree, neural network, transformer).
- Learning algorithm — how the parameters are updated (gradient descent and its variants, Bayesian methods, evolutionary algorithms).
- Loss function — defines what exactly “good” means — that is, how far the model’s prediction is from the correct answer.
- Infrastructure — CPUs for classical ML, GPUs and TPUs for deep networks, and MLOps pipelines for training, versioning and deployment.
What can it be used for?
The list of applications today is close to the list of industries in the economy. In medicine, convolutional neural networks support the interpretation of X-rays, CT scans and histopathology slides, while regression algorithms help personalise drug dosing. In banking, ML automatically detects fraudulent transactions, scores creditworthiness and drives algorithmic trading. In Industry 4.0, predictive maintenance models analyse vibration, temperature and power consumption to predict a machine failure before it happens. In e-commerce and media, the recommender systems behind Netflix, Spotify and Amazon rest entirely on ML. NLP drives machine translation, voice assistants and chatbots.
In robotics ML is today the perception layer — and, increasingly, the control layer. Autonomous cars and drones fuse camera, LiDAR and radar data to classify objects in real time. Reinforcement learning teaches industrial arms optimal grasp points, and surgical robots — for instance experiments with the RAVEN platform at Berkeley — have tested autonomous tissue suturing. The direction is clear: from rigidly programmed assembly arms toward machines that adapt to a variable, uncertain environment.
What kinds of machine learning are there?
The four main paradigms of machine learning break down as follows:
- Supervised learning (the market-dominant paradigm) — needs a set of examples together with their correct answers; the model learns to map input to output. A classic example is spam filtering in an email inbox.
- Unsupervised learning — receives data without labels and looks for structure on its own: clusters, anomalies, hidden dimensions. A classic example is segmenting customers into groups with similar purchasing behaviour.
- Semi-supervised learning — a hybrid: a small pool of labelled data plus a large pool of unlabelled data. It is often the economic compromise wherever manual annotation is expensive. A classic example is medical image analysis, where having a doctor label every scan is expensive.
- Reinforcement learning — diverges from the rest: there are no “correct answers,” only an agent, an environment and a reward signal, and the agent learns a policy by trial and error. This is the paradigm behind AlphaGo, autonomous vehicles and much of modern adaptive robotics.
How does it differ from other approaches?
Against hand-written, classically programmed engineering, ML wins on flexibility and scalability but loses on interpretability. Against large generative models (LLMs), classical ML is cheaper, more predictable and often perfectly sufficient for structured problems — demand forecasting, scoring, anomaly detection.
Key limitations and challenges
Machine learning also has several significant limitations:
- The “black box” problem — deep networks with billions of parameters make decisions in ways that not even their creators can trace step by step. In medicine or law this is a barrier to adoption — hence the rise of the Explainable AI (XAI) field.
- Bias — a model learning from historical data internalises the structural inequalities in that data. High-profile US recidivism risk-assessment systems showed how easily discrimination can be encoded inside seemingly objective mathematics.
- Energy cost — training large models generates a significant carbon footprint.
- Vulnerability to adversarial attacks — deliberately crafted inputs can fool even highly accurate networks.
- Data quality — the “garbage in, garbage out” rule remains painfully relevant in ML — no algorithm fixes a biased or incomplete dataset.
The EU AI Act formalises these issues as obligations for high-risk systems.
Why does it matter?
Economists increasingly classify machine learning as a general purpose technology — the same category as electricity at the start of the 20th century or the internet in the 1990s. It is not a single product or a single industry but a new computational layer on top of which successive applications, devices and entire business models are built. Every big name of recent years — GPT-4, Gemini, Claude, AlphaFold, Waymo's autonomous taxis, content filtering systems, the perception stacks of humanoids from Figure and Unitree — is a different application of the same family of methods.
For an engineer this means that knowing ML is ceasing to be a "data scientist" specialisation and is becoming part of the general toolbox, much as database literacy did a generation ago. For companies it means that the ability to collect and curate proprietary data is becoming a strategic asset. For society it means that tools that decide on our credit, insurance, diagnosis or career are increasingly learning from data whose origins and distortions we do not know. That is why the conversation about ML cannot stay purely technical — it has to run in parallel with regulation, audit and explainability mechanisms.
Machine learning is therefore neither a new "magic" nor intelligence in the human sense. It is a set of statistical optimisation tools — extremely powerful in domains with plenty of data and clearly defined tasks, yet still naive outside those areas. The coming decade will decide whether we learn to exploit that asymmetry consciously or keep confusing it with real understanding.
Sources
- IBM — What is machine learning? — link
- Yann LeCun, Yoshua Bengio, Geoffrey Hinton — Deep Learning, Nature 521 (2015) — link
- Ian Goodfellow, Yoshua Bengio, Aaron Courville — Deep Learning (MIT Press, 2016) — link
- Wikipedia — Machine learning — link
- ImageNet — project and competition that sparked the DL revolution — link
- European Commission — Regulatory framework for AI (AI Act) — link
