A team of researchers from European institutions published on May 11, 2026, in Nature Machine Intelligence a roadmap article that for the first time systematically describes how to apply explainable artificial intelligence (XAI) to protein language models. The paper argues that growing opacity in these models has become a real challenge — not only scientifically, but also for biological safety.
Key takeaways
- Publication: "Towards the explainability of protein language models" — Nature Machine Intelligence, May 11, 2026
- XAI divided into four analysis layers: training data, input, internal model structure, and input-output behavior
- Five XAI roles identified: evaluator, multi-tasker, engineer, trainer, and teacher — the teacher role considered the ultimate goal
- Lack of interpretability in protein models identified as a biosafety risk — a model may silently embed immune evasion motifs
Protein models reach SOTA but do not explain their decisions
Protein language models (pLMs) have taken over in recent years as the leading tool in computational biology. AlphaFold 3 predicts protein structures with precision comparable to experimental methods. Generative pLMs design new enzymes, antiviral peptides, and sequences tailored to specific therapeutic functions. In tasks such as drug-target interaction prediction or protein design, these models achieve state-of-the-art (SOTA) results.
The problem appears when you ask why. Why does the model generate this particular sequence? Which amino acid residues determine the predicted function? What patterns are actually encoded in the 650-million-parameter weights of ESM-2 or similar architectures built on the transformer? Contemporary pLMs offer no answers to these questions — they are black boxes in the fullest sense.
The paper's authors, Andrea Hunklinger and Noelia Ferruz, argue that the prevailing progress logic — more data, more parameters, deeper transformer architectures — does not automatically translate into understanding. Standard benchmarks measure effectiveness, not transparency. This is the gap their roadmap aims to systematically address.
Four layers of explainability for biological AI
The paper proposes a classification framework for XAI dedicated to protein models. Explainability can be analyzed across four levels:
- Training data layer — which protein sequences actually shape model behavior and where biases in the training set emerge
- Input layer — identification of amino acids that actually drive a given prediction (analogous to saliency and attribution methods in NLP)
- Internal model structure — analysis of attention heads, neurons, sparse autoencoders (SAE), and the residual stream in transformer architectures
- Input-output behavior — explaining model decisions via perturbations, proxy models (e.g., LIME, SHAP), and counterfactual methods
These frameworks are not limited to transformer architectures. The authors note they can also be applied to diffusion models, graph neural networks (GNNs), and AlphaFold-style hybrids — covering the full ecology of modern bioinformatics tools.
From evaluator to teacher: five XAI roles in protein science
Based on their literature review, the authors identified five roles that XAI plays (or could play) in protein research. Currently, the evaluator role dominates: interpretability methods are mainly used to check whether a model has learned patterns already known to biologists — such as structural motifs or ligand binding sites. This is a useful validation function but limited: it does not allow discovering new biological relationships, improving model architecture, or flagging training data artifacts.
The remaining roles — multi-tasker (transferring knowledge across tasks), engineer (improving architectures), and trainer (guiding the learning process) — are progressively less represented in the literature, indicating the field's immaturity. The ultimate role is the teacher: an XAI system that allows researchers to extract genuinely new biological knowledge from a model — patterns previously unknown to humans, discovered because the model processed millions of sequences from databases such as UniProt.
Biosafety: why lack of explainability is not just an academic problem
The paper dedicates particular attention to biological safety implications. If a generative pLM is uninterpretable, it may — without the operator's knowledge — encode immune evasion motifs into a designed protein: sequence fragments that enable evasion of the immune response. A researcher looking only at the output (the finished sequence) may fail to detect such a risk.
The authors argue that only XAI operating at the teacher level — capable of unambiguously flagging that a given sequence segment is conserved because it disrupts a specific host receptor — enables researchers to stop the design process before a real threat materializes. This is the argument that explainability is not an academic luxury but a necessary condition for responsible deployment of AI in biodesign.
Context: how does XAI in proteins compare to NLP?
In natural language processing, explainability methods — from LIME and SHAP through attention mechanisms to interpretable probing — have been applied for years and have a mature tooling ecosystem. For pLMs, analogous tools are only beginning to emerge. A key structural difference is that proteins have three-dimensional conformations, and the same sequence can perform different functions in different cellular contexts. This makes a simple transfer of NLP methods impossible — adaptation incorporating amino acid physicochemistry and the evolutionary history of sequences is required.
Works like the one described here represent an attempt to draw a roadmap for that adaptation — both at the methodological level and in the infrastructure layer for evaluating and benchmarking biological interpretability.
Why it matters
Computational biology is entering a phase where large-scale models generate research hypotheses, rather than merely processing data provided by humans. In this context, the question of explainability ceases to be an academic problem on the borderline of philosophy of science and becomes a technical issue with direct practical consequences. A researcher designing a drug using a generative pLM must be able to verify what the model "knows" about protein mechanisms — otherwise, the result is a black box on top of a black box.
The work by Hunklinger and Ferruz is the first comprehensive XAI review dedicated exclusively to this class of models and proposes a concrete taxonomy that other research groups can adopt or extend. For the entire field, this is an important step toward standardization: without a shared language for interpretability, it is impossible to compare solutions or formulate regulatory requirements in the context of biosafety. If the framework proposed in this paper is adopted by the community, it could accelerate the development of tools that make biological AI not only effective but also auditable.
What's next?
- The authors identify wet lab experimental validation as the next necessary step — XAI methods must be validated not only in silico but by confirming biological hypotheses in the laboratory
- The paper identifies the absence of standardized interpretability benchmarks for pLMs as a gap to fill — analogous to the role GLUE and SuperGLUE played for NLP
- Growing interest from regulatory bodies and biosafety agencies in AI used in biological sciences suggests that frameworks like those described could become a reference point for auditability requirements on models used in therapeutic protein design
Sources
- Nature Machine Intelligence — Towards the explainability of protein language models
- Phys.org — Roadmap for safer protein AI


