What is Token Introspection?
Token Introspection is an extension of the OAuth 2.0 protocol described in RFC 7662, published by the IETF in 2015. It defines a standardized endpoint โ most commonly exposed under the /introspect path โ to which a resource server (a protected API) sends a token and receives back information about its current state along with associated metadata.
It is worth stating right away what Token Introspection is not. It is not a token format or a cryptographic algorithm. Nor is it a user login mechanism. It is an integration standard โ a back-channel?back-channel: a server-to-server communication channel that happens outside the user's browser and is invisible to them communication protocol between two server-side components: the protected API and the authorization server that issues tokens and keeps their state.
The reason this mechanism exists is best seen through the lens of two kinds of access tokens:
- Opaque token โ essentially a random string, an identifier pointing to a record in the authorization server's database. An API that receives such a token cannot read on its own whom it belongs to, what permissions it carries, or whether it is still valid.
- JWT (JSON Web Token) โ carries all of that data within itself and is signed cryptographically, so the API can validate it locally without any network call.
Token Introspection was conceived primarily so that a resource server could handle opaque tokens โ although the standard also permits introspecting JWTs.
Who is behind it?
The standard is the work of the IETF (Internet Engineering Task Force), the body responsible for developing the internet's fundamental protocols. Token Introspection emerged from the OAuth Working Group, the same group that earlier produced the core OAuth 2.0 standard (RFC 6749). RFC 7662 carries the status of Proposed Standard, meaning it is an official, stable recommendation underpinning both commercial and open source implementations.
The mechanism has been widely adopted by identity providers. Native support is offered by Keycloak and Okta, among others, and on the library side by mature solutions such as Spring Security in the Java ecosystem. One exception is worth noting โ the Auth0 platform (now part of Okta as Customer Identity Cloud) for a long time did not expose a standard introspection endpoint for typical tenants, steering users toward local JWT validation instead.
How does it work?
Introspection involves three parties that should not be confused:
- The client โ an application that wants to use data on someone's behalf, for example a banking app that wants to read an account balance. It is the first to log in to the authorization server and receive a token.
- The resource server โ a service with a public API that holds this data and lets in only holders of a valid token, for example a bank's API.
- The authorization server โ the service that issues tokens and knows their current state, that is the identity provider, for example Keycloak or Okta.
The full cycle looks like this: the client first obtains a token from the authorization server, then attaches it to a request sent to the resource server. The resource server cannot judge on its own whether the token is valid, so it asks the authorization server โ and that last step is introspection.
The asking party here is always the resource server (a backend with a public API), not the client's frontend application. Watch out for the overloaded word "client": the application that wants the data is an OAuth client, but the resource server โ when it queries introspection โ also acts as an OAuth client, except toward the authorization server and authenticated with its own credentials. It queries the introspection endpoint, a special address on the authorization server that works like an information desk you can walk up to and ask "is this token still valid?".
The simplest way to picture it is a cloakroom ticket. The ticket itself means nothing to the person at the door โ it is just a piece of plastic. Only the cloakroom that issued it knows whose it is and whether it still counts. The resource server is like that person at the door: instead of guessing, it walks up to the cloakroom window (the introspection endpoint) and asks, "number 447 โ still valid, and whose is it?". The biggest advantage shows when the ticket was reported lost a minute earlier โ the window will immediately answer "invalid", without waiting for it to expire on its own.
The request
Technically, the resource server sends a POST request to that address, transmitting data in application/x-www-form-urlencoded?application/x-www-form-urlencoded: an HTTP request encoding format in which nameโvalue pairs are joined with = and separated by &, just like a submitted web form format (the same one ordinary web forms use). It takes two parameters:
token(required) โ the string under inspection.token_type_hint(optional) โ a hint about whether it is an access token (access_token) or a refresh token (refresh_token). It lets the server find the token faster, but per the standard it may be ignored โ and if the token is not found in the suggested area, the server is obliged to search all the others.
A typical request looks like this:
The response
The heart of the response is the boolean field active. By the RFC 7662 definition, a token is active when it satisfies four conditions at once:
- it was issued by this server,
- it has not yet expired,
- it has not been revoked,
- it is permitted to operate in the context of the requesting API.
If any of these conditions fails, the server simply returns {"active": false} โ without revealing any detail that might help an attacker. Importantly, such a denial still carries an HTTP 200 OK code, because from the protocol's point of view the request itself was handled correctly.
When the token is active, the response expands with metadata corresponding to the typical claims familiar from JWT:
The diagram below lets you follow this flow step by step. Pick a token scenario โ from active to expired, revoked, a foreign issuer, or the wrong context โ and watch how a single request travels between the client, the resource server, and the authorization server, all the way to the active: true or active: false verdict.
What are its key components?
From a structural standpoint, Token Introspection comes down to two things: authentication of the caller (who is allowed to query the endpoint) and the response structure (what the server sends back). The introspection endpoint is a protected address on the authorization server, accessible only over TLS?TLS: Transport Layer Security โ the protocol that encrypts a network connection, the same one behind HTTPS, protecting data in transit from eavesdropping โ it exposes sensitive data about users and permissions, so it cannot be open to anonymous requests.
Authenticating the caller
The caller is the resource server itself (a backend), never a frontend application. It must prove to the authorization server who it is. The methods used are:
- HTTP Basic Authentication โ base64?base64: a way of representing binary data as text using 64 characters (AโZ, aโz, 0โ9, + and /); it is not encryption โ anyone can decode such a string back-encoded
client_idandclient_secret, the most common method, - credentials in the request body โ
client_idandclient_secretpassed as POST parameters, - a dedicated access token granted to the resource server,
- Private Key JWT โ signing the request with a private key, in higher-security variants.
If authentication fails, the server responds with an HTTP 401 Unauthorized error.
The response structure
When the token is active, the server sends back a JSON object with a full set of metadata. A typical rich response looks like this:
The key rule: only one field is mandatory โ `active`. Everything else is optional, and the server includes it at its discretion. Here is what the most common fields mean:
Status
activeโ whether the token is valid right now (the only mandatory field).
Identity and permissions
subโ subject: identifier of the token's owner.usernameโ human-readable name of the token's owner.client_idโ identifier of the application that received the token.scopeโ space-separated list of permissions (e.g. "read write").
Lifetime
iatโ issued-at timestamp.nbfโ "not before": the moment from which the token becomes valid.expโ expiration timestamp (Unix format).
Origin and recipient
issโ issuer: the authorization server that issued the token.audโ audience: the service or API the token is intended for.
Other
token_typeโ the token type, usually "Bearer".jtiโ unique token identifier.
When the token is invalid, the response shrinks to a single line โ {"active": false} โ with no metadata, so as not to hand an attacker any clues.
The structure can be extended both ways. The server may add its own fields beyond this list โ for example roles, a tenant identifier, or organization attributes. Standardized extensions go into the IANA "OAuth Token Introspection Response" registry, while purely company-specific fields sit alongside them. It can also omit fields: it is not obliged to return everything, and may show different resource servers a different subset of the same token's data, so that surplus attributes do not leak.
The diagram below breaks this anatomy into parts โ click each element to see what it is responsible for.
Common use cases
Token Introspection shines wherever the resource server needs an authoritative, up-to-date answer about a token rather than just a local read of it. The most common cases are:
- Validating opaque tokens โ when an API receives an opaque token?opaque token: a random string with no readable content โ on its own it tells the receiving API nothing, and only the authorization server that issued it knows its meaning and validity, introspection is the only way to learn whether it is valid and what permissions it carries.
- Immediate revocation detection โ if an administrator blocks an account or withdraws consent, the next introspection request instantly returns
active: false, before the token has a chance to expire naturally. - Reading encrypted tokens โ when a token is encrypted and only the authorization server holds the key to read it, introspection is the only way for the resource server to learn its contents.
- Machine-to-machine communication (B2B) โ when the client is another service rather than a person (the Client Credentials?Client Credentials: an OAuth 2.0 grant in which an application authenticates with its own credentials, with no user login involved โ used for machine-to-machine communication grant), introspection gives a central point where access for a machine can be revoked in real time.
The Phantom Token pattern
An interesting extension is the so-called Phantom Token pattern. Keycloak allows you to send an introspection request with the Accept: application/jwt header instead of the standard application/json. In response, the server returns a full, signed JWT. At the network edge, an API gateway accepts a lightweight opaque token from the external client, exchanges it through introspection for a rich JWT, and forwards that downstream to the microservices. As a result, the internal network enjoys the benefits of local validation while the token details remain hidden from the outside world.
How does it differ from other approaches?
First, let us clear up the most common misunderstanding. JWT and Token Introspection are not two competing things of the same kind. JWT is a token format โ how the token is built. Token Introspection is a way of checking it โ a query to the server. You can even check a JWT through introspection. The real question is not "JWT or introspection" but where the verification happens โ locally on your side, or remotely on the server that issued the token.
Put simply: a JWT is like an ID card with a hologram. You can check the hologram and the expiry date yourself, on the spot โ fast, without calling anyone. The downside is that you will not learn whether the document was revoked this morning. Introspection is a phone call to the office that issued the card, asking "is this number valid right now?". Slower, but you catch revocations instantly. That is the whole difference โ the rest of this section is just the consequences of that one choice.
Local validation (JWKS / JWT)
The main alternative to introspection is local JWT signature validation. In this approach the API fetches public cryptographic keys from a JWKS?JWKS: JSON Web Key Set โ a set of public cryptographic keys published by the authorization server, used to verify a JWT's signature locally endpoint and then, on every request, validates the token's signature on its own and checks fields such as expiry time, issuer, and audience. The entire operation happens locally, on the resource server itself, without any network call.
The fundamental trade-off
The differences amount to a fundamental trade-off. Introspection burdens every request with an additional network call that adds anywhere from a dozen to several dozen milliseconds of latency, and the authorization server becomes a potential bottleneck and single point of failure under heavy traffic. In exchange, it provides real-time knowledge of the token's state. Local validation is practically unlimited in scalability and runs with zero network latency, but carries a significant drawback โ until the token expires, the API treats it as valid even if the account has since been blocked.
| Property | Introspection (RFC 7662) | Local JWT validation (JWKS) |
|---|---|---|
| Latency | tens of ms per request | zero, a CPU operation |
| Scalability | authorization server as bottleneck | practically unlimited |
| Revocation | real-time | only after the token expires |
| Token format | opaque and JWT | JWT only |
Other competing approaches
Introspection and local validation are not the only options. In practice three further approaches compete with them, each striking a different balance between security and performance:
- Short-lived JWTs with a refresh token โ the access token is valid for only a few minutes, after which the client fetches a new one. Validation stays local and fast, while the "window" of potential misuse shrinks to the token's lifetime (for example 5 minutes) instead of vanishing instantly.
- Deny-list (blocklist) โ the resource server keeps a list of revoked tokens in a fast cache, for example Redis, fed by events from the identity server. This gives near-instant revocation without querying the server on every request, at the cost of extra infrastructure to keep the list in sync.
- Probing the OIDC `/userinfo` endpoint โ some platforms (for example Auth0 without introspection) suggest checking a token's validity by calling
/userinfo. It works as a stopgap but is not a full equivalent of introspection, strains that endpoint's rate limits, and mixes resource authorization with fetching user profile data.
Hybrid approaches
That is why hybrid approaches are popular in practice. A typical pattern is local JWT validation for most read operations, reaching for introspection only on critical operations โ transfers, permission changes, data modification โ where one needs absolute certainty that access has not been withdrawn. Another variant is maintaining a deny-list of revoked tokens in a cache, for example in Redis, fed by events from the identity server.
Advantages and disadvantages
Gathered in one place, the strengths and weaknesses of introspection look like this.
Advantages:
- Real-time revocation โ a revoked or blocked token is rejected immediately, without waiting for expiry.
- The only standard way to validate opaque tokens, as well as encrypted tokens whose key only the authorization server holds.
- Very simple to implement on the API side โ just a POST request and a read of the
activefield. - Full server control over which metadata it returns and to whom โ exposure of sensitive attributes can be limited.
Disadvantages:
- Network latency on every request, on the order of tens of milliseconds.
- The authorization server becomes a bottleneck and single point of failure under heavy traffic.
- Caching results eases the cost but weakens the immediacy of revocation and risks errors when scopes are ignored.
- The need to securely store and rotate credentials (
client_secret) on the resource server.
Key limitations and challenges
The most serious limitation is performance. An extra network call on every API request can drastically reduce throughput. A common remedy is caching introspection results in the API gateway, but this introduces its own pitfalls.
exp value, otherwise the system would start accepting authorizations that are already invalid. Too long a TTL also negates introspection's only advantage โ if a result sits in the cache for five minutes, a revoked token will be treated as valid for just as long.In practice a TTL?TTL: Time To Live โ how long an entry stays valid in a cache before it is refreshed or discarded on the order of tens of seconds is recommended as a reasonable compromise.
Caching also hides a subtler risk. If the cache key is the token alone, it is easy to make the mistake of ignoring the permission context. A classic example was a vulnerability in the Ory Oathkeeper gateway in 2021, where an authorization result for one scope was wrongly returned for a request requiring a different scope. A correct implementation must rigorously separate results for different security contexts.
The endpoint itself is a separate challenge. Left without rate limiting, it is exposed to token fishing?token fishing: bulk, automated guessing of token strings by repeatedly querying the server until an active one is hit, where an attacker bulk-queries the server trying to guess valid tokens. That is why the standard mandates a uniform {"active": false} response for every invalid token and requires the use of TLS and rate limiting.
What use does it have in artificial intelligence?
Token Introspection is not an artificial intelligence technique and has no connection to training or running models. It is a security layer. Its relevance to AI comes from the fact that modern AI systems increasingly act as autonomous API clients that reach for external tools and data on a user's behalf.
The most important point of contact is agentic AI. An agent that plans and executes actions on its own must authenticate against APIs, databases, and services. Unlike a human clicking through an app, an agent can make hundreds of calls in a short time and operate without direct supervision. That raises the stakes of immediate access revocation โ if an agent starts misbehaving or its token leaks, an administrator must be able to revoke access in real time rather than wait for the token to expire. That is exactly what introspection provides.
A second area is the Model Context Protocol (MCP) โ the standard by which AI assistants connect to external tool servers. The MCP authorization specification is built on OAuth 2.x, and an MCP server acts as a resource server that must verify the token presented by the AI client. Token Introspection is one acceptable way to perform that verification, especially when the server issues opaque tokens and wants to retain control over revoking them.
A third point is gateways for language models. Enterprise LLM deployments usually place a gateway (AI gateway) in front of the model that authenticates requests, meters usage, and enforces quotas. Such a gateway can use introspection to check access tokens before passing a request to an expensive model โ with the same performance and caching trade-offs described above.
It is worth keeping the right perspective. In all these cases AI is the party whose access is being secured, not a part of the introspection mechanism itself. Token Introspection remains a classic OAuth tool โ it is the rise of autonomous agents that makes its advantage, real-time revocation, matter more than it did in the era of applications operated solely by humans.
Why does it matter?
Token Introspection solves a concrete, recurring problem of distributed architecture โ how can a protected API trust a token it cannot read on its own, without inventing its own non-standard protocol for talking to the identity server? Every such bespoke solution is a potential gap in security logic, and IETF standardization made it possible to close that space with a single, well-documented mechanism.
Although the market is largely dominated by JWKS-based JWT validation, owing to its scalability and zero latency, introspection remains irreplaceable wherever the cost of revoking access immediately is higher than the cost of network latency. The domains that deliberately reach for it โ as opaque tokens or hybrid patterns โ are those where consistency of authorization state is at stake:
- finance,
- healthcare,
- transactional systems,
- B2B service-to-service communication.
When an administrator strips someone of their permissions, the next introspection call rejects access in the very same millisecond โ without this mechanism, the token would keep working until its built-in timer ran out. The conscious choice between these approaches is today one of the measures of security engineering maturity.
Token Introspection, then, is not an outdated alternative to JWT but a complementary tool in the authorization designer's arsenal. Knowing it lets you make informed decisions where performance meets security, matching the validation method to actual risk instead of applying one solution to everything.
Sources
- IETF โ RFC 7662: OAuth 2.0 Token Introspection โ link
- OAuth.com โ OAuth 2.0 Token Introspection Endpoint โ link
- Connect2id โ OAuth 2.0 token introspection โ link
- Keycloak โ Server Administration Guide (Token Introspection) โ link
- Spring Security โ OAuth 2.0 Resource Server: Opaque Token โ link
- Okta Developer โ Validate access tokens / Introspection โ link
