Apple is working to bring a Gemini-powered Siri to the iPhone — but not in the way privacy advocates hoped. According to a report from The Information, the hybrid Siri and Gemini system will process complex requests in the cloud, not locally on device. Apple has signed a deal with NVIDIA for Confidential Computing to handle that cloud workload.
Key takeaways
- Apple is distilling Google's cloud-based Gemini models — compressing trillion-parameter models to run on the iPhone
- Complex Siri requests will be routed to Google's cloud infrastructure, not Apple's M-chip-based Private Cloud Compute
- Apple signed a deal with NVIDIA to use Confidential Computing — encrypted AI processing on cloud GPUs
- Smartphones lack the RAM to keep trillion-parameter models in memory — local models top out at a few billion parameters
- Details expected to be announced at WWDC 2026
Why distillation, not Gemini Nano
Google offers Gemini Nano — a version optimized for mobile devices. Nano handles contextual features well: note summarization, reply suggestions, basic commands. But that is not what Siri needs.
Siri is designed to be a conversational assistant: understanding complex commands, completing multi-step tasks, and maintaining natural dialogue. That requires a significantly more capable model. On Android, Google itself doesn't even attempt local Gemini for conversation — every Gemini request routes straight to the cloud.
Apple is caught in a bind. The company has spent years promoting on-device AI as a privacy advantage. Now it wants to deliver Gemini quality on the iPhone, but the hardware's physical constraints leave no easy path.
Distillation is the attempted compromise. The technique trains a small model to mimic a large one — repeatedly imitating its outputs until it captures key capabilities while pruning less critical weights. Done well, it transfers cloud-model behavior to a local version with a fraction of the parameters and memory footprint. But even after distillation, according to The Information, Siri will still need to reach for the cloud for tasks that demand full Gemini capability.
The iPhone's physical limits
Every new Apple chip generation arrives with AI performance claims. The Neural Engine — Apple's dedicated AI accelerator in A- and M-series chips — is designed for efficient contextual inference: real-time image classification, voice transcription, on-device features. It is not built for trillion-parameter conversational models.
On-device models for phones top out at a few billion parameters. Cloud Gemini models have trillions — a gap of three to four orders of magnitude. Local models are also quantized, running at lower numerical precision to gain speed at the cost of output quality.
Memory is the other wall. A phone with 8 GB of RAM cannot keep a trillion-parameter model loaded. On-the-fly layer swapping is technically possible but too slow for real-time conversation. The result is that local phone AI sounds less intelligent than its cloud counterparts. Even large cloud models can seem dumb sometimes — small local models are genuinely limited.
NVIDIA as privacy shield
Complex Siri queries will route to Google's cloud infrastructure — not Apple's Private Cloud Compute. Apple's PCC, built on M-series Mac chips, reportedly cannot run undistilled Gemini models without issues. Apple turned to an external cloud partner instead.
NVIDIA Confidential Computing encrypts data while it is being processed on cloud GPUs. The AI model operates on encrypted data — in theory, even the data center operator cannot access it in plaintext. This allows Apple to claim it still protects user privacy even when requests leave the iPhone.
Encryption comes with a performance cost. Confidential Computing on NVIDIA GPUs is slower than unencrypted processing. Users will likely notice the latency difference when Siri decides to route a request to the cloud rather than handling it on device.
User experience: seamless by design
The iPhone will probably not tell users which version of Gemini is handling a given Siri request. Manufacturers designing hybrid systems — local and cloud AI — tend to market them as "seamless." In practice, the latency difference may be perceptible.
Apple will likely retain the Private Cloud Compute branding even as a portion of processing runs on Google and NVIDIA infrastructure. That creates a messaging tension — "your data never leaves your device" reads differently when Siri is querying Gemini in the cloud.
Why it matters
Apple built its AI brand on the narrative of privacy and on-device processing. WWDC 2024 and 2025 were demonstrations of control, locality, and security. Now the company has designed a system where the part of Siri that requires genuine intelligence depends on the infrastructure of two other companies: Google and NVIDIA.
That doesn't disqualify the approach. Hybrid AI is standard practice — Google does it on Android, Microsoft does it with Copilot+, and now Apple. The challenge isn't technical; it's narrative. Apple spent years differentiating itself from competitors with "privacy by design." Now it must sustain that positioning while sending a portion of user queries to an external cloud.
For users, the bottom line is straightforward: a Gemini-powered Siri will almost certainly be far smarter than the current version. The question is whether the intelligence gain comes with a privacy trade-off that Apple spent years criticizing in competitors.
What's next?
- WWDC 2026: Apple is expected to announce details of the new Gemini-powered Siri — likely the first public demonstration of the hybrid system
- Apple must resolve the tension between its privacy narrative and cloud processing on Google and NVIDIA infrastructure — the marketing message will be critical
- The pace of Gemini distillation for iPhone will determine how many Siri tasks can be handled locally versus routed to the cloud





