On-device AI mobile apps.
The 2026 explainer.
On-device AI runs the model on the phone itself, not in the cloud. In 2026 that became real for ordinary apps, with Apple and Google both shipping on-device models you can call from your own code. This is what on-device AI mobile apps are, where they beat the cloud and where they do not, and what it takes to reach them from React Native and Expo.
The short answer
What on-device AI actually means.
On-device AI means the model runs on the phone’s own chip rather than on a server. You send it text, it answers locally, and the input never leaves the device. That buys you three things a cloud call cannot: privacy by default, no network round trip, and a feature that still works with no signal.
The 2026 state is that this is no longer a research demo. Apple ships its Foundation Models framework on iOS 26, and Google ships Gemini Nano on supported Android phones, so apps can use a built-in on-device model through native code. The honest limit is size: a model small enough to fit on a phone is good at focused jobs like summarizing, rewriting, or classifying text, while a large open-ended request still runs better in the cloud. So most on-device AI mobile apps in 2026 mix the two, using the local model for the fast, private work and a cloud model for the heavy lifting.
On-device vs cloud
The tradeoffs, side by side.
On-device and cloud are not rivals so much as different tools. Here is how they compare on the things that decide which one a feature should use.
| Factor | On-device AI | Cloud AI |
|---|---|---|
| Privacy | Input stays on the phone; nothing is sent to a server | Requests travel to a server, so data leaves the device |
| Latency | No network round trip, so responses start fast | Adds a round trip; speed depends on the connection |
| Cost | No per-request server bill once the app is built | You pay per call, and it scales with usage |
| Offline | Works with no signal, on a plane or underground | Needs a connection; fails or stalls without one |
| Capability | A small model good at focused, scoped jobs | A far larger model for open-ended, heavy requests |
| Device reach | Only recent, capable hardware; needs a fallback | Runs the same on every phone, old or new |
The pattern most teams land on: on-device for the private, instant, everyday work, and cloud for the requests that need a bigger model.
Apple
The Foundation Models framework.
Apple introduced the Foundation Models framework at WWDC25 in June 2025. It gives apps a Swift API to Apple’s on-device model, a model of roughly 3 billion parameters, and it ships in iOS 26, iPadOS 26, and macOS 26 on Apple Intelligence-capable devices. This is the path for on-device AI on iOS, and it is built into the system, so you are not bundling a model of your own.
The API is built for app developers, not just researchers. Guided generation, through the @Generable macro, hands you structured output as real Swift types instead of a blob of text you have to parse. Tool calling lets the model invoke functions you define, so it can pull in your app’s data or take an action mid-response. All of it runs on the device, so the text stays on the phone and there is no per-request server cost.
- Ships in iOS 26, iPadOS 26, and macOS 26 on Apple Intelligence-capable devices
- A Swift API to Apple's roughly 3-billion-parameter on-device model
- Guided generation with the @Generable macro returns structured Swift types
- Tool calling lets the model invoke functions you define in the app
Android
Gemini Nano and ML Kit GenAI.
On Android, the on-device model is Gemini Nano. It runs inside the AICore system service, and apps do not poke at it directly. Instead you reach it through ML Kit’s GenAI APIs, which today cover Summarization, Proofreading, Rewriting, and Image Description, with a Prompt API in alpha for more open-ended use. So the common text jobs come as ready-made APIs rather than something you prompt from scratch.
The catch is hardware. Gemini Nano only runs on supported high-end devices, such as recent Pixel and Samsung Galaxy phones, not on every Android phone in the wild. That makes availability checks and a fallback non-optional: your app asks whether the feature is ready on this device, and routes to a cloud model or a plain screen when it is not. Plan for both, or the feature simply will not exist for a large share of users.
- Gemini Nano runs in the AICore system service on the device
- Reached through ML Kit's GenAI APIs, not a raw model handle
- Summarization, Proofreading, Rewriting, and Image Description, plus a Prompt API in alpha
- Only on supported high-end devices, like recent Pixel and Samsung Galaxy phones
React Native and Expo
Reaching it from React Native and Expo.
Here is the part that trips people up. Both platform frameworks are native, so a React Native or Expo app does not call them from JavaScript. Four points cover what it actually takes.
- 1
There is no JavaScript API
Apple's Foundation Models framework is Swift, and Android's ML Kit GenAI APIs are Kotlin and Java. Neither one exposes a JavaScript entry point, so a React Native or Expo app cannot call them straight from the JavaScript layer. The model lives behind native code on both platforms.
- 2
You bridge it with a native module
To reach the framework you write a native module, or use a community package that ships one, plus a config plugin so Expo wires the native pieces in during the build. The module calls the native API and returns the result to your JavaScript, so the rest of the app sees a normal async function.
- 3
It does not run in Expo Go
Expo Go only contains the native code Expo ships with it, and these frameworks are not in that bundle. So you run npx expo prebuild and make a development build or an EAS build that compiles the native module in. From then on the on-device model is inside your binary and you test on a real, capable device.
- 4
You handle availability and fallback
On-device AI is not on every phone. iOS needs an Apple Intelligence-capable device, and Gemini Nano needs supported high-end Android hardware. The native module checks whether the feature exists on the device, and your code falls back to a cloud model or a plain screen when it does not, so older phones still get a working app.
The one-line version: prebuild, then a development build
On-device AI has no JavaScript API, so you bridge it with a native module or a community package, run npx expo prebuild, and ship a development or EAS build. It does not work in Expo Go, because Expo Go cannot load native code it was not compiled with.
Where Newly fits
Where Newly fits in all this.
Newly is a paid AI mobile-app builder, from $25 a month. You describe the app in plain language and it generates a React Native and Expo codebase that you own. Newly’s own built-in AI chatbot feature is cloud-based, and that is the honest starting point: there is no on-device toggle to flip.
The advantage is the code you walk away with. Because you own a real React Native and Expo project rather than rows in a closed builder, you can add a native module for Apple’s Foundation Models or Android’s Gemini Nano yourself and ship it in a development build. A closed builder with one built-in AI cannot offer that, since you never get to the native layer. So on-device AI mobile apps built on a Newly codebase are possible in a way they are not on a locked platform.
The caveat is the same one this whole guide has been making. On- device access needs a custom native module and a development build, not Expo Go, and that is real work. For many features a cloud model, reached through the backend bundled with every project, is simpler, cheaper to wire up, and runs on every phone. Treat on-device as a deliberate choice for the features that benefit from privacy, speed, or offline use, not a default for everything.
Start with the cloud chatbot, keep the on-device door open
A practical route: ship the built-in cloud AI chatbot first, since it works on every device with no native build, then add an on- device model later for the features that need it. Both live in the same code you own.
FAQ
On-device AI mobile apps, answered.
On-device AI mobile apps run an AI model directly on the phone instead of sending requests to a server. The model lives on the device, so the work happens on the phone's own chip. That keeps the input private, removes the network round trip, and lets a feature keep working with no signal. In 2026 the two big platform options are Apple's Foundation Models framework on iOS and Gemini Nano on Android, both reachable through native code. The tradeoff is capability: a small model that fits on a phone handles focused jobs like summarizing or rewriting text well, while large, open-ended reasoning still tends to run better in the cloud.
Build it on code you own.
From $25 a month, Newly generates a real React Native and Expo app from a prompt. Ship the built-in cloud AI today, and because the codebase is yours, add an on-device model later when a feature calls for it.