DEV Community

Timothy Fosteman
Timothy Fosteman

Posted on

White Paper FM v Public API

Apple Foundation Models: White Paper vs Real API — What Actually Matches Up

So I was thinking about doing a simple bubble classification experiment with Apple’s Foundation Models (FM), and the first thing that becomes obvious is this: the white paper and the actual API surface are describing the same system, but at very different layers of reality.

This technical report is very hopeful to read it describes the full model capability. But then, i exported FoundationModels top level swift doc with developer typings, and it looked sad as it exposes only a slice of advertised functionatlity.

https://pastebin.com/4N2PNgDc

1. What the white paper actually claims

The paper describes a fairly ambitious multimodal system:

  • native support for text + image inputs
  • vision encoders integrated into the model stack
  • reasoning over images, multi-image inputs, and mixed modality prompts
  • structured tool-use and JSON-style outputs
  • on-device inference with optimized small models (~3B class scale)
  • grounding tasks (OCR, region reasoning, visual understanding)

What else can a man want ? The 3B swissknife is compared head-on against Qwen 2.5 , that's impressive, as Qwen is the smartest model available on llama.cpp within 9B param formfactor. (P.S. probably the best alternative atm)

me: hopeful image classification is online as per paper's perspective:

  • image → encoded tokens → reasoning → structured output

FoundationModels API exposes:

  • strongly structured around prompt → response check
  • optimized for deterministic, schema-driven responses check
  • integrated with Apple system frameworks rather than raw model control check

But as soon as i go CMD+F to search bar for "Image" or anything close to pixel buffers, 0 results. It's just not there as of May 2026, macOS Tahoe 26.2

  • no explicit image input type in the public API
  • no direct “vision prompt” interface
  • no raw multimodal session builder

This lag between paper and public API causes me to go and seek another solution...

Top comments (0)