Inference costs are climbing. Anthropic, OpenAI, and Microsoft have all tightened their token quotas this year. The era of subsidized generative AI is quietly ending.
Not every task needs a frontier model. Burning cloud tokens on a formatting job or a quick summary is just a waste of money.
Privacy and data protection add a second constraint. Some workloads simply can't leave your perimeter. This include European users, where RGPD is enforced. The EU Cloud act will soon add more constraints.
That's where local models come in. Ollama is one of the best reference, solid API, runs well with tools like OpenCode and come with plenty of models.
When I got a new laptop with an NPU chip, I took Microsoft Local Foundry for a spin. The premise is simple: run inferences locally with an AI accelerator, no more cloud provider in the loop.
Local Foundry ships as an SDK (Windows, Linux, macOS) and as a CLI (Windows and macOS) in preview. This post focuses on the CLI.
To install Foundry local run this single line in a shell.
On Windows
winget install Microsoft.FoundryLocal
On MacOS
brew install microsoft/foundrylocal/foundrylocal
To manage models, there are three commands. list shows a detail list of available models, download pulls a model into the local cache, load puts it into the running service. run does in one shot, useful for quick tests.
foundry model list
foundry model run deepseek-r1-14b
foundry model download deepseek-r1-14b
foundry model load deepseek-r1-14b
foundry model run deepseek-r1-14b
The CLI is great for interactive use. But what if you want to automate it from PowerShell?
Two integration paths: the SDK (Python, C#, Rust, JavaScript) or the REST API exposed by the CLI. Since PowerShell has no direct SDK binding, we're taking the REST route.
Three prerequisites before any API call: the service must be running, a model must be loaded, and you need the REST API URI.
This script will start the service if not the service is stopped and get the REST API URI.
function get-foundryServiceStatus {
return & foundry service status
}
$getServiceStatus = get-foundryServiceStatus
if ($getServiceStatus -like "*service is not running*")
{
& foundry service start | Out-Null
$getServiceStatus = get-foundryServiceStatus
}
# load a model in order to get the uri of the service
& foundry model load phi-3-mini-128k | Out-Null
$pattern = 'https?://[^\s"]+'
$uri = [regex]::Match($getServiceStatus, $pattern).Value
$uri = $uri -replace '/openai/status$',''
$uri
Now we can create a function that will invoke the REST API with the correct format.
function Invoke-FoundryRequest {
param(
[Parameter(Mandatory)]
[string]$Method,
[Parameter(Mandatory)]
[string]$FoundryBaseUrl,
[Parameter(Mandatory)]
[string]$Path,
[hashtable]$Headers,
$Body
)
$uri = "$FoundryBaseUrl$Path"
$params = @{
Method = $Method
Uri = $uri
}
if ($Headers) { $params.Headers = $Headers }
if ($Body) {
$params.Body = ($Body | ConvertTo-Json -Depth 10)
$params.ContentType = "application/json"
}
return Invoke-RestMethod @params
}
From now on we can start playing with the API
To get a list of locally available models
Invoke-FoundryRequest -Method GET -Path "/openai/models" -FoundryBaseUrl $uri
But the most important part with a local agent is to start a chat. In this case there are several things to know.
The Local Foundry REST API is OpenAI Chat Completion-compatible. You don't send raw text; you send a structured array of role/content pairs. Roles are system, user, and assistant. The content is the prompt.
To send a prompt to the model you need two roles, system, to set the behaviour, tone, and context for the assistant, and user for the input. Assistant is the role used by the generated content from the model.
You need to provide the name of the model you want to use, not the name used for loading or running a model but the full name the one listed in the previous command. Here I use phi-3-mini-128k-instruct-qnn-npu:3.
The creativity can be adjusted with the temperature parameter. Between 0 and 2 to adjust the creativity of the response (higher value mean more creativity).
There are many other parameters available, they are listed on this page https://learn.microsoft.com/en-us/azure/foundry-local/reference/reference-rest#post-v1chatcompletions
To use the chat completion API, a POST needs to be made to the API Path /v1/chat/completions with a JSON data.
A function will help to handle the task.
function New-FoundryChatCompletion {
[CmdletBinding()]
param(
[string]
$Model = "phi-3-mini-128k-instruct-qnn-npu:3",
[Parameter(Mandatory)]
[array]
$Messages,
[Parameter(Mandatory)]
[string]
$FoundryBaseUrl,
[double]
$Temperature,
[double]
$TopP,
[int]$MaxTokens
)
$body = @{
model = $Model
messages = $Messages
max_tokens = 2048
max_completion_tokens = 2048
}
if ($PSBoundParameters.ContainsKey('Temperature')) { $body.temperature = $Temperature }
if ($PSBoundParameters.ContainsKey('TopP')) { $body.top_p = $TopP }
Invoke-FoundryRequest -Method POST -Path "/v1/chat/completions" -Body $body -FoundryBaseUrl $FoundryBaseUrl
}
To use it, a message array with the two roles is needed
$messages = @(
@{ role = "system"; content = "You are a PowerShell coding assistant. Only give code if requested." },
@{ role = "user"; content = "Give me a PowerShell script to list all files in a directory." }
)
$chat = New-FoundryChatCompletion -FoundryBaseUrl "http://127.0.0.1:52236" -Temperature 0.2 -Messages $messages
The model response will be found in an array named choice
$chat.choices[0].message.content
The REST API gets you surprisingly far, but it has limits. The next post covers the .NET SDK, which unlocks the full surface from PowerShell.
Top comments (0)