Using Azure Local Foundry CLI with PowerShell

#azure #ai #powershell

Inference costs are climbing. Anthropic, OpenAI, and Microsoft have all tightened their token quotas this year. The era of subsidized generative AI is quietly ending.

Not every task needs a frontier model. Burning cloud tokens on a formatting job or a quick summary is just a waste of money.

Privacy and data protection add a second constraint. Some workloads simply can't leave your perimeter. This include European users, where RGPD is enforced. The EU Cloud act will soon add more constraints.

That's where local models come in. Ollama is one of the best reference, solid API, runs well with tools like OpenCode and come with plenty of models.

When I got a new laptop with an NPU chip, I took Microsoft Local Foundry for a spin. The premise is simple: run inferences locally with an AI accelerator, no more cloud provider in the loop.

Local Foundry ships as an SDK (Windows, Linux, macOS) and as a CLI (Windows and macOS) in preview. This post focuses on the CLI.
To install Foundry local run this single line in a shell.

On Windows
winget install Microsoft.FoundryLocal

On MacOS
brew install microsoft/foundrylocal/foundrylocal

To manage models, there are three commands. list shows a detail list of available models, download pulls a model into the local cache, load puts it into the running service. run does in one shot, useful for quick tests.

foundry model list foundry model run deepseek-r1-14b foundry model download deepseek-r1-14b foundry model load deepseek-r1-14b foundry model run deepseek-r1-14b

The CLI is great for interactive use. But what if you want to automate it from PowerShell?

Two integration paths: the SDK (Python, C#, Rust, JavaScript) or the REST API exposed by the CLI. Since PowerShell has no direct SDK binding, we're taking the REST route.

Three prerequisites before any API call: the service must be running, a model must be loaded, and you need the REST API URI.

This script will start the service if not the service is stopped and get the REST API URI.

function get-foundryServiceStatus {
   return  & foundry service status
}

$getServiceStatus = get-foundryServiceStatus

if ($getServiceStatus -like "*service is not running*") 
{
     & foundry service start | Out-Null
     $getServiceStatus = get-foundryServiceStatus
}

# load a model in order to get the uri of the service
& foundry model load phi-3-mini-128k  | Out-Null

$pattern = 'https?://[^\s"]+'
$uri = [regex]::Match($getServiceStatus, $pattern).Value

$uri = $uri -replace '/openai/status$',''

$uri

Now we can create a function that will invoke the REST API with the correct format.

function Invoke-FoundryRequest {
    param(
        [Parameter(Mandatory)]
        [string]$Method,
        [Parameter(Mandatory)]
        [string]$FoundryBaseUrl,
        [Parameter(Mandatory)]
        [string]$Path,
        [hashtable]$Headers,
        $Body
    )

    $uri = "$FoundryBaseUrl$Path"

    $params = @{
        Method  = $Method
        Uri     = $uri
    }

    if ($Headers) { $params.Headers = $Headers }

    if ($Body) {
            $params.Body = ($Body | ConvertTo-Json -Depth 10)
            $params.ContentType = "application/json"
    }

    return Invoke-RestMethod @params
}

From now on we can start playing with the API
To get a list of locally available models

Invoke-FoundryRequest -Method GET -Path "/openai/models" -FoundryBaseUrl $uri

But the most important part with a local agent is to start a chat. In this case there are several things to know.

The Local Foundry REST API is OpenAI Chat Completion-compatible. You don't send raw text; you send a structured array of role/content pairs. Roles are system, user, and assistant. The content is the prompt.

To send a prompt to the model you need two roles, system, to set the behaviour, tone, and context for the assistant, and user for the input. Assistant is the role used by the generated content from the model.

You need to provide the name of the model you want to use, not the name used for loading or running a model but the full name the one listed in the previous command. Here I use phi-3-mini-128k-instruct-qnn-npu:3.

The creativity can be adjusted with the temperature parameter. Between 0 and 2 to adjust the creativity of the response (higher value mean more creativity).

There are many other parameters available, they are listed on this page https://learn.microsoft.com/en-us/azure/foundry-local/reference/reference-rest#post-v1chatcompletions

To use the chat completion API, a POST needs to be made to the API Path /v1/chat/completions with a JSON data.

A function will help to handle the task.

function New-FoundryChatCompletion {
    [CmdletBinding()]
    param(
        [string]
        $Model = "phi-3-mini-128k-instruct-qnn-npu:3",
        [Parameter(Mandatory)]
        [array]
        $Messages,
        [Parameter(Mandatory)]
        [string]
        $FoundryBaseUrl,
        [double]
        $Temperature,
        [double]
        $TopP,
        [int]$MaxTokens
    )

    $body = @{
        model    = $Model
        messages = $Messages
        max_tokens = 2048
        max_completion_tokens = 2048
    }

    if ($PSBoundParameters.ContainsKey('Temperature')) { $body.temperature = $Temperature }
    if ($PSBoundParameters.ContainsKey('TopP'))        { $body.top_p       = $TopP }


    Invoke-FoundryRequest -Method POST -Path "/v1/chat/completions" -Body $body -FoundryBaseUrl $FoundryBaseUrl
}

To use it, a message array with the two roles is needed

$messages = @(
    @{ role = "system"; content = "You are a PowerShell coding assistant. Only give code if requested." },
    @{ role = "user"; content = "Give me a PowerShell script to list all files in a directory." }
)

$chat = New-FoundryChatCompletion -FoundryBaseUrl "http://127.0.0.1:52236" -Temperature 0.2 -Messages $messages