The 2am page nobody wants
Last month I got the page every developer dreads. A scheduled job that's been quietly running for two years started spitting 401 Unauthorized at every request. The target was a fleet of network-connected hardware sitting on our office LAN. No code had changed. No certificates had expired. The only thing that had moved was the vendor's firmware, which had auto-updated overnight.
If you've ever built tooling against a vendor's local-network API and watched it die after a forced firmware push, this one's for you. I want to walk through how I diagnosed it, the workaround I shipped that afternoon, and the architectural changes I made afterward so I never have to do this again.
What actually broke
The symptom was clean: every HTTP call to the device's LAN endpoint returned 401. The device still responded to ICMP. The web UI on the device still worked. But anything calling the documented local API now demanded a token it had never required before.
First instinct is always to blame your own code, so I rolled the integration back two versions. Same error. Then I pulled the firmware changelog. Buried under "stability improvements" was a single line about "enhanced authentication for network requests." Translation: the vendor had quietly retrofitted a cloud-issued bearer token into what used to be an open LAN endpoint.
This is a pattern. A device ships with a permissive local API, an ecosystem of community tooling grows up around it, and then a later firmware tightens the screws. The community tools break. The vendor's official app keeps working because it knows the new handshake.
Step 1: confirm the wire-level change
Before you guess, capture. I always reach for tcpdump first because it tells you what the device is actually doing, not what the docs claim.
# Capture traffic between this host and the device on the LAN
# -i any : listen on all interfaces
# -w : write to file so we can open in Wireshark
sudo tcpdump -i any -w /tmp/device.pcap host 192.168.1.42
Open the pcap in Wireshark, filter for http (or tls if the device upgraded to HTTPS), and compare a known-good capture against the broken one. In my case the broken request was missing an Authorization: Bearer ... header that the official app was sending. The vendor's app was first hitting their cloud, getting a short-lived token, and presenting it to the device on the LAN.
That single observation collapsed the search space. The device wasn't broken. It was now demanding proof that the caller had recently authenticated against the vendor's cloud.
Step 2: find the local escape hatch
Most vendors who do this still leave a developer-mode pressure valve, partly because their own QA teams need it. Common patterns I've seen:
- A long-lived "access code" or "LAN key" you can read off the device's screen and pass as a header.
- A toggle in the device settings to enable a legacy/local mode.
- A printed setup credential in a config file on the device's storage.
In my case it was the first one. The device showed an 8-character code under a hidden settings panel. Passing it as X-Local-Access got me back a 200. That's the minimum-viable fix, and it's enough to get your CI green again.
import os
import httpx # https://www.python-httpx.org/
DEVICE_HOST = os.environ["DEVICE_HOST"]
# Read the LAN code off the device's settings panel once,
# then store it in your secrets manager — it does not rotate.
LAN_CODE = os.environ["DEVICE_LAN_CODE"]
def call(path: str, payload: dict) -> dict:
# The header name is vendor-specific; pull yours from the pcap.
headers = {"X-Local-Access": LAN_CODE}
r = httpx.post(f"http://{DEVICE_HOST}{path}", json=payload, headers=headers, timeout=10)
r.raise_for_status()
return r.json()
If your vendor went a step further and signed the requests, you'll need to replay the handshake. That's a bigger lift, but the diagnostic approach is the same: capture, diff, replicate.
Step 3: stop trusting the vendor's surface
The afternoon fix is the easy part. The harder question is: how do I stop this happening again? The honest answer is that you can't stop a vendor from changing their own product. What you can do is reduce the blast radius when they do.
The single most useful change I made was wrapping the device behind a small adapter interface that the rest of my codebase talks to. Before, my job called httpx.post directly against the device. Now it calls a DeviceClient, and the only file that knows about authentication, headers, or hostnames is the adapter.
from typing import Protocol
class DeviceClient(Protocol):
# The methods my application actually cares about.
# Nothing here mentions HTTP, auth, or vendors.
def send_job(self, job_id: str, payload: bytes) -> None: ...
def status(self, job_id: str) -> str: ...
class LanHttpClient:
"""Implementation that talks to the vendor's LAN API."""
def __init__(self, host: str, lan_code: str):
self._host = host
self._lan_code = lan_code
def send_job(self, job_id, payload):
# Vendor-specific request shape lives here and only here.
...
class MqttClient:
"""Alternative path over an open protocol if the device speaks it."""
...
The payoff is that when the vendor changes their API again — and they will — only LanHttpClient has to move. Your scheduled jobs, your tests, your alerting, none of that cares.
Step 4: lean on open protocols where you can
A lot of network-connected hardware now speaks at least one open protocol underneath the vendor wrapper. MQTT is the big one in IoT, and many devices expose it on the LAN even when the vendor doesn't advertise it. If your device does, prefer that path. Open protocols have specifications you can read, brokers you can host yourself, and they don't change because a product manager decided to push everyone through a cloud.
I haven't tested this on every category of hardware, so check your specific device's docs and community forums. But as a rule of thumb: if there are two ways to reach the device and one of them is a documented open protocol, take the documented open protocol.
Prevention checklist
A few habits that have saved me since:
- Pin firmware versions on production devices. Auto-update is convenient until it breaks your job at 2am.
- Wrap vendor APIs in an adapter so the surface area of vendor knowledge in your codebase is exactly one file.
- Capture a known-good pcap the first time you integrate. When something breaks later, you have a baseline to diff against.
-
Subscribe to firmware changelogs with a feed reader or a
curl | diffcron. "Enhanced authentication" is almost always a breaking change in disguise. - Prefer open protocols when the device offers them. Vendor-specific JSON APIs are convenient and disposable; MQTT topics from five years ago still work.
The uncomfortable truth is that any closed surface you integrate against is borrowed time. Building like that's true from day one makes the inevitable firmware push a footnote instead of an incident.
Top comments (0)