DEV Community

Cover image for What I actually want from an AI evidence layer (and why I built NexArt)
Jb
Jb

Posted on

What I actually want from an AI evidence layer (and why I built NexArt)

I want to talk about a problem that doesn't get enough attention in AI engineering: when something goes wrong, most teams cannot actually prove what their AI did.

Not "show me the logs." Not "here's a screenshot of the dashboard." I mean prove it. Cryptographically. To an auditor, a regulator, or a customer who has no reason to trust you.

This isn't a hypothetical. It comes up the moment a customer disputes a refund decision, or compliance asks for evidence, or a partner wants to verify the inputs that led to an output. Logs describe what you remember. They don't prove what actually happened.

I'm the founder of NexArt, and we built it specifically to close that gap. In this post I want to walk you through what NexArt is, why it exists, and how to drop it into your stack in roughly an hour.

So what is NexArt, in one paragraph

NexArt is verifiable execution infrastructure for AI systems. Every time your AI runs, NexArt produces a Certified Execution Record (CER), which is a tamper-evident, cryptographically signed artifact that binds the inputs, outputs, parameters, and context into a single record. Anyone (auditors, partners, regulators, your own customers) can verify that record independently at verify.nexart.io. No account, no API key, no dependency on us.

The model we follow is simple: free to create, paid to certify, public to verify.

It is not observability. It is not a logging tool. It is not an eval framework. Think of it as the evidence layer underneath your AI: proof that what ran is what you say ran.

What it does not do

This part is worth being clear about, because the category is new and easy to mix up with other tools.

NexArt does not claim the model's output is correct. It does not score quality or alignment. It does not replace your observability stack or your evals. What it does is prove what executed, with what inputs, at what time, against what parameters, and bind all of that into one artifact that anyone can verify without trusting you.

The promise is integrity of the record, not quality of the model. Different problem.

Why this matters now

A few situations where the lack of execution evidence really bites:

  • Audit and compliance work. ISO 42001, SOC 2, and the EU AI Act all push toward demonstrable, reproducible evidence of AI decisions.
  • Customer disputes. Someone claims your AI denied them unfairly. What is your proof of the exact inputs and parameters at that moment?
  • Agent workflows. Multi-step agents make many decisions per run. Without a sealed trail, you cannot reliably reconstruct which step did what.
  • Debugging. Reproducing a failed run from a tamper-evident record is a very different experience from stitching together five different log sources.

The pattern I keep seeing is that teams discover this gap during an audit or a dispute. By then, the executions they need to prove are already gone.

How it works

The flow is intentionally small. Four steps, each with one job:

  1. Capture. Inputs, outputs, and execution context are recorded at runtime.
  2. Seal. A canonical, tamper-proof fingerprint (the certificateHash) binds them into a single record. This happens locally and offline.
  3. Attest. An independent attestation node signs the record and issues a receipt plus a verification envelope.
  4. Verify. Anyone can re-derive the hash, validate the signature, and check the envelope. No trust in NexArt or in you is required.

Verification itself happens in three independent layers:

  • Layer 1, Integrity. Recompute SHA-256 over the canonicalized whitelist and compare it with certificateHash.
  • Layer 2, Receipt. Validate the Ed25519 signature from the attestation node.
  • Layer 3, Envelope. Validate the bundle-level signature over the attestation projection.

Each layer reports independently as PASS, FAIL, or SKIPPED. A sealed-only (offline) bundle gives you PASS / SKIPPED / SKIPPED. A certified bundle gives you three PASS lines. SKIPPED is not a failure, it just means that layer doesn't apply to that bundle yet.

Quick implementation

This is the canonical integration: one function call wrapped around your model inference. No proxy, no middleware, no infrastructure change.

1. Install the SDK

npm install @nexart/ai-execution
Enter fullscreen mode Exit fullscreen mode

2. Set two environment variables

export NEXART_NODE_URL="https://node.nexart.io"
export NEXART_API_KEY="<your-api-key>"
Enter fullscreen mode Exit fullscreen mode

3. Wrap an execution

import {
  certifyLangChainRun,
  verifyAiCerBundleDetailed,
} from "@nexart/ai-execution";

async function main() {
  // Capture, seal, and certify in one node round-trip.
  const { bundle, certificateHash, verificationUrl } =
    await certifyLangChainRun({
      provider: "openai",
      model: "gpt-4o-mini",
      input: {
        messages: [
          { role: "user", content: "Should this refund be approved?" },
        ],
      },
      output: { decision: "approve", reason: "policy_passed" },
      nodeUrl: process.env.NEXART_NODE_URL!,
      apiKey: process.env.NEXART_API_KEY!,
    });

  console.log("certificateHash :", certificateHash);
  console.log("verificationUrl :", verificationUrl);

  // Independent verification of the returned bundle.
  const report = await verifyAiCerBundleDetailed(bundle);
  console.log("Integrity (Layer 1) :", report.integrity);
  console.log("Receipt   (Layer 2) :", report.receipt);
  console.log("Envelope  (Layer 3) :", report.envelope);
}

main().catch((err) => {
  console.error("FAILED:", err);
  process.exit(1);
});
Enter fullscreen mode Exit fullscreen mode

4. Run it

npx tsx test-harness.ts
Enter fullscreen mode Exit fullscreen mode

You should see something like this:

certificateHash : sha256:9f2b1c8e4a7d6f3b...
verificationUrl : https://verify.nexart.io/c/sha256:9f2b1c8e4a7d6f3b...
Integrity (Layer 1) : PASS
Receipt   (Layer 2) : PASS
Envelope  (Layer 3) : PASS
Enter fullscreen mode Exit fullscreen mode

Three PASS lines tell you a few things at once. The bundle is byte-identical to what the node attested. The receipt signature validates against the node's key. And the verification envelope correctly binds the attestation back to the bundle.

Open the verificationUrl in any browser. Anyone, including someone who has never heard of you, can confirm the record. No account, no API key. That's it. You now have audit-grade execution evidence for that AI call.

Offline mode, when you don't want a network call

Not every execution needs to hit the node. NexArt supports a fully offline flow that's useful for air-gapped environments, local testing, or batching certification for later.

import { sealCer, verifyAiCerBundleDetailed } from "@nexart/ai-execution";

const { bundle, certificateHash } = sealCer({
  provider: "openai",
  model: "gpt-4o-mini",
  input: { messages: [{ role: "user", content: "What is 2 + 2?" }] },
  output: { text: "4" },
});

const report = await verifyAiCerBundleDetailed(bundle);
// report.integrity === "PASS"
// report.receipt   === "SKIPPED"
// report.envelope  === "SKIPPED"
Enter fullscreen mode Exit fullscreen mode

Two things to keep in mind here. SKIPPED is expected for a sealed-only artifact, so don't treat it as a failure. And when you're later ready to make the record publicly verifiable, submit the bundle to the attestation node. The same certificateHash will receive a signed receipt. The identity of the record never changes.

Multi-step workflows: Project Bundles

Real systems aren't a single call. Agents plan, retrieve, decide, and act across many steps. For those, NexArt has Project Bundles. Each step becomes its own CER, and the bundle ties them into one verifiable artifact with its own projectHash.

import { startWorkflow } from "@nexart/agent-kit";

const workflow = startWorkflow({ projectTitle: "Contract review" });

const clauses = await workflow.step("Extract clauses", async () => {
  return await llm.call("Extract key clauses...");
});

const risks = await workflow.step("Summarize risks", async () => {
  return await llm.call("Summarize risks from: " + clauses);
});

const bundle = workflow.finish();
// bundle.integrity.projectHash is the verifiable hash for the whole workflow
Enter fullscreen mode Exit fullscreen mode

One workflow. Many executions. One artifact you can hand to anyone.

The point of independent verification

Here is the part most evidence systems quietly skip. Verification cannot require trust in the system that created the record. If it does, it isn't really evidence.

A few design choices follow from that:

  • The SDK creates records locally.
  • The attestation node certifies them independently.
  • The verifier at verify.nexart.io is a separate public surface, with no login and no API key.
  • Verification can also run offline, using only the bundle and the node's published public key.

To verify a record, anyone can open https://verify.nexart.io/c/{certificateHash} directly, or paste an execution ID, certificate hash, or upload the JSON bundle on the home page, or run verifyAiCerBundleDetailed(bundle) locally in their own code. All three paths land at the same answer.

This is what makes the evidence portable. You can hand it to an auditor, a regulator, a counterparty, or a customer, and they do not have to trust you for it to mean something.

A few pitfalls worth mentioning

Some mistakes I've seen in real integrations, in case they save you some time:

  • Don't mutate a certified bundle. Re-ordering keys, adding fields, or re-serializing will break certificateHash. Persist bundles byte-for-byte after certification.
  • Don't look records up by executionId. Always use certificateHash, which is the canonical identity. Two attempts of the same execution can share an executionId but produce different hashes.
  • Don't hash the full bundle yourself. Hashing is over a strict whitelist (bundleType, version, createdAt, snapshot, plus optional context and contextSummary) using JCS canonicalization (RFC 8785). The SDK does this for you.
  • Don't treat SKIPPED as a failure. It's the correct result for sealed-only artifacts on Layers 2 and 3.

Where to dig in next

  • Main site: nexart.io
  • Docs and quickstart: docs.nexart.io
  • Public verification surface: verify.nexart.io
  • LangChain example repo: github.com/artnames/nexart-langchain
  • n8n example repo: github.com/artnames/nexart-n8n

One last thought

We've spent the last decade making AI systems more capable. The next decade is going to be about making them accountable. As more decisions get automated, the cost of not having execution evidence quietly compounds. Every uncertified run is a record you can't recover later.

If you're building anything where someone might one day ask "prove it," I'd start certifying now. One function call, one record, one link anyone can verify is a much better position to be in than scrambling to reconstruct logs the week before an audit.

That's NexArt. If you try it, I'd genuinely love to hear what works and what breaks. Feedback from real integrations is how the protocol gets sharper.

Thanks for reading.

Jeremy, founder of NexArt

Top comments (0)