Writing

Whose hash, whose key, whose pin — supply chain is the sovereignty question

AI Governance Supply Chain Provenance Model Risk Self-Hosted AI

A1 named four trust primitives that make sovereign-by-default deployment defensible. This piece develops the governance layer behind them. Whose hash backstops the weights, whose key signs the inference log, whose pin authority holds the runtime. A three-question framework for self-hosted AI in regulated firms.

The audit conversation has moved

The audit conversation about self-hosted AI is moving from "do you have inference logs" to "whose key signs them". The first question is answerable inside any procurement cycle. The second is a governance decision the firm has to own, and the supervisory examination after the next material incident will not be satisfied with "the vendor provided the signing infrastructure". The first question is technical. The second is constitutional.

The previous piece in this series laid out the case for sovereign-by-default deployment and named four trust primitives that make the architecture defensible: hash-verified weights, signed inference logs, pinned runtimes, and fixed inference seeds. Each of those primitives is a technical artefact. None of them is a governance answer. A hash is a number. A signature is bytes. A pin is a string. The governance question, the one supervisors are starting to ask and the question this piece develops, is whose authority chain stands behind each of those artefacts. Three questions, in order: whose hash, whose key, whose pin.

This is the Schneier framing applied to model supply chains. Cryptographic primitives do not produce trust on their own. They transport trust across a boundary, where the trust originates from a key, a certificate authority, a registry, or a contractual obligation. Self-hosting an open-weights model does not exit the supply chain. It changes which links in the chain the firm controls and which links the firm has delegated, often without naming the delegation.

The four primitives, re-cast as authority questions

Reading A1 back from the supply-chain lens, each primitive carries an unstated authority assumption.

Hash-verify weights at download. The implied authority is: whoever published the hash. If the hash sits on the same server that hosts the weights, the integrity guarantee covers the transport but not the publication. A compromised publisher serves a matching hash for a tampered weight file, and the firm's download check returns green. The integrity question is not whether the bytes arrived intact. It is whose statement that these are the right bytes can be trusted.

Sign the inference log. The implied authority is: whoever holds the signing key. If the key is generated and managed by the inference runtime vendor, the audit log proves the vendor's runtime ran. It does not prove the firm controlled the runtime. The audit-defensibility question is not whether the log is cryptographically intact. It is whose signature on the log a supervisor will treat as evidence of the firm's control.

Pin the runtime, not just the weights. The implied authority is: whoever decides what version is the pinned version. If the firm pulls from an upstream registry without mirroring, every dependency-resolution pass is an opportunity for the registry to substitute. Pinning is a discipline only if the pin authority is internalised. Otherwise, pinning is a vendor's prerogative dressed as a firm's control.

Fix the inference seed. The implied authority is: whoever validates that the seed produces the same output. Reproducibility on the firm's hardware proves the firm can reproduce. It does not prove the seed has not been chosen to produce a specific known output. The seed-fixing primitive defends against drift, not against intentional manipulation upstream of the seed parameter.

The four primitives are necessary. They are not, by themselves, sufficient for a regulated deployment that intends to survive a supervisory examination after a material incident. The sufficient layer is the authority chain.

Whose hash backstops the weights

The model weight file is the most valuable artefact in the deployment. Compromise it, and every downstream trust primitive is operating on a corrupted base. The integrity-verification practice, in the open-weights ecosystem as of May 2026, has three layers.

The first layer is the publisher's hash, typically a SHA-256 digest posted alongside the weight file on the model card. This is the floor. Verifying against this hash defends against bit-rot and against accidental corruption during transport. It does not defend against publisher compromise.

The second layer is an ecosystem hash, which can mean a Hugging Face model-card-pinned digest, a Sigstore transparency-log entry for the weight file, or a community-maintained registry like the open-weights mirror maintained by IndiaAI for Indian-jurisdiction deployments. Ecosystem-layer hashing distributes the trust root across multiple parties. It defends against publisher compromise, provided the firm verifies against more than one source.

The third layer is firm-controlled hashing. The firm downloads, verifies against publisher and ecosystem layers, then computes its own hash and registers it in the firm's evidence vault. From that point forward, the firm's hash is the only one that matters for the firm's audit trail. Re-downloading the weight file from any source must verify against the firm's hash. This is the layer most commonly skipped, and the layer that the supervisory question most commonly lands on after an incident.

The pattern of attack the three-layer approach defends against is not theoretical. Through 2024 and 2025, several high-profile open-weights repositories experienced compromise events. The Hugging Face platform alone surfaced multiple cases of model files modified post-publication, model cards rewritten to point at backdoored weights, and uploader accounts compromised through credential reuse from unrelated breaches. The PyPI and npm ecosystems have a longer history of the same pattern. Generalised to model supply chains, the lesson is that publisher trust is an operational variable that decays over time, not a one-time procurement decision.

A hash without a named key behind it is just a useful coincidence. Provenance is what makes a hash mean something to a supervisor after an incident.

ṛtaPulse research, May 2026

For a UK FCA-supervised firm, SYSC 6 and PRA SS2/21 model risk management principles place the integrity of model inputs squarely inside the firm's accountability perimeter. The supervisory question after an incident will not be "did the hash match" but "who certified the hash, and what is the firm's evidence chain back to that certification". The three-layer approach answers that question. The single-layer approach does not.

Whose key signs the inference log

A signed inference log is the audit primitive that turns model output from a black-box assertion into a verifiable record. Ed25519 has settled into the practitioner default for new deployments through 2025 and 2026, with widespread runtime support and acceptable signing throughput on production loads. The technical primitive is mature. The governance question is not.

Three key custody patterns are worth distinguishing.

Vendor-managed keys. The inference runtime ships with a signing key, the runtime signs each inference, and the firm receives signed logs. The signature proves the runtime executed. It does not prove the firm controlled the runtime, and it places the signing-key custody inside the vendor's organisational boundary. For a regulated firm, this pattern fails the basic control-attestation test. The firm cannot, in a supervisory conversation, claim independent control of a logging primitive whose root key is held by a third party.

Third-party CA-rooted keys. The firm generates the signing key, but the certificate is chained to a third-party certificate authority. This pattern is common in TLS deployments and translates poorly to inference logs. The CA can revoke the certificate. The CA's compromise propagates downstream. The retention question is also non-trivial: certificates expire on a CA-controlled schedule, and audit logs need to remain verifiable across regulatory retention periods, which run to seven years in several jurisdictions and longer in others.

Firm-HSM-rooted keys. The signing key is generated inside a hardware security module under the firm's exclusive control. The HSM enforces non-extraction of the private key. The signing operation requires HSM authorisation. The audit log signature chains back to a key the firm can prove no other party has held. This is the pattern that survives the supervisory examination. It is also the pattern that imposes the heaviest operational burden, because HSM key rotation, recovery procedures, and HSM lifecycle management become first-class governance items.

The choice is not aesthetic. For US-supervised firms operating under SR 11-7 and OCC 2011-12 model risk frames, the audit log's key custody arrangement is a model risk control. For EU-supervised firms running high-risk AI systems under the AI Act, Article 14 human-oversight requirements imply a verifiable audit trail whose authenticity is independently provable. For MAS-supervised entities operating under Notice 655 critical-systems classification, the key custody and recovery procedures fall inside operational risk management obligations that must be evidenceable on examination.

A regulated firm self-hosting AI inference and accepting vendor-managed signing keys for the audit log is, in supervisory terms, running an outsourced audit log on a self-hosted compute platform. That posture is internally inconsistent. The sovereignty argument that makes self-hosting rational does not survive a vendor-managed audit log.

Whose pin authority on the runtime

Pinning the inference runtime is the primitive that turns "we ran the model on workstation hardware" into a reproducible, audit-defensible statement. The technical practice is straightforward: the firm pins a specific version of llama.cpp, vLLM, Ollama, Triton, or whichever runtime is in production, and freezes the dependency graph at that point. The governance question is whose pinning decision counts.

Upstream pinning is the default that most firms inherit without naming. The firm pulls the runtime from the upstream registry at the configured version tag. The pin authority sits with the registry: if the registry remaps the tag, the firm receives a different binary on the next pull without surface signal. This is the practice that the 2024 PyPI and npm typosquat-and-substitute classes of attack exploited, and the pattern generalises directly to ML runtime registries.

Downstream pinning is the firm-controlled discipline. The firm mirrors the runtime binary into its own artefact registry at the pinned version, with content-addressing rather than tag-addressing. Every deployment pulls from the firm's mirror, not from the upstream registry. The pin authority sits with the firm: tag remapping upstream is observable but not consumable until the firm mirrors a new version explicitly. This is the audit-defensible pattern.

Frozen pinning is the change-management discipline layered on top of downstream pinning. Updates to the pinned runtime version require a change-management approval, a verification run against a regression test set, and a documented rollback path. The pin authority sits with the firm's change-management process: the runtime cannot move without the process moving it. This is the pattern that survives supervisory examination after a material incident.

The dependency graph beneath the inference runtime adds depth to the problem. llama.cpp has a transitive dependency graph that includes CUDA toolkit, cuBLAS, system OpenSSL, and a long tail of OS-level libraries. Each transitive dependency carries its own supply-chain risk surface. The pin discipline at the runtime layer is the floor, not the ceiling. A serious deployment also pins the kernel version, the CUDA toolkit version, and where possible the GPU driver version.

The 250-document poisoning research published jointly by Anthropic, the UK AI Security Institute, and the Alan Turing Institute in October 2025 demonstrated that small numbers of poisoned training documents can produce reliable behaviour-modification effects in large language models. The defence at the deployment layer is supply-chain integrity across the model weights, the runtime, and the document corpus consumed at inference time. None of those defences works without a pin authority discipline that the firm controls. Pinning without pin authority is the inference equivalent of locking the door and giving the key to the courier.

Pinning without internalised pin authority is a discipline a firm performs for itself in a vendor's office.

ṛtaPulse research, May 2026

The accountability layer

The technical primitives and the supply-chain disciplines compose into an architecture. The architecture, in a regulated firm, sits inside an accountability frame that supervisors are starting to articulate explicitly.

UK PRA SS2/21 model risk management principles require named accountable owners for each material model. The principle extends naturally to the model's supply chain: the named owner accounts for the weights, the runtime, the signing key custody, and the dependency pinning discipline, not only for the inference outputs. A firm that names a model owner without that owner having authority over the supply chain has named a placeholder.

EU AI Act Article 27 fundamental rights impact assessment, for high-risk systems under Annex III, requires the firm to assess and document the risks of the AI system. A supply-chain compromise is a class of risk the FRIA must address. Article 14 human-oversight requirements imply a verifiable audit trail; that trail is only verifiable to the extent the supply chain producing it is itself documented and controlled.

US SR 11-7 model risk management guidance has, since 2011, required model validation, ongoing monitoring, and a model inventory. The 2026 supervisory landscape extends model validation upstream: the model's training data, the model's weights provenance, and the model's runtime environment are all parts of the model in the sense the guidance intends. A firm running a self-hosted AI inference path and treating the runtime as out-of-scope for model validation has misread the guidance.

MAS Notice 655 critical-systems classification turns on how the system is used, not on what the system is technically. A self-hosted inference path supporting a critical workload acquires the full operational-risk-management apparatus, including supply-chain verification and recovery-time objectives. Firms running self-hosted AI for non-critical workloads today should keep the classification under review: workload re-classification is a function of business adoption, not of underlying technology.

The pattern across jurisdictions is consistent. The supervisory frame is moving from "can the firm prove the model behaved as expected" to "can the firm prove the model is the model the firm intended to run, end-to-end". The supply chain is the only layer at which that proof can originate.

A decision framework

Before a self-hosted AI inference path goes live in a regulated firm, three questions answer cleanly or the deployment is not yet defensible.

First, whose hash. The firm answers with a documented three-layer integrity practice: publisher hash, ecosystem hash, firm hash, with evidence retention for the regulatory horizon applicable in the jurisdiction. The named accountable owner for the model owns the integrity practice. The procurement record references the firm hash, not the publisher hash.

Second, whose key. The firm answers with a key custody pattern that the supervisor will accept as evidence of independent control. For most regulated firms running material workloads, this means firm-HSM-rooted signing keys with documented rotation, recovery, and lifecycle management procedures. Vendor-managed keys are acceptable only for non-material workloads where the audit-defensibility burden is correspondingly lower.

Third, whose pin authority. The firm answers with downstream pinning and frozen-pin change management. The runtime, the kernel version, the CUDA toolkit, and the dependency graph at the level the firm has visibility into all sit inside the firm's change-management perimeter. Upstream remapping is observable but not consumable without explicit firm action.

A firm with clear answers to all three questions has supply-chain provenance. A firm with one clear answer and two implicit ones has a partial defence. A firm with no documented answers has a sovereignty posture that exists on the architecture diagram but does not survive contact with the supervisor.

Where this pattern stops working: three honest gaps

The three-question framework does not close the entire supply chain. Three classes of risk sit outside it.

Hardware supply chain. GPU firmware, BMC firmware, CPU microcode, and the manufacturing supply chain that produces the workstation are upstream of every primitive discussed in this piece. The 2018 and 2019 ME-firmware-class vulnerabilities, the more recent BMC vulnerabilities affecting commodity workstation platforms, and the long tail of hardware-supply-chain risk all sit upstream of any pin authority the firm can exercise. Hardware-attestation primitives like Intel TDX and AMD SEV-SNP help. They do not close the gap, and they introduce their own attestation-authority questions.

Inference-time prompt injection in retrieved documents. A self-hosted RAG workload retrieves documents at inference time. If the retrieved documents have themselves been poisoned, the model output is corrupted at the inference step, downstream of every weight and runtime integrity check. The three-question framework defends the model. It does not defend the retrieval corpus. Document-integrity practices for the RAG corpus are a parallel discipline, often less mature than the model-integrity practices and frequently the weaker link in production deployments.

Multi-jurisdictional key custody. The firm-HSM-rooted key custody pattern works inside a single regulatory perimeter. A cross-border firm operating under multiple supervisors may face conflicting requirements: a UK FCA requirement for key custody within UK jurisdiction, an Indian DPDPA requirement for key custody within India for personal-data-related inference workloads, a MAS requirement for Singapore-resident keys for MAS-critical workloads. Resolving the conflict requires legal counsel that this piece does not substitute for. Firms with cross-border perimeters should treat key custody jurisdiction as a procurement input, not as a deployment afterthought.

The three gaps are not exhaustive. They are the gaps that, in practitioner conversations through 2025 and 2026, most commonly surface as second-order discoveries after the three-question framework is in place. Naming them at the start is cheaper than discovering them under supervisory pressure.

Closing

Sovereign-by-default is the deployment posture. Supply-chain provenance is the governance discipline that makes the posture survive a supervisory examination. The four trust primitives that A1 named are necessary; they are not sufficient. The three authority questions that this piece develops are the layer at which the primitives compose into something a regulator will treat as evidence.

The framework is one layer of a stack, not the whole stack. Above it sits the data and prompt supply chain: training corpora, fine-tuning datasets, prompt templates, retrieval indices, all of them supply-chain artefacts the three-question framework does not address. Below it sits the silicon and firmware supply chain: GPU firmware, BMC, microcode, the trust base on which every signature in this framework ultimately rests. Cutting across both layers is a time axis the framework does not yet name: cryptographic agility. Ed25519 has roughly a five-year obsolescence horizon before NIST post-quantum cryptography starts displacing classical signatures in production audit infrastructure, and the seven-year retention regimes in several jurisdictions mean an audit log signed today must verify across that migration. Each of those is a piece in its own right, not a gap in this one.

The narrower question this piece does not address, and which deserves its own treatment, is how the three-question framework interacts with the model lifecycle when the firm itself is the publisher of fine-tuned variants. Self-publishing a fine-tuned model collapses the publisher trust layer into the firm's own perimeter. That sounds like a simplification. In supervisory terms, it is a transfer of liability that needs to be explicitly named, not implicitly assumed.

A question for your board's risk register

For each material AI workload the firm runs self-hosted, name the three accountable parties: whose hash backstops the weights, whose key signs the inference log, whose pin authority holds the runtime. If the firm cannot name three parties, name the gap and the closure date.

This is a practitioner note on patterns in production use across multiple firms. It is not legal advice. It is not regulatory guidance. It is not a product or tool announcement. Production deployment in a regulated industry needs qualified legal and regulatory counsel and a genuine internal control review before the first inference call clears in the production environment. None of that is in scope here.

If the three-question framework is something the firm runs or is trying to run, the comments are where this conversation continues. The harder cases, the ones involving fine-tuned variants, cross-border key custody conflicts, and hardware-attestation chains, are the ones I would most like to compare notes on. If the firm has practitioner-grade material on any of those, please get in touch.

Sources and Further Reading

Credit where it is due. This piece draws on the regulatory frameworks, research, standards, and open-source projects listed below. Live links go to source-of-record where available; regulatory texts and research papers update mid-cycle, so always read at the source repository, not from this table.

CategorySourceLink
Predecessor in seriesPULSE A1: Sovereign by default, hybrid at edges (May 2026)rtapulse.com/ai-augmented-governance/field-notes/sovereign-by-default
ResearchDavies, Gal et al., "Poisoning Attacks on LLMs Require a Near-constant Number of Poison Samples", Anthropic + UK AI Security Institute + Alan Turing Institute (October 2025)anthropic.com/research/small-samples-poison
Regulatory frameworkUK PRA SS2/21 (Model Risk Management Principles for Banks)bankofengland.co.uk
Regulatory frameworkUK FCA SYSC 6 (Compliance, internal audit and financial crime)handbook.fca.org.uk
Regulatory frameworkEU AI Act (Regulation 2024/1689), Articles 14, 27eur-lex.europa.eu
Regulatory frameworkMAS Notice 655 (Cyber Hygiene; Outsourcing Risk Management)mas.gov.sg
Regulatory frameworkFederal Reserve SR 11-7 (Model Risk Management Guidance)federalreserve.gov
Regulatory frameworkOCC 2011-12 (Sound Practices for Model Risk Management)occ.gov
Regulatory frameworkIndia DPDPA 2023 (Digital Personal Data Protection Act)meity.gov.in
Supply-chain primitiveSigstore transparency logsigstore.dev
Supply-chain primitiveEd25519 signature scheme (RFC 8032)datatracker.ietf.org
Hardware enclaveIntel TDX (Trust Domain Extensions)intel.com
Hardware enclaveAMD SEV-SNP (Secure Encrypted Virtualization)amd.com
Hardware enclaveNVIDIA Hopper Confidential Computenvidia.com
Indian AI ecosystemIndiaAI Mission (Government of India)indiaai.gov.in
Inference runtimellama.cppgithub.com/ggerganov/llama.cpp
Inference runtimevLLMgithub.com/vllm-project/vllm
Inference runtimeOllamagithub.com/ollama/ollama
Inference runtimeNVIDIA Triton Inference Servergithub.com/triton-inference-server/server

About the Author

I am a CISA and CISSP-certified governance practitioner. My day-to-day work spans technology risk, audit defensibility, and cross-border regulatory intelligence across the UK (FCA, PRA), India (RBI, SEBI, IFSCA), Southeast Asia (MAS), and the Gulf (CBUAE), with working knowledge of the EU AI Act's financial services implications.

My current research sits at the intersection of audit-defensible AI deployment patterns and supervisory expectations in regulated firms. The specific threads are multi-regulation reasoning architectures, sovereign open-weights deployment, supply-chain provenance for self-hosted inference paths, and the governance of inference pipelines in firms with cross-border regulatory perimeters. I am developing a comparative framework for sovereign-by-default AI deployment across South Asia, Southeast Asia, and the Gulf, and welcome engagement from practitioners, regulators, and institutions working at this intersection.

A footnote on Sentinel Engine

Sentinel Engine is the sovereign model deployment I run from my own backyard, currently in beta. The three-question framework above (whose hash, whose key, whose pin authority) is the discipline Sentinel operates under, not a hypothetical proposal. Where this article names a gap, the gap is one Sentinel has surfaced and not yet closed.

LinkedInsachin@rtapulse.com • rtapulse.com


Collaboration welcome: corrections, counterexamples, and build ideas. sachin@rtapulse.comDiscussionsIssuesHow to collaborate.

What ऋतPulse means

rtapulse.com (ऋतPulse) combines ऋत (ṛta / ṛtá), order, rule, truth, rightness, with Pulse (a living signal of health). It reflects how I think GRC should work: not a quarterly scramble, but a steady rhythm, detect drift early, keep evidence ready, and translate risk into decisions leaders can act on.