Research 01 · 18 min read

VisiLock.

A cryptographic construction for display and transaction semantic alignment. The full account of the protocol, the engineering decisions, and the security reduction that anchors the whole argument.

FieldApplied cryptography

CodeCircom · snarkjs · Foundry

IThe night I started thinking about this.

In February of 2025, I read the ByBit forensic report in a single sitting. The Lazarus Group had taken $1.46 billion in a single transaction. A multisignature cold wallet. Signers in two countries, each holding a Ledger hardware wallet, each looking at their own screen. The forensic narrative was clean. JavaScript had been injected into the Safe wallet interface, replacing the transaction the signers thought they were approving with a delegatecall to a contract address that, once executed, would replace the wallet’s own implementation.

A normal flow would have made that operation legible. A label that said upgrade implementation contract. A confirmation prompt. A second prompt for good measure. The signers saw none of that. The calldata for this particular delegatecall was longer than the Ledger firmware could decode. When the firmware cannot decode it falls back to the safest thing it knows how to render. Thirty two bytes of hexadecimal. A transaction hash. No structure. No selector. No semantics. Each signer looked at a hash and approved.

I read the same paragraph three times. I had been working in some form on wallet security for two years already. EIP-712. ERC-7730. Blockaid simulations. Ledger’s Clear Signing initiative. Each of those projects is real and well intentioned, and none of them would have prevented what happened that day. The trust they offered ran in the wrong direction.

Every wallet loss story for the past decade has had the same shape. A user approves a transaction. The transaction does something different from what the approval suggested.

IIThe shape of the problem.

Cryptocurrency wallets do two things that should be one thing. They render a human readable summary of a transaction. They sign the underlying payload. Those operations execute on two independent code paths. The architecture provides no guarantee that the summary on the screen and the bytes on the wire mean the same thing.

When people say blind signing they often mean the case where calldata is opaque, as in the ByBit incident. The real blind signing problem is broader than that. Every transaction is approved against a representation. Every approval depends on that representation being faithful. The whole stack, from the front end JavaScript that constructs the call, through the wallet code that renders the summary, down to the device that displays the prompt, is responsible for that fidelity, and no single component proves it.

The BadgerDAO incident in December 2021 worked exactly because of this gap. Attackers compromised the Cloudflare account that fronted BadgerDAO’s interface. They injected a script that silently appended an additional approval call to user transactions. An unlimited token approval to an attacker controlled address. The user saw a routine deposit. The wallet signed the deposit, and the trailing approval, together. Five hundred wallets were drained over twenty two days. $120 million. The smart contracts were untouched.

Compromise the front end and you compromise everything below it. There is no component of the system that says: the summary the user is looking at and the bytes that are about to be signed mean the same thing.

IIIWhy the standards were not enough.

When I started thinking about this seriously, I sketched out every existing defense. Not to dismiss them, but to find the seam where each one stopped.

EIP-712 structures transaction data into typed fields. It makes the data more legible. It does not check that the front end renders those fields. ERC-7730 maps function selectors to human readable descriptions. Same problem. Both standards say here is how to display the payload, and rely on the wallet to actually do it correctly.

Hardware wallets isolate the private key inside a secure element. They display transaction summaries on an isolated screen. They cannot decode arbitrary calldata. When they cannot decode, they fall back to a hash, which is the failure mode that drained ByBit.

Transaction simulators like Blockaid VTX predict what a transaction will do if executed. They are useful. They are also a trusted third party. The user, the wallet, and the chain all have to believe the simulator’s verdict, and there is no on chain check that the simulation matches the payload that actually gets signed.

Monitoring systems detect attacks after they happen. The Forta Network retrospective on BadgerDAO showed that the attack was visible in chain data hours before the major losses landed. Retrospective detection is valuable. It is not prevention.

The common factor across all of these is a missing primitive. Nothing in the stack proves the equality of two representations of the same intent. Everyone agreed that wallets should render faithfully. No one had encoded that as a cryptographic claim with a verifier.

IVThe move.

The clarifying move, when it came, was small. State the missing thing as a property.

Let D be the rendered display content. Let T be the transaction representation that will be signed. Define an extraction function Φ that maps either object to a canonical semantic tuple of (recipient, value, contract, function, parameters). A display and a transaction are semantically aligned exactly when Φ(D) equals Φ(T).

Phrased this way, the question becomes mechanical. How do you prove that two representations produce equal extractions without revealing the representations themselves? That is a textbook problem for zero knowledge proofs.

I wrote that definition on a whiteboard and underlined it. After a year of reading about wallet UX, I had the property in one line. Everything else in VisiLock follows from that line.

VWhy a TEE alone is not enough.

The first thing anyone asks at this point is: why not just have a TEE sign that the display and the transaction match?

That works. It is a real defense. It catches every attack a Groth16 proof catches, under the assumption that the TEE is honest. So why do more?

Verifiability. A TEE signed assertion is a signature. A relying party who wants to check it has to (a) trust the TEE’s signing key, (b) trust that the key has not been revoked, (c) trust that the attestation evidence is fresh. The chain stores a signature and a key registry. The chain does not check semantic equality. It checks that an authorized device asserted equality.

The shape of the trust matters. A TEE assertion turns the question are these aligned into the question is this device honest right now. A zero knowledge proof turns the question into does this proof verify against these public inputs. A third party with the same public inputs reaches the same verdict. There is no oracle in the loop.

VisiLock keeps the TEE because the TEE is the source of the display evidence. Only the secure world can read the framebuffer from the display controller. But the alignment check itself moves into a proof a smart contract can verify on chain. The TEE governs whether H_D is trustworthy. The Groth16 proof governs whether Φ(D) equals Φ(T). Those two layers cleanly separate.

PHASE 1 · DISPLAY CAPTURE

TEE display module

Modeled on ARM TrustZone. Reads the framebuffer from the display controller, hashes it with Poseidon, and produces an attestation envelope signed by a key that lives only inside the Secure World. The attestation binds the rendered display to the firmware measurement that produced it.

Click a phase or any component to inspect.

Fig. 1 · VisiLock protocol architecture · interactive · four phases across three trust regions.

VIChoosing the cryptographic stack.

Once the property was clear, the next question was which proof system to use. I sketched a comparison.

Groth16 produces constant size proofs. Three group elements, 192 bytes on the wire. The verifier runs in milliseconds because Ethereum has BN254 pairing precompiles, with gas costs that EIP-1108 brought down to a level that makes on chain verification a practical fixed cost. The trade off is a per circuit trusted setup. The trade off was worth it. The application is a single fixed circuit, deployed once, used millions of times. The setup happens once. The verification happens forever, cheaply.

For commitments inside the circuit I picked Poseidon. SHA-256 is what an Ethereum transaction commitment looks like outside the circuit, but inside a SNARK SHA-256 is ruinous. Each block costs tens of thousands of constraints. Poseidon is built for arithmetic circuits over a prime field. The same field BN254 group operations are over. In circuit hashes in Poseidon cost roughly an order of magnitude less than SHA-256 for the same security level.

I used SHA-256 anyway. Not by choice. The on chain transaction commitment uses SHA-256, and the circuit has to bind to that exact commitment to make the proof meaningful. So Poseidon is the dominant hash inside, used for commitments and the font Merkle tree, and SHA-256 is the bridge to the world outside the circuit. The cost of that bridge shows up plainly in the constraint table. SHA-256 is 27.4 percent of the circuit, and there is no way around it.

VIIThe hardest engineering decision: buckets.

The hardest decision in VisiLock has nothing to do with cryptography. It has to do with how much you let the circuit see.

A naive implementation would parse arbitrary unbounded calldata inside the circuit. The semantic extraction function Φ would walk the bytes, recognize the function selector, decode the parameters, build the canonical tuple. This works, in principle. An R1CS circuit can express any deterministic computation. The reason it does not work in practice is that the circuit size becomes a function of the worst case calldata length. A single circuit that handles every plausible DeFi summary is impractically large to prove.

The way out is buckets. VisiLock has three. 48 bytes, 100 bytes, 175 bytes. Each one is a separate circuit with a separate trusted setup. The wallet picks the smallest bucket that can hold the rendered semantic summary. The 48 byte bucket covers ETH transfers, ERC-20 transfers, and simple approvals. The 100 byte bucket covers compact router style swap summaries. The 175 byte bucket covers larger flows. An Across style bridge intent and an Aave V3 borrow summary both come in at about 115 bytes.

The benchmarks show why this matters. The 48 and 100 byte buckets both live inside the 2¹⁸ proving domain of Groth16. Their constraint counts differ by about 15 percent, but their proof times differ by under 1 percent. The 175 byte bucket has only 1.1 percent more constraints than the 100 byte bucket, but it crosses the 2¹⁸ to 2¹⁹ FFT domain boundary. Proof time jumps by a factor of 1.56. The discontinuity is invisible in constraint counts and devastating in latency.

If you ran a single circuit at the 175 byte size, every transaction would pay the larger bucket cost. The bucket routing keeps the cheap circuit on routine flows and only pays the larger cost when the rendered summary actually demands it.

VIIIInside the constraints.

It is worth opening the 48 byte circuit because the constraint distribution surprised me when I first ran it.

228,514 R1CS constraints. 156,396 non linear and 72,118 linear. The split by component:

Text rendering (32 chars)81,95152.4%

SHA-256 (2 blocks)42,85227.4%

Comparators and routing30,49819.5%

Poseidon commitments1,0950.7%

Text rendering dominates. Every character in the summary is checked against a Poseidon Merkle tree of the approved font glyphs. Depth seven. Seven Poseidon hashes per character. For thirty two characters of summary that is a lot of constraint, more than the SHA-256 work and the comparators combined.

The reason it has to be there is one of the small ugly truths of display attestation. A SNARK can certify that a sequence of bytes is what was rendered. It cannot, by itself, certify that those bytes look like a normal letter. Without the font tree an adversary could feed in a sequence of glyphs that visually decode to a different recipient address. The font tree prevents that by forcing every rendered glyph to be one of the approved characters at a known visual signature. The cost is high. The alternative is silent visual substitution. The high cost is worth paying.

Poseidon commitments cost almost nothing. SHA-256, the necessary bridge to on chain payloads, dominates everything except the font work. Comparators and routing handle the actual Φ(D) = Φ(T) equality. The most cryptographically essential piece of the proof, the equality check, is the cheapest part of the circuit.

The most cryptographically essential piece of the proof, the equality check, is the cheapest part of the circuit.

IXThe attestation registry.

There is a problem with TEEs in any production system, which is that you have to trust them at some layer. The cryptographic guarantees of Groth16 do not extend through the TEE. If the TEE is compromised, the attacker can compute a perfectly valid display commitment over a display the user never saw.

I will not pretend I solved that. VisiLock does the next best thing. The on chain verifier maintains a registry of approved TEE firmware and code measurement hashes. Every accepted proof has to come with an attestation envelope signed by a key in the registry. The registry enforces freshness. Each quote is valid for five minutes from the time of issue, and the attestation evidence is bound to the proof so an old attestation cannot be reused. The validity window is twenty four hours from the registry’s perspective, which keeps stale evidence out without requiring a real time oracle for every approval.

What the registry buys is narrowing. The set of TEEs that can contribute valid display commitments is small and visible. If a vulnerability is published against a specific firmware version, the operator removes that measurement from the registry and every proof produced under it becomes unverifiable. The trust is human administered, and the registry makes that trust explicit. You can see which firmware measurements you are accepting.

The font tree has a separate timelock. Seven days from the moment a new font root is announced before the verifier accepts proofs against it. The reason is the silent substitution worry. If the font tree could be replaced atomically, an attacker who compromised the font registry could swap in glyphs that visually resemble one character but resolve to a different code point. The timelock gives anyone watching the registry a week to notice and yell.

XBatching, and the gas economics.

Most DeFi transactions are not single calls. Approve, then swap. Approve, then deposit. Approve, then bridge. Verifying each call independently produces an independent Groth16 proof. The proofs verify cheaply, but the per proof setup cost on chain dominates.

VisiLock has a batch mode. Up to four call pairs. Per call display and transaction commitments aggregate into a Poseidon binary Merkle tree. The circuit produces one proof that opens to all four roots. The on chain verifier checks one Groth16 proof and the Merkle root rather than four separate proofs.

The batch circuit itself is small. 10,328 constraints, because all it does is open already extracted tuples and check field equality. The gas numbers are good. Four individual proofs cost about 1.08 million gas. One batch proof costs about 270,000. Seventy five percent reduction.

Ethereum L1, 30 gwei$27.75baseline

Arbitrum, 0.1 gwei$0.092300× cheaper

Optimism, 0.01 gwei$0.0093000× cheaper

Base, 0.005 gwei$0.00466000× cheaper

The cost economics tell you where VisiLock should run. On Ethereum L1 at 30 gwei, a single matched proof costs about twenty eight dollars. Too much for routine wallet activity. On Optimism it is under a cent. On Base it is under half a cent. The deployment story is L2 native by definition. Routine transactions stay on the TEE only path. High value transactions, the kinds that have driven the billion dollar losses, pay the 2.4x premium for the publicly verifiable claim.

XIThe adversarial games.

The way I talked myself into believing the construction was correct was to break it on purpose.

We built an automated adversarial harness. The harness implements what we call the DisplayBind game. A programmatic adversary picks (D, T) pairs with Φ(D) not equal to Φ(T) and tries to produce an accepting proof. Five strategies:

Recipient swap. The display shows one address, the transaction has another.
Value inflation. The display shows 10. The transaction sends 1000.
Function selector mismatch. The display says transfer. The transaction calls approve.
Hidden call injection. The display shows a single call. The transaction is a batch.
Parameter reorder. Same call, swapped argument positions.

A hundred adversarial pairs across those five categories. All blocked. The blocks happened at witness generation, before the proof was even attempted, because the circuit constraints became unsatisfiable.

Three test families reproduced the BadgerDAO attack pattern. Three reproduced the ByBit pattern. Four were malicious wallet vendor scenarios where the wallet itself, not the front end, was the adversary. Every one blocked at witness generation.

We also ran a Poseidon sanity test. A thousand random pairs designed to look for hash collisions under random witness selection. Zero false acceptances.

The thing I found persuasive is that the failures happened at witness generation. Not at verification. An adversary cannot produce an invalid proof and ship it to the chain. An adversary cannot even construct an invalid proof. The constraints fail at the moment of building the proof, on the adversary’s own machine, and the proof is never produced.

XIIThe argument I am proudest of.

The security argument for VisiLock has four pieces. Display Integrity says H_D reflects what was rendered. Zero Knowledge says the proof leaks nothing. Public Verifiability says the verdict can be checked from public inputs alone. The piece I worked hardest on is Semantic Binding.

Semantic Binding states: for any probabilistic polynomial time adversary, the probability of producing an accepting proof on commitments (H_D, H_T) while Φ(D) differs from Φ(T) is negligible. I want to walk through the argument because the shape of it is what convinced me the construction was correct.

Suppose there is an adversary that succeeds with non negligible probability. By Groth16’s knowledge soundness, under q-SDH and KEA, there is an efficient extractor that pulls out the witness the adversary used. The witness satisfies every R1CS constraint of the alignment circuit on inputs (H_D, H_T). The constraints enforce three things by construction. First, that the Poseidon commitments H_D and H_T open to the witness representations. Second, that Φ is correctly applied to each representation inside the circuit. Third, that the two extracted tuples are equal component by component.

Now compare the extracted witness representations to the actual D and T the protocol is bound to. Two cases. Either the extracted representations are equal to D and T, in which case the in circuit equality of Φ contradicts the hypothesis that Φ(D) was not equal to Φ(T). Or one of the extracted representations differs from D or T, in which case the equality of Poseidon commitments between the extracted representation and the actual representation is a Poseidon collision.

Poseidon collisions occur with negligible probability. The Groth16 knowledge extraction succeeds with probability close to ε. The union bound across the two cases gives a total adversarial advantage bounded by negligible plus negligible, which is negligible. The non negligible assumption contradicts itself.

The reduction does the work I wanted it to. It tells me that if an adversary could ever produce a valid proof for a misaligned pair, that adversary breaks something cryptographically hard I am explicitly assuming. That is the strongest form of confidence available for a construction at this layer.

XIIIWhat VisiLock does not promise.

I want to be honest about the boundary.

If a TEE is compromised, an attacker can produce a valid attestation over a display the user never saw. In that case the Groth16 proof is still cryptographically valid. The math holds. The chain accepts the proof. The user loses funds. Display Integrity is broken at the attestation layer, not at the proof. Semantic Binding is intact in the formal sense and irrelevant in the operational sense.

I treat this as a Display Integrity failure rather than a Semantic Binding failure, because the proof itself faithfully witnesses the equality of what was committed. The trouble is that what was committed is no longer connected to the user’s screen.

The attestation registry narrows this. Allowlisted measurements. Bounded freshness. A clear governance surface. The registry does not eliminate the failure mode, and I am not going to claim that it does. Defense in depth is real here, and it is asymmetric. A break in the proof system kills semantic binding. A break in TEE capture kills display integrity. Both layers matter. Neither layer is sufficient on its own.

There are other things VisiLock does not yet promise. Gas related fields are not attested in the prototype, so gas griefing attacks fall outside Φ. Each additional attested field adds constraints proportional to its encoding. The current Φ covers the five fields that drive the losses I cared most about. Adding more is a constraint budget question.

XIVWhere this is going.

Mobile latency is the next thing I want to fix. snarkjs WASM gives 30 to 33 seconds on Android 10. That is a lot of seconds to ask a user to wait between approving and broadcasting. Native provers like rapidsnark report three to five times speedups over WASM on comparable circuits. Six to eleven seconds is acceptable for a high value approval. Native compilation is a build problem, not a cryptographic problem, and it is the most leveraged work left on the prototype.

The other direction is Proof Carrying Information Flow for multi agent systems. The same shape, semantic property plus zero knowledge proof, applied to agent ecosystems where indirect prompt injection and tool poisoning and chained compromise propagate across calls. Every piece of data carries a label. The labels propagate as the agent reasons. Before any dangerous action, a zero knowledge proof attests the lineage is policy compliant. The verifier learns the verdict and nothing else. Trust composes across agents through recursive aggregation, so a chain of length n produces a single constant size proof.

The protocol design for that work is done. The circuits are under construction. The threat model is what I want to publish next.

XVClosing.

I keep coming back to one frame. Every wallet loss story for the past decade has had the same shape. A user approves a transaction. The transaction does something different from what the approval suggested. The gap between approval and execution carries every loss.

For most of that decade the response has been to make the display more honest. Better UX. Richer summaries. Smarter simulators. None of it changed the underlying architecture. The summary and the payload still travel independently. The trust still depends on the layers between them.

VisiLock changes the response. It makes the alignment of summary and payload into a cryptographic claim. A claim a smart contract can check. A claim that fails loudly, at witness generation, the moment something does not align. It does not solve every problem in wallet security. It solves one problem in a way that composes with the others.

If you have read this far and you build wallets, or you operate signing infrastructure, or you write standards, I would like to hear what you think.

SHA-256 · this article

0000000000000000000000000000000000000000000000000000000000000000

verified locally · client side digest

All notes Get in touch