precursor « bunnie's blog

Archive for the ‘precursor’ Category

IRIS (Infra-Red, in situ) Project Updates

Sunday, March 10th, 2024

A goal of mine is to give everyday people tangible reasons to trust their hardware. Betrusted is a multi-year project of mine to deliver a full-stack verifiable “from logic gates to Rust crates” supply chain for security-critical applications such as password managers. At this point, many parts of the project have come together: Precursor is an FPGA-based open hardware implementation, and it runs Xous, our Rust-based microkernel message-passing OS. I currently use my Precursor on a daily basis with the “vault” application loaded to manage my passwords, TOTP tokens, and FIDO/U2F logins.

However, Precursor is expensive, because FPGAs are expensive. The device could be much cheaper with a dedicated security chip, but then we have no reason to trust these chips – security chip vendors don’t facilitate any form of user-side inspection, so we can’t tell if we have real or fake security chips in our device.

Kind of defeats the purpose, if you ask me.

Last March, I introduced the concept of Infra-Red, in situ (IRIS) inspection of silicon in a blog post and an arXiv paper. My hope has been that IRIS, plus some circuit-level scans and mathematical methods, could be the missing link that allows us to transition from our expensive FPGA-based Precursor solution, to a more pocketbook-friendly ASIC-based solution.

At the time when I released the initial paper, every picture was manually composed and focused; every sharp image was cherry-picked from dozens of fuzzy images. It was difficult to reproduce images, and unsuitable for automatically tiling multiple images together. The technique was good enough for a demo, but shaky as a foundation for full-chip verification.

Over the past year, I’ve refined the technique and implemented a fully automated system that can robustly and repeatably image whole chips at micron-scale resolution in a matter of minutes. The idea is not for everyone to have one of these robots in their home (but how cool would that be!); rather, the idea is that most users could utilize an inexpensive but somewhat fiddly setup and compare their results against reference images generated by the few users like me who have fully automated systems.

Here’s an example of the MPW7 run on SKY130A, courtesy of Matt Venn, imaged using my automated IRIS machine:

The above is just a thumbnail; click on the image to zoom into and browse the full-resolution version at siliconpr0n.org (and check out my collection on that server for some more full-chip IRIS images). Each of the mottled dots in the lighter-shaded rectangles in the image corresponds to a logic gate, flip flop, or a “fill cell” (dummy transistors wired up as decoupling capacitors). To create the image, I just had to tell the machine where three corners of the chip are, and it automatically focuses and scans the full area. A script then post-processes the tiles into the fully-composed image you see here.

A broad goal of the project is to democratize silicon imaging and improve the state of the art in hardware verification. This is an area where simply popularizing silicon imaging can move the needle on security for everyone, because a credible threat of being caught reduces the incentive for adversaries to invest in expensive Trojan-implantation capabilities.

The good news is that the project is now at a state where, over the next couple of months, I can share a series of posts that detail the methodology used to develop the automated IRIS system, as well as document the construction of the device.

To kick things off, I’m going to start with a review of the current state of the art in hardware verification.

Review: Current State of the Art in Hardware Verification

Is my computer trustworthy? How do I know it was built correctly, and will execute my instructions faithfully? These are the questions that hardware verification aims to answer.

Hardware verification happens at every level of the supply chain. Let’s start with the relatable, every-day problem of how do I pick a piece of hardware to buy, and from there dive all the way down to esoteric topics such as verifying the circuits and devices that make up the hardware itself.

Consumers shopping for a computer rely principally on reputation mechanics to make choices: is the manufacturer a reputable brand? And how do my peers rate it?

Many of us can relate to how these mechanisms can fail. Ratings can be inflated by purchasing fake reviews, and stores can sell counterfeit brand goods. “Surely, this is just a problem of the retail market”, one might think. Once we’ve waded through the swamp of advertisements and on-line storefronts, the powers that be ought to ensure the device we ordered is the device we get! Otherwise it’s like, fraud or something bad like that, right?

Unfortunately, the level of sophistication for verification at every level of the supply chain is precisely as much as it needs to be for the vendor to get away with it, and not one iota more. For example, when it comes to logistics, we largely rely upon anti-tamper seals and tracking numbers to make sure our package arrives intact.

These measures are generally effective at deterring petty theft, but most anti-tamper tape can be purchased by the roll on gray markets, and tracking updates are too infrequent to rule out package diversion and intervention. Besides, that, consumers are conditioned to accept packages that have been mis-routed or experienced an “exception” during delivery – few will return an item that was delivered a day late under fears that the item could have spent a night in a facility where back doors were installed. Our ready acceptance of delivery exceptions is just one example of how supply chains are only as tight as they need to be for broad consumer acceptance, and not one iota more.

Once you’ve received the shipping box, most high-end consumer electronics have additional seals on their packaging. Unfortunately, most seals rely on easy-to-copy anti-tamper solutions such as holograms or fine printing; or at best contain serial numbers that are easy to copy yet have no easy way to check for authenticity.

Solutions such as glitter seals or other stochastic seals that rely upon the randomness inherent in paper fiber or glue to create a unique, unforgeable seal for every package have been proposed, but adoption is low and there is a lack of standardized, easy-to-use verification tools for such seals. Again, packaging seals are just as good as they need to be for broad consumer acceptance, no more, no less.

At the product level, there is a modicum of good news, at least in certain classes of products.

The traditional route of verification – observing the “fit and finish” of a product to detect counterfeits – is still the dominant method for most consumer products. However, in mobile phones and some laptops, manufacturers deploy electronic serial numbers and tamper detection techniques to deter would-be thieves of components or sub-assemblies. The effectiveness of these techniques depend intimately upon the implementation details; but in any case, they incur a cost in repairability and often times the end consumer can’t access the vendor’s databases to check that everything is in order. Instead, consumers are forced to delegate their trust to the vendors; yet regular consumers have no way to audit the vendors. The deferred trust boogeyman haunts everything from Apple’s iPhone ecosystem, to Intel’s SGX remote attestation mechanisms.

One would hope that this increase in verification sophistication is a trend that improves the deeper you go into the underlying technology. Unfortunately, it’s quite the opposite.

At the component level, the standard to this day for verifying the authenticity of a component is to look at the top marking (that is, the laser-etched numbers and logo) and the fit and finish of the package. Counterfeiters will often miss subtle details such as the font of the numbering, the location of the pin 1 marking, the composition of the overmold material, etc. Factories train staff to inspect and detect defects based on these irregularities.

What happens when a counterfeiter gets all these factors right? Well … the component goes into production, and we find out later about problems, either due to the assemblies failing test in the factory, or perhaps failing in peculiar ways in the field. For better or for worse, these problems are rare, generally affecting less than single-digit percentages of end users, and absent specific requirements or payments from customers to do more, equipment makers do exactly this and nothing more to protect the supply chain.

Even though most modern microcontrollers ship with an electronic serial number, few device manufacturers take advantage of them, and, perhaps somewhat surprisingly, there is usually no easy way to authenticate serial numbers with the component maker. Often times the purpose of the serial number is to serve as a unique ID for tracking products once manufactured; they are not structured to serve as a cryptographic method for determining provenance of the chip itself. Some security-forward microcontrollers feature things like PUFs (physically unclonable functions), but their implementation is usually directed at preventing people from tampering with or servicing their devices, rather than enabling users to verify the construction of the device itself.

And that’s about it – this is where any attempt to verify our electronics stops. To a first order, nobody even looks at the wires inside the chip.

This is because prior to IRIS, your options for inspecting silicon are either destructive, or experiments conducted in high energy physics labs. If you’ve ever seen the brilliant teardowns done by companies like TechInsights, the chips are generally imaged at the circuit level with a SEM or FIB, which requires the chip to be removed from its package and some of the metal layers to be permanently stripped off. Thus, one can obtain extremely high-quality imagery of a sample chip, but these techniques cannot be used to verify the very chip you want to use inside your computer, as the imaged chip must be destroyed.

An alternative, non-destructive technique known as X-ray ptychography can be thought of as a very high resolution 3D scanner for circuits. It’s very impressive, but to date it can only be done in a handful of high energy physics labs and it takes a long time (about three hours for 20 cubic microns) to image a full chip.

The technique I’m developing, IRIS, is a non-destructive technique to acquire micron-resolution images of the first metal layer of a chip at a rate of a seconds per square millimeter. To the best of my knowledge, this is the first practical technique that gives users a glimpse of the actual circuits they will use after it has been mounted on a circuit board.

There’s one final layer deeper into the technology stack beyond imaging of the circuits: electrically testing the circuits directly with a technique known as a “scan chain”. The good news is that scan chains are a mature technology; the bad news is that it is almost never done by users because the details of the scan chain are kept secret, and that a scan chain inspection alone can be easily defeated by a malicious adversary.

The purpose of a scan chain is to assist with the rapid detection of fabrication defects. It works by adding an extra path to a finished design that strings every register into one or more chains. Bit patterns are loaded into the chain, and the resulting logical operations performed by the gates connected between the registers is observed on the output of the chain. With enough patterns, you can build up an idea of what logic is between every register. Of course, the space of bits to explore grows exponentially with the number of bits in a chain, so it’s not practical to brute-force a large state space.

As a result, scan chains are good for detecting flaws in known circuits introduced by mother nature, but ineffective at deterring a malicious adversary. This is true even if one could brute-force the entire state-space due to the epistemic circularity of trusting a circuit to test itself. More colloquially, one may have heard of the “Hawthorne Effect” or the “observer effect”, which describes a subject under study temporarily altering their behavior because they know they are being studied, thus affecting the results of the study. In this case, a scan chain knows when it’s being queried, and thus, a malicious modification to a scan chain can add hidden states that alter its behavior for the duration of a scan, allowing it to pretend to be correctly constructed, but only when a check is actively running.

Although almost every chip goes through a scan chain test before it is shipped, the test vectors are proprietary, and often times the scan chains are deliberately and permanently sealed in a way to make it impossible for users to access. Scan chains may be scuttled after the factory test for ostensible security reasons, as an adversary can use them to read out the full state of a chip. However, depending on the threat model, it may be preferable to give users the option to exercise the scan chain and then permanently fuse out the scan chain after inspection.

Filling in the Verification Gap

As the summary chart below shows, supply chain verification techniques, although imperfect, enjoy broad adoption at the component level and above. However, there is a significant gap in user verification starting at the chip level and below.

My work, the Infra-Red, in situ (IRIS) inspection of silicon, is a step toward filling in this verification gap. In its simplest form, chips are deliberately constructed for optical inspection with infra-red light shined through the back side of the chip – that is, the side facing “up” that is not bonded to the circuit board.

The technique works because although silicon looks opaque at visible light, it is transparent starting at near-infrared wavelengths (roughly 1000 nm and longer). Today’s commodity optics and CMOS cameras are actually capable of working with lights at this wavelength; thus, IRIS is a low-cost and effective technique for confirming the construction of chips down to block level. For example, IRIS can readily help determine if a chip has the correct amount of RAM, number of CPU cores, peripherals, bond pads, etc. This level of verification would be sufficient to deter most counterfeits or substitutions.

However, due to the diffraction limit of infra-red light, it is insufficient for transistor-level imaging. Furthermore, it can only reliably infer the existence of the metal layers closest to the transistors (in technical jargon, it can infer the existence of “standard cell” library elements); it cannot reveal too much information about the higher-level metal wires that route between logic gates.

Thus, for threat models which require protection against adversaries capable of manipulating wires on an integrated circuit, IRIS should be combined with scan chain techniques to robustly verify a chip’s construction.

Hybrid Verification For the Win

As alluded to previously, scan chains alone are insufficient for detecting modifications to a circuit, because an adversary may modify the scan chain in such a manner that it responds with correct answers during the scan itself, but behaves maliciously otherwise.

However, such modifications require the introduction of additional logic gates to track the scan state and compute both correct and malicious responses. Because IRIS can “see” logic gates, it is able to put a firm upper bound on the potential amount of additional logic present in an integrated circuit. Thus, the combination of IRIS and scan chain techniques may be able to effectively verify that a circuit is correctly constructed.

The diagram above illustrates how IRIS and scan chain techniques compliment each other to obtain a high-confidence verification of a chip’s structure.

At the largest scales, IRIS can trivially quantify the number of IP blocks, pads, analog functions and memories; this is in contrast to scan-chain techniques which may struggle to characterize analog functions and other macro-scale properties, due to factors such as the analog limitations of scan chains, and the exponential growth of state-space to explore at the macro-level.
At intermediate scales, IRIS can quantify the number of bits of memory, or bound the number of standard cells in a region. This places constraints on how much malicious logic could be injected that could otherwise defeat a scan chain test.
At the smallest scales, IRIS cannot discern individual wires or gates. However, scan chain excels at exploring the topology and function of logic at the smallest and most local increments.

Thus, in combination, the two techniques may be used to achieve a high confidence verification of a chip’s function – at a cost and time scale suitable for point-of-use (e.g. end user) verification. I say “high confidence” because in the end, there is a probabilistic nature to both imaging and scan chain pattern coverage. As future work I’d like to explore the possibility of using formal methods to mathematically rule out any escapes, but absent formal proofs, it is important to understand that the technique is probabilistic in nature. That being said, it is still vastly better than the current state of the art, which is doing nothing at all.

Hybrid verification could be a viable path toward filling in the verification gap at the most fundamental levels of the supply chain, assuming chip vendors are willing to facilitate such verification by designing and packaging their products in a manner conducive to optical verification, and assuming chip vendors are willing to share scan chain test vectors with end users.

IRIS: Where We Are, and Where We are Headed

Because of the 1000 nm wavelength limit imposed by the transparency of silicon, IRIS has a limit on the features it can resolve. Below is an example IRIS imaging a small part of a RISC-V core on a 130nm chip fabricated using the SKY130A open PDK from Matt Venn’s MPW7 run; you can browse the entire chip image here.

Use the slider to compare the base image against an overlay derived from the design files. Each colored rectangle in the overlay corresponds to a “standard cell”: blue are flip flops, pink are filler/capacitor cells, aqua are varieties of and-or-invert gates, etc. Note that the native resolution of the image above is 1469 pixels wide; it has been scaled down to fit the width of this page.

One can see that in a 130nm process, IRIS has a reasonable chance of conducting a gate-count census of an entire chip. And yes, it’s not atypical for chips to be limited not by logic density, but by wiring density; hence, the majority of a chip’s active area contains filler cells (the pink rectangles).

Above is of an identically scaled region of a 22nm chip, again of a RISC-V core, but this time almost the entire core is within view because the logic gates are, unsurprisingly, much smaller: a single gate can be as small as a few pixels. At this node, IRIS can place an upper bound on gate count to within a couple dozen extra flip flops.

An important caveat when comparing images above: the backside finish of the 130nm chip is a mirror polish, but the 22nm chip only went through backgrinding; in other words, the 22nm chip’s image clarity is degraded due to a series of small surface ridges that refract light. If the 22nm chip had the same mirror-finish quality as the 130nm chip, the imaging resolution of the 22nm chip would be improved. Back side polishing of chips is not a difficult or uncommon process, but it’s not done unless explicitly required. Thus, chips intended for optical verification should specify a high quality mirror finish for the back side.

IRIS goes hand-in-hand with electrical scan chains to achieve full chip verification. Scan chains are able to quickly confirm the wiring between small clusters of standard cells, but Trojans can evade detection by including an honest copy of the affected logic. IRIS confirms that a given cluster of logic being tested by a scan chain is approximately the right size. The search perimeter for rouge cells is reduced by running the scan chain test at high speeds. Ideally, the total bounds are tight enough to rule out the existence of sufficient extra logic to evade detection in scan chain testing.

In other words, what IRIS can’t directly image, the scan chain has to make up for with complexity of test. Thus, at 130nm, a simple bit-shift scan chain may be sufficient, since individual gates are resolvable; at 22nm, a more complicated technique splitting the scan chain into multiple segments, capable of challenging the system with mutually unpredictable data patterns, may be required to drive up the lower bound on circuit complexity to the “dozens of logic gates” range required to bypass the test. And at the most advanced nodes, even more scan chain segments may be required along with supplementary design techniques to drive the lower confidence bound into the “hundreds of logic gates” range for reliable Trojan detection with IRIS.

In general, when I use the term “IRIS” alone in a context where high-confidence gate level verification is required, readers should infer that I’m referring to some kind of hybrid verification technique of both IRIS and scan chain.

That being said, I envision IRIS mainly being used to verify high-value circuitry, such as those found in a discrete cryptographic enclave intended to store secrets such as root keys. These enclaves would not require the performance or density of the latest process nodes. With careful design, a 22nm or 28nm process can offer GHz clock speeds, sufficient for storing and processing bulk data with root secrets. The “2x” nm node is particularly interesting because it is the best “value per transistor” sweet-spot, and likely to stay that way for the foreseeable future: it’s the smallest process node that still uses the easier-to-fabricate planar CMOS transistors while requiring only single-patterning DUV lithography techniques.

Thus, the bulk of my on-going research will focus on samples produced in 130nm and 180nm (because there are Open PDKs available for those nodes today), and 22nm (because of the ultimate economic importance of the node). I am also betting that while the 2x nm node is not open source today, it will become more open within the next decade if the world continues on a “business as usual” scenario where technology continues to race down the commodification curve, and fabs continue to compete on price and need more designs to keep them busy and profitable.

While the imaging system has met its initial goals, the software still has miles to go before I sleep (And miles to go before I sleep). In particular, I’m still working on training a computer to automatically recognize patterns of gates in IRIS images and to generate a gate count census. I’d also ideally like to find a way to use formal methods to place an upper bound on the amount of logic one can hide in a scan chain for a given testing methodology, so designers can have a formally proven tool to validate that their scan chains are sufficiently dense so that IRIS can pick up any attempts to bypass them. Unfortunately, I’m neither a software engineer nor a mathematician, but the problems are interesting enough that I’ll still give them a go. Worst case, I will learn something new along the path to failure.

Above is the IRIS machine that I’ve built. There’s a lot going on here – but basically, it’s an IR camera attached to a microscope, a nanometer-resolution focusing mechanism, and a pair of 1050nm light sources that have continuously adjustable azimuth and zenith. This microscope assembly is mounted in a Jubilee motion platform. The Jubilee is open source, and was designed by Sonya Vasquez of Prof. Nadya Peek‘s Machine Agency group. I got it as a kit from Filastruder. The base motion platform is capable of 10 micron steps, and features a kinematically coupled Z-bed with three independent actuators, allowing me to dynamically compensate for planarity issues in the sample undergoing imaging.

Above is a short video loop showing the core mechanics in action. The weird thing on the bottom with the red and black wires coming out of it is the kinematically coupled nanometer-resolution fine focus stage; its motions are too small to be picked up by the camera.

An explicit goal of this project is to open source all of IRIS, so that anyone can replicate the imaging system. Democratizing chip verification is important because a credible threat of being caught reduces the incentive of adversaries to deploy expensive Trojan-implantation capabilities.

With little fear of being caught, there’s a payoff even if an adversary has to plow tens of millions of dollars into a capability for planting chip-level hardware Trojans in high-value targets. However, if chip inspection equipment can purchased in the ballpark of hundreds to perhaps thousands of dollars, and more than a handful of users are known to routinely inspect chips, the path to payoff for an adversary before they are caught becomes murky. In this case, a rational adversary may be deterred from targeting an IRIS-enabled design, instead reserving their capabilities only for the chips that are difficult to inspect.

Aside from that, I’ll be straight with you – a big motivation for IRIS is simply because I am curious, and I want to look inside chips and see how they are built (and it’s kind of fun and strangely satisfying to build robots). I haven’t been disappointed with what I’ve seen so far – I have to stop myself from wasting evenings browsing through the construction of chips. I’ve done a bit of chip design in the past, and it’s fascinating to see the diversity of techniques and new trends in chip designs. I’m excited to share this sense of wonder with kindred spirits!

Given the volume of material to cover, I’m going to break the documentation up into a series of blog posts that go into the methodology used to build the machine, as well as details about every custom component, and the design decisions that went into them. I’ll also summarize the status of the analysis software that accompanies the system – so stay tuned for more posts!

However, if you’re impatient and don’t want to wait for the documentation, you can already browse the source files for the microscope, control software, stitching software, and layout extraction software.

Finally, a big shout-out to NLnet and the European Commission. NLnet’s NGI0 Entrust fund, established with support from the European Commission’s Next Generation Internet Program, are instrumental in facilitating my work on IRIS. Also a big shout-out to my Github Sponsors for their incredible generosity and monthly support. Thanks to all these donors, I’m able to keep IRIS 100% open source and free of conflicts of interest with commercial investors.

❤️ Sponsor me on Github! ❤️

Posted in betrusted, Hacking, IRIS, open source, precursor | 18 Comments »

Fully Oxidizing `ring`: Creating a Pure Rust TLS Stack Based on `rustls` + `ring`

Friday, September 16th, 2022

I really want to understand all the software that runs on my secure devices.

It’s a bit of a quixotic quest, but so far we’ve made pretty good progress towards this goal: I’ve been helping to write the Xous OS from the ground up in pure Rust – from the bootloader to the apps. Xous now has facilities like secure storage, a GUI toolkit, basic networking, and a password vault application that can handle U2F/FIDO, TOTP, and plaintext passwords.

One of the biggest challenges has been keeping our SBOM (software bill of materials) as small as possible. I consider components of the SBOM to be part of our threat model, so we very selectively re-write crates and libraries that are too bloated. This trades off the risk of introducing new bugs in our hand-rolled code versus the risk of latent, difficult-to-discover bugs buried in more popular but bloated libraries. A side benefit of this discipline is that to this day, Xous builds on multiple platforms with nothing more than a default Rust compiler – no other tooling necessary. It does mean we’re putting a lot of trust in the intractably complicated `rustc` codebase, but better than also including, for example, `gcc`, `nasm`, and `perl` codebases as security-critical SBOM components.

Unfortunately, more advanced networking based on TLS is a huge challenge. This is because the “go-to” Rust library for TLS, `rustls`, uses `ring` for its cryptography. `ring` is in large part an FFI (foreign function interface) wrapper around a whole lot of assembly and C code that is very platform specific and lifted out of BoringSSL. And it requires `gcc`, `nasm`, and `perl` to build, pulling all these complicated tools into our SBOM.

Notwithstanding our bespoke concerns, `ring` turns out to be the right solution for probably 90%+ of the deployments by CPU core count. It’s based on the highly-regarded, well-maintained and well-vetted BoringSSL codebase (“never roll your own crypto”!), and because of all the assembly and C, it is high performance. Secure, high-performance code, wrapped in Rust. What else could you ask for when writing code that potentially runs on some of the biggest cloud services on the Internet? I definitely can’t argue with the logic of the maintainers – in Open Source, sustainability often requires catering to deep-pocketed patrons.

The problem, of course, is that Open Source includes The Bazaar, with a huge diversity of architectures. The problem is well-stated in this comment from a RedHat maintainer:

…I’m not really speaking as a member of the Packaging Committee here, but as the person who is primary maintainer for 2000+ packages for Rust crates.

In Fedora Linux, our supported architectures are x86_64, i686, aarch64, powerpc64le, s390x, and, up to Fedora 36, armv7 (will no longer supported starting with Fedora 37). By default, all packages are built on all architectures, and architecture support is opt-out instead of opt-in. […]

On the other hand, this also makes it rather painful to deal with Rust crates which only have limited architecture support: Builds of packages for the affected crates and every other package of a Rust crate that depends on them need to opt-out of building on, in this case, powerpc64le and s390x architectures. This is manageable for the 2-3 packages that we have which depend on ring, but right now, I’m in the process of actually removing optional features that need rustls where I can, because that support is unused and hard to support.

However, the problem will get much worse once widely-used crates, like hyper (via h3 and quinn) start adding a (non-optional) dependency on rustls / ring. At that point, it would probably be easier to stop building Rust crates on the two unsupported architectures completely – but we cannot do that, because some new distribution-critical components have been introduced, which were either written from scratch in Rust, or were ported from C or Python to Rust, and many of them are network stack related, with many of them using hyper.

Long story short, if Redhat/Fedora can’t convince `ring` to support their needs, then the prognosis for getting our niche RISC-V + Xous combo supported in `ring` does not look good, which would mean that `rustls`, in turn, is not viable for Xous.

Fortunately, Ellen Poe (ellenhp) reached out to me in response to a post I made back in July, and informed me that she had introduced a patch which adds RISC-V support for ESP32 targets to `ring`, and that this is now being maintained by the community as `ring-compat`. Her community graciously tried another go at submitting a pull request to get this patch mainlined, but it seems to not have made much progress on being accepted.

At this point, the following options remained:

Use WolfSSL with FFI bindings, through the wolfssl-sys crate.
Write our own crappy pure-Rust TLS implementation
Patch over all the `ring` FFI code with pure Rust versions

WolfSSL is appealing as it is a well-supported TLS implementation precisely targeted toward light-weight clients that fit our CPU profile: I was confident it could meet our space and performance metrics if we could only figure out how to integrate the package. Unfortunately, it is both license and language incompatible with Xous, which would require turning it into a stand-alone binary for integration. This also reduced efficiency of the code, because we would have to wrap every SSL operation into an inter-process call, as the WolfSSL code would be sandboxed into its own virtual memory space. Furthermore, it introduces a C compiler into our SBOM, something we had endeavoured to avoid from the very beginning.

Writing our own crappy TLS implementation is just a patently bad idea for so many reasons, but, when doing a clean-sheet architecture like ours, all options have to stay on the table.

This left us with one clear path: trying to patch over the `ring` FFI code with pure Rust versions.

The first waypoint on this journey was to figure out how `ring-compat` managed to get RISC-V support into `ring`. It turns out their trick only works for `ring` version 0.17.0 – which is an unreleased, as-of-yet still in development version.

Unfortunately, `rustls` depends on `ring` version 0.16.20; `ring` version 0.16.20 uses C code derived from BoringSSL that seems to be hand-coded, but carefully reviewed. So, even if we could get `ring-compat` to work for our platform, it still would not work with `rustls`, because 0.17.0 != 0.16.20.

Foiled!

…or are we?

I took a closer look at the major differences between `ring` 0.17.0 and 0.16.20. There were enough API-level differences that I would have to fork `rustls` to use `ring` 0.17.0.

However, if I pushed one layer deeper, within `ring` itself, one of the biggest changes is that ring’s “fipsmodule” code changes from the original, hand-coded version, to a machine-generated version that is derived from ciphers from the fiat-crypto project (NB: “Fiat Crypto” has nothing to do with cryptocurrency, and they’ve been at it for about as long as Bitcoin has been in existence. As they say, “crypto means cryptography”: fiat cryptography utilizes formal methods to create cryptographic ciphers that are guaranteed to be correct. While provably correct ciphers are incredibly important and have a huge positive benefit, they don’t have a “get rich quick” story attached to them and thus they have been on the losing end of the publicity-namespace battle for the words “fiat” and “crypto”). Because their code is machine-generated from formal proofs, they can more easily support a wide variety of back-ends; in particular, in 0.17.0, there was a vanilla C version of the code made available for every architecture, which was key to enabling targets such as WASM and RISC-V.

This was great news for me. I proceeded to isolate the fipsmodule changes and layer them into a 0.16.20 base (with Ellen’s patch applied); this was straightforward in part because cryptography APIs have very little reason to change (and in fact, changing them can have disastrous unintended consequences).

Now, I had a `rustls` API-compatible version of `ring` that also uses machine-generated, formally verified pure C code (that is: no more bespoke assembly targets!) with a number of pathways to achieve a pure Rust translation.

Perhaps the most “correct” method would have been to learn the entire Fiat Crypto framework and generate Rust back-ends from scratch, but that does not address the thin layer of remnant C code in `ring` still required to glue everything together.

Instead, Xobs suggested that we use `c2rust` to translate the existing C code into Rust. I was initially skeptical: transpilation is a very tricky proposition; but Xobs whipped together a framework in an afternoon that could at least drive the scripts and get us to a structure that we could rapidly iterate around. The transpiled code generated literally thousands of warnings, but because we’re transpiling machine-generated code, the warning mechanisms were very predictable and easy to patch using various regex substitutions.

Over the next couple of days, I kept plucking away at the warnings emitted by `rustc`, writing fix-up patches that could be automatically applied to the generated Rust code through a Python script, until I had a transpilation script that could take the original C code and spit out warning-free Rust code that integrates seamlessly into `ring`. The trickiest part of the whole process was convincing `c2rust`, which was running on a 64-bit x86 host, to generate 32-bit code; initially all our TLS tests were failing because the bignum arithmetic assumed a 64-bit target. But once I figured out that the `-m32` flag was needed in the C options, everything basically just worked! (hurray for `rustc`’s incredibly meticulous compiler warnings!)

The upshot is now we have a fork of `ring` in `ring-xous` that is both API-compatible with the current `rustls` version, and uses pure Rust, so we can compile TLS for Xous without need of gcc, clang, nasm, or perl.

But Is it Constant Time?

One note of caution is that the cryptographic primitives used in TLS are riddled with tricky timing side channels that can lead to the disclosure of private keys and session keys. The good news is that a manual inspection of the transpiled code reveals that most of the constant-time tricks made it through the transpilation process cleanly, assuming that I interpreted the barrier instruction correctly as the Rust `compiler_fence` primitive. Just to be sure, I built a low-overhead, cycle-accurate hardware profiling framework called perfcounter. With about 2 cycles of overhead, I’m able to snapshot a timestamp that can be used to calculate the runtime of any API call.

Inspired by DJB’s Cache-timing attacks on AES paper, I created a graphical representation of the runtimes of both our hardware AES block (which uses a hard-wired S-box for lookups, and is “very” constant-time) and the transpiled `ring` AES code (which uses program code that can leak key-dependent timing information due to variations in execution speed) to convince myself that the constant-time properties made it through the transpilation process.

Each graphic above shows a plot of runtime versus 256 keys (horizontal axis) versus 128 data values (vertical axis) (similar to figure 8.1 in the above-cited paper). In the top row, brightness corresponds to runtime; the bright spots correspond to periodic OS interrupts that hit in the middle of the AES processing routine. These bright spots are not correlated to the AES computation, and would average out over multiple runs. The next lower row is the exact same image, but with a random color palette, so that small differences in runtime are accentuated. Underneath the upper 2×2 grid of images is another 2×2 grid that corresponds to the same parameters, but averaged over 8 runs.

Here we can see that for the AES with hardware S-boxes, there is a tiny bit of texture, which represents a variability of about ±20 CPU cycles out of a typical time of 4168 cycles to encrypt a block; this variability is not strongly correlated with key or data bit patterns. For AES with transpiled ring code, we see a lot more texture, representing about ±500 cycles variability out of a typical time of 12,446 cycles to encrypt a block. It’s not as constant time as the hardware S-boxes, but more importantly the variance also does not seem to be strongly correlated with a particular key or data pattern over multiple runs.

Above is a histogram of the same data sets; on the left are the hardware S-boxes, and the right is the software S-box used in the `ring` transpilation; and across the top are results from a single run, and across the bottom are the average of 8 runs. Here we can see how on a single run, the data tends to bin into a couple of bands, which I interpret as timing differences based upon how “warm” the cache is (in particular, the I-cache). The banding patterns are easily disturbed: they do not replicate well from run-to-run, they tend to “average out” over more runs, and they only manifest when the profiling is very carefully instrumented (for example, introducing some debug counters in the profiling routines disrupts the banding pattern). I interpret this as an indicator that the banding patterns are more an artifact of external influences on the runtime measurement, rather than a pattern exploitable in the AES code itself.

More work is necessary to thoroughly characterize this, but it’s good enough for a first cut; and this points to perhaps optimizing `ring-xous` to use our hardware AES block for both better performance and more robust constant-time properties, should we be sticking with this for the long haul.

Given that Precursor is primarily a client and not a server for TLS, leakage of the session key is probably the biggest concern, so I made checking the AES implementation a priority. However, I also have reason to believe that the ECDSA and RSA implementation’s constant time hardening should have also made it through the transpilation process.

That being said, I’d welcome help from anyone who can recommend a robust and succinct way to test for constant time ECDSA and/or RSA operation. Our processor is fairly slow, so at 100MHz simply generating gobs of random keys and signing them may not give us enough coverage to gain confidence in face of some of the very targeted timing attacks that exist against the algorithm. Another alternative could be to pluck out every routine annotated with “constant time” in the source code and benchmark them; it’s a thing we could do but first I’m still not sure this would encompass everything we should be worried about, and second it would be a lot of effort given the number of routines with this annotation. The ideal situation would be a Wycheproof-style set of test vectors for constant time validation, but unfortunately the Wycheproof docs simply say “TBD” under Timing Attacks for DSA.

Summary

`ring-xous` is a fork of `ring` that is compatible with `rustls` (that is, it uses the 0.16.20 API), and is pure Rust. I am also optimistic that our transpilation technique preserved many of the constant-time properties, so while it may not be the most performant implementation, it should at least be usable; but I would welcome the review and input of someone who knows much more about constant-time code to confirm my hunch.

We’re able to use it as a drop-in replacement for `ring`, giving us TLS on Xous via `rustls` with a simple `Cargo.toml` patch in our workspace:

[patch.crates-io.ring]
git="https://github.com/betrusted-io/ring-xous"
branch="0.16.20-cleanup"

We’ve also confirmed this works with the `tungstenite` websockets framework for Rust, paving the way towards implementing higher-level secure messaging protocols.

This leads to the obvious question of “What now?” — we’ve got this fork of `ring`, will we maintain it? Will we try to get things upstreamed? I think the idea is to maintain a fork for now, and to drop it once something better comes along. At the very least, this particular fork will be deprecated once `ring` reaches full 0.17.0 and `rustls` is updated to use this new version of `ring`. So for now, this is a best-effort port for the time being that is good enough to get us moving again on application development. If you think this fork can also help your project get un-stuck, you may be able to get `ring-xous` to work with your OS/arch with some minor tweaks of the `cfg` directives sprinkled throughout; feel free to submit a PR if you’d like to share your tweaks with others!

Posted in Hacking, open source, precursor | 4 Comments »

The Plausibly Deniable DataBase (PDDB): It’s Real Now!

Thursday, July 28th, 2022

Earlier I described the Plausibly Deniable DataBase (PDDB). It’s a filesystem (like FAT or ext4), combined with plausibly deniable full disk encryption (similar to LUKS or VeraCrypt) in a “batteries included” fashion. Plausible deniability aims to make it difficult to prove “beyond a reasonable doubt” that additional secrets exist on the disk, even in the face of forensic evidence.

Since then, I’ve implemented, deployed, and documented the PDDB. Perhaps of most interest for the general reader is the extensive documentation now available in the Xous Book. Here you can find discussions about the core data structures, key derivations, native & std APIs, testing, backups, and issues affecting security and deniability.

Posted in betrusted, open source, precursor | 2 Comments »

Rust: A Critical Retrospective

Thursday, May 19th, 2022

Since I was unable to travel for a couple of years during the pandemic, I decided to take my new-found time and really lean into Rust. After writing over 100k lines of Rust code, I think I am starting to get a feel for the language and like every cranky engineer I have developed opinions and because this is the Internet I’m going to share them.

The reason I learned Rust was to flesh out parts of the Xous OS written by Xobs. Xous is a microkernel message-passing OS written in pure Rust. Its closest relative is probably QNX. Xous is written for lightweight (IoT/embedded scale) security-first platforms like Precursor that support an MMU for hardware-enforced, page-level memory protection.

In the past year, we’ve managed to add a lot of features to the OS: networking (TCP/UDP/DNS), middleware graphics abstractions for modals and multi-lingual text, storage (in the form of an encrypted, plausibly deniable database called the PDDB), trusted boot, and a key management library with self-provisioning and sealing properties.

One of the reasons why we decided to write our own OS instead of using an existing implementation such as SeL4, Tock, QNX, or Linux, was we wanted to really understand what every line of code was doing in our device. For Linux in particular, its source code base is so huge and so dynamic that even though it is open source, you can’t possibly audit every line in the kernel. Code changes are happening at a pace faster than any individual can audit. Thus, in addition to being home-grown, Xous is also very narrowly scoped to support just our platform, to keep as much unnecessary complexity out of the kernel as possible.

Being narrowly scoped means we could also take full advantage of having our CPU run in an FPGA. Thus, Xous targets an unusual RV32-IMAC configuration: one with an MMU + AES extensions. It’s 2022 after all, and transistors are cheap: why don’t all our microcontrollers feature page-level memory protection like their desktop counterparts? Being an FPGA also means we have the ability to fix API bugs at the hardware level, leaving the kernel more streamlined and simplified. This was especially relevant in working through abstraction-busting processes like suspend and resume from RAM. But that’s all for another post: this one is about Rust itself, and how it served as a systems programming language for Xous.

Rust: What Was Sold To Me

Back when we started Xous, we had a look at a broad number of systems programming languages and Rust stood out. Even though its `no-std` support was then-nascent, it was a strongly-typed, memory-safe language with good tooling and a burgeoning ecosystem. I’m personally a huge fan of strongly typed languages, and memory safety is good not just for systems programming, it enables optimizers to do a better job of generating code, plus it makes concurrency less scary. I actually wished for Precursor to have a CPU that had hardware support for tagged pointers and memory capabilities, similar to what was done on CHERI, but after some discussions with the team doing CHERI it was apparent they were very focused on making C better and didn’t have the bandwidth to support Rust (although that may be changing). In the grand scheme of things, C needed CHERI much more than Rust needed CHERI, so that’s a fair prioritization of resources. However, I’m a fan of belt-and-suspenders for security, so I’m still hopeful that someday hardware-enforced fat pointers will make their way into Rust.

That being said, I wasn’t going to go back to the C camp simply to kick the tires on a hardware retrofit that backfills just one poor aspect of C. The glossy brochure for Rust also advertised its ability to prevent bugs before they happened through its strict “borrow checker”. Furthermore, its release philosophy is supposed to avoid what I call “the problem with Python”: your code stops working if you don’t actively keep up with the latest version of the language. Also unlike Python, Rust is not inherently unhygienic, in that the advertised way to install packages is not also the wrong way to install packages. Contrast to Python, where the official docs on packages lead you to add them to system environment, only to be scolded by Python elders with a “but of course you should be using a venv/virtualenv/conda/pipenv/…, everyone knows that”. My experience with Python would have been so much better if this detail was not relegated to Chapter 12 of 16 in the official tutorial. Rust is also supposed to be better than e.g. Node at avoiding the “oops I deleted the Internet” problem when someone unpublishes a popular package, at least if you use fully specified semantic versions for your packages.

In the long term, the philosophy behind Xous is that eventually it should “get good enough”, at which point we should stop futzing with it. I believe it is the mission of engineers to eventually engineer themselves out of a job: systems should get stable and solid enough that it “just works”, with no caveats. Any additional engineering beyond that point only adds bugs or bloat. Rust’s philosophy of “stable is forever” and promising to never break backward-compatibility is very well-aligned from the point of view of getting Xous so polished that I’m no longer needed as an engineer, thus enabling me to spend more of my time and focus supporting users and their applications.

The Rough Edges of Rust

There’s already a plethora of love letters to Rust on the Internet, so I’m going to start by enumerating some of the shortcomings I’ve encountered.

“Line Noise” Syntax

This is a superficial complaint, but I found Rust syntax to be dense, heavy, and difficult to read, like trying to read the output of a UART with line noise:
Trying::to_read::<&'a heavy>(syntax, |like| { this. can_be( maddening ) }).map(|_| ())?;

In more plain terms, the line above does something like invoke a method called “to_read” on the object (actually `struct`) “Trying” with a type annotation of “&heavy” and a lifetime of ‘a with the parameters of “syntax” and a closure taking a generic argument of “like” calling the can_be() method on another instance of a structure named “this” with the parameter “maddening” with any non-error return values mapped to the Rust unit type “()” and errors unwrapped and kicked back up to the caller’s scope.

Deep breath. Surely, I got some of this wrong, but you get the idea of how dense this syntax can be.

And then on top of that you can layer macros and directives which don’t have to follow other Rust syntax rules. For example, if you want to have conditionally compiled code, you use a directive like
#[cfg(all(not(baremetal), any(feature = “hazmat”, feature = “debug_print”)))]
Which says if either the feature “hazmat” or “debug_print” is enabled and you’re not running on bare metal, use the block of code below (and I surely got this wrong too). The most confusing part of about this syntax to me is the use of a single “=” to denote equivalence and not assignment, because, stuff in config directives aren’t Rust code. It’s like a whole separate meta-language with a dictionary of key/value pairs that you query.

I’m not even going to get into the unreadability of Rust macros – even after having written a few Rust macros myself, I have to admit that I feel like they “just barely work” and probably thar be dragons somewhere in them. This isn’t how you’re supposed to feel in a language that bills itself to be reliable. Yes, it is my fault for not being smart enough to parse the language’s syntax, but also, I do have other things to do with my life, like build hardware.

Anyways, this is a superficial complaint. As time passed I eventually got over the learning curve and became more comfortable with it, but it was a hard, steep curve to climb. This is in part because all the Rust documentation is either written in eli5 style (good luck figuring out “feature”s from that example), or you’re greeted with a formal syntax definition (technically, everything you need to know to define a “feature” is in there, but nowhere is it summarized in plain English), and nothing in between.

To be clear, I have a lot of sympathy for how hard it is to write good documentation, so this is not a dig at the people who worked so hard to write so much excellent documentation on the language. I genuinely appreciate the general quality and fecundity of the documentation ecosystem.

Rust just has a steep learning curve in terms of syntax (at least for me).

Rust Is Powerful, but It Is Not Simple

Rust is powerful. I appreciate that it has a standard library which features HashMaps, Vecs, and Threads. These data structures are delicious and addictive. Once we got `std` support in Xous, there was no going back. Coming from a background of C and assembly, Rust’s standard library feels rich and usable — I have read some criticisms that it lacks features, but for my purposes it really hits a sweet spot.

That being said, my addiction to the Rust `std` library has not done any favors in terms of building an auditable code base. One of the criticisms I used to leverage at Linux is like “holy cow, the kernel source includes things like an implementation for red black trees, how is anyone going to audit that”.

Now, having written an OS, I have a deep appreciation for how essential these rich, dynamic data structures are. However, the fact that Xous doesn’t include an implementation of HashMap within its repository doesn’t mean that we are any simpler than Linux: indeed, we have just swept a huge pile of code under the rug; just the `collection`s portion of the standard library represents about 10k+ SLOC at a very high complexity.

So, while Rust’s `std` library allows the Xous code base to focus on being a kernel and not also be its own standard library, from the standpoint of building a minimum attack-surface, “fully-auditable by one human” codebase, I think our reliance on Rust’s `std` library means we fail on that objective, especially so long as we continue to track the latest release of Rust (and I’ll get into why we have to in the next section).

Ideally, at some point, things “settle down” enough that we can stick a fork in it and call it done by well, forking the Rust repo, and saying “this is our attack surface, and we’re not going to change it”. Even then, the Rust `std` repo dwarfs the Xous repo by several multiples in size, and that’s not counting the complexity of the compiler itself.

Rust Isn’t Finished

This next point dovetails into why Rust is not yet suitable for a fully auditable kernel: the language isn’t finished. For example, while we were coding Xous, a thing called `const generic` was introduced. Before this, Rust had no native ability to deal with arrays bigger than 32 elements! This limitation is a bit maddening, and even today there are shortcomings such as the `Default` trait being unable to initialize arrays larger than 32 elements. This friction led us to put limits on many things at 32 elements: for example, when we pass the results of an SSID scan between processes, the structure only reserves space for up to 32 results, because the friction of going to a larger, more generic structure just isn’t worth it. That’s a language-level limitation directly driving a user-facing feature.

Also over the course of writing Xous, things like in-line assembly and workspaces finally reached maturity, which means we need to go back a revisit some unholy things we did to make those critical few lines of initial boot code, written in assembly, integrated into our build system.

I often ask myself “when is the point we’ll get off the Rust release train”, and the answer I think is when they finally make “alloc” no longer a nightly API. At the moment, `no-std` targets have no access to the heap, unless they hop on the “nightly” train, in which case you’re back into the Python-esque nightmare of your code routinely breaking with language releases.

We definitely gave writing an OS in `no-std` + stable a fair shake. The first year of Xous development was all done using `no-std`, at a cost in memory space and complexity. It’s possible to write an OS with nothing but pre-allocated, statically sized data structures, but we had to accommodate the worst-case number of elements in all situations, leading to bloat. Plus, we had to roll a lot of our own core data structures.

About a year ago, that all changed when Xobs ported Rust’s `std` library to Xous. This means we are able to access the heap in stable Rust, but it comes at a price: now Xous is tied to a particular version of Rust, because each version of Rust has its own unique version of `std` packaged with it. This version tie is for a good reason: `std` is where the sausage gets made of turning fundamentally `unsafe` hardware constructions such as memory allocation and thread creation into “safe” Rust structures. (Also fun fact I recently learned: Rust doesn’t have a native allocater for most targets – it simply punts to the native libc `malloc()` and `free()` functions!) In other words, Rust is able to make a strong guarantee about the stable release train not breaking old features in part because of all the loose ends swept into `std`.

I have to keep reminding myself that having `std` doesn’t eliminate the risk of severe security bugs in critical code – it merely shuffles a lot of critical code out of sight, into a standard library. Yes, it is maintained by a talented group of dedicated programmers who are smarter than me, but in the end, we are all only human, and we are all fair targets for software supply chain exploits.

Rust has a clockwork release schedule – every six weeks, it pushes a new version. And because our fork of `std` is tied to a particular version of Rust, it means every six weeks, Xobs has the thankless task of updating our fork and building a new `std` release for it (we’re not a first-class platform in Rust, which means we have to maintain our own `std` library). This means we likewise force all Xous developers to run `rustup update` on their toolchains so we can retain compatibility with the language.

This probably isn’t sustainable. Eventually, we need to lock down the code base, but I don’t have a clear exit strategy for this. Maybe the next point at which we can consider going back to `nostd` is when we can get the stable `alloc` feature, which allows us to have access to the heap again. We could then decouple Xous from the Rust release train, but we’d still need to backfill features such as Vec, HashMap, Thread, and Arc/Mutex/Rc/RefCell/Box constructs that enable Xous to be efficiently coded.

Unfortunately, the `alloc` crate is very hard, and has been in development for many years now. That being said, I really appreciate the transparency of Rust behind the development of this feature, and the hard work and thoughtfulness that is being put into stabilizing this feature.

Rust Has A Limited View of Supply Chain Security

I think this position is summarized well by the installation method recommended by the rustup.rs installation page:
`curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh`
“Hi, run this shell script from a random server on your machine.”

To be fair, you can download the script and inspect it before you run it, which is much better than e.g. the Windows .MSI installers for vscode. However, this practice pervades the entire build ecosystem: a stub of code called `build.rs` is potentially compiled and executed whenever you pull in a new crate from crates.io. This, along with “loose” version pinning (you can specify a version to be, for example, simply “2” which means you’ll grab whatever the latest version published is with a major rev of 2), makes me uneasy about the possibility of software supply chain attacks launched through the crates.io ecosystem.

Crates.io is also subject to a kind of typo-squatting, where it’s hard to determine which crates are “good” or “bad”; some crates that are named exactly what you want turn out to just be old or abandoned early attempts at giving you the functionality you wanted, and the more popular, actively-maintained crates have to take on less intuitive names, sometimes differing by just a character or two from others (to be fair, this is not a problem unique to Rust’s package management system).

There’s also the fact that dependencies are chained – when you pull in one thing from crates.io, you also pull in all of that crate’s subordinate dependencies, along with all their build.rs scripts that will eventually get run on your machine. Thus, it is not sufficient to simply audit the crates explicitly specified within your Cargo.toml file — you must also audit all of the dependent crates for potential supply chain attacks as well.

Fortunately, Rust does allow you to pin a crate at a particular version using the `Cargo.lock` file, and you can fully specify a dependent crate down to the minor revision. We try to mitigate this in Xous by having a policy of publishing our Cargo.lock file and specifying all of our first-order dependent crates to the minor revision. We have also vendored in or forked certain crates that would otherwise grow our dependency tree without much benefit.

That being said, much of our debug and test framework relies on some rather fancy and complicated crates that pull in a huge number of dependencies, and much to my chagrin even when I try to run a build just for our target hardware, the dependent crates for running simulations on the host computer are still pulled in and the build.rs scripts are at least built, if not run.

In response to this, I wrote a small tool called `crate-scraper` which downloads the source package for every source specified in our Cargo.toml file, and stores them locally so we can have a snapshot of the code used to build a Xous release. It also runs a quick “analysis” in that it searches for files called build.rs and collates them into a single file so I can more quickly grep through to look for obvious problems. Of course, manual review isn’t a practical way to detect cleverly disguised malware embedded within the build.rs files, but it at least gives me a sense of the scale of the attack surface we’re dealing with — and it is breathtaking, about 5700 lines of code from various third parties that manipulates files, directories, and environment variables, and runs other programs on my machine every time I do a build.

I’m not sure if there is even a good solution to this problem, but, if you are super-paranoid and your goal is to be able to build trustable firmware, be wary of Rust’s expansive software supply chain attack surface!

You Can’t Reproduce Someone Else’s Rust Build

A final nit I have about Rust is that builds are not reproducible between different computers (they are at least reproducible between builds on the same machine if we disable the embedded timestamp that I put into Xous for $reasons).

I think this is primarily because Rust pulls in the full path to the source code as part of the panic and debug strings that are built into the binary. This has lead to uncomfortable situations where we have had builds that worked on Windows, but failed under Linux, because our path names are very different lengths on the two and it would cause some memory objects to be shifted around in target memory. To be fair, those failures were all due to bugs we had in Xous, which have since been fixed. But, it just doesn’t feel good to know that we’re eventually going to have users who report bugs to us that we can’t reproduce because they have a different path on their build system compared to ours. It’s also a problem for users who want to audit our releases by building their own version and comparing the hashes against ours.

There’s some bugs open with the Rust maintainers to address reproducible builds, but with the number of issues they have to deal with in the language, I am not optimistic that this problem will be resolved anytime soon. Assuming the only driver of the unreproducibility is the inclusion of OS paths in the binary, one fix to this would be to re-configure our build system to run in some sort of a chroot environment or a virtual machine that fixes the paths in a way that almost anyone else could reproduce. I say “almost anyone else” because this fix would be OS-dependent, so we’d be able to get reproducible builds under, for example, Linux, but it would not help Windows users where chroot environments are not a thing.

Where Rust Exceeded Expectations

Despite all the gripes laid out here, I think if I had to do it all over again, Rust would still be a very strong contender for the language I’d use for Xous. I’ve done major projects in C, Python, and Java, and all of them eventually suffer from “creeping technical debt” (there’s probably a software engineer term for this, I just don’t know it). The problem often starts with some data structure that I couldn’t quite get right on the first pass, because I didn’t yet know how the system would come together; so in order to figure out how the system comes together, I’d cobble together some code using a half-baked data structure.

Thus begins the descent into chaos: once I get an idea of how things work, I go back and revise the data structure, but now something breaks elsewhere that was unsuspected and subtle. Maybe it’s an off-by-one problem, or the polarity of a sign seems reversed. Maybe it’s a slight race condition that’s hard to tease out. Nevermind, I can patch over this by changing a <= to a <, or fixing the sign, or adding a lock: I’m still fleshing out the system and getting an idea of the entire structure. Eventually, these little hacks tend to metastasize into a cancer that reaches into every dependent module because the whole reason things even worked was because of the “cheat”; when I go back to excise the hack, I eventually conclude it’s not worth the effort and so the next best option is to burn the whole thing down and rewrite it…but unfortunately, we’re already behind schedule and over budget so the re-write never happens, and the hack lives on.

Rust is a difficult language for authoring code because it makes these “cheats” hard – as long as you have the discipline of not using “unsafe” constructions to make cheats easy. However, really hard does not mean impossible – there were definitely some cheats that got swept under the rug during the construction of Xous.

This is where Rust really exceeded expectations for me. The language’s structure and tooling was very good at hunting down these cheats and refactoring the code base, thus curing the cancer without killing the patient, so to speak. This is the point at which Rust’s very strict typing and borrow checker converts from a productivity liability into a productivity asset.

I liken it to replacing a cable in a complicated bundle of cables that runs across a building. In Rust, it’s guaranteed that every strand of wire in a cable chase, no matter how complicated and awful the bundle becomes, is separable and clearly labeled on both ends. Thus, you can always “pull on one end” and see where the other ends are by changing the type of an element in a structure, or the return type of a method. In less strictly typed languages, you don’t get this property; the cables are allowed to merge and affect each other somewhere inside the cable chase, so you’re left “buzzing out” each cable with manual tests after making a change. Even then, you’re never quite sure if the thing you replaced is going to lead to the coffee maker switching off when someone turns on the bathroom lights.

Here’s a direct example of Rust’s refactoring abilities in action in the context of Xous. I had a problem in the way trust levels are handled inside our graphics subsystem, which I call the GAM (Graphical Abstraction Manager). Each Canvas in the system gets a `u8` assigned to it that is a trust level. When I started writing the GAM, I just knew that I wanted some notion of trustability of a Canvas, so I added the variable, but wasn’t quite sure exactly how it would be used. Months later, the system grew the notion of Contexts with Layouts, which are multi-Canvas constructions that define a particular type of interaction. Now, you can have multiple trust levels associated with a single Context, but I had forgotten about the trust variable I had previously put in the Canvas structure – and added another trust level number to the Context structure as well. You can see where this is going: everything kind of worked as long as I had simple test cases, but as we started to get modals popping up over applications and then menus on top of modals and so forth, crazy behavior started manifesting, because I had confused myself over where the trust values were being stored. Sometimes I was updating the value in the Context, sometimes I was updating the one in the Canvas. It would manifest itself sometimes as an off-by-one bug, other times as a concurrency error.

This was always a skeleton in the closet that bothered me while the GAM grew into a 5k-line monstrosity of code with many moving parts. Finally, I decided something had to be done about it, and I was really not looking forward to it. I was assuming that I messed up something terribly, and this investigation was going to conclude with a rewrite of the whole module.

Fortunately, Rust left me a tiny string to pull on. Clippy, the cheerfully named “linter” built into Rust, was throwing a warning that the trust level variable was not being used at a point where I thought it should be – I was storing it in the Context after it was created, but nobody every referred to it after then. That’s strange – it should be necessary for every redraw of the Context! So, I started by removing the variable, and seeing what broke. This rapidly led me to recall that I was also storing the trust level inside the Canvases within the Context when they were being created, which is why I had this dangling reference. Once I had that clue, I was able to refactor the trust computations to refer only to that one source of ground truth. This also led me to discover other bugs that had been lurking because in fact I was never exercising some code paths that I thought I was using on a routine basis. After just a couple hours of poking around, I had a clear-headed view of how this was all working, and I had refactored the trust computation system with tidy APIs that were simple and easier to understand, without having to toss the entire code base.

This is just one of many positive experiences I’ve had with Rust in maintaining the Xous code base. It’s one of the first times I’ve walked into a big release with my head up and a positive attitude, because for the first time ever, I feel like maybe I have a chance of being able deal with hard bugs in an honest fashion. I’m spending less time making excuses in my head to justify why things were done this way and why we can’t take that pull request, and more time thinking about all the ways things can get better, because I know Clippy has my back.

Caveat Coder

Anyways, that’s a lot of ranting about software for a hardware guy. Software people are quick to remind me that first and foremost, I make circuits and aluminum cases, not code, therefore I have no place ranting about software. They’re right – I actually have no “formal” training to write code “the right way”. When I was in college, I learned Maxwell’s equations, not algorithms. I could never be a professional programmer, because I couldn’t pass even the simplest coding interview. Don’t ask me to write a linked list: I already know that I don’t know how to do it correctly; you don’t need to prove that to me. This is because whenever I find myself writing a linked list (or any other foundational data structure for that matter), I immediately stop myself and question all the life choices that brought me to that point: isn’t this what libraries are for? Do I really need to be re-inventing the wheel? If there is any correlation between doing well in a coding interview and actual coding ability, then you should definitely take my opinions with the grain of salt.

Still, after spending a couple years in the foxhole with Rust and reading countless glowing articles about the language, I felt like maybe a post that shared some critical perspectives about the language would be a refreshing change of pace.

Posted in Hacking, open source, Ponderings, precursor | 38 Comments »

Precursor: From Boot to Root

Wednesday, February 16th, 2022

I have always wanted a computer that was open enough that it can be inspected for security, and also simple enough that I could analyze it in practice. Precursor is a step towards that goal.

As a test, I made a one hour video that walks through the Precursor tech stack, from hardware to root keys. I feel it’s a nice demo of what evidence-based trust should look like:

The video is a bit of a firehose, so please refer to our wiki for more info, or open an issue to further the discussion.

Erratum #1: I had mistakenly attributed SpinalHDL as a subset of Chisel. SpinalHDL is actually a separately developed HDL by Charles Papon. It was developed contemporaneously with Chisel and inspired by many concepts in it, such as using Scala as the underlying language; but it is not affiliated with Chisel.

Posted in Hacking, precursor | 6 Comments »