Archive for the ‘Ponderings’ Category

Non-Destructive Silicon Imaging (and Winner of Name that Ware December 2022)

Wednesday, January 4th, 2023

The ware for December 2022 is an AMD Radeon RX540 chip, part number 216-0905018. Congrats to SAM for guessing the ware; email me for your prize. The image is from Fritzchen Fritz’s Flickr feed; I recommend checking out his photos (or you can follow him on twitter). Even if you aren’t into photos of chips, he elevates it to an art. Even more amazingly, all of his work is public domain; hats off to him for contributing these photos to the commons with such a generous license, because it is not easy to prepare the material and take images of this quality. If any of my readers happens to know him and are willing to make an introduction, I’d appreciate that. I only discovered his work by chance while doing some background research.

First, here is the entire photo from which the ware was cropped:


Credit: Fritzchen Fritz

Interestingly, you can see the design of the chip in this photograph. This is not photoshop; based on the notes accompanying the photo, this was taken in “NIR”, or near-infrared, using a Sony NEX-5T.

Silicon is transparent to IR, and so, photographs taken in infra-red can be used to verify, at a coarse level, the construction of a chip!

I was pretty excited to see photos like this posted on the Internet, at full-resolution, because I have only read about this technique in journal articles. Silicon becomes very transparent in infrared:


Silicon’s absorption of light in the near infrared range. A lower value is more transparent. Generated using PV lighthouse.

This principle forms the foundation of my efforts to verify the construction of silicon in a non-destructive fashion.

The line between NIR/SWIR (near/shortwave infrared) depends on who you ask, but according to Edmud Optics, it places the line at 1000nm. By this definition, I’m inferring that the above photograph was probably taken using a powerful 900nm illuminator positioned to the left of the chip near the horizon. A bright light at that wavelength would have sufficient power to penetrate the ~1mm thickness of silicon to image the circuits on the other side, and placing it near the horizon prevents swamping the sensor with reflected light except for the bits of metal that happen to catch the light and reflect it upwards.

It’s also possible to do this with a SWIR sensor, using a wavelength closer to 1300nm (where silicon is as transparent as glass is to visible light), but the resolution of the photographs are much higher than the best SWIR sensor that I’m aware of. Unfortunately, it seems all interesting technologies are regulated by the US government’s ITAR, and SWIR area-scan sensors are no exception. I’m guessing they are also a critical component of night vision gear, and thus it is hard to obtain such sensors without a license. Regardless, even the photos taken at 900nm are a powerful demonstration of the utility of IR for inspecting the construction of silicon.

Here’s another image taken using what looks like the same technique:


Credit: Fritzchen Fritz

This is of the Via Centaur CHA, which has an excellently detailed Wikichip page complete with floorplans, such as the one shown below.


Credit: Wikichip

Remember, the IR image is from the back side of the die, so you have to mirror-image (and rotate) the front-side floorplan in your head to line it up with orientation of the photograph.

According to Wikichip, this is a TSMC 16FFC (16nm) process, with a 194mm^2 die area. This means the die above is about 13.9 mm on a side. The image as-is (which is 90% package and 10% die) resolves at about 18um/pixel, so perhaps if it was a die-only shot we could resolve at something close to 5um/pixel in a single image.

With image stitching, the resolution can be even higher:


Credit: Fritzchen Fritz


Credit: Fritzchen Fritz

In these two photos, it seems the light source was rotated 90 degrees with respect to the chip, so that different sets of components are highlighted, depending on the bias of the metal routes for that component. Note that I’m inferring this image is taken through the back side because of the presence of scratches that would be from the exposed surface of the silicon, and the orientation of the imaged die is consistent with a back-side shot.

The resolution of the above images boils down to about 3um/pixel — getting fairly close to the limit of what you can do with NIR light. To put this in perspective, TSMC 16FFC has minimum metal pitch of 64nm, so a 9-track standard cell would be 0.576um tall, and an SRAM bitcell has a size of 0.074um^2, so one pixel encompasses roughly 25 logic gates or 120 bits of SRAM. In these images, you can clearly make out variations in the density of standard cell logic, as well as the size and location of individual memory macros; the internal structure of the PCI-express drivers is also readily apparent.

I’ve been contemplating silicon supply chain attacks quite a bit, and I think that at this resolution, one can rule out the following forms of silicon supply chain attacks:

  • Replacement of the chip with an entirely different design that emulates the original
  • Insertion of a ROM larger than a few hundred bits containing alternate microcode or instruction codings
  • Insertion of a RAM macro for recording data — probably of any practical size for a RAM macro, due to the presence of line drivers/amplifiers creating a high-signal reflection
  • Insertion of extra I/O drivers
  • Potential detection of extra eFuse elements
  • Likely able to detect recompilation/resynthesis of standard cell blobs

This significantly constrains the types of attacks one has to worry about. Without backside imaging and just looking at the exterior package, it’s difficult to even know if a chip has been wholesale replaced for an inferior clone or an emulated version. The inability to add significant amounts of microcode ROM or RAM constrains the types of modifications one could make to a CPU and “get away with it”; with some additional design-level guard rails and open source RTL I suspect one could virtually eliminate effective CPU instruction-level modifications that doesn’t also introduce ISA-level flaws in every mode of operation that could be easily detected with a software-only test.

I have reasons to suspect that modifications to an eFuse box would be detectable, but because eFuses are carefully guarded black boxes such that even chip designers are not allowed to see their insides, it’s possible that a foundry could just build a back door into every eFuse box and we wouldn’t be able to tell the difference because it would be “normal”.

Finally, depending on the repeatability of the place/route tool, a modification to the RTL that triggers a re-synthesis and place/route could change the gross morphology of the standard cell blob. However, I’m not familiar enough with the latest industry-standard tools to know how big a difference that would create. I imagine there are ways to control the place and route seed so that results look very similar if only small changes are made to the RTL, such as inserting a patch wire on a single bit in a non-congested region of a design. However, a larger change, such as the insertion of a 64-bit sampling register in the datapath somewhere, would likely be detectable with this level of imaging.

There’s still a class of exploits that could be undetected with this level of imaging. This would include:

  • Small changes to test access paths; for example, patching existing ATPG scan chain logic to an existing but unused point on an I/O mux hard macro. This could facilitate unrestricted access to internal state with some additional off-chip circuitry.
  • Spare cell-only modifications that are manually patched using higher metal levels. These patches would be obscured from the back side due to masking by lower metal layers, and by definition no additional transistors are involved.
  • Dopant-level attacks, where transistor flavor or threshold voltages are modified, perhaps to bias a random number generator or to modify the function of a single gate.
  • Other careful modifications that disturb fewer than ~100 logic gates or ~100 bits of SRAM.

However, the attack surface of concern is by far smaller with this level of imaging than the current state-of-practice, which consists of squinting at the top markings on a chip package.

My hope for supply chain verification is that end users can establish a practical amount of trust in silicon chips through a combination of imaging and design analysis, without requiring a fully-open PDK (although it certainly is easier and better if the PDK is open). The missing link is an automated imager that can produce results similar to the ones demonstrated by Fritzchens Fritz. These images can then be compared against die shots released by the designer. These die shots would be low enough resolution to not violate foundry NDA, but still have enough detail to constrain the intended positions of blocks. The remaining verification gap (on the order of hundreds of gates / hundreds of bits of SRAM) could be filled in with design techniques that harden against small exploit vectors, perhaps by the use of redundant/byzantine fault tolerant logic, or by some combination of inducing faults and scan chain analysis to confirm correct construction at the gate level. And finally, open source RTL is required to help establish a link between what is visible, and what was intended by the designer (and of course also to help discover any bugs/backdoors introduced by the designer).

And now back to the Name that Ware competition. Confusingly, one of the first answers in the comments points to a tweet that also claims to have taken the photo. I did a bit of poking about and the image appears to be identical to the one on Fritzchen Fritz’s feed, down to the position of solder particulates and lint. There’s a number of possible explanations for this; I won’t speculate as to what is going on, but I will comment that the chip is not typically referred to as an “AMD M74AP” — M74AP is the lot code, so I couldn’t declare Taylan the winner, unfortunately (so close, though!). 216-0905018 is the canonical part number; if you search around for the part number, you will see several examples of chips that have the same part number, but a different lot code. This one, for example, has a lot code of M62K8.00.

Postscript

When going through Fitzchen Fritz’s photos, I was also considering using this image as the Name that Ware:


Credit: Fritzchen Fritz

It’s a tiny portion (1/400th the area) of an Intel i3-8121U (187MiB full-res mirror link), fabbed in a 10nm process. The region is cropped from a section centered in the top right quadrant of the image.

In terms of actual dimensions, the region is about 485um x 375um if I’ve done my math right – about the area covered by a medium sand particle. According to Wikichip, a 9-track standard cell would be 0.324um high, so if the area were covered with nothing but square 9-track standard cells, it would hold 1500 x 1150 cells (1.7M cells, or about one gate per pixel in the photo), or 700kiB of the densest SRAM cells (without sense amps etc.)

However, the area is not homogeneously covered with one or the other, and in fact has lots of unused silicon. The darker purplish regions are unused silicon — for one reason or the other (often times routing/floorplanning constraints, and sometimes schedule constraints), there are no logic transistors there. I think only the solid tan regions in the lower left hand corners contain high density SRAM cells; the smaller rectangles above them could contain SRAM, but could also be some other type of memory more optimized for performance or port count.

Each SRAM region is divided by sense amps and other driver logic. One solid, SRAM-cell-only region is about 48.7×28.7um, which is about 5.4kiB, so the overall region of larger rectangles holds about 22kiB of memory, including an overhead of about 35% for the drivers and amps. Likewise, the cauliflower-like structure in the center is about 750 gates wide by 900 gates high (if the gates were square — which they aren’t, so this is an upper bound), or about 600k gates (again, this image is at a resolution of about 1 pixel/gate). That would fit about a dozen VexRiscv cores, or a few 80486’s, so it’s not a small chunk of logic.

Finally, I think (but am not sure) that the rectangular cut-out regions within the cauliflower-region are clock drivers or repeaters. No transistors are placed in the trench around them probably to meet thermal flux constraints, and I also wouldn’t be surprised if they packed some local decoupling capacitors around the drivers using dummy transistors and/or MIM capacitors to reduce power droop and induced jitter in that region.

What I love about this image is how clouds of standard cells take on organic shapes when viewed at this resolution. To me it looks more like mold or bacteria growing in a petri dish than the pinnacle of precision manufactured goods. But perhaps this is just convergent evolution in action, driven by the laws of physics: signals diffuse through on-chip wires, much like nutrients in a media.

Towards a More Open Secure Element Chip

Tuesday, December 20th, 2022

Secure Element” (SE) chips have traditionally taken a very closed-source, NDA-heavy approach. Thus, it piqued my interest when an early-stage SE chip startup, Cramium (still in stealth mode), approached me to advise on open source strategy. This blog post explains my reasoning for agreeing to advise Cramium, and what I hope to accomplish in the future.

As an open source hardware activist, I have been very pleased at the progress made by the eFabless/Google partnership at creating an open-to-the-transistors physical design kit (PDK) for chips. This would be about as open as you can get from the design standpoint. However, the partnership currently supports only lower-complexity designs in the 90nm to 180nm technology nodes. Meanwhile, Cramium is planning to tape out their security chip in the 22nm node. A 22nm chip would be much more capable and cost-effective than one fabricated in 90nm (for reference, the RP2040 is fabricated in 40nm, while the Raspberry Pi 4’s CPU is fabricated in 28nm), but it would not be open-to-the-transistors.

Cramium indicated that they want to push the boundaries on what one can do with open source, within the four corners of the foundry NDAs. Ideally, a security chip would be fabricated in an open-PDK process, but I still feel it’s important to engage and help nudge them in the right direction because there is a genuine possibility that an open SDK (but still closed PDK) SE in a 22nm process could gain a lot of traction. If it’s not done right, it could establish poor de-facto standards, with lasting impacts on the open source ecosystem.

For example, when Cramium approached me, their original thought was to ship the chip with an ARM Cortex M7 CPU. Their reasoning is that developers prize a high-performance CPU, and the M7 is one of the best offerings in its class from that perspective. Who doesn’t love a processor with lots of MHz and a high IPC?

However, if Cramium’s chip were to gain traction and ship to millions of customers, it could effectively entrench the ARM instruction set — and more importantly — quirks such as the Memory Protection Unit (MPU) as the standard for open source SEs. We’ve seen the power of architectural lock-in as the x86 serially shredded the Alpha, Sparc, Itanium and MIPS architectures; so, I worry that every new market embracing ARM as a de-facto standard is also ground lost to fully open architectures such as RISC-V.

So, after some conversations, I accepted an advisory position at Cramium as the Ecosystem Engineer under the condition that they also include a RISC-V core on the chip. This is in addition to the Cortex M7. The good news is that a RISC-V core is royalty-free, and the silicon area necessary to add it at 22nm is basically a rounding error in cost, so it was a relatively easy sell. If I’m successful at integrating the RISC-V core, it will give software developers a choice between ARM and RISC-V.

So why is Cramium leaving the M7 core in? Quite frankly, it’s for risk mitigation. The project will cost upwards of $20 million to tape out. The ARM M7 core has been taped out and shipped in millions of products, and is supported by a billion-dollar company with deep silicon experience. The VexRiscv core that we’re planning to integrate, on the other hand, comes with no warranty of fitness, and it is not as performant as the Cortex M7. It’s just my word and sweat of brow that will ensure it hopefully works well enough to be usable. Thus, I find it understandable that the people writing the checks want a “plan B” that involves a battle-tested core, even if proprietary.

This will understandably ruffle the feathers of the open source purists who will only certify hardware as “Free” if and only if it contains solely libre components. I also sympathize with their position; however, our choices are either the open source community somehow provides a CPU core with a warranty of fitness, effectively underwriting a $20 million bill if there is a fatal bug in the core, or I walk away from the project for “not being libre enough”, and allow ARM to take the possibly soon-to-be-huge open source SE market without challenge.

In my view it’s better to compromise and have a seat at the table now, than to walk away from negotiations and simply cede green fields to proprietary technologies, hoping to retake lost ground only after the community has achieved consensus around a robust full-stack open source SE solution. So, instead of investing time arguing over politics before any work is done, I’m choosing to invest time building validation test suites. Once I have a solid suite of tests in hand, I’ll have a much stronger position to argue for the removal of any proprietary CPU cores.

On the Limit of Openness in a Proprietary Ecosystem

Advising on the CPU core is just one of many tasks ahead of me as their open source Ecosystem Engineer. Cramium’s background comes from the traditional chip world, where NDAs are the norm and open source is an exotic and potentially fatal novelty. Fatal, because most startups in this space exit through acquisition, and it’s much harder to negotiate a high acquisition price if prized IP is already available free-of-charge. Thus my goal is to not alienate their team with contumelious condescension about the obviousness and goodness of open source that is regrettably the cultural norm of our community. Instead, I am building bridges and reaching across the aisle, trying to understand their concerns, and explaining to them how and why open source can practically benefit a security chip.

To that end, trying to figure out where to draw the line for openness is a challenge. The crux of the situation is that the perceived fear/uncertainty/doubt (FUD) around a particular attack surface tends to have an inverse relation to the actual size of the attack surface. This illustrates the perceived FUD around a given layer of the security hierarchy:

Generally, the amount of FUD around an attack surface grows with how poorly understood the attack surface is: naturally we fear things we don’t understand well; likewise we have less fear of the familiar. Thus, “user error” doesn’t sound particularly scary, but “direct readout” with a focused ion beam of hardware security keys sounds downright leet and scary, the stuff of state actors and APTs, and also of factoids spouted over beers with peers to sound smart.

However, the actual size of the attack surface is quite the opposite:

In practice, “user error” – weak passwords, spearphishing, typosquatting, or straight-up fat fingering a poorly designed UX – is common and often remotely exploitable. Protocol errors – downgrade attacks, failures to check signatures, TOCTOUs – are likewise fairly common and remotely exploitable. Next in the order are just straight-up software bugs – buffer overruns, use after frees, and other logic bugs. Due to the sheer volume of code (and more significantly the rate of code turnover) involved in most security protocols, there are a lot of bugs, and a constant stream of newly minted bugs with each update.

Beneath this are the hardware bugs. These are logical errors in the implementation of a function of a piece of hardware, such as memory aliasing, open test access ports, and oversights such as partially mutable cryptographic material (such as an AES key that can’t be read out, but can be updated one byte at a time). Underneath logical hardware bugs are sidechannels – leakage of secret information through timing, power, and electromagnetic emissions that can occur even if the hardware is logically perfect. And finally, at the bottom layer is direct readout – someone with physical access to a chip directly inspecting its arrangement of atoms to read out secrets. While there is ultimately no defense against the direct readout of nonvolatile secrets short of zeroizing them on tamper detection, it’s an attack surface that is literally measured in microns and it requires unmitigated physical access to hardware – a far cry from the ubiquity of “user error” or even “software bugs”.

The current NDA-heavy status quo for SE chips creates an analytical barrier that prevents everyday users like us from determining how big the actual attack surface is. That analytical barrier actually extends slightly up the stack from hardware, into “software bugs”. This is because without intimate knowledge of how the hardware is supposed to function, there are important classes of software bugs we can’t analyze.

Furthermore, the inability of developers to freely write code and run it directly on SEs forces more functionality up into the protocol layer, creating an even larger attack surface.

My hope is that working with Cramium will improve this situation. In the end, we won’t be able to entirely remove all analytical barriers, but hopefully we arrive at something closer to this:

Due to various NDAs, we won’t be able to release things such as the mask geometries, and there are some blocks less relevant to security such as the ADC and USB PHY that are proprietary. However, the goal is to have the critical sections responsible for the security logic, such as the cryptographic accelerators, the RISC-V CPU core, and other related blocks shared as open source RTL descriptions. This will allow us to have improved, although not perfect, visibility into a significant class of hardware bugs.

The biggest red flag in the overall scenario is that the on-chip interconnect matrix is slated to be a core generated using the ARM NIC-400 IP generator, so this logic will not be available for inspection. The reasoning behind this is, once again, risk mitigation of the tapeout. This is unfortunate, but this also means we just need to be a bit more clever about how we structure the open source blocks so that we have a toolbox to guard against potential misbehavior in the interconnect matrix.

My personal goal is to create a fully OSS-friendly FPGA model of the RISC-V core and their cryptographic accelerators using the LiteX framework, so that researchers and analysts can use this to model the behavior of the SE and create a battery of tests and fuzzers to confirm the correctness of construction of the rest of the chip.

In addition to the work advising Cramium’s engagement with the open source community, I’m also starting to look into non-destructive optical inspection techniques to verify chips in earnest, thanks to a grant I received from NLNet’s NGI0 Entrust fund. More on this later, but it’s my hope that I can find a synergy between the work I’m doing at Cramium and my silicon verification work to help narrow the remaining gaps in the trust model, despite refractory foundry and IP NDAs.

Counterpoint: The Utility of Secrecy in Security

Secrecy has utility in security. After all, every SE vendor runs with this approach, and for example, we trust the security of nuclear stockpiles to hardware that is presumably entirely closed source.

Secrecy makes a lot of sense when:

  • Even a small delay in discovering a secret can be a matter of life or death
  • Distribution and access to hardware is already strictly controlled
  • The secrets would rather be deleted than discovered

Military applications check all these boxes. The additional days, weeks or months delay incurred by an adversary analyzing around some obfuscation can be a critical tactical advantage in a hot war. Furthermore, military hardware has controlled distribution; every mission-critical box can be serialized and tracked. Although systems are designed assuming serial number 1 is delivered to the Kremlin, great efforts are still taken to ensure that is not the case (or that a decoy unit is delivered), since even a small delay or confusion can yield a tactical advantage. And finally, in many cases for military hardware, one would rather have the device self-destruct and wipe all of its secrets, rather than have its secrets extracted. Building in booby traps that wipe secrets can measurably raise the bar for any adversary contemplating a direct-readout attack.

On the other hand, SEs like those found in bank cards and phones are:

  • Widely distributed – often directly and intentionally to potentially adversarial parties
  • Protecting data at rest (value of secret is constant or may even grow with time)
  • Used as a trust root for complicated protocols that typically update over time
  • Protecting secrets where extraction is preferable to self-destruction. The legal system offers remedies for recourse and recovery of stolen assets; whereas self-destruction of the assets offers no recourse

In this case, the role of the anti-tamper countermeasures and side-channel minimization is to raise the investment necessary to recover data from “trivial” to somewhere around “there’s probably an easier and cheaper way to go about this…right?”. After all, for most complicated cryptosystems, the bigger risk is an algorithmic or protocol flaw that can be exploited without any circumvention of hardware countermeasures. If there is a protocol flaw, employing an SE to protect your data is like using a vault, but leaving the keys dangling on a hook next to the vault.

It is useful to contemplate who bears the greatest risk in the traditional SE model, where chips are typically distributed without any way to update their firmware. While an individual user may lose the contents of their bank account, a chip maker may bear a risk of many tens of millions of dollars in losses from recalls, replacement costs and legal damages if a flaw were traced to their design issue. In this game, the player with the most to lose is the chipmaker, not any individual user protected by the chip. Thus, a chipmaker has little incentive to disclose their design’s details.

A key difference between a traditional SE and Cramium’s is that Cramium’s firmware can be updated (assuming an updateable SKU is released; this was a surprisingly controversial suggestion when I brought it up). This is thanks in part to the extensive use of non-volatile ReRAM to store the firmware. This likewise shifts the calculus on what constitutes a recall event. The open source firmware model also means that the code on the device comes, per letter of the license, without warranty; the end customer is ultimately responsible for building, certifying and deploying their own applications. Thus, for a player like Cramium, the potential benefits of openness outweigh those of secrecy and obfuscation embraced by traditional SE vendors.

Summary

My role is to advise Cramium on how to shift the norms around SEs from NDAs to openness. Cramium is not attempting to forge an open-foundry model – they are producing parts using a relatively advanced (compared to your typical stand-alone SE) 22nm process. This process is protected by the highly restrictive foundry NDAs. However, Cramium plans to release much of their design under an open source license, to achieve the following goals:

  • Facilitate white-box inspection of cryptosystems implemented using their primitives
  • Speed up discovery of errors; and perhaps more importantly, improve the rate at which they are patched
  • Reduce the risk of protocol and algorithmic errors, so that hardware countermeasures could be the actual true path of least resistance
  • Build trust
  • Promote wide adoption and accelerate application development

Cramium is neither fully open hardware, nor is it fully closed. My goal is to steer it toward the more open side of the spectrum, but the reality is there are going to be elements that are too difficult to open source in the first generation of the chip.

The Cramium chip complements the eFabless/Google efforts to build open-to-the-transistors chips. Today, one can build chips that are open to the mask level using 90 – 180nm processes. Unfortunately, the level of integration achievable with their current technology isn’t quite sufficient for a single-chip Secure Element. There isn’t enough ROM or RAM available to hold the entire application stack on chip, thus requiring a multi-chip solution and negating the HSM-like benefits of custom silicon. The performance of older processes is also not sufficient for the latest cryptographic systems, such as Post Quantum algorithms or Multiparty Threshold ECDSA with Identifiable Aborts. On the upside, one could understand the design down to the transistor level using this process.

However, it’s important to remember that knowing the mask pattern does not mean you’ve solved the supply chain problem, and can trust the silicon in your hands. There are a lot of steps that silicon goes through to go from foundry to product, and at any of those steps the chip you thought you’re getting could be swapped out with a different one; this is particularly easy given the fact that all of the chips available through eFabless/Google’s process use a standardized package and pinout.

In the context of Cramium, I’m primarily concerned about the correctness of the RTL used to generate the chip, and the software that runs on it. Thus, my focus in guiding Cramium is to open sufficient portions of the design such that anyone can analyze the RTL for errors and weaknesses, and less on mitigating supply-chain level attacks.

That being said, RTL-level transparency can still benefit efforts to close the supply chain gap. A trivial example would be using the RTL to fuzz blocks with garbage in simulation; any differences in measured hardware behavior versus simulated behavior could point to extra or hidden logic pathways added to the design. Extra backdoor circuitry injected into the chip would also add loading to internal nodes, impacting timing closure. Thus, we could also do non-destructive, in-situ experiments such as overclocking functional blocks to the point where they fail; with the help of the RTL we can determine the expected critical path and compare it against the observed failure modes. Strong outliers could indicate tampering with the design. While analysis like this cannot guarantee the absence of foundry-injected backdoors, it constrains the things one could do without being detected. Thus, the availability of design source opens up new avenues for verifying correctness and trustability in a way that would be much more difficult, if not impossible, to do without design source.

Finally, by opening as much of the chip as possible to programmers and developers, I’m hoping that we can get the open source SE chip ecosystem off on the right foot. This way, as more advance nodes shift toward open PDKs, we’ll be ready and waiting to create a full-stack open source solution that adequately addresses all the security needs of our modern technology ecosystem.

Book Review: Open Circuits

Wednesday, September 21st, 2022

There’s a profound beauty in well-crafted electronics.

Somehow, the laws of physics conspired with the evolution of human consciousness such that sound engineering solutions are also aesthetically appealing: from the ideal solder fillet, to the neat geometric arrangements of components on a circuit board, to the billowing clouds of standard cells laid down by the latest IC place-and-route tools, aesthetics both inspire and emerge from the construction of practical, everyday electronics.

Eric Schlaepfer (@TubeTimeUS) and Windell Oskay (co-founder of Evil Mad Scientist)’s latest book, Open Circuits, is a celebration of the electronic aesthetic, by literally opening circuits with mechanical cross-sections, accompanied by pithy explanations and illustrations. Their masterfully executed cross-sectioning process and meticulous photography blur the line between engineering and art, reminding us that any engineering task executed with soul and care results in something that can inspire feelings of awe (“wow!”) and reflection (“huh.”): that is art.

The pages of Open Circuits contain ample inspiration for both novices and grizzled veterans alike. Having been in electronics for four decades, I sometimes worry I’m becoming numb and cynical as I watch the world’s landfills brim with cheap electronics, built without care and purchased (and disposed of) with even less thought. However, as I thumb through the pages of Open Circuits, that excitement, that awe which I felt as a youth when I traced my fingers along the outlines of the resistors and capacitors of my first computer returns to me. Schlaepfer and Oskay render even the most mundane artifacts, such as the ceramic disc capacitor, in splendid detail – and in ways I’ve never seen before. Prior to now, I had no intuition for the dimensions of an actual capacitor’s dielectric material. I also didn’t realize that every thick film resistor bears the marks of lasers that trim it to its final value. Or just seeing the cross-section of a coaxial cable, as joined through a connector – all of a sudden, the telegrapher’s equations and the time domain reflectometry graphs take on a new and very tangible meaning to me. Ah, I think, so that’s the bump in the TDR graph at the connector interface!

Also breathtaking is the sheer scope of components addressed by Schlaepfer and Oskay. Nothing is too retro, nothing is too modern, nothing is too delicate: if you’ve ever wanted to see a vacuum tube cut in half, they managed to somehow slice straight through it without shattering the thin glass envelope; likewise, if you ever wondered what your smartphone motherboard might look like, they’ve gone and sliced clear through that as well.

One of my favorite tricks of the authors is when they slice through optoelectronic devices: somehow, they manage to cut through multiple LEDs and leave them in an operable state, leading to stunning images such as a 7-segment LED still displaying the number “5” yet revealed in cross-section. I really appreciate the effort that went into mounting that part onto a beautifully fabricated and polished (perhaps varnished?) copper-clad circuit board, so that not only are you treated to the spectacle of the still-functional cross sectioned device, you have the reflection of the device rippling off of a handsomely brushed copper surface. Like I said: any engineering executed with soul and care is also art.

In a true class act, Schlaepfer and Oskay conclude the book with an “Afterward” that shares the secrets of their cross-sectioning and photography techniques. Adhering to the principle of openness, this meta-chapter breaks down the fourth wall and gives you a peek into their atelier, showing you the tools and techniques used to generate the images within the book. Such sharing of hard-earned knowledge is a hallmark of true masters; while lesser authors would withold such trade secrets, fearing others may rise to compete with them, Schlaepfer and Oskay gain an even deeper respect from their fans by disclosing the effort and craft that went into creating the book. Sharing also plants the seeds for a broader community of circuit-openers, preserving the knowledge and techniques for new generations of electronics aficionados.

Even if you’re not a “hardware person”, or even if you’re “not into tech”, the images in Open Circuits are so captivating that they may just tempt you to learn a bit more about it. Or, perhaps more importantly, a wayward young mind may be influenced to realize that hardware isn’t scary: it’s okay to peel back the covers and discover that the fruits of engineering are not merely functional, but also deeply aesthetic as well. I know that a younger version of me would have carried a copy of this book everywhere I went, poring over its pages at every chance.

While I was only able to review an early access electronic copy of their book, I am excited to get the full-color, hard-cover edition of the book. Having published a couple books with No Starch Press myself, I know the passion with which its founder, Bill Pollock, conducts his trade. He does not scrimp on materials: for The Hardware Hacker, he sprung on silver ink for the endsheets and clear UV spot inks for the cover – extra costs that came out of his bottom line, but made the hardcover edition look and feel great. So, I’m excited to see these wonderful images rendered faithfully onto the pages of a coffee-table companion book that I will be proud to showcase for years to come.

If you’re also turned on to Open Circuits, pre-order it on No Starch Press’ website, with the discount code “BUNNIESTUDIOS25”, to receive 25% off (no affiliate code or trackback in that link – 100% goes to No Starch and the authors). The code expires Tuesday, October 4. Pre-orders will also receive exclusive phone and desktop wallpaper images that are not in the book!

Hydroponics: Growing an Appreciation for Plants

Tuesday, August 9th, 2022

I once heard a saying – “Don’t feel pity on plants because they can’t move. Feel pity on us, because we have to”. I really didn’t have an appreciation for what this meant until the COVID pandemic hit, which restricted my movement for a couple of years, and I decided to spend some of my new-found time at home learning how to raise plants in my little flat in central Singapore. The result is a small hydroponics system that now lines the sunny windows of my place, yielding fresh herbs weekly that I incorporate into my dishes.

For me, hydroponics really drove home how remarkable plants are: from a bin containing nothing but water and salts, a fully-formed plant emerges. No vitamins, amino acids, or other nutrients – just add sunlight, and the plant produces everything it needs starting from a single, tiny seed. The seed encodes every gene it needs to survive and reproduce – our basil plant, for example, is tetraploid, which means it has four copies of every gene. Perhaps this somewhat explains the adaptability of plant clones – it is almost as if every branch on our basil bush has a separate character, each one trying a different angle at survival. Some branches would grow large and leafy, others small and dense, and if you propagate by a cutting, the resulting plant would inherit the character of the cutting. Thus, a lone plant should not be mistaken as lonely: it needs not a mate to create diverse offspring. Every tetraploid cell contains the genetic diversity of two diploids (whereas a human is one diploid), allowing it to adapt without need of sex or seedlings.

I also did an experiment and grew some sage from seed, and planted one set in dirt and another in hydroponics. Even though from the same seed stock, the resulting individuals bore little resemblance to each other. The dirt-grown sage looked much like the herb you’re familiar with in the grocery store – dark green, covered with fine hair, and densely arranged on a stem. The hydroponically grown sage instead grows like a vine, with long thin green stems between each leaf, the leaves themselves having a lighter color and less hair. The flavor is even a bit different; the hydroponic sage emits a slightly sulfurous odor when disturbed, and exhibits a bit more mint on the palate when eaten.

Even more fascinating is how the plants seem to “groom the water”. I’ve noticed that the most successful plants we’ve tried to grow can lower the pH of the water on their own, and regulate it within a fairly consistent band (more on this later!). Furthermore, they seem to have recruited commensurate organisms to live among their roots. The basil grows long white or translucent roots with a pale white mycorrhiza, while the sage has a brownish symbiont and a short, bushy root ball. Thus I only fully replace the water of the hydroponic system as a last resort if a plant seems diseased; normally I will cycle the water by removing about a half of the reservoir and topping it up, so as not to displace the favored microbes from the ecosystem.

The Setup

The initial inspiration to try hydroponics actually came a bit by chance. We bought some locally grown hydroponic lettuce, and noticed that they were packaged as whole plants, complete with roots. We were curious – could we pluck most of the leaves of this lettuce, and then stick the plants in water, and grow another serving of hydroponic lettuce?

Surprisingly enough, it worked! Even with a crude setup consisting of a handful of generic plant fertilizer and a small aquarium bubbler, we were able to take a single plant and grow a couple more servings of lettuce from it. Unfortunately, with time, the plants started to grow very “stemmy” and pale, and eventually they succumbed to tiny mites that infested their leaves.

Inspired by this initial success, I started to read up a bit on how others did hydroponics. One of the top hits is a blog by Kyle Gabriel, detailing how he built an extremely sophisticated system based around a Raspberry Pi, and a multitude of sensors, valves and pumps. It was sort of a nerd’s dream of how farming could be fully automated. I figured I’m pretty handy with a soldering iron, so maybe I could give a go at building a system like his. So, I dug up a spare Raspberry Pi, some solid state relays and white LEDs left over from when I did the house lighting, and put together a simple system that just automated the lighting and took hourly photos of the plants as they grew. The time-lapses were fascinating!

You can’t really watch a plant grow in real time, but, over a period of days one can easily see patterns in how plants grow and adapt.

With this small success, I put my mind toward further automation – adding various pumps and regulators for the system. However, as I started to put together the BOM for this, I realized very quickly that there was going to be no return on investment for building out a system this complicated. Plus, I really didn’t like that the whole system ran on code – I did not relish the idea of coming home to a room flooded with water or a set of rotten plants because my control program hit a segfault.

So, I sat back and thought about things a bit. First, one observation I had was despite providing the plants with a 10,000 lux light source 12 hours a day, they still had a tendency to grow toward the nearby window. As an experiment, I took one bin and removed it from the regulated light source, and just stuck it up against the window. The plant grew much better with natural sunlight, so, I removed all the artificial lighting, unplugged the Raspberry Pi, and just stuck all the plants against the windowsill (it definitely helps that I live one degree off the equator – it’s eternally summer here, with sunrise at 7AM and sunset at 7PM, 365 days a year). I was happy to save the electricity while getting bigger plants in the process.

For water level automation, I replaced the computer with two float switches in series. One switch cuts off the pump if the water level gets too low in the feed reservoir; the other cuts off the pump if the water level gets too high in the plant’s growth bin. You can use the same type of switch for both purposes; by just mounting the switch upsidedown you can invert the function of the switch.

The current “automated” system, consisting of a reservoir on the left, with a peristaltic pump on top of the reservoir bin, and two float switches. The silicone tube that takes the solution from the reservoir to the plant bin is covered by an old sock to prevent algae from growing in the solution when it’s not moving. There is also an aeration pump, not visible in this photo.

The float switch mounted on the top, functioning as a “break-when-full” switch. You can see the plant’s roots have taken over the entire bin! A couple of spacers were also added to adjust the height of the water.

The float switch mounted on the bottom of a tank, functioning as a “break when empty” switch. In order to provide clearance for the switch on the bottom, a couple of wine corks were hot-glued to the bottom of the bin. The switch comes with a rubber o-ring, creating an effective seal and no leakage.

So, with a couple of storage bins from Daiso, two float switches and a peristaltic pump, I’ve constructed a system that automates the care of our plants for up to two weeks at a time for under $40. No transistors required – just old-school technology dating from the 1800’s!

There is one other small detail necessary for hydroponics – an aeration pump. Any aquarium pump will do – although we eventually upgraded to some fancy silent pumps instead of the cheaper but noisier diaphragm based ones. Some blogs say that the “roots need oxygen” to survive, but my suspicion is actually that the pumps mostly serve to circulate the nutrient solution. If you leave the pump off, the roots will rapidly deplete the water around them of nutrients, and without any circulation you’re relying purely on a slow process of diffusion for nutrients to reach the roots. I’ve noticed that on bins with a low air flow, the roots will grow thick and matted, but bins with a faster air flow, the roots barely need to grow at all – my hypothesis is this reflects the plant allocating less resources toward root growth in bins with greater circulation, because fresh solution is always available with faster circulation.

The Tricky Bit

The electronics were actually the easiest part of the whole enterprise; the hardest part was figuring out what, exactly, I had to add to the water to get the plants to flourish. Once I got this right, the plants basically take care of themselves; of course it helps to pick plant varietals that are pest-resistant, and have the innate ability to regulate the pH level around their roots.

When I started, I was naively aware that plants needed nitrogen-bearing fertilizers. Reading the label on packaged fertilizer solutions, they use an “NPK” system, which stands for nitrogen-phosphorous-potassium. OK, sure, so plants need a bit more than just nitrogen. Surely I could just pick up some of this NPK stuff, dissolve it in some water, and we’re good to go…

…but how much of this should I add? This deceptively simple question lead me down a several-month rat-hole that took many failed experiments and daily journals of observations to find an answer. The core problem is that most plant bloggers like to use units like “one handful” as a unit of measure; the more precise ones would write something to the effect of “one capful per gallon”. As an engineer, units of handfuls and capfuls are extremely dissatisfying: how many grams per liter, dammit!

This lead me to research several academic papers about plant nutrition, which lead to reading graphs about plant growth under “controlled” conditions that lead to astonishingly contradictory results to what the plant bloggers would write: the NPK ratios implied by some of the academic works were wildly different from what the plant bloggers relayed in their actual experience.

It turns out the truth is somewhere in between. A big confounding factor is probably the nature of the soil used in the research, versus the base quality of the water used in your hydroponic system. Most of the research I uncovered was written about fertilizing plants grown in soil, and for example “loamy diatomaceous earth” turns out to be quite a complicated mix of nutrients in and of itself.

The most informative bit of research that I uncovered was experiments done where they would ash a plant after it was grown and measure out all the base elements from the resulting dry weight of the plant. It was here that I learned that, for example, molybdenum is absolutely essential to the growth of plants. It’s almost never mentioned in soil cultures, because dirt almost always has sufficient trace quantities of molybdenum to sustain plants, but water cultures quickly become molybdenum-deficient, and the plants will become pale and sickly without a supplement.

I also learned that plants need calcium and magnesium in astonishingly large quantities; as much as they need phosphorous and potassium. Again, these two nutrients are less discussed in soil-based literature because many rocks are basically made of calcium and magnesium, and as such plants have no trouble extracting what they need from the soil.

Finally, there is the issue of iron. Iron turns out to be the hardest nutrient to balance in a hydroponic system. Despite being extremely plentiful on Earth, and indeed, possibly being the penultimate composition of the entire universe, it is extremely scarce as a free atom in the biosphere. This is in part because it gets strongly bound to other molecules. For example, oxygen binds to myoglobin with a log K1 of 6.18, which means that it is a million times more likely to find oxygen bound to myoglobin than unbound in solution. This may sound strong, but EDTA, a chelating agent, has a log K1 of something like 27.7, so it is one octillion (1,000,000,000,000,000,000,000,000,000) times more likely to exist as bound to iron than unbound in equilibrium. In a way, iron is so biologically important that organic life had an arms race to bind free iron, and some ridiculously potent molecules exist to rapidly sweep the tiniest amount of iron out of solution. Fortunately, as long as I (or more conveniently, the plant itself) can keep the pH of the water below 5.5, I can take advantage of the extremely strong binding of EDTA to iron to keep it dissolved in solution and out of reach of other organisms trying to scavenge it out of the water. The plants can somehow take in the bound iron-EDTA complex, degrade the EDTA and extract the iron for its use (this took a long time and many trials with various iron binding agents to figure out how to remedy the chlorosis that would eventually take over every plant I grew).

Alright, now that I have a vague understanding of the atoms that a plant needs to survive, the question is how do I get them to the plant – and in what ratios? The answer to this is equally as vague and frustrating. You can’t simply throw a chunk of magnesium metal into a bin of water and expect a plant to access it. The magnesium needs to be turned into a salt so that it can readily dissolve into the water. One of the easiest versions of this to buy is magnesium sulfate, MgSO4, also known as epsom salts. So, I can just read the blogs and find the ones that tell you how many grams of magnesium sulfate to add per liter of water and be done with it, right?

Wrong again! It turns out that MgSO4 has several “hydration states” (11 total). Even though it looks like a hard, translucent crystal, Epsom salt is actually more water than magnesium by weight, as 7 molecules of water are bound to every molecule of magnesium sulfate in that preparation.

Of course, no plant blogger ever specifies the hydration states of the salts that they use in their preparations; and many on-line listings for agricultural-grade salts also fail to list the exact hydration state of their salts. Unfortunately this means there can be extremely large deviations in actual nutrient availability if you purchase a dissimilar hydration state from that used by the plant blogger.

That left me with purchasing a set of salts and trying to calculate, from first principles, the ratios that I needed to add to my hydroponics bins. The salts I finally decided on purchasing are:

  • Monopotassium phosphate (anhydrous) K2PO4
  • Potassium sulfate (anhydrous) K2SO4
  • Calcium nitrate Ca(NO3)2•4H2O – hygroscopic
  • Magnesium sulfate MgSO4•7H2O

Plus a pre-mixed micronutrient from a local hydroponics shop that contains the remaining essential elements in the following ratios:

  • Iron as EDTA chelate 21.25 mg/mL
  • Manganese 5.684 mg/mL
  • Boron 0.483 mg/mL
  • Zinc 0.617 mg/mL
  • Copper 0.267 mg/mL
  • Molybdenum 0.471 mg/mL

For the salts, I computed a matrix that allows me to solve for the amount of nutrient I want in solution, by taking the mass fraction of each nutrient available, writing it in matrix form, and then inverting it (had to crack open my linear algebra book from high school to remember what determinants were! Who knew that determinants could be useful for farming…).

You can make the matrix yourself by expressing the ratio of the milligrams of nutrient (as derived by the atomic weight of the nutrient) per milligram of compound (as derived by summing up the weight of all the atoms in the molecular formula, including the hydration state), and putting it into a matrix form like this:

And then taking the coefficients into an inverse matrix calculator and deriving a final format that allows you to plug in your desired NPK ratio and compute the mass of the salts you need to dissolve in water to achieve that:

As a sanity check, I plug the calculated weights back into the forward matrix to make sure I didn’t mess up the math, and I also add up all the dissolved solids to a TDS (total dissolved salts) number, so I can cross-check the resulting solution easily using a cheap TDS meter (link without referral code). In case you want to start from a template, you can download the spreadsheet. The template contains the pre-computed ratio that I currently use for growing all my herbs with compounds that I can source easily from the local market, and it seems to work fairly well for plants ranging from brazilian spinach to basil to sage.

As a side note, calcium nitrate is pretty tricky to handle. It’s very hygroscopic, so if left in ambient humidity, it will absorb water from the atmosphere and “melt away” into a concentrated, syrupy liquid. I usually add a few percent extra by weight over the formula to compensate for the excess water it accumulates over time. Also, I store the substance in an air-tight bag, and I always wear nitrile gloves while handling the compound to avoid damaging my hands.

For the micronutrients, it’s a bit trickier to dose correctly. Fortunately, I have a micropipette set that can measure out solutions in the range from 1uL to 200uL, from back when I did some genetic engineering in my kitchen (pipettes are also surprisingly cheap (without referral code) now). Again, the blogs are not terribly helpful about dosing – you get advice along the lines of “one drop per bucket” or something like that. What’s a drop? What’s a bucket? The exact volume of a drop depends on the surface tension and viscosity of the liquid, but I went with the rule of thumb that one drop is 50uL (20 drops per mL) as a starting point.

Initially, I tried 60uL of micronutrients per 1.5L of solution, but the plants started to show evidence of boron poisoning (this is a great guide for diagnosing plant nutritional problems based on the appearance of the leaves), so after a few iterations and replacements of the water to flush out the excess accumulated micronutrients, I settled on 30uL of micronutrients per 1.5L of solution, with a 15uL per week bump for iron-hungry species like spinach.

At a microliters-per-week consumption rate, even the smallest bottle of 150mL micronutrient solution will last years, but the tricky part is storing it. In order to avoid contaminating the bottle, I aliquot the solution every couple months into a set of 1.5mL eppendorfs which I keep in my wine fridge, alongside the original bottle. Even though I try my best to avoid contaminating the eppendorfs, after a couple weeks a pellet forms at the bottom from some process that is causing the micronutrients to come out of solution, so I typically end up discarding the aliquot before it is entirely used up.

The Final Result

It’s pretty neat to go from a pile of salts to delicious herbs. About a gram of salts go in, and a week later a couple dozen grams of leaves come out!


In go salts…


Out comes basil!

Basil in particular has been a real champ at growing in our hydroponics bins – we are at the point now where between two plants, we’re regularly giving it away to friends because it yields more than we can eat, even though I cook Italian food almost every other night. A handful of basil, a bit of salt, olive oil, tomatoes and garlic and we have a flavorful bruschetta to kick off a meal! Our other favorite is sage, it’s great for flavoring pork and poultry, but it’s very hard to find for some reason in Singapore. So, having a bit of fresh sage around is convenient and saves us a bit of money since it can be quite expensive to buy in specialty stores.

It’s been less practical to grow bulk vegetables, such as spinach. Brazilian spinach has been fairly successful in terms of growth, but it takes about a month for a cutting to grow to maturity, and we need about four plants to make a salad, so we’d need several racks of bins to make a dent in our vegetable consumption. Also, in general our herbs have had less pests than leafy green vegetables; maybe their strong flavor comes from compounds they produce that also serves to repel bugs? So, in addition to being great flavor for our sauces, the herbs have required no pesticides.

Overall, it was satisfying to learn about plant biology while developing a better connection to my food through technology. It was also a calming way to pass time during the pandemic; agriculture requires patience and time, but the reward is visceral. Having kept a miniature farmer’s almanac to decode missing pieces of information from the blogosphere, I have an new appreciation for how such personal journals could lead to scientific discoveries. And, I’m a much better chef than I was a couple years ago. Somehow, just having the fresh herbs around inspired explorations into new and exciting pairings; it gave me a whole new way to think about food.

Rust: A Critical Retrospective

Thursday, May 19th, 2022

Since I was unable to travel for a couple of years during the pandemic, I decided to take my new-found time and really lean into Rust. After writing over 100k lines of Rust code, I think I am starting to get a feel for the language and like every cranky engineer I have developed opinions and because this is the Internet I’m going to share them.

The reason I learned Rust was to flesh out parts of the Xous OS written by Xobs. Xous is a microkernel message-passing OS written in pure Rust. Its closest relative is probably QNX. Xous is written for lightweight (IoT/embedded scale) security-first platforms like Precursor that support an MMU for hardware-enforced, page-level memory protection.

In the past year, we’ve managed to add a lot of features to the OS: networking (TCP/UDP/DNS), middleware graphics abstractions for modals and multi-lingual text, storage (in the form of an encrypted, plausibly deniable database called the PDDB), trusted boot, and a key management library with self-provisioning and sealing properties.

One of the reasons why we decided to write our own OS instead of using an existing implementation such as SeL4, Tock, QNX, or Linux, was we wanted to really understand what every line of code was doing in our device. For Linux in particular, its source code base is so huge and so dynamic that even though it is open source, you can’t possibly audit every line in the kernel. Code changes are happening at a pace faster than any individual can audit. Thus, in addition to being home-grown, Xous is also very narrowly scoped to support just our platform, to keep as much unnecessary complexity out of the kernel as possible.

Being narrowly scoped means we could also take full advantage of having our CPU run in an FPGA. Thus, Xous targets an unusual RV32-IMAC configuration: one with an MMU + AES extensions. It’s 2022 after all, and transistors are cheap: why don’t all our microcontrollers feature page-level memory protection like their desktop counterparts? Being an FPGA also means we have the ability to fix API bugs at the hardware level, leaving the kernel more streamlined and simplified. This was especially relevant in working through abstraction-busting processes like suspend and resume from RAM. But that’s all for another post: this one is about Rust itself, and how it served as a systems programming language for Xous.

Rust: What Was Sold To Me

Back when we started Xous, we had a look at a broad number of systems programming languages and Rust stood out. Even though its `no-std` support was then-nascent, it was a strongly-typed, memory-safe language with good tooling and a burgeoning ecosystem. I’m personally a huge fan of strongly typed languages, and memory safety is good not just for systems programming, it enables optimizers to do a better job of generating code, plus it makes concurrency less scary. I actually wished for Precursor to have a CPU that had hardware support for tagged pointers and memory capabilities, similar to what was done on CHERI, but after some discussions with the team doing CHERI it was apparent they were very focused on making C better and didn’t have the bandwidth to support Rust (although that may be changing). In the grand scheme of things, C needed CHERI much more than Rust needed CHERI, so that’s a fair prioritization of resources. However, I’m a fan of belt-and-suspenders for security, so I’m still hopeful that someday hardware-enforced fat pointers will make their way into Rust.

That being said, I wasn’t going to go back to the C camp simply to kick the tires on a hardware retrofit that backfills just one poor aspect of C. The glossy brochure for Rust also advertised its ability to prevent bugs before they happened through its strict “borrow checker”. Furthermore, its release philosophy is supposed to avoid what I call “the problem with Python”: your code stops working if you don’t actively keep up with the latest version of the language. Also unlike Python, Rust is not inherently unhygienic, in that the advertised way to install packages is not also the wrong way to install packages. Contrast to Python, where the official docs on packages lead you to add them to system environment, only to be scolded by Python elders with a “but of course you should be using a venv/virtualenv/conda/pipenv/…, everyone knows that”. My experience with Python would have been so much better if this detail was not relegated to Chapter 12 of 16 in the official tutorial. Rust is also supposed to be better than e.g. Node at avoiding the “oops I deleted the Internet” problem when someone unpublishes a popular package, at least if you use fully specified semantic versions for your packages.

In the long term, the philosophy behind Xous is that eventually it should “get good enough”, at which point we should stop futzing with it. I believe it is the mission of engineers to eventually engineer themselves out of a job: systems should get stable and solid enough that it “just works”, with no caveats. Any additional engineering beyond that point only adds bugs or bloat. Rust’s philosophy of “stable is forever” and promising to never break backward-compatibility is very well-aligned from the point of view of getting Xous so polished that I’m no longer needed as an engineer, thus enabling me to spend more of my time and focus supporting users and their applications.

The Rough Edges of Rust

There’s already a plethora of love letters to Rust on the Internet, so I’m going to start by enumerating some of the shortcomings I’ve encountered.

“Line Noise” Syntax

This is a superficial complaint, but I found Rust syntax to be dense, heavy, and difficult to read, like trying to read the output of a UART with line noise:

Trying::to_read::<&'a heavy>(syntax, |like| { this. can_be( maddening ) }).map(|_| ())?;

In more plain terms, the line above does something like invoke a method called “to_read” on the object (actually `struct`) “Trying” with a type annotation of “&heavy” and a lifetime of ‘a with the parameters of “syntax” and a closure taking a generic argument of “like” calling the can_be() method on another instance of a structure named “this” with the parameter “maddening” with any non-error return values mapped to the Rust unit type “()” and errors unwrapped and kicked back up to the caller’s scope.

Deep breath. Surely, I got some of this wrong, but you get the idea of how dense this syntax can be.

And then on top of that you can layer macros and directives which don’t have to follow other Rust syntax rules. For example, if you want to have conditionally compiled code, you use a directive like

#[cfg(all(not(baremetal), any(feature = “hazmat”, feature = “debug_print”)))]

Which says if either the feature “hazmat” or “debug_print” is enabled and you’re not running on bare metal, use the block of code below (and I surely got this wrong too). The most confusing part of about this syntax to me is the use of a single “=” to denote equivalence and not assignment, because, stuff in config directives aren’t Rust code. It’s like a whole separate meta-language with a dictionary of key/value pairs that you query.

I’m not even going to get into the unreadability of Rust macros – even after having written a few Rust macros myself, I have to admit that I feel like they “just barely work” and probably thar be dragons somewhere in them. This isn’t how you’re supposed to feel in a language that bills itself to be reliable. Yes, it is my fault for not being smart enough to parse the language’s syntax, but also, I do have other things to do with my life, like build hardware.

Anyways, this is a superficial complaint. As time passed I eventually got over the learning curve and became more comfortable with it, but it was a hard, steep curve to climb. This is in part because all the Rust documentation is either written in eli5 style (good luck figuring out “feature”s from that example), or you’re greeted with a formal syntax definition (technically, everything you need to know to define a “feature” is in there, but nowhere is it summarized in plain English), and nothing in between.

To be clear, I have a lot of sympathy for how hard it is to write good documentation, so this is not a dig at the people who worked so hard to write so much excellent documentation on the language. I genuinely appreciate the general quality and fecundity of the documentation ecosystem.

Rust just has a steep learning curve in terms of syntax (at least for me).

Rust Is Powerful, but It Is Not Simple

Rust is powerful. I appreciate that it has a standard library which features HashMaps, Vecs, and Threads. These data structures are delicious and addictive. Once we got `std` support in Xous, there was no going back. Coming from a background of C and assembly, Rust’s standard library feels rich and usable — I have read some criticisms that it lacks features, but for my purposes it really hits a sweet spot.

That being said, my addiction to the Rust `std` library has not done any favors in terms of building an auditable code base. One of the criticisms I used to leverage at Linux is like “holy cow, the kernel source includes things like an implementation for red black trees, how is anyone going to audit that”.

Now, having written an OS, I have a deep appreciation for how essential these rich, dynamic data structures are. However, the fact that Xous doesn’t include an implementation of HashMap within its repository doesn’t mean that we are any simpler than Linux: indeed, we have just swept a huge pile of code under the rug; just the `collection`s portion of the standard library represents about 10k+ SLOC at a very high complexity.

So, while Rust’s `std` library allows the Xous code base to focus on being a kernel and not also be its own standard library, from the standpoint of building a minimum attack-surface, “fully-auditable by one human” codebase, I think our reliance on Rust’s `std` library means we fail on that objective, especially so long as we continue to track the latest release of Rust (and I’ll get into why we have to in the next section).

Ideally, at some point, things “settle down” enough that we can stick a fork in it and call it done by well, forking the Rust repo, and saying “this is our attack surface, and we’re not going to change it”. Even then, the Rust `std` repo dwarfs the Xous repo by several multiples in size, and that’s not counting the complexity of the compiler itself.

Rust Isn’t Finished

This next point dovetails into why Rust is not yet suitable for a fully auditable kernel: the language isn’t finished. For example, while we were coding Xous, a thing called `const generic` was introduced. Before this, Rust had no native ability to deal with arrays bigger than 32 elements! This limitation is a bit maddening, and even today there are shortcomings such as the `Default` trait being unable to initialize arrays larger than 32 elements. This friction led us to put limits on many things at 32 elements: for example, when we pass the results of an SSID scan between processes, the structure only reserves space for up to 32 results, because the friction of going to a larger, more generic structure just isn’t worth it. That’s a language-level limitation directly driving a user-facing feature.

Also over the course of writing Xous, things like in-line assembly and workspaces finally reached maturity, which means we need to go back a revisit some unholy things we did to make those critical few lines of initial boot code, written in assembly, integrated into our build system.

I often ask myself “when is the point we’ll get off the Rust release train”, and the answer I think is when they finally make “alloc” no longer a nightly API. At the moment, `no-std` targets have no access to the heap, unless they hop on the “nightly” train, in which case you’re back into the Python-esque nightmare of your code routinely breaking with language releases.

We definitely gave writing an OS in `no-std` + stable a fair shake. The first year of Xous development was all done using `no-std`, at a cost in memory space and complexity. It’s possible to write an OS with nothing but pre-allocated, statically sized data structures, but we had to accommodate the worst-case number of elements in all situations, leading to bloat. Plus, we had to roll a lot of our own core data structures.

About a year ago, that all changed when Xobs ported Rust’s `std` library to Xous. This means we are able to access the heap in stable Rust, but it comes at a price: now Xous is tied to a particular version of Rust, because each version of Rust has its own unique version of `std` packaged with it. This version tie is for a good reason: `std` is where the sausage gets made of turning fundamentally `unsafe` hardware constructions such as memory allocation and thread creation into “safe” Rust structures. (Also fun fact I recently learned: Rust doesn’t have a native allocater for most targets – it simply punts to the native libc `malloc()` and `free()` functions!) In other words, Rust is able to make a strong guarantee about the stable release train not breaking old features in part because of all the loose ends swept into `std`.

I have to keep reminding myself that having `std` doesn’t eliminate the risk of severe security bugs in critical code – it merely shuffles a lot of critical code out of sight, into a standard library. Yes, it is maintained by a talented group of dedicated programmers who are smarter than me, but in the end, we are all only human, and we are all fair targets for software supply chain exploits.

Rust has a clockwork release schedule – every six weeks, it pushes a new version. And because our fork of `std` is tied to a particular version of Rust, it means every six weeks, Xobs has the thankless task of updating our fork and building a new `std` release for it (we’re not a first-class platform in Rust, which means we have to maintain our own `std` library). This means we likewise force all Xous developers to run `rustup update` on their toolchains so we can retain compatibility with the language.

This probably isn’t sustainable. Eventually, we need to lock down the code base, but I don’t have a clear exit strategy for this. Maybe the next point at which we can consider going back to `nostd` is when we can get the stable `alloc` feature, which allows us to have access to the heap again. We could then decouple Xous from the Rust release train, but we’d still need to backfill features such as Vec, HashMap, Thread, and Arc/Mutex/Rc/RefCell/Box constructs that enable Xous to be efficiently coded.

Unfortunately, the `alloc` crate is very hard, and has been in development for many years now. That being said, I really appreciate the transparency of Rust behind the development of this feature, and the hard work and thoughtfulness that is being put into stabilizing this feature.

Rust Has A Limited View of Supply Chain Security

I think this position is summarized well by the installation method recommended by the rustup.rs installation page:

`curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh`

“Hi, run this shell script from a random server on your machine.”

To be fair, you can download the script and inspect it before you run it, which is much better than e.g. the Windows .MSI installers for vscode. However, this practice pervades the entire build ecosystem: a stub of code called `build.rs` is potentially compiled and executed whenever you pull in a new crate from crates.io. This, along with “loose” version pinning (you can specify a version to be, for example, simply “2” which means you’ll grab whatever the latest version published is with a major rev of 2), makes me uneasy about the possibility of software supply chain attacks launched through the crates.io ecosystem.

Crates.io is also subject to a kind of typo-squatting, where it’s hard to determine which crates are “good” or “bad”; some crates that are named exactly what you want turn out to just be old or abandoned early attempts at giving you the functionality you wanted, and the more popular, actively-maintained crates have to take on less intuitive names, sometimes differing by just a character or two from others (to be fair, this is not a problem unique to Rust’s package management system).

There’s also the fact that dependencies are chained – when you pull in one thing from crates.io, you also pull in all of that crate’s subordinate dependencies, along with all their build.rs scripts that will eventually get run on your machine. Thus, it is not sufficient to simply audit the crates explicitly specified within your Cargo.toml file — you must also audit all of the dependent crates for potential supply chain attacks as well.

Fortunately, Rust does allow you to pin a crate at a particular version using the `Cargo.lock` file, and you can fully specify a dependent crate down to the minor revision. We try to mitigate this in Xous by having a policy of publishing our Cargo.lock file and specifying all of our first-order dependent crates to the minor revision. We have also vendored in or forked certain crates that would otherwise grow our dependency tree without much benefit.

That being said, much of our debug and test framework relies on some rather fancy and complicated crates that pull in a huge number of dependencies, and much to my chagrin even when I try to run a build just for our target hardware, the dependent crates for running simulations on the host computer are still pulled in and the build.rs scripts are at least built, if not run.

In response to this, I wrote a small tool called `crate-scraper` which downloads the source package for every source specified in our Cargo.toml file, and stores them locally so we can have a snapshot of the code used to build a Xous release. It also runs a quick “analysis” in that it searches for files called build.rs and collates them into a single file so I can more quickly grep through to look for obvious problems. Of course, manual review isn’t a practical way to detect cleverly disguised malware embedded within the build.rs files, but it at least gives me a sense of the scale of the attack surface we’re dealing with — and it is breathtaking, about 5700 lines of code from various third parties that manipulates files, directories, and environment variables, and runs other programs on my machine every time I do a build.

I’m not sure if there is even a good solution to this problem, but, if you are super-paranoid and your goal is to be able to build trustable firmware, be wary of Rust’s expansive software supply chain attack surface!

You Can’t Reproduce Someone Else’s Rust Build

A final nit I have about Rust is that builds are not reproducible between different computers (they are at least reproducible between builds on the same machine if we disable the embedded timestamp that I put into Xous for $reasons).

I think this is primarily because Rust pulls in the full path to the source code as part of the panic and debug strings that are built into the binary. This has lead to uncomfortable situations where we have had builds that worked on Windows, but failed under Linux, because our path names are very different lengths on the two and it would cause some memory objects to be shifted around in target memory. To be fair, those failures were all due to bugs we had in Xous, which have since been fixed. But, it just doesn’t feel good to know that we’re eventually going to have users who report bugs to us that we can’t reproduce because they have a different path on their build system compared to ours. It’s also a problem for users who want to audit our releases by building their own version and comparing the hashes against ours.

There’s some bugs open with the Rust maintainers to address reproducible builds, but with the number of issues they have to deal with in the language, I am not optimistic that this problem will be resolved anytime soon. Assuming the only driver of the unreproducibility is the inclusion of OS paths in the binary, one fix to this would be to re-configure our build system to run in some sort of a chroot environment or a virtual machine that fixes the paths in a way that almost anyone else could reproduce. I say “almost anyone else” because this fix would be OS-dependent, so we’d be able to get reproducible builds under, for example, Linux, but it would not help Windows users where chroot environments are not a thing.

Where Rust Exceeded Expectations

Despite all the gripes laid out here, I think if I had to do it all over again, Rust would still be a very strong contender for the language I’d use for Xous. I’ve done major projects in C, Python, and Java, and all of them eventually suffer from “creeping technical debt” (there’s probably a software engineer term for this, I just don’t know it). The problem often starts with some data structure that I couldn’t quite get right on the first pass, because I didn’t yet know how the system would come together; so in order to figure out how the system comes together, I’d cobble together some code using a half-baked data structure.

Thus begins the descent into chaos: once I get an idea of how things work, I go back and revise the data structure, but now something breaks elsewhere that was unsuspected and subtle. Maybe it’s an off-by-one problem, or the polarity of a sign seems reversed. Maybe it’s a slight race condition that’s hard to tease out. Nevermind, I can patch over this by changing a <= to a <, or fixing the sign, or adding a lock: I’m still fleshing out the system and getting an idea of the entire structure. Eventually, these little hacks tend to metastasize into a cancer that reaches into every dependent module because the whole reason things even worked was because of the “cheat”; when I go back to excise the hack, I eventually conclude it’s not worth the effort and so the next best option is to burn the whole thing down and rewrite it…but unfortunately, we’re already behind schedule and over budget so the re-write never happens, and the hack lives on.

Rust is a difficult language for authoring code because it makes these “cheats” hard – as long as you have the discipline of not using “unsafe” constructions to make cheats easy. However, really hard does not mean impossible – there were definitely some cheats that got swept under the rug during the construction of Xous.

This is where Rust really exceeded expectations for me. The language’s structure and tooling was very good at hunting down these cheats and refactoring the code base, thus curing the cancer without killing the patient, so to speak. This is the point at which Rust’s very strict typing and borrow checker converts from a productivity liability into a productivity asset.

I liken it to replacing a cable in a complicated bundle of cables that runs across a building. In Rust, it’s guaranteed that every strand of wire in a cable chase, no matter how complicated and awful the bundle becomes, is separable and clearly labeled on both ends. Thus, you can always “pull on one end” and see where the other ends are by changing the type of an element in a structure, or the return type of a method. In less strictly typed languages, you don’t get this property; the cables are allowed to merge and affect each other somewhere inside the cable chase, so you’re left “buzzing out” each cable with manual tests after making a change. Even then, you’re never quite sure if the thing you replaced is going to lead to the coffee maker switching off when someone turns on the bathroom lights.

Here’s a direct example of Rust’s refactoring abilities in action in the context of Xous. I had a problem in the way trust levels are handled inside our graphics subsystem, which I call the GAM (Graphical Abstraction Manager). Each Canvas in the system gets a `u8` assigned to it that is a trust level. When I started writing the GAM, I just knew that I wanted some notion of trustability of a Canvas, so I added the variable, but wasn’t quite sure exactly how it would be used. Months later, the system grew the notion of Contexts with Layouts, which are multi-Canvas constructions that define a particular type of interaction. Now, you can have multiple trust levels associated with a single Context, but I had forgotten about the trust variable I had previously put in the Canvas structure – and added another trust level number to the Context structure as well. You can see where this is going: everything kind of worked as long as I had simple test cases, but as we started to get modals popping up over applications and then menus on top of modals and so forth, crazy behavior started manifesting, because I had confused myself over where the trust values were being stored. Sometimes I was updating the value in the Context, sometimes I was updating the one in the Canvas. It would manifest itself sometimes as an off-by-one bug, other times as a concurrency error.

This was always a skeleton in the closet that bothered me while the GAM grew into a 5k-line monstrosity of code with many moving parts. Finally, I decided something had to be done about it, and I was really not looking forward to it. I was assuming that I messed up something terribly, and this investigation was going to conclude with a rewrite of the whole module.

Fortunately, Rust left me a tiny string to pull on. Clippy, the cheerfully named “linter” built into Rust, was throwing a warning that the trust level variable was not being used at a point where I thought it should be – I was storing it in the Context after it was created, but nobody every referred to it after then. That’s strange – it should be necessary for every redraw of the Context! So, I started by removing the variable, and seeing what broke. This rapidly led me to recall that I was also storing the trust level inside the Canvases within the Context when they were being created, which is why I had this dangling reference. Once I had that clue, I was able to refactor the trust computations to refer only to that one source of ground truth. This also led me to discover other bugs that had been lurking because in fact I was never exercising some code paths that I thought I was using on a routine basis. After just a couple hours of poking around, I had a clear-headed view of how this was all working, and I had refactored the trust computation system with tidy APIs that were simple and easier to understand, without having to toss the entire code base.

This is just one of many positive experiences I’ve had with Rust in maintaining the Xous code base. It’s one of the first times I’ve walked into a big release with my head up and a positive attitude, because for the first time ever, I feel like maybe I have a chance of being able deal with hard bugs in an honest fashion. I’m spending less time making excuses in my head to justify why things were done this way and why we can’t take that pull request, and more time thinking about all the ways things can get better, because I know Clippy has my back.

Caveat Coder

Anyways, that’s a lot of ranting about software for a hardware guy. Software people are quick to remind me that first and foremost, I make circuits and aluminum cases, not code, therefore I have no place ranting about software. They’re right – I actually have no “formal” training to write code “the right way”. When I was in college, I learned Maxwell’s equations, not algorithms. I could never be a professional programmer, because I couldn’t pass even the simplest coding interview. Don’t ask me to write a linked list: I already know that I don’t know how to do it correctly; you don’t need to prove that to me. This is because whenever I find myself writing a linked list (or any other foundational data structure for that matter), I immediately stop myself and question all the life choices that brought me to that point: isn’t this what libraries are for? Do I really need to be re-inventing the wheel? If there is any correlation between doing well in a coding interview and actual coding ability, then you should definitely take my opinions with the grain of salt.

Still, after spending a couple years in the foxhole with Rust and reading countless glowing articles about the language, I felt like maybe a post that shared some critical perspectives about the language would be a refreshing change of pace.