Archive for the ‘Ponderings’ Category

Hydroponics: Growing an Appreciation for Plants

Tuesday, August 9th, 2022

I once heard a saying – “Don’t feel pity on plants because they can’t move. Feel pity on us, because we have to”. I really didn’t have an appreciation for what this meant until the COVID pandemic hit, which restricted my movement for a couple of years, and I decided to spend some of my new-found time at home learning how to raise plants in my little flat in central Singapore. The result is a small hydroponics system that now lines the sunny windows of my place, yielding fresh herbs weekly that I incorporate into my dishes.

For me, hydroponics really drove home how remarkable plants are: from a bin containing nothing but water and salts, a fully-formed plant emerges. No vitamins, amino acids, or other nutrients – just add sunlight, and the plant produces everything it needs starting from a single, tiny seed. The seed encodes every gene it needs to survive and reproduce – our basil plant, for example, is tetraploid, which means it has four copies of every gene. Perhaps this somewhat explains the adaptability of plant clones – it is almost as if every branch on our basil bush has a separate character, each one trying a different angle at survival. Some branches would grow large and leafy, others small and dense, and if you propagate by a cutting, the resulting plant would inherit the character of the cutting. Thus, a lone plant should not be mistaken as lonely: it needs not a mate to create diverse offspring. Every tetraploid cell contains the genetic diversity of two diploids (whereas a human is one diploid), allowing it to adapt without need of sex or seedlings.

I also did an experiment and grew some sage from seed, and planted one set in dirt and another in hydroponics. Even though from the same seed stock, the resulting individuals bore little resemblance to each other. The dirt-grown sage looked much like the herb you’re familiar with in the grocery store – dark green, covered with fine hair, and densely arranged on a stem. The hydroponically grown sage instead grows like a vine, with long thin green stems between each leaf, the leaves themselves having a lighter color and less hair. The flavor is even a bit different; the hydroponic sage emits a slightly sulfurous odor when disturbed, and exhibits a bit more mint on the palate when eaten.

Even more fascinating is how the plants seem to “groom the water”. I’ve noticed that the most successful plants we’ve tried to grow can lower the pH of the water on their own, and regulate it within a fairly consistent band (more on this later!). Furthermore, they seem to have recruited commensurate organisms to live among their roots. The basil grows long white or translucent roots with a pale white mycorrhiza, while the sage has a brownish symbiont and a short, bushy root ball. Thus I only fully replace the water of the hydroponic system as a last resort if a plant seems diseased; normally I will cycle the water by removing about a half of the reservoir and topping it up, so as not to displace the favored microbes from the ecosystem.

The Setup

The initial inspiration to try hydroponics actually came a bit by chance. We bought some locally grown hydroponic lettuce, and noticed that they were packaged as whole plants, complete with roots. We were curious – could we pluck most of the leaves of this lettuce, and then stick the plants in water, and grow another serving of hydroponic lettuce?

Surprisingly enough, it worked! Even with a crude setup consisting of a handful of generic plant fertilizer and a small aquarium bubbler, we were able to take a single plant and grow a couple more servings of lettuce from it. Unfortunately, with time, the plants started to grow very “stemmy” and pale, and eventually they succumbed to tiny mites that infested their leaves.

Inspired by this initial success, I started to read up a bit on how others did hydroponics. One of the top hits is a blog by Kyle Gabriel, detailing how he built an extremely sophisticated system based around a Raspberry Pi, and a multitude of sensors, valves and pumps. It was sort of a nerd’s dream of how farming could be fully automated. I figured I’m pretty handy with a soldering iron, so maybe I could give a go at building a system like his. So, I dug up a spare Raspberry Pi, some solid state relays and white LEDs left over from when I did the house lighting, and put together a simple system that just automated the lighting and took hourly photos of the plants as they grew. The time-lapses were fascinating!

You can’t really watch a plant grow in real time, but, over a period of days one can easily see patterns in how plants grow and adapt.

With this small success, I put my mind toward further automation – adding various pumps and regulators for the system. However, as I started to put together the BOM for this, I realized very quickly that there was going to be no return on investment for building out a system this complicated. Plus, I really didn’t like that the whole system ran on code – I did not relish the idea of coming home to a room flooded with water or a set of rotten plants because my control program hit a segfault.

So, I sat back and thought about things a bit. First, one observation I had was despite providing the plants with a 10,000 lux light source 12 hours a day, they still had a tendency to grow toward the nearby window. As an experiment, I took one bin and removed it from the regulated light source, and just stuck it up against the window. The plant grew much better with natural sunlight, so, I removed all the artificial lighting, unplugged the Raspberry Pi, and just stuck all the plants against the windowsill (it definitely helps that I live one degree off the equator – it’s eternally summer here, with sunrise at 7AM and sunset at 7PM, 365 days a year). I was happy to save the electricity while getting bigger plants in the process.

For water level automation, I replaced the computer with two float switches in series. One switch cuts off the pump if the water level gets too low in the feed reservoir; the other cuts off the pump if the water level gets too high in the plant’s growth bin. You can use the same type of switch for both purposes; by just mounting the switch upsidedown you can invert the function of the switch.

The current “automated” system, consisting of a reservoir on the left, with a peristaltic pump on top of the reservoir bin, and two float switches. The silicone tube that takes the solution from the reservoir to the plant bin is covered by an old sock to prevent algae from growing in the solution when it’s not moving. There is also an aeration pump, not visible in this photo.

The float switch mounted on the top, functioning as a “break-when-full” switch. You can see the plant’s roots have taken over the entire bin! A couple of spacers were also added to adjust the height of the water.

The float switch mounted on the bottom of a tank, functioning as a “break when empty” switch. In order to provide clearance for the switch on the bottom, a couple of wine corks were hot-glued to the bottom of the bin. The switch comes with a rubber o-ring, creating an effective seal and no leakage.

So, with a couple of storage bins from Daiso, two float switches and a peristaltic pump, I’ve constructed a system that automates the care of our plants for up to two weeks at a time for under $40. No transistors required – just old-school technology dating from the 1800’s!

There is one other small detail necessary for hydroponics – an aeration pump. Any aquarium pump will do – although we eventually upgraded to some fancy silent pumps instead of the cheaper but noisier diaphragm based ones. Some blogs say that the “roots need oxygen” to survive, but my suspicion is actually that the pumps mostly serve to circulate the nutrient solution. If you leave the pump off, the roots will rapidly deplete the water around them of nutrients, and without any circulation you’re relying purely on a slow process of diffusion for nutrients to reach the roots. I’ve noticed that on bins with a low air flow, the roots will grow thick and matted, but bins with a faster air flow, the roots barely need to grow at all – my hypothesis is this reflects the plant allocating less resources toward root growth in bins with greater circulation, because fresh solution is always available with faster circulation.

The Tricky Bit

The electronics were actually the easiest part of the whole enterprise; the hardest part was figuring out what, exactly, I had to add to the water to get the plants to flourish. Once I got this right, the plants basically take care of themselves; of course it helps to pick plant varietals that are pest-resistant, and have the innate ability to regulate the pH level around their roots.

When I started, I was naively aware that plants needed nitrogen-bearing fertilizers. Reading the label on packaged fertilizer solutions, they use an “NPK” system, which stands for nitrogen-phosphorous-potassium. OK, sure, so plants need a bit more than just nitrogen. Surely I could just pick up some of this NPK stuff, dissolve it in some water, and we’re good to go…

…but how much of this should I add? This deceptively simple question lead me down a several-month rat-hole that took many failed experiments and daily journals of observations to find an answer. The core problem is that most plant bloggers like to use units like “one handful” as a unit of measure; the more precise ones would write something to the effect of “one capful per gallon”. As an engineer, units of handfuls and capfuls are extremely dissatisfying: how many grams per liter, dammit!

This lead me to research several academic papers about plant nutrition, which lead to reading graphs about plant growth under “controlled” conditions that lead to astonishingly contradictory results to what the plant bloggers would write: the NPK ratios implied by some of the academic works were wildly different from what the plant bloggers relayed in their actual experience.

It turns out the truth is somewhere in between. A big confounding factor is probably the nature of the soil used in the research, versus the base quality of the water used in your hydroponic system. Most of the research I uncovered was written about fertilizing plants grown in soil, and for example “loamy diatomaceous earth” turns out to be quite a complicated mix of nutrients in and of itself.

The most informative bit of research that I uncovered was experiments done where they would ash a plant after it was grown and measure out all the base elements from the resulting dry weight of the plant. It was here that I learned that, for example, molybdenum is absolutely essential to the growth of plants. It’s almost never mentioned in soil cultures, because dirt almost always has sufficient trace quantities of molybdenum to sustain plants, but water cultures quickly become molybdenum-deficient, and the plants will become pale and sickly without a supplement.

I also learned that plants need calcium and magnesium in astonishingly large quantities; as much as they need phosphorous and potassium. Again, these two nutrients are less discussed in soil-based literature because many rocks are basically made of calcium and magnesium, and as such plants have no trouble extracting what they need from the soil.

Finally, there is the issue of iron. Iron turns out to be the hardest nutrient to balance in a hydroponic system. Despite being extremely plentiful on Earth, and indeed, possibly being the penultimate composition of the entire universe, it is extremely scarce as a free atom in the biosphere. This is in part because it gets strongly bound to other molecules. For example, oxygen binds to myoglobin with a log K1 of 6.18, which means that it is a million times more likely to find oxygen bound to myoglobin than unbound in solution. This may sound strong, but EDTA, a chelating agent, has a log K1 of something like 27.7, so it is one octillion (1,000,000,000,000,000,000,000,000,000) times more likely to exist as bound to iron than unbound in equilibrium. In a way, iron is so biologically important that organic life had an arms race to bind free iron, and some ridiculously potent molecules exist to rapidly sweep the tiniest amount of iron out of solution. Fortunately, as long as I (or more conveniently, the plant itself) can keep the pH of the water below 5.5, I can take advantage of the extremely strong binding of EDTA to iron to keep it dissolved in solution and out of reach of other organisms trying to scavenge it out of the water. The plants can somehow take in the bound iron-EDTA complex, degrade the EDTA and extract the iron for its use (this took a long time and many trials with various iron binding agents to figure out how to remedy the chlorosis that would eventually take over every plant I grew).

Alright, now that I have a vague understanding of the atoms that a plant needs to survive, the question is how do I get them to the plant – and in what ratios? The answer to this is equally as vague and frustrating. You can’t simply throw a chunk of magnesium metal into a bin of water and expect a plant to access it. The magnesium needs to be turned into a salt so that it can readily dissolve into the water. One of the easiest versions of this to buy is magnesium sulfate, MgSO4, also known as epsom salts. So, I can just read the blogs and find the ones that tell you how many grams of magnesium sulfate to add per liter of water and be done with it, right?

Wrong again! It turns out that MgSO4 has several “hydration states” (11 total). Even though it looks like a hard, translucent crystal, Epsom salt is actually more water than magnesium by weight, as 7 molecules of water are bound to every molecule of magnesium sulfate in that preparation.

Of course, no plant blogger ever specifies the hydration states of the salts that they use in their preparations; and many on-line listings for agricultural-grade salts also fail to list the exact hydration state of their salts. Unfortunately this means there can be extremely large deviations in actual nutrient availability if you purchase a dissimilar hydration state from that used by the plant blogger.

That left me with purchasing a set of salts and trying to calculate, from first principles, the ratios that I needed to add to my hydroponics bins. The salts I finally decided on purchasing are:

  • Monopotassium phosphate (anhydrous) K2PO4
  • Potassium sulfate (anhydrous) K2SO4
  • Calcium nitrate Ca(NO3)2•4H2O – hygroscopic
  • Magnesium sulfate MgSO4•7H2O

Plus a pre-mixed micronutrient from a local hydroponics shop that contains the remaining essential elements in the following ratios:

  • Iron as EDTA chelate 21.25 mg/mL
  • Manganese 5.684 mg/mL
  • Boron 0.483 mg/mL
  • Zinc 0.617 mg/mL
  • Copper 0.267 mg/mL
  • Molybdenum 0.471 mg/mL

For the salts, I computed a matrix that allows me to solve for the amount of nutrient I want in solution, by taking the mass fraction of each nutrient available, writing it in matrix form, and then inverting it (had to crack open my linear algebra book from high school to remember what determinants were! Who knew that determinants could be useful for farming…).

You can make the matrix yourself by expressing the ratio of the milligrams of nutrient (as derived by the atomic weight of the nutrient) per milligram of compound (as derived by summing up the weight of all the atoms in the molecular formula, including the hydration state), and putting it into a matrix form like this:

And then taking the coefficients into an inverse matrix calculator and deriving a final format that allows you to plug in your desired NPK ratio and compute the mass of the salts you need to dissolve in water to achieve that:

As a sanity check, I plug the calculated weights back into the forward matrix to make sure I didn’t mess up the math, and I also add up all the dissolved solids to a TDS (total dissolved salts) number, so I can cross-check the resulting solution easily using a cheap TDS meter (link without referral code). In case you want to start from a template, you can download the spreadsheet. The template contains the pre-computed ratio that I currently use for growing all my herbs with compounds that I can source easily from the local market, and it seems to work fairly well for plants ranging from brazilian spinach to basil to sage.

As a side note, calcium nitrate is pretty tricky to handle. It’s very hygroscopic, so if left in ambient humidity, it will absorb water from the atmosphere and “melt away” into a concentrated, syrupy liquid. I usually add a few percent extra by weight over the formula to compensate for the excess water it accumulates over time. Also, I store the substance in an air-tight bag, and I always wear nitrile gloves while handling the compound to avoid damaging my hands.

For the micronutrients, it’s a bit trickier to dose correctly. Fortunately, I have a micropipette set that can measure out solutions in the range from 1uL to 200uL, from back when I did some genetic engineering in my kitchen (pipettes are also surprisingly cheap (without referral code) now). Again, the blogs are not terribly helpful about dosing – you get advice along the lines of “one drop per bucket” or something like that. What’s a drop? What’s a bucket? The exact volume of a drop depends on the surface tension and viscosity of the liquid, but I went with the rule of thumb that one drop is 50uL (20 drops per mL) as a starting point.

Initially, I tried 60uL of micronutrients per 1.5L of solution, but the plants started to show evidence of boron poisoning (this is a great guide for diagnosing plant nutritional problems based on the appearance of the leaves), so after a few iterations and replacements of the water to flush out the excess accumulated micronutrients, I settled on 30uL of micronutrients per 1.5L of solution, with a 15uL per week bump for iron-hungry species like spinach.

At a microliters-per-week consumption rate, even the smallest bottle of 150mL micronutrient solution will last years, but the tricky part is storing it. In order to avoid contaminating the bottle, I aliquot the solution every couple months into a set of 1.5mL eppendorfs which I keep in my wine fridge, alongside the original bottle. Even though I try my best to avoid contaminating the eppendorfs, after a couple weeks a pellet forms at the bottom from some process that is causing the micronutrients to come out of solution, so I typically end up discarding the aliquot before it is entirely used up.

The Final Result

It’s pretty neat to go from a pile of salts to delicious herbs. About a gram of salts go in, and a week later a couple dozen grams of leaves come out!


In go salts…


Out comes basil!

Basil in particular has been a real champ at growing in our hydroponics bins – we are at the point now where between two plants, we’re regularly giving it away to friends because it yields more than we can eat, even though I cook Italian food almost every other night. A handful of basil, a bit of salt, olive oil, tomatoes and garlic and we have a flavorful bruschetta to kick off a meal! Our other favorite is sage, it’s great for flavoring pork and poultry, but it’s very hard to find for some reason in Singapore. So, having a bit of fresh sage around is convenient and saves us a bit of money since it can be quite expensive to buy in specialty stores.

It’s been less practical to grow bulk vegetables, such as spinach. Brazilian spinach has been fairly successful in terms of growth, but it takes about a month for a cutting to grow to maturity, and we need about four plants to make a salad, so we’d need several racks of bins to make a dent in our vegetable consumption. Also, in general our herbs have had less pests than leafy green vegetables; maybe their strong flavor comes from compounds they produce that also serves to repel bugs? So, in addition to being great flavor for our sauces, the herbs have required no pesticides.

Overall, it was satisfying to learn about plant biology while developing a better connection to my food through technology. It was also a calming way to pass time during the pandemic; agriculture requires patience and time, but the reward is visceral. Having kept a miniature farmer’s almanac to decode missing pieces of information from the blogosphere, I have an new appreciation for how such personal journals could lead to scientific discoveries. And, I’m a much better chef than I was a couple years ago. Somehow, just having the fresh herbs around inspired explorations into new and exciting pairings; it gave me a whole new way to think about food.

Rust: A Critical Retrospective

Thursday, May 19th, 2022

Since I was unable to travel for a couple of years during the pandemic, I decided to take my new-found time and really lean into Rust. After writing over 100k lines of Rust code, I think I am starting to get a feel for the language and like every cranky engineer I have developed opinions and because this is the Internet I’m going to share them.

The reason I learned Rust was to flesh out parts of the Xous OS written by Xobs. Xous is a microkernel message-passing OS written in pure Rust. Its closest relative is probably QNX. Xous is written for lightweight (IoT/embedded scale) security-first platforms like Precursor that support an MMU for hardware-enforced, page-level memory protection.

In the past year, we’ve managed to add a lot of features to the OS: networking (TCP/UDP/DNS), middleware graphics abstractions for modals and multi-lingual text, storage (in the form of an encrypted, plausibly deniable database called the PDDB), trusted boot, and a key management library with self-provisioning and sealing properties.

One of the reasons why we decided to write our own OS instead of using an existing implementation such as SeL4, Tock, QNX, or Linux, was we wanted to really understand what every line of code was doing in our device. For Linux in particular, its source code base is so huge and so dynamic that even though it is open source, you can’t possibly audit every line in the kernel. Code changes are happening at a pace faster than any individual can audit. Thus, in addition to being home-grown, Xous is also very narrowly scoped to support just our platform, to keep as much unnecessary complexity out of the kernel as possible.

Being narrowly scoped means we could also take full advantage of having our CPU run in an FPGA. Thus, Xous targets an unusual RV32-IMAC configuration: one with an MMU + AES extensions. It’s 2022 after all, and transistors are cheap: why don’t all our microcontrollers feature page-level memory protection like their desktop counterparts? Being an FPGA also means we have the ability to fix API bugs at the hardware level, leaving the kernel more streamlined and simplified. This was especially relevant in working through abstraction-busting processes like suspend and resume from RAM. But that’s all for another post: this one is about Rust itself, and how it served as a systems programming language for Xous.

Rust: What Was Sold To Me

Back when we started Xous, we had a look at a broad number of systems programming languages and Rust stood out. Even though its `no-std` support was then-nascent, it was a strongly-typed, memory-safe language with good tooling and a burgeoning ecosystem. I’m personally a huge fan of strongly typed languages, and memory safety is good not just for systems programming, it enables optimizers to do a better job of generating code, plus it makes concurrency less scary. I actually wished for Precursor to have a CPU that had hardware support for tagged pointers and memory capabilities, similar to what was done on CHERI, but after some discussions with the team doing CHERI it was apparent they were very focused on making C better and didn’t have the bandwidth to support Rust (although that may be changing). In the grand scheme of things, C needed CHERI much more than Rust needed CHERI, so that’s a fair prioritization of resources. However, I’m a fan of belt-and-suspenders for security, so I’m still hopeful that someday hardware-enforced fat pointers will make their way into Rust.

That being said, I wasn’t going to go back to the C camp simply to kick the tires on a hardware retrofit that backfills just one poor aspect of C. The glossy brochure for Rust also advertised its ability to prevent bugs before they happened through its strict “borrow checker”. Furthermore, its release philosophy is supposed to avoid what I call “the problem with Python”: your code stops working if you don’t actively keep up with the latest version of the language. Also unlike Python, Rust is not inherently unhygienic, in that the advertised way to install packages is not also the wrong way to install packages. Contrast to Python, where the official docs on packages lead you to add them to system environment, only to be scolded by Python elders with a “but of course you should be using a venv/virtualenv/conda/pipenv/…, everyone knows that”. My experience with Python would have been so much better if this detail was not relegated to Chapter 12 of 16 in the official tutorial. Rust is also supposed to be better than e.g. Node at avoiding the “oops I deleted the Internet” problem when someone unpublishes a popular package, at least if you use fully specified semantic versions for your packages.

In the long term, the philosophy behind Xous is that eventually it should “get good enough”, at which point we should stop futzing with it. I believe it is the mission of engineers to eventually engineer themselves out of a job: systems should get stable and solid enough that it “just works”, with no caveats. Any additional engineering beyond that point only adds bugs or bloat. Rust’s philosophy of “stable is forever” and promising to never break backward-compatibility is very well-aligned from the point of view of getting Xous so polished that I’m no longer needed as an engineer, thus enabling me to spend more of my time and focus supporting users and their applications.

The Rough Edges of Rust

There’s already a plethora of love letters to Rust on the Internet, so I’m going to start by enumerating some of the shortcomings I’ve encountered.

“Line Noise” Syntax

This is a superficial complaint, but I found Rust syntax to be dense, heavy, and difficult to read, like trying to read the output of a UART with line noise:

Trying::to_read::<&'a heavy>(syntax, |like| { this. can_be( maddening ) }).map(|_| ())?;

In more plain terms, the line above does something like invoke a method called “to_read” on the object (actually `struct`) “Trying” with a type annotation of “&heavy” and a lifetime of ‘a with the parameters of “syntax” and a closure taking a generic argument of “like” calling the can_be() method on another instance of a structure named “this” with the parameter “maddening” with any non-error return values mapped to the Rust unit type “()” and errors unwrapped and kicked back up to the caller’s scope.

Deep breath. Surely, I got some of this wrong, but you get the idea of how dense this syntax can be.

And then on top of that you can layer macros and directives which don’t have to follow other Rust syntax rules. For example, if you want to have conditionally compiled code, you use a directive like

#[cfg(all(not(baremetal), any(feature = “hazmat”, feature = “debug_print”)))]

Which says if either the feature “hazmat” or “debug_print” is enabled and you’re not running on bare metal, use the block of code below (and I surely got this wrong too). The most confusing part of about this syntax to me is the use of a single “=” to denote equivalence and not assignment, because, stuff in config directives aren’t Rust code. It’s like a whole separate meta-language with a dictionary of key/value pairs that you query.

I’m not even going to get into the unreadability of Rust macros – even after having written a few Rust macros myself, I have to admit that I feel like they “just barely work” and probably thar be dragons somewhere in them. This isn’t how you’re supposed to feel in a language that bills itself to be reliable. Yes, it is my fault for not being smart enough to parse the language’s syntax, but also, I do have other things to do with my life, like build hardware.

Anyways, this is a superficial complaint. As time passed I eventually got over the learning curve and became more comfortable with it, but it was a hard, steep curve to climb. This is in part because all the Rust documentation is either written in eli5 style (good luck figuring out “feature”s from that example), or you’re greeted with a formal syntax definition (technically, everything you need to know to define a “feature” is in there, but nowhere is it summarized in plain English), and nothing in between.

To be clear, I have a lot of sympathy for how hard it is to write good documentation, so this is not a dig at the people who worked so hard to write so much excellent documentation on the language. I genuinely appreciate the general quality and fecundity of the documentation ecosystem.

Rust just has a steep learning curve in terms of syntax (at least for me).

Rust Is Powerful, but It Is Not Simple

Rust is powerful. I appreciate that it has a standard library which features HashMaps, Vecs, and Threads. These data structures are delicious and addictive. Once we got `std` support in Xous, there was no going back. Coming from a background of C and assembly, Rust’s standard library feels rich and usable — I have read some criticisms that it lacks features, but for my purposes it really hits a sweet spot.

That being said, my addiction to the Rust `std` library has not done any favors in terms of building an auditable code base. One of the criticisms I used to leverage at Linux is like “holy cow, the kernel source includes things like an implementation for red black trees, how is anyone going to audit that”.

Now, having written an OS, I have a deep appreciation for how essential these rich, dynamic data structures are. However, the fact that Xous doesn’t include an implementation of HashMap within its repository doesn’t mean that we are any simpler than Linux: indeed, we have just swept a huge pile of code under the rug; just the `collection`s portion of the standard library represents about 10k+ SLOC at a very high complexity.

So, while Rust’s `std` library allows the Xous code base to focus on being a kernel and not also be its own standard library, from the standpoint of building a minimum attack-surface, “fully-auditable by one human” codebase, I think our reliance on Rust’s `std` library means we fail on that objective, especially so long as we continue to track the latest release of Rust (and I’ll get into why we have to in the next section).

Ideally, at some point, things “settle down” enough that we can stick a fork in it and call it done by well, forking the Rust repo, and saying “this is our attack surface, and we’re not going to change it”. Even then, the Rust `std` repo dwarfs the Xous repo by several multiples in size, and that’s not counting the complexity of the compiler itself.

Rust Isn’t Finished

This next point dovetails into why Rust is not yet suitable for a fully auditable kernel: the language isn’t finished. For example, while we were coding Xous, a thing called `const generic` was introduced. Before this, Rust had no native ability to deal with arrays bigger than 32 elements! This limitation is a bit maddening, and even today there are shortcomings such as the `Default` trait being unable to initialize arrays larger than 32 elements. This friction led us to put limits on many things at 32 elements: for example, when we pass the results of an SSID scan between processes, the structure only reserves space for up to 32 results, because the friction of going to a larger, more generic structure just isn’t worth it. That’s a language-level limitation directly driving a user-facing feature.

Also over the course of writing Xous, things like in-line assembly and workspaces finally reached maturity, which means we need to go back a revisit some unholy things we did to make those critical few lines of initial boot code, written in assembly, integrated into our build system.

I often ask myself “when is the point we’ll get off the Rust release train”, and the answer I think is when they finally make “alloc” no longer a nightly API. At the moment, `no-std` targets have no access to the heap, unless they hop on the “nightly” train, in which case you’re back into the Python-esque nightmare of your code routinely breaking with language releases.

We definitely gave writing an OS in `no-std` + stable a fair shake. The first year of Xous development was all done using `no-std`, at a cost in memory space and complexity. It’s possible to write an OS with nothing but pre-allocated, statically sized data structures, but we had to accommodate the worst-case number of elements in all situations, leading to bloat. Plus, we had to roll a lot of our own core data structures.

About a year ago, that all changed when Xobs ported Rust’s `std` library to Xous. This means we are able to access the heap in stable Rust, but it comes at a price: now Xous is tied to a particular version of Rust, because each version of Rust has its own unique version of `std` packaged with it. This version tie is for a good reason: `std` is where the sausage gets made of turning fundamentally `unsafe` hardware constructions such as memory allocation and thread creation into “safe” Rust structures. (Also fun fact I recently learned: Rust doesn’t have a native allocater for most targets – it simply punts to the native libc `malloc()` and `free()` functions!) In other words, Rust is able to make a strong guarantee about the stable release train not breaking old features in part because of all the loose ends swept into `std`.

I have to keep reminding myself that having `std` doesn’t eliminate the risk of severe security bugs in critical code – it merely shuffles a lot of critical code out of sight, into a standard library. Yes, it is maintained by a talented group of dedicated programmers who are smarter than me, but in the end, we are all only human, and we are all fair targets for software supply chain exploits.

Rust has a clockwork release schedule – every six weeks, it pushes a new version. And because our fork of `std` is tied to a particular version of Rust, it means every six weeks, Xobs has the thankless task of updating our fork and building a new `std` release for it (we’re not a first-class platform in Rust, which means we have to maintain our own `std` library). This means we likewise force all Xous developers to run `rustup update` on their toolchains so we can retain compatibility with the language.

This probably isn’t sustainable. Eventually, we need to lock down the code base, but I don’t have a clear exit strategy for this. Maybe the next point at which we can consider going back to `nostd` is when we can get the stable `alloc` feature, which allows us to have access to the heap again. We could then decouple Xous from the Rust release train, but we’d still need to backfill features such as Vec, HashMap, Thread, and Arc/Mutex/Rc/RefCell/Box constructs that enable Xous to be efficiently coded.

Unfortunately, the `alloc` crate is very hard, and has been in development for many years now. That being said, I really appreciate the transparency of Rust behind the development of this feature, and the hard work and thoughtfulness that is being put into stabilizing this feature.

Rust Has A Limited View of Supply Chain Security

I think this position is summarized well by the installation method recommended by the rustup.rs installation page:

`curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh`

“Hi, run this shell script from a random server on your machine.”

To be fair, you can download the script and inspect it before you run it, which is much better than e.g. the Windows .MSI installers for vscode. However, this practice pervades the entire build ecosystem: a stub of code called `build.rs` is potentially compiled and executed whenever you pull in a new crate from crates.io. This, along with “loose” version pinning (you can specify a version to be, for example, simply “2” which means you’ll grab whatever the latest version published is with a major rev of 2), makes me uneasy about the possibility of software supply chain attacks launched through the crates.io ecosystem.

Crates.io is also subject to a kind of typo-squatting, where it’s hard to determine which crates are “good” or “bad”; some crates that are named exactly what you want turn out to just be old or abandoned early attempts at giving you the functionality you wanted, and the more popular, actively-maintained crates have to take on less intuitive names, sometimes differing by just a character or two from others (to be fair, this is not a problem unique to Rust’s package management system).

There’s also the fact that dependencies are chained – when you pull in one thing from crates.io, you also pull in all of that crate’s subordinate dependencies, along with all their build.rs scripts that will eventually get run on your machine. Thus, it is not sufficient to simply audit the crates explicitly specified within your Cargo.toml file — you must also audit all of the dependent crates for potential supply chain attacks as well.

Fortunately, Rust does allow you to pin a crate at a particular version using the `Cargo.lock` file, and you can fully specify a dependent crate down to the minor revision. We try to mitigate this in Xous by having a policy of publishing our Cargo.lock file and specifying all of our first-order dependent crates to the minor revision. We have also vendored in or forked certain crates that would otherwise grow our dependency tree without much benefit.

That being said, much of our debug and test framework relies on some rather fancy and complicated crates that pull in a huge number of dependencies, and much to my chagrin even when I try to run a build just for our target hardware, the dependent crates for running simulations on the host computer are still pulled in and the build.rs scripts are at least built, if not run.

In response to this, I wrote a small tool called `crate-scraper` which downloads the source package for every source specified in our Cargo.toml file, and stores them locally so we can have a snapshot of the code used to build a Xous release. It also runs a quick “analysis” in that it searches for files called build.rs and collates them into a single file so I can more quickly grep through to look for obvious problems. Of course, manual review isn’t a practical way to detect cleverly disguised malware embedded within the build.rs files, but it at least gives me a sense of the scale of the attack surface we’re dealing with — and it is breathtaking, about 5700 lines of code from various third parties that manipulates files, directories, and environment variables, and runs other programs on my machine every time I do a build.

I’m not sure if there is even a good solution to this problem, but, if you are super-paranoid and your goal is to be able to build trustable firmware, be wary of Rust’s expansive software supply chain attack surface!

You Can’t Reproduce Someone Else’s Rust Build

A final nit I have about Rust is that builds are not reproducible between different computers (they are at least reproducible between builds on the same machine if we disable the embedded timestamp that I put into Xous for $reasons).

I think this is primarily because Rust pulls in the full path to the source code as part of the panic and debug strings that are built into the binary. This has lead to uncomfortable situations where we have had builds that worked on Windows, but failed under Linux, because our path names are very different lengths on the two and it would cause some memory objects to be shifted around in target memory. To be fair, those failures were all due to bugs we had in Xous, which have since been fixed. But, it just doesn’t feel good to know that we’re eventually going to have users who report bugs to us that we can’t reproduce because they have a different path on their build system compared to ours. It’s also a problem for users who want to audit our releases by building their own version and comparing the hashes against ours.

There’s some bugs open with the Rust maintainers to address reproducible builds, but with the number of issues they have to deal with in the language, I am not optimistic that this problem will be resolved anytime soon. Assuming the only driver of the unreproducibility is the inclusion of OS paths in the binary, one fix to this would be to re-configure our build system to run in some sort of a chroot environment or a virtual machine that fixes the paths in a way that almost anyone else could reproduce. I say “almost anyone else” because this fix would be OS-dependent, so we’d be able to get reproducible builds under, for example, Linux, but it would not help Windows users where chroot environments are not a thing.

Where Rust Exceeded Expectations

Despite all the gripes laid out here, I think if I had to do it all over again, Rust would still be a very strong contender for the language I’d use for Xous. I’ve done major projects in C, Python, and Java, and all of them eventually suffer from “creeping technical debt” (there’s probably a software engineer term for this, I just don’t know it). The problem often starts with some data structure that I couldn’t quite get right on the first pass, because I didn’t yet know how the system would come together; so in order to figure out how the system comes together, I’d cobble together some code using a half-baked data structure.

Thus begins the descent into chaos: once I get an idea of how things work, I go back and revise the data structure, but now something breaks elsewhere that was unsuspected and subtle. Maybe it’s an off-by-one problem, or the polarity of a sign seems reversed. Maybe it’s a slight race condition that’s hard to tease out. Nevermind, I can patch over this by changing a <= to a <, or fixing the sign, or adding a lock: I’m still fleshing out the system and getting an idea of the entire structure. Eventually, these little hacks tend to metastasize into a cancer that reaches into every dependent module because the whole reason things even worked was because of the “cheat”; when I go back to excise the hack, I eventually conclude it’s not worth the effort and so the next best option is to burn the whole thing down and rewrite it…but unfortunately, we’re already behind schedule and over budget so the re-write never happens, and the hack lives on.

Rust is a difficult language for authoring code because it makes these “cheats” hard – as long as you have the discipline of not using “unsafe” constructions to make cheats easy. However, really hard does not mean impossible – there were definitely some cheats that got swept under the rug during the construction of Xous.

This is where Rust really exceeded expectations for me. The language’s structure and tooling was very good at hunting down these cheats and refactoring the code base, thus curing the cancer without killing the patient, so to speak. This is the point at which Rust’s very strict typing and borrow checker converts from a productivity liability into a productivity asset.

I liken it to replacing a cable in a complicated bundle of cables that runs across a building. In Rust, it’s guaranteed that every strand of wire in a cable chase, no matter how complicated and awful the bundle becomes, is separable and clearly labeled on both ends. Thus, you can always “pull on one end” and see where the other ends are by changing the type of an element in a structure, or the return type of a method. In less strictly typed languages, you don’t get this property; the cables are allowed to merge and affect each other somewhere inside the cable chase, so you’re left “buzzing out” each cable with manual tests after making a change. Even then, you’re never quite sure if the thing you replaced is going to lead to the coffee maker switching off when someone turns on the bathroom lights.

Here’s a direct example of Rust’s refactoring abilities in action in the context of Xous. I had a problem in the way trust levels are handled inside our graphics subsystem, which I call the GAM (Graphical Abstraction Manager). Each Canvas in the system gets a `u8` assigned to it that is a trust level. When I started writing the GAM, I just knew that I wanted some notion of trustability of a Canvas, so I added the variable, but wasn’t quite sure exactly how it would be used. Months later, the system grew the notion of Contexts with Layouts, which are multi-Canvas constructions that define a particular type of interaction. Now, you can have multiple trust levels associated with a single Context, but I had forgotten about the trust variable I had previously put in the Canvas structure – and added another trust level number to the Context structure as well. You can see where this is going: everything kind of worked as long as I had simple test cases, but as we started to get modals popping up over applications and then menus on top of modals and so forth, crazy behavior started manifesting, because I had confused myself over where the trust values were being stored. Sometimes I was updating the value in the Context, sometimes I was updating the one in the Canvas. It would manifest itself sometimes as an off-by-one bug, other times as a concurrency error.

This was always a skeleton in the closet that bothered me while the GAM grew into a 5k-line monstrosity of code with many moving parts. Finally, I decided something had to be done about it, and I was really not looking forward to it. I was assuming that I messed up something terribly, and this investigation was going to conclude with a rewrite of the whole module.

Fortunately, Rust left me a tiny string to pull on. Clippy, the cheerfully named “linter” built into Rust, was throwing a warning that the trust level variable was not being used at a point where I thought it should be – I was storing it in the Context after it was created, but nobody every referred to it after then. That’s strange – it should be necessary for every redraw of the Context! So, I started by removing the variable, and seeing what broke. This rapidly led me to recall that I was also storing the trust level inside the Canvases within the Context when they were being created, which is why I had this dangling reference. Once I had that clue, I was able to refactor the trust computations to refer only to that one source of ground truth. This also led me to discover other bugs that had been lurking because in fact I was never exercising some code paths that I thought I was using on a routine basis. After just a couple hours of poking around, I had a clear-headed view of how this was all working, and I had refactored the trust computation system with tidy APIs that were simple and easier to understand, without having to toss the entire code base.

This is just one of many positive experiences I’ve had with Rust in maintaining the Xous code base. It’s one of the first times I’ve walked into a big release with my head up and a positive attitude, because for the first time ever, I feel like maybe I have a chance of being able deal with hard bugs in an honest fashion. I’m spending less time making excuses in my head to justify why things were done this way and why we can’t take that pull request, and more time thinking about all the ways things can get better, because I know Clippy has my back.

Caveat Coder

Anyways, that’s a lot of ranting about software for a hardware guy. Software people are quick to remind me that first and foremost, I make circuits and aluminum cases, not code, therefore I have no place ranting about software. They’re right – I actually have no “formal” training to write code “the right way”. When I was in college, I learned Maxwell’s equations, not algorithms. I could never be a professional programmer, because I couldn’t pass even the simplest coding interview. Don’t ask me to write a linked list: I already know that I don’t know how to do it correctly; you don’t need to prove that to me. This is because whenever I find myself writing a linked list (or any other foundational data structure for that matter), I immediately stop myself and question all the life choices that brought me to that point: isn’t this what libraries are for? Do I really need to be re-inventing the wheel? If there is any correlation between doing well in a coding interview and actual coding ability, then you should definitely take my opinions with the grain of salt.

Still, after spending a couple years in the foxhole with Rust and reading countless glowing articles about the language, I felt like maybe a post that shared some critical perspectives about the language would be a refreshing change of pace.

Fixing a Tiny Corner of the Supply Chain

Tuesday, December 14th, 2021

No product gets built without at least one good supply chain war story – especially true in these strange times. Before we get into the details of the story, I feel it’s worth understanding a bit more about the part that caused me so much trouble: what it does, and why it’s so special.

The Part that Could Not Be Found
There’s a bit of magic in virtually every piece of modern electronics involved in the generation of internal timing signals. It’s called an oscillator, and typically it’s implemented with a precisely cut fleck of quartz crystal that “rings” when stimulated, vibrating millions of times per second. The accuracy of the crystal is measured in parts-per-million, such that over a month – about 2.5 million seconds – a run-of-the-mill crystal with 50ppm accuracy might drift by about two minutes. In mechanical terms it’s like producing 1kg (2.2 pound) bags of rice that have precisely no more and no less than one grain of rice compared to each other; or in CS terms it’s about 15 bits of precision (it’s funny how one metric sounds hard, while the other sounds trivial).

One of the many problems with quartz crystals is that they are big. Here’s a photo from wikipedia of the inside of a typical oscillator:


CC BY-SA 4.0 by Binarysequence via Wikipedia

The disk on the left is the crystal itself. Because the frequency of the crystal is directly related to its size, there’s a “physics limit” on how small these things can be. Their large size also imposes a limit on how much power it takes to drive them (it’s a lot). As a result, they tend to be large, and power-hungry – it’s not uncommon for a crystal oscillator to be specified to consume a couple milliamperes of current in normal operation (and yes, there are also chips with built-in oscillator circuits that can drive crystals, which reduces power; but they, too, have to burn the energy to charge and discharge some picofarads of capacitance millions of times per second due to the macroscopic nature of the crystal itself).

A company called SiTime has been quietly disrupting the crystal industry by building MEMS-based silicon resonators that can outperform quartz crystals in almost every way. The part I’m using is the SiT8021, and it’s tiny (1.5×0.8mm), surface-mountable (CSBGA), consumes about 100x less power than the quartz-based competition, and has a comparable frequency stability of 100ppm. Remarkably, despite being better in almost every way, it’s also cheaper – if you can get your hands on it. More on that later.

Whenever something like this comes along, I always like to ask “how come this didn’t happen sooner?”. You usually learn something interesting in exploring that question. In the case of pure-silicon oscillators, they have been around forever, but they are extremely sensitive to temperature and aging. Anyone who has designed analog circuits in silicon are familiar with the problem that basically every circuit element is a “temperature-to-X” converter, where X is the parameter you wish you could control. For example, a run of the mill “ring oscillator” with no special compensation would have an initial frequency accuracy of about 50% – going back to our analogies, it’d be like getting a bag of rice that nominally holds 1kg, but is filled to an actual weight of somewhere between 0.5kg and 1.5kg – and you would get swings of an additional 30% depending upon the ambient temperature. A silicon MEMS oscillator is a bit better than that, but its frequency output would still vary wildly with temperature; orders of magnitude more than the parts-per-million specified for a quartz crystal.

So, how do they turn something so innately terrible into something better-than-quartz? First I took a look at the devices under a microscope, and it’s immediately obvious that it’s actually two chips that have been bonded face-to-face with each other.


Edge-on view of an SiT8021 already mounted on a circuit board.

I deduced that a MEMS oscillator chip is nestled between the balls that bond the chip to the PCB. I did a quick trawl through the patents filed by SiTimes, and I’m guessing the MEMS oscillator chip contains at least two separate oscillators. These oscillators are intentionally different, so that their frequency drift with temperature also have different, but predictable, curves. They can use the relative difference of the frequencies to very precisely measure the absolute temperature of the pair of oscillators by comparing the instantaneous difference between the two frequencies. In other words, they took the exact problem that plagues silicon designs, and turned it into a feature: they built a very precise temperature sensor out of two silicon oscillators.

With the temperature of the oscillators known to exquisite precision, one can now compensate for the temperature effects. That’s what the larger of the two chips (the one directly attached to the solder balls) presumably does. It computes an inverse mapping of temperature vs. frequency, constantly adjusting a PLL driven by one of the two MEMs oscillators, to derive a precise, temperature-stable net frequency. The controller chip presumably also contains a set of eFuses that are burned in the factory (or by the distributor) to calibrate and set the initial frequency of the device. I didn’t do an acid decap of the controller chip, but it’s probably not unreasonable for it to be fabricated in 28nm silicon; at this geometry you could fit an entire RISC-V CPU in there with substantial microcode and effectively “wrap a computer” around the temperature drift problem that plagues silicon designs.

Significantly, the small size of the MEMS resonator compared to a quartz crystal, along with its extremely intimate bonding to the control electronics, means a fundamentally lower limit on the amount of energy required to sustain resonance, which probably goes a long way towards explaining why this circuit is able to reduce active power by so much.

The tiny size of the controller chip means that a typical 300mm wafer will yield about 50,000 chips; going by the “rule of thumb” that a processed wafer is roughly $3k, that puts the price of a raw, untested controller chip at about $0.06. The MEMs device is presumably a bit more expensive, and the bonding process itself can’t be cheap, but at a “street price” of about $0.64 each in 10k quantities, I imagine SiTime is still making good margin. All that being said, a million of these oscillators would fit on about 18 wafers, and the standard “bulk” wafer cassette in a fab holds 25 wafers (and a single fab will pump out about 25,000 – 50,000 wafers a month); so, this is a device that’s clearly ready for mobile-phone scale production.

Despite the production capacity, the unique characteristics of the SiT8021 make it a strong candidate to be designed into mobile phones of all types, so I would likely be competing with companies like Apple and Samsung for my tiny slice of the supply chain.

The Supply Chain War Story
It’s clearly a great part for a low-power mobile device like Precursor, which is why I designed it into the device. Unfortunately, there’s also no real substitute for it. Nobody else makes a MEMS oscillator of comparable quality, and as outlined above, this device is smaller and orders of magnitude lower power than an equivalent quartz crystal. It’s so power-efficient that in many chips it is less power to use this off-chip oscillator, than to use the built-in crystal oscillator to drive a passive crystal. For example, the STM32H7 HSE burns 450uA, whereas the SiT8021 runs at 160uA. To be fair, one also has to drive the pad input capacitance of the STM32, but even with that considered you’re probably around 250uA.

To put it in customer-facing terms, if I were forced to substitute commonly available quartz oscillators for this part, the instant-on standby time of a Precursor device would be cut from a bit over 50 hours down to about 40 hours (standby current would go from 11mA up to 13mA).

If this doesn’t make the part special enough, the fact that it’s an oscillator puts it in a special class with respect to electromagnetic compliance (EMC) regulations. These are the regulations that make sure that radios don’t interfere with each other, and like them or not, countries take them very seriously as trade barriers – by requiring expensive certifications, you’re able to eliminate the competition of small upstarts and cheap import equipment on “radio safety” grounds. Because the quality of radio signals depend directly upon the quality of the oscillator used to derive them, the regulations (quite reasonably) disallow substitutions of oscillators without re-certification. Thus, even if I wanted to take the hit on standby time and substitute the part, I’d have to go through the entire certification process again, at a cost of several thousand dollars and some weeks of additional delay.

Thus this part, along with the FPGA, is probably one of the two parts on the entire BOM that I really could not do without. Of course, I focused a lot on securing the FPGA, because of its high cost and known difficulty to source; but for want of a $0.68 crystal, a $565 product would not be shipped…

The supply chain saga starts when I ordered a couple thousand of these in January 2021, back when it had about a 30 week lead time, giving a delivery sometime in late August 2021. After waiting about 28 weeks, on August 12th, we got an email from our distributor informing us that they had to cancel our entire order for SiT8021s. That’s 28 weeks lost!

The nominal reason given was that the machine used to set the frequency of the chips was broken or otherwise unavailable, and due to supply chain problems it couldn’t be fixed anytime soon. Thus, we had to go to the factory to get the parts. But, in order to order direct from the factory, we had to order 18,000 pieces minimum – over 9x of what I needed. Recall that one wafer yields 58,000 chips, so this isn’t even half a wafer’s worth of oscillators. That being said, 18,000 chips would be about $12,000. This isn’t chump change for a project operating on a fixed budget. It’s expensive enough that I considered recertification of the product to use a different oscillator, if it weren’t for the degradation in standby time.

Panic ensues. We immediately trawl all the white-market distributor channels and buy out all the stock, regardless of the price. Instead of paying our quoted rate of $0.68, we’re paying as much as $1.05 each, but we’re still short about 300 oscillators.

I instruct the buyers to search the gray market channels, and they come back with offers at $5 or $6 for the $0.68 part, with no guarantee of fitness or function. In other words, I could pay 10x of the value of the part and get a box of bricks, and the broker could just disappear into the night with my money.

No deal. I had to do better.

By this time, every distributor was repeating the “18k Minimum Order Quantity (MOQ) with long lead time” offer, and my buyers in China waved the white flag and asked me to intervene. After trawling the Internet for a couple hours, I discover that Element14 right here in Singapore (where I live) claims to be able to deliver small quantities of the oscillator before the end of the year. It seems too good to be true.

I ask my buyers in China to place an order, and they balk; the China office repeats that there is simply no stock. This has happened before, due to trade restrictions and regional differences the inventory in one region may not be orderable in another, so I agree to order the balance of the oscillators with a personal credit card, and consign them directly to the factory. At this point, Element14 is claiming a delivery of 10-12 weeks on the Singapore website: that would just meet the deadline for the start of SMT production at the end of November.

I try to convince myself that I had found the solution to the problem, but something smelled rotten. A month later, I check back into the Element14 website to see the status of the order. The delivery had shifted back almost day-for-day to December. I start to suspect maybe they don’t even carry this part, and it’s just an automated listing to honeypot orders into their system. So, I get on the phone with an Element14 representative, and crazy enough, she can’t even find my order in her system even though I can see the order in my own Element14 account. She tells me this is not uncommon, and if she can’t see it in her system, then the web order will never be filled. I’m like, is there any way you can cancel the order then? She’s like “no, because I can’t see the order, I can’t cancel it.” But also because the representative can’t see the order, it also doesn’t exist, and it will never be filled. She recommends I place the order again.

I’m like…what the living fuck. Now I’m starting to sweat bullets; we’re within a few weeks of production start, and I’m considering ordering 18,000 oscillators and reselling the excess as singles via Crowd Supply in a Hail Mary to recover the costs. The frustrating part is, the cost of 300 parts is small – under $200 – but the lack of these parts blocks the shipment of roughly $170,000 worth of orders. So, I place a couple bets across the board. I go to Newark (Element14, but for the USA) and place an order for 500 units (they also claimed to be able to deliver), and I re-placed the order with Element14 Singapore, but this time I put a Raspberry Pi into the cart with the oscillators, as a “trial balloon” to test if the order was actually in their system. They were able to ship the part of the order with the Raspberry Pi to me almost immediately, so I knew they couldn’t claim to “lose the order” like before – but the SiT8021 parts went from having a definitive delivery date to a “contact us for more information” note – not very useful.

I also noticed that by this time (this is mid-October), Digikey is listing most of the SiT8021 parts for immediately delivery, with the exception of the 12MHz, 1.8V version that I need. Now I’m really starting to sweat – one of the hypothesis pushed back at my by the buyer in China was that there was no demand for this part, so it’s been canceled and that’s why I can’t find it anywhere. If the part’s been canceled, I’m really screwed.

I decide it’s time to reach out to SiTime directly. Through hook and crook, I get in touch with a sales rep, who confirms with me that the 12MHz, 1.8V version is a valid and orderable part number, and I should be able to go to Digikey to purchase it. I inform the sales rep that the Digikey website doesn’t list this part number, to which they reply “that’s strange, we’ll look into it”.

Not content to leave it there, I reach out to Digikey directly. I get connected to one of their technical sales representatives via an on-line chat, and after a bit of conversing, I confirmed that in fact the parts are shipped to Digikey as blanks, and they have a machine that can program the parts on-site. The technical sales rep also confirms the machine can program that exact configuration of the part, but because the part is not listed on the website I have to do a “custom part” quotation.

Aha! Now we are getting somewhere. I reach out to their custom-orders department and request a quotation. A lady responds to me fairly quickly and says she’ll get back to me. About a week passes, no response. I ping the department again, no response.

“Uh-oh.”

I finally do the “unthinkable” in the web age – I pick up the phone, and dial in hoping to reach a real human being to plead my case. I dial the extension for the custom department sales rep. It drops straight to voice mail. I call back again, this time punching the number to draw a lottery ticket for a new sales rep.

Luckily, I hit the jackpot! I got connected with a wonderful lady by the name of Mel who heard out my problem, and immediately took ownership for solving the problem. I could hear her typing queries into her terminal, and hemming and hawing over how strange it is for there to be no order code, but she can still pull up pricing. While I couldn’t look over her shoulder, I could piece together that the issue was a mis-configuration in their internal database. After about 5 minutes of tapping and poking, she informs me that she’ll send a message to their web department and correct the issue. Three days later, my part (along with 3 other missing SKUs) is orderable on Digikey, and a week later I have the 300 missing oscillators delivered directly to the factory – just in time for the start of SMT production.

I wrote Mel a hand-written thank-you card and mailed it to Digikey. I hope she received it, because people like here are a rare breed: she has the experience to quickly figure out how the system breaks, the judgment to send the right instructions to the right groups on how to fix it, and the authority to actually make it happen. And she’s actually still working a customer-facing job, not promoted into a corner office management position where she would never be exposed to a real-world problem like mine.

So, while Mel saved my production run, the comedy of errors still plays on at Element14 and Newark. The “unfindable order” is still lodged in my Element14 account, probably to stay there until the end of time. Newark’s “international” department sent me a note saying there’s been an export compliance issue with the part (since when did jellybean oscillators become subject to ITAR restrictions?!), so I responded to their department to give them more details – but got no response back. I’ve since tried to cancel the order, got no response, and now it just shows a status of “red exclamation mark” on hold and a ship date of Jan 2022. The other Singapore Element14 order that was combined with the Raspberry Pi still shows the ominous “please contact us for delivery” on the ship date, and despite trying to contact them, nobody has responded to inquiries. But hey, Digikey’s Mel has got my back, and production is up and (mostly) running on schedule.

This is just is one of many supply chain war stories I’ve had with this production run, but it is perhaps the one with the most unexpected outcome. I feared that perhaps the issue was intense competition for the parts making them unavailable, but the ground truth turned out to be much more mundane: a misconfigured website. Fortunately, this small corner of the supply chain is now fixed, and now anyone can buy the part that caused me so many sleepless nights.

What’s the Value of Hackable Hardware, Anyway?

Friday, December 11th, 2020

There is plenty of skepticism around the value of hackable products. Significantly, hackability is different from openness: cars are closed-source, yet support vibrant modding communities; gcc is one of the “real OG”s of open source, but few users find it easy to extend or enhance. Is it better to have a garden planted by the most knowledgeable botanists and maintained by experienced gardeners, or an open plot of land maintained by whoever has the interest and time?


Above left: Walled garden of Edzell Castle; above right: Thorncliffe Park community garden.

In the case of hardware products, consumer behavior consistently affirms a strong preference for well-curated gardens. Hardware is hard – not only is it difficult to design and validate, supply chains benefit from economies of scale and predictable user demand. The larger a captive audience, the more up-front money one can invest into developing a better hardware product. However, every decision to optimize comes with inherent trade-offs. For example, anytime symmetry is broken, one must optimize for either a right-handed or a left-handed version.


Above: touching the spot indicated by the red arrow would degrade antenna performance on an iPhone 4. This spot would naturally rest on the palm of a left-handed user. Image adapted from “iPhone 4” by marc.flores, licensed under CC BY 2.0.

Some may recall a decade ago when the iPhone 4 was launched, left-handed people noticed the phone would frequently drop calls. It turned out the iPhone 4 was designed with a critical antenna element that would fail when held naturally by a left-handed person. The late Steve Jobs responded to this problem by telling users to “just avoid holding it that way”. Even if he didn’t mean it, I couldn’t help but feel like he was saying the iPhone 4 was perfect and left-handers like me were just defective humans who should be sent to re-education camps on how to hold things.

Of course, as a hardware engineer, I can also sympathize with why Steve Jobs might have felt this way – clearly, a huge amount of effort and thought went into designing a technical masterpiece that was also of museum-quality construction. It’s frustrating to be told, after spending years and billions of dollars trying to build “the perfect product” that they somehow got it wrong because humans aren’t a homogeneous population. Rumors have it Apple spent tens of millions of dollars building micron-precision production jigs out of injection-molding grade tooling to ensure the iPhone4 was simply perfect in terms of production tolerances; duplicating all of those to make a mirror-image version for left-handers that make up 10% of the market size just made no business sense. It proved to be cheaper and easier, ultimately, to take full refunds or to give out rubber bumpers to the users who requested them.

I do think there is such a thing as “over-designing” a product. For example, contemporary “high concept” smartphone design is minimalist – phone companies have removed headphone jacks, hidden the front camera, and removed physical buttons. There is clearly no place for screws in this world; the love affair of smartphones and adhesives has proven to be … sticky. Adhesives, used in place of screws in modern smartphones, are so aggressive that removing them either requires additional equipment, such as a hot plate and solvents, or simply destroying the outer bezel by breaking the outer case off in pieces and replacing it with an entirely new bezel upon re-assembly. In other words, hacking a modern smartphone necessarily implies the destruction or damage of adhesive-bound parts.

With Precursor, I’m bringing screws back.

Precursor’s screws are unapologetic – I make no attempt to hide them or cover them with bits of tape or rubber inserts. Instead, I’ve sourced custom-made Torx T3 metric screws with a black oxide finish that compliments the overall color scheme of Precursor. Six of them line the front, as a direct invitation for users to remove them and see what’s inside. I’ve already received ample criticism for the decision to show screws as “primitive”, “ugly”, “out of touch with modern trends” — but in the end, I feel the visual clutter of these six screws is a small price to pay for the gain in hackability.

Of course, the anti-screw critics question the value of hackability. Surely, I am sacrificing mass-market appeal to enable a fringe market; if hackability was so important, wouldn’t Apple and Google already have incorporated it into their phones? Wouldn’t we see more good examples of hackability already?

This line of questioning is circular: you can’t get good examples of hacks until you have made hackable products wide-spread. However, the critics are correct, in a way: in order to bootstrap an ecosystem, we’re going to need some good examples of why hackability matters.

In the run-up to crowdfunding Precursor, I was contemplating a good demo hack for Precursor. Fortuitously, a fellow named Matt Campbell opened a GitHub issue requesting a text-to-speech option for blind users. This led me to ask what might be helpful in terms of a physical keyboard design to assist blind people. You can read the thread for yourself, but I’ll highlight that even the blind community itself is divided on whether or not there is such a thing as the “blind ghetto” — an epithet applied by some users who feel that blindness-specific products tend to lag behind modern smartphones, tablets, and laptops. However, given that most modern gadgets struggle to consider the needs of 10% of the population that’s left-handed, I’m readily sympathetic to the notion that gadgets make little to no concession to accommodate the even smaller number of blind users.

Matt was articulate in specifying his preferred design for a pocketable keyboard. He referred me to the “Braille ‘n Speak” (shown above) as an example of an existing braille keyboard. Basically, it takes the six dots that make up braille, and lines them up horizontally into three left and three right sets of buttons, adding a single button in the middle that functions as a space bar. Characters are entered by typing chords that correspond to the patterns of the six dots in the braille alphabet. Not being a braille user myself, I had to look up what the alphabet looked like. I made the guide below based on a snippet from Wikipedia to better understand how such a keyboard might be used.

Ironically, even though Matt had linked me to the picture of the Braille n’ Speak, it still took a while to sink in that a braille variant of Precursor did not need a display. I’m a bit ashamed to admit my first sketches involved trying to cram this set of switches into the existing keyboard area of the Precursor, without first removing the display entirely. I had to overcome my own prejudice about how the world should look and it took me time and empathy to understand this new perspective.

Once I had a better grasp of Matt’s request, I set about designing a customized braille variant. Precursor was designed for this style of hacking: the keyboard is a simple 2-layer PCB that’s cheap and easy to re-design, and the front bezel is also a PCB, which is a bit more expensive to redesign. Fortunately, I was able to amortize setup costs by bundling the braille front bezel variant with another variant that I had to fabricate anyways for the crowdfunding campaign. Beyond that, I also had to come up with some custom key caps to complement the switches.

The major challenge in designing any type of mobile-friendly keyboard is always a trade-off between the hand feel of the switches, versus thinness of the overall design. On one side of the spectrum, full-travel mechanical switches have a wonderful hand feel, but are thicker than a sausage. On the other side of the spectrum, dome switches and printed carbon ink patterns are thinner than a credit card, but can feel mushy and have a limited “sweet spot” — the region of a key switch with optimal tactile feedback and operational force curves. The generally well-regarded Thinkpad keyboards go with a middle-ground solution that’s a few millimeters thick, using a “scissor” mechanism to stabilize the key caps over a silicone dome switch, giving individual keys a bit of travel while ensuring that the “sweet spot” covers the entire key cap. Optimizing key switch mechanisms is hard: some may recall the controversy over Apple’s re-design of the MacBook’s keyboard to use a “butterfly” mechanism, which shaved a couple mm of thickness, but led to lawsuits over a defect where the keyboard allegedly stopped working when small bits of dust or other particles got trapped under it.

Given the short time frame and a shoestring budget, I decided to use an ultra-thin (0.35mm) tactile switch that I could buy off-the-shelf from Digikey and create custom key caps with small dimples to assist users in finding the relatively small sweet spots typical of such switches. I have sincere hopes this is a pretty good final solution; while it lacks a scissor mechanism to spread off-centered force, the simple mechanism meant I didn’t have to stick with a square key cap and could do something more comfortable and ergonomic to focus forces into the sweet spot. At the very least, the mechanism would be no worse than the current mechanism used in Precursor’s existing keyboard for sighted users, which is similarly a dome switch plus a hybrid-polymer key film.

Next, I had to figure out where to place the switches. To assist with this, I printed a 1:1 scale version of the Precursor case, dipped my fingertips in ink, and proceeded to tap on the printout in what felt like a natural fashion.

I then took the resulting ink spots and dimensioned their centers, to place the centroid of each key cap. I also asked my partner, who has smaller hands, to place her fingers over the spots and noted the differences in where her fingers lay to help shape the final key caps for different-sized hands.

Next, using the “master profile” discussed in the previous post on Precursor’s mechanical design, I translated this into a sketch to help create a set of key caps based on splines that matched the natural angle of fingers.

Above, you can see an early sketch of the key caps, showing the initial shape with dimples for centering the fingers.

Before moving ahead and spending a few hundred dollars to build a functional prototype, I decided to get Matt’s feedback on the design. We submitted the design to Shapeways and had a 3D print sent to Matt, which he graciously paid for. After receiving the plastic dummy, his feedback was that the center space bar should be a single key, instead of two separate keys, so I merged the two separate key caps of the space bar together into a single piece, while retaining two separate switches wired in parallel under the space bar. I felt this was a reasonable compromise that would allow for a “sweet spot” that serviced lefties as well as righties.

I then re-designed the keyboard PCB, which was a fairly simple task, because the braille keyboard consists of only eight switches. I just had to be careful to pick row/column pairs that would not conflict during chording and be sure to include the row/column pairs necessary to turn Precursor on after being put to sleep. I also redesigned the bezel; eliminating the display actually makes manufacturing a little bit easier because it also removes a beveling step in the manufacturing process. I kept the RF antenna in exactly the same location, as its design was already well-characterized and it takes a lot of effort to tune the antenna. Finally, I decided to manufacture the key switches out of aluminum. The switches have a lot of fine features and I needed a stiff material that could translate off-target force to the key switches to enlarge the sweet spot as much as possible.


Above: The prototype of Precursor with braille keyboard.

About three weeks later, all the parts for the braille keyboard had arrived. I decided to use purple anodization for the key switches which, combined with the organic key shapes, gives the final design a bit of a Wakanda-esque “Black Panther” aesthetic, especially when mounted in a brass case. The key switch feel is about in line with what I imagined, with the caveat that one of the switches feels a little less nice than the rest, but I think that’s due to a bad solder job on the switch itself. I haven’t had a chance to trace it down because…well, I’ve had to write a bunch of posts like this to fund Precursor. I have also been working with Xobs to refactor Xous in hopes of putting together enough code to send Matt a prototype he can evaluate without having to write gobs of embedded hardware driver code himself.

Above is a quick photo showing the alignment of fingers to keys. Naturally, it’s perfect for my hand because it was designed around it. I’m looking forward to hearing Matt’s opinion about the feel of the keys.

Above is a photo of the custom parts for the braille keyboard. At the top, you can see the custom bezel with key caps and the RF antenna matching circuitry on the top right. On the bottom, you can see the custom keyboard PCB mounted onto a Precursor motherboard. The keyboard PCB is mostly blank and, because of the small number of keys and the flexibility of the FPGA, there’s an option to mount more peripherals on the PCB.

Despite not being yet finalized, I hope this exercise is sufficient to demonstrate the potential value of hackable products. The original design scope for Precursor (née Betrusted) did not explicitly include a braille keyboard option, but thanks to modular design principles and the use of accessible construction materials, I was able to produce a prototype in about a month that has a similar fit and finish as the mainstream product.

As long as humans choose to embrace diversity, I think hackability will have value. A notional “perfect” product implies there’s such a thing as a “perfect” user. However, in reality, even the simple conundrum of left- or right-handedness challenges the existence of a singular “perfect” product for all of humanity. Fortunately, accommodating the wonderfully diverse, quirky, and interesting range of humanity implicates just a few simple engineering principles, such as embracing screws over adhesives, openness, and modularity. That we can’t hack our products isn’t a limitation of physics and engineering. Precursor demonstrates one can build something simultaneously secure and hackable, while being compact and pocketable. This suggests the relative lack of hackable products on the market isn’t a fundamental limitation. Maybe we just need a little more imagination, maybe we need to be a little more open-minded about aesthetics, and maybe companies need to be willing to take brave steps toward openness and inclusivity.

For Apple, true “courage to move on and do something new that betters all of us” was to remove the headphone jack, which resulted in locking users deeper into a walled-garden ecosystem. For hackers like myself, our “courage” is facing blunt criticisms for making “ugly” products with screws in order to facilitate mods, such as braille keyboards, in order to expand the definition of “all of us” beyond a set of privileged, “perfect” users.

I hope this braille keyboard is just the first example of many mods for Precursor that adapt the product for unique end-users, bucking the trend of gaslighting users to mold their behavior and preferences to fit the product. If you’ve got an itch to develop your own yet-to-be-seen feature in a mobile device, please visit our crowdfunding campaign page to learn more about Precursor. We’re close to being funded, but we’ve only a few days left in the campaign. After the campaign concludes on December 15th, the limited edition will no longer be available, and pricing of the standard model goes up. If you like what you see, please consider helping us to bring Precursor to life!

On Liberating My Smartwatch From Cloud Services

Saturday, July 25th, 2020

I’ve often said that if we convince ourselves that technology is magic, we risk becoming hostages to it. Just recently, I had a brush with this fate, but happily, I was saved by open source.

At the time of writing, Garmin is suffering from a massive ransomware attack. I also happen to be a user of the Garmin Instinct watch. I’m very happy with it, and in many ways, it’s magical how much capability is packed into such a tiny package.

I also happen to have a hobby of paddling the outrigger canoe:

I consider the GPS watch to be an indispensable piece of safety gear, especially for the boat’s steer, because it’s hard to judge your water speed when you’re more than a few hundred meters from land. If you get stuck in a bad current, without situational awareness you could end up swept out to sea or worse.

The water currents around Singapore can be extreme. When the tides change, the South China Sea eventually finds its way to the Andaman Sea through the Singapore Strait, causing treacherous flows of current that shift over time. Thus, after every paddle, I upload my GPS data to the Garmin Connect cloud and review the route, in part to note dangerous changes in the ebb-and-flow patterns of currents.

While it’s a clear and present privacy risk to upload such data to the Garmin cloud, we’re all familiar with the trade-off: there’s only 24 hours in the day to worry about things, and the service just worked so well.

Until yesterday.

We had just wrapped up a paddle with particularly unusual currents, and my paddling partner wanted to know our speeds at a few of the tricky spots. I went to retrieve the data and…well, I found out that Garmin was under attack.

Garmin was being held hostage, and transitively, so was access to my paddling data: a small facet of my life had become a hostage to technology.

A bunch of my paddling friends recommended I try Strava. The good news is Garmin allows data files to be retrieved off of the Instinct watch, for upload to third-party services. All you have to do is plug the watch into a regular USB port, and it shows up as a mass storage device.

The bad news is as I tried to create an account on Strava, all sorts of warning bells went off. The website is full of dark patterns, and when I clicked to deny Strava access to my health-related data, I was met with this tricky series dialog boxes:

Click “Decline”…

Click “Deny Permission”…

Click “OK”…

Three clicks to opt out, and if I wasn’t paying attention and just kept clicking the bottom box, I would have opted-in by accident. After this, I was greeted by a creepy list of people to follow (how do they know so much about me from just an email?), and then there’s a tricky dialog box that, if answered incorrectly, routes you to a spot to enter credit card information as part of your “free trial”.

Since Garmin at least made money by selling me a $200+ piece of hardware, collecting my health data is just icing on the cake; for Strava, my health data is the cake. It’s pretty clear to me that Strava made a pitch to its investors that they’ll make fat returns by monetizing my private data, including my health information.

This is a hard no for me. Instead of liberating myself from a hostage situation, going from Garmin to Strava would be like stepping out of the frying pan and directly into the fire.

So, even though this was a busy afternoon … I’m scheduled to paddle again the day after tomorrow, and it would be great to have my boat speed analytics before then. Plus, I was sufficiently miffed by the Strava experience that I couldn’t help but start searching around to see if I couldn’t cobble together my own privacy-protecting alternative.

I was very pleased to discovered an open-source utility called gpsbabel (thank you gpsbabel! I donated!) that can unpack Garmin’s semi-(?)proprietary “.FIT” file format into the interoperable “.GPX” format. From there, I was able to cobble together bits and pieces of XML parsing code and merge it with OpenStreetMaps via the Folium API to create custom maps of my data.

Even with getting “lost” on a detour of trying to use the Google Maps API that left an awful “for development only” watermark on all my map tiles, this only took an evening — it wasn’t the best possible use of my time all things considered, but it was mostly a matter of finding the right open-source pieces and gluing them together with Python (fwiw, Python is a great glue, but a terrible structural material. Do not build skyscrapers out of Python). The code quality is pretty crap, but Python allows that, and it gets the job done. Given those caveats, one could use it as a starting point for something better.

Now that I have full control over my data, I’m able to visualize it in ways that make sense to me. For example, I’ve plotted my speed as a heat map map over the course, with circles proportional to the speed at that moment, and a hover-text that shows my instantaneous speed and heart rate:

It’s exactly the data I need, in the format that I want; no more, and no less. Plus, the output is a single html file that I can share directly with nothing more than a simple link. No analytics, no cookies. Just the data I’ve chosen to share with you.

Here’s a snippet of the code that I use to plot the map data:

Like I said, not the best quality code, but it works, and it was quick to write.

Even better yet, I’m no longer uploading my position or fitness data to the cloud — there is a certain intangible satisfaction in “going dark” for yet another surveillance leakage point in my life, without any compromise in quality or convenience.

It’s also an interesting meta-story about how healthy and vibrant the open-source ecosystem is today. When the Garmin cloud fell, I was able to replace the most important functions of it in just an afternoon by cutting and pasting together various open source frameworks.

The point of open source is not to ritualistically compile our stuff from source. It’s the awareness that technology is not magic: that there is a trail of breadcrumbs any of us could follow to liberate our digital lives in case of a potential hostage situation. Should we so desire, open source empowers us to create and run our own essential tools and services.

Edits: added details on how to take data off the watch, and noted the watch’s price.