Archive for the ‘precursor’ Category

Fully Oxidizing `ring`: Creating a Pure Rust TLS Stack Based on `rustls` + `ring`

Friday, September 16th, 2022

I really want to understand all the software that runs on my secure devices.

It’s a bit of a quixotic quest, but so far we’ve made pretty good progress towards this goal: I’ve been helping to write the Xous OS from the ground up in pure Rust – from the bootloader to the apps. Xous now has facilities like secure storage, a GUI toolkit, basic networking, and a password vault application that can handle U2F/FIDO, TOTP, and plaintext passwords.

One of the biggest challenges has been keeping our SBOM (software bill of materials) as small as possible. I consider components of the SBOM to be part of our threat model, so we very selectively re-write crates and libraries that are too bloated. This trades off the risk of introducing new bugs in our hand-rolled code versus the risk of latent, difficult-to-discover bugs buried in more popular but bloated libraries. A side benefit of this discipline is that to this day, Xous builds on multiple platforms with nothing more than a default Rust compiler – no other tooling necessary. It does mean we’re putting a lot of trust in the intractably complicated `rustc` codebase, but better than also including, for example, `gcc`, `nasm`, and `perl` codebases as security-critical SBOM components.

Unfortunately, more advanced networking based on TLS is a huge challenge. This is because the “go-to” Rust library for TLS, `rustls`, uses `ring` for its cryptography. `ring` is in large part an FFI (foreign function interface) wrapper around a whole lot of assembly and C code that is very platform specific and lifted out of BoringSSL. And it requires `gcc`, `nasm`, and `perl` to build, pulling all these complicated tools into our SBOM.

Notwithstanding our bespoke concerns, `ring` turns out to be the right solution for probably 90%+ of the deployments by CPU core count. It’s based on the highly-regarded, well-maintained and well-vetted BoringSSL codebase (“never roll your own crypto”!), and because of all the assembly and C, it is high performance. Secure, high-performance code, wrapped in Rust. What else could you ask for when writing code that potentially runs on some of the biggest cloud services on the Internet? I definitely can’t argue with the logic of the maintainers – in Open Source, sustainability often requires catering to deep-pocketed patrons.

The problem, of course, is that Open Source includes The Bazaar, with a huge diversity of architectures. The problem is well-stated in this comment from a RedHat maintainer:

…I’m not really speaking as a member of the Packaging Committee here, but as the person who is primary maintainer for 2000+ packages for Rust crates.

In Fedora Linux, our supported architectures are x86_64, i686, aarch64, powerpc64le, s390x, and, up to Fedora 36, armv7 (will no longer supported starting with Fedora 37). By default, all packages are built on all architectures, and architecture support is opt-out instead of opt-in. […]

On the other hand, this also makes it rather painful to deal with Rust crates which only have limited architecture support: Builds of packages for the affected crates and every other package of a Rust crate that depends on them need to opt-out of building on, in this case, powerpc64le and s390x architectures. This is manageable for the 2-3 packages that we have which depend on ring, but right now, I’m in the process of actually removing optional features that need rustls where I can, because that support is unused and hard to support.

However, the problem will get much worse once widely-used crates, like hyper (via h3 and quinn) start adding a (non-optional) dependency on rustls / ring. At that point, it would probably be easier to stop building Rust crates on the two unsupported architectures completely – but we cannot do that, because some new distribution-critical components have been introduced, which were either written from scratch in Rust, or were ported from C or Python to Rust, and many of them are network stack related, with many of them using hyper.

Long story short, if Redhat/Fedora can’t convince `ring` to support their needs, then the prognosis for getting our niche RISC-V + Xous combo supported in `ring` does not look good, which would mean that `rustls`, in turn, is not viable for Xous.

Fortunately, Ellen Poe (ellenhp) reached out to me in response to a post I made back in July, and informed me that she had introduced a patch which adds RISC-V support for ESP32 targets to `ring`, and that this is now being maintained by the community as `ring-compat`. Her community graciously tried another go at submitting a pull request to get this patch mainlined, but it seems to not have made much progress on being accepted.

At this point, the following options remained:

  • Use WolfSSL with FFI bindings, through the wolfssl-sys crate.
  • Write our own crappy pure-Rust TLS implementation
  • Patch over all the `ring` FFI code with pure Rust versions

WolfSSL is appealing as it is a well-supported TLS implementation precisely targeted toward light-weight clients that fit our CPU profile: I was confident it could meet our space and performance metrics if we could only figure out how to integrate the package. Unfortunately, it is both license and language incompatible with Xous, which would require turning it into a stand-alone binary for integration. This also reduced efficiency of the code, because we would have to wrap every SSL operation into an inter-process call, as the WolfSSL code would be sandboxed into its own virtual memory space. Furthermore, it introduces a C compiler into our SBOM, something we had endeavoured to avoid from the very beginning.

Writing our own crappy TLS implementation is just a patently bad idea for so many reasons, but, when doing a clean-sheet architecture like ours, all options have to stay on the table.

This left us with one clear path: trying to patch over the `ring` FFI code with pure Rust versions.

The first waypoint on this journey was to figure out how `ring-compat` managed to get RISC-V support into `ring`. It turns out their trick only works for `ring` version 0.17.0 – which is an unreleased, as-of-yet still in development version.

Unfortunately, `rustls` depends on `ring` version 0.16.20; `ring` version 0.16.20 uses C code derived from BoringSSL that seems to be hand-coded, but carefully reviewed. So, even if we could get `ring-compat` to work for our platform, it still would not work with `rustls`, because 0.17.0 != 0.16.20.

Foiled!

…or are we?

I took a closer look at the major differences between `ring` 0.17.0 and 0.16.20. There were enough API-level differences that I would have to fork `rustls` to use `ring` 0.17.0.

However, if I pushed one layer deeper, within `ring` itself, one of the biggest changes is that ring’s “fipsmodule” code changes from the original, hand-coded version, to a machine-generated version that is derived from ciphers from the fiat-crypto project (NB: “Fiat Crypto” has nothing to do with cryptocurrency, and they’ve been at it for about as long as Bitcoin has been in existence. As they say, “crypto means cryptography”: fiat cryptography utilizes formal methods to create cryptographic ciphers that are guaranteed to be correct. While provably correct ciphers are incredibly important and have a huge positive benefit, they don’t have a “get rich quick” story attached to them and thus they have been on the losing end of the publicity-namespace battle for the words “fiat” and “crypto”). Because their code is machine-generated from formal proofs, they can more easily support a wide variety of back-ends; in particular, in 0.17.0, there was a vanilla C version of the code made available for every architecture, which was key to enabling targets such as WASM and RISC-V.

This was great news for me. I proceeded to isolate the fipsmodule changes and layer them into a 0.16.20 base (with Ellen’s patch applied); this was straightforward in part because cryptography APIs have very little reason to change (and in fact, changing them can have disastrous unintended consequences).

Now, I had a `rustls` API-compatible version of `ring` that also uses machine-generated, formally verified pure C code (that is: no more bespoke assembly targets!) with a number of pathways to achieve a pure Rust translation.

Perhaps the most “correct” method would have been to learn the entire Fiat Crypto framework and generate Rust back-ends from scratch, but that does not address the thin layer of remnant C code in `ring` still required to glue everything together.

Instead, Xobs suggested that we use `c2rust` to translate the existing C code into Rust. I was initially skeptical: transpilation is a very tricky proposition; but Xobs whipped together a framework in an afternoon that could at least drive the scripts and get us to a structure that we could rapidly iterate around. The transpiled code generated literally thousands of warnings, but because we’re transpiling machine-generated code, the warning mechanisms were very predictable and easy to patch using various regex substitutions.

Over the next couple of days, I kept plucking away at the warnings emitted by `rustc`, writing fix-up patches that could be automatically applied to the generated Rust code through a Python script, until I had a transpilation script that could take the original C code and spit out warning-free Rust code that integrates seamlessly into `ring`. The trickiest part of the whole process was convincing `c2rust`, which was running on a 64-bit x86 host, to generate 32-bit code; initially all our TLS tests were failing because the bignum arithmetic assumed a 64-bit target. But once I figured out that the `-m32` flag was needed in the C options, everything basically just worked! (hurray for `rustc`’s incredibly meticulous compiler warnings!)

The upshot is now we have a fork of `ring` in `ring-xous` that is both API-compatible with the current `rustls` version, and uses pure Rust, so we can compile TLS for Xous without need of gcc, clang, nasm, or perl.

But Is it Constant Time?

One note of caution is that the cryptographic primitives used in TLS are riddled with tricky timing side channels that can lead to the disclosure of private keys and session keys. The good news is that a manual inspection of the transpiled code reveals that most of the constant-time tricks made it through the transpilation process cleanly, assuming that I interpreted the barrier instruction correctly as the Rust `compiler_fence` primitive. Just to be sure, I built a low-overhead, cycle-accurate hardware profiling framework called perfcounter. With about 2 cycles of overhead, I’m able to snapshot a timestamp that can be used to calculate the runtime of any API call.

Inspired by DJB’s Cache-timing attacks on AES paper, I created a graphical representation of the runtimes of both our hardware AES block (which uses a hard-wired S-box for lookups, and is “very” constant-time) and the transpiled `ring` AES code (which uses program code that can leak key-dependent timing information due to variations in execution speed) to convince myself that the constant-time properties made it through the transpilation process.

Each graphic above shows a plot of runtime versus 256 keys (horizontal axis) versus 128 data values (vertical axis) (similar to figure 8.1 in the above-cited paper). In the top row, brightness corresponds to runtime; the bright spots correspond to periodic OS interrupts that hit in the middle of the AES processing routine. These bright spots are not correlated to the AES computation, and would average out over multiple runs. The next lower row is the exact same image, but with a random color palette, so that small differences in runtime are accentuated. Underneath the upper 2×2 grid of images is another 2×2 grid that corresponds to the same parameters, but averaged over 8 runs.

Here we can see that for the AES with hardware S-boxes, there is a tiny bit of texture, which represents a variability of about ±20 CPU cycles out of a typical time of 4168 cycles to encrypt a block; this variability is not strongly correlated with key or data bit patterns. For AES with transpiled ring code, we see a lot more texture, representing about ±500 cycles variability out of a typical time of 12,446 cycles to encrypt a block. It’s not as constant time as the hardware S-boxes, but more importantly the variance also does not seem to be strongly correlated with a particular key or data pattern over multiple runs.

Above is a histogram of the same data sets; on the left are the hardware S-boxes, and the right is the software S-box used in the `ring` transpilation; and across the top are results from a single run, and across the bottom are the average of 8 runs. Here we can see how on a single run, the data tends to bin into a couple of bands, which I interpret as timing differences based upon how “warm” the cache is (in particular, the I-cache). The banding patterns are easily disturbed: they do not replicate well from run-to-run, they tend to “average out” over more runs, and they only manifest when the profiling is very carefully instrumented (for example, introducing some debug counters in the profiling routines disrupts the banding pattern). I interpret this as an indicator that the banding patterns are more an artifact of external influences on the runtime measurement, rather than a pattern exploitable in the AES code itself.

More work is necessary to thoroughly characterize this, but it’s good enough for a first cut; and this points to perhaps optimizing `ring-xous` to use our hardware AES block for both better performance and more robust constant-time properties, should we be sticking with this for the long haul.

Given that Precursor is primarily a client and not a server for TLS, leakage of the session key is probably the biggest concern, so I made checking the AES implementation a priority. However, I also have reason to believe that the ECDSA and RSA implementation’s constant time hardening should have also made it through the transpilation process.

That being said, I’d welcome help from anyone who can recommend a robust and succinct way to test for constant time ECDSA and/or RSA operation. Our processor is fairly slow, so at 100MHz simply generating gobs of random keys and signing them may not give us enough coverage to gain confidence in face of some of the very targeted timing attacks that exist against the algorithm. Another alternative could be to pluck out every routine annotated with “constant time” in the source code and benchmark them; it’s a thing we could do but first I’m still not sure this would encompass everything we should be worried about, and second it would be a lot of effort given the number of routines with this annotation. The ideal situation would be a Wycheproof-style set of test vectors for constant time validation, but unfortunately the Wycheproof docs simply say “TBD” under Timing Attacks for DSA.

Summary

`ring-xous` is a fork of `ring` that is compatible with `rustls` (that is, it uses the 0.16.20 API), and is pure Rust. I am also optimistic that our transpilation technique preserved many of the constant-time properties, so while it may not be the most performant implementation, it should at least be usable; but I would welcome the review and input of someone who knows much more about constant-time code to confirm my hunch.

We’re able to use it as a drop-in replacement for `ring`, giving us TLS on Xous via `rustls` with a simple `Cargo.toml` patch in our workspace:

[patch.crates-io.ring]
git="https://github.com/betrusted-io/ring-xous"
branch="0.16.20-cleanup"

We’ve also confirmed this works with the `tungstenite` websockets framework for Rust, paving the way towards implementing higher-level secure messaging protocols.

This leads to the obvious question of “What now?” — we’ve got this fork of `ring`, will we maintain it? Will we try to get things upstreamed? I think the idea is to maintain a fork for now, and to drop it once something better comes along. At the very least, this particular fork will be deprecated once `ring` reaches full 0.17.0 and `rustls` is updated to use this new version of `ring`. So for now, this is a best-effort port for the time being that is good enough to get us moving again on application development. If you think this fork can also help your project get un-stuck, you may be able to get `ring-xous` to work with your OS/arch with some minor tweaks of the `cfg` directives sprinkled throughout; feel free to submit a PR if you’d like to share your tweaks with others!

The Plausibly Deniable DataBase (PDDB): It’s Real Now!

Thursday, July 28th, 2022

Earlier I described the Plausibly Deniable DataBase (PDDB). It’s a filesystem (like FAT or ext4), combined with plausibly deniable full disk encryption (similar to LUKS or VeraCrypt) in a “batteries included” fashion. Plausible deniability aims to make it difficult to prove “beyond a reasonable doubt” that additional secrets exist on the disk, even in the face of forensic evidence.

Since then, I’ve implemented, deployed, and documented the PDDB. Perhaps of most interest for the general reader is the extensive documentation now available in the Xous Book. Here you can find discussions about the core data structures, key derivations, native & std APIs, testing, backups, and issues affecting security and deniability.

Rust: A Critical Retrospective

Thursday, May 19th, 2022

Since I was unable to travel for a couple of years during the pandemic, I decided to take my new-found time and really lean into Rust. After writing over 100k lines of Rust code, I think I am starting to get a feel for the language and like every cranky engineer I have developed opinions and because this is the Internet I’m going to share them.

The reason I learned Rust was to flesh out parts of the Xous OS written by Xobs. Xous is a microkernel message-passing OS written in pure Rust. Its closest relative is probably QNX. Xous is written for lightweight (IoT/embedded scale) security-first platforms like Precursor that support an MMU for hardware-enforced, page-level memory protection.

In the past year, we’ve managed to add a lot of features to the OS: networking (TCP/UDP/DNS), middleware graphics abstractions for modals and multi-lingual text, storage (in the form of an encrypted, plausibly deniable database called the PDDB), trusted boot, and a key management library with self-provisioning and sealing properties.

One of the reasons why we decided to write our own OS instead of using an existing implementation such as SeL4, Tock, QNX, or Linux, was we wanted to really understand what every line of code was doing in our device. For Linux in particular, its source code base is so huge and so dynamic that even though it is open source, you can’t possibly audit every line in the kernel. Code changes are happening at a pace faster than any individual can audit. Thus, in addition to being home-grown, Xous is also very narrowly scoped to support just our platform, to keep as much unnecessary complexity out of the kernel as possible.

Being narrowly scoped means we could also take full advantage of having our CPU run in an FPGA. Thus, Xous targets an unusual RV32-IMAC configuration: one with an MMU + AES extensions. It’s 2022 after all, and transistors are cheap: why don’t all our microcontrollers feature page-level memory protection like their desktop counterparts? Being an FPGA also means we have the ability to fix API bugs at the hardware level, leaving the kernel more streamlined and simplified. This was especially relevant in working through abstraction-busting processes like suspend and resume from RAM. But that’s all for another post: this one is about Rust itself, and how it served as a systems programming language for Xous.

Rust: What Was Sold To Me

Back when we started Xous, we had a look at a broad number of systems programming languages and Rust stood out. Even though its `no-std` support was then-nascent, it was a strongly-typed, memory-safe language with good tooling and a burgeoning ecosystem. I’m personally a huge fan of strongly typed languages, and memory safety is good not just for systems programming, it enables optimizers to do a better job of generating code, plus it makes concurrency less scary. I actually wished for Precursor to have a CPU that had hardware support for tagged pointers and memory capabilities, similar to what was done on CHERI, but after some discussions with the team doing CHERI it was apparent they were very focused on making C better and didn’t have the bandwidth to support Rust (although that may be changing). In the grand scheme of things, C needed CHERI much more than Rust needed CHERI, so that’s a fair prioritization of resources. However, I’m a fan of belt-and-suspenders for security, so I’m still hopeful that someday hardware-enforced fat pointers will make their way into Rust.

That being said, I wasn’t going to go back to the C camp simply to kick the tires on a hardware retrofit that backfills just one poor aspect of C. The glossy brochure for Rust also advertised its ability to prevent bugs before they happened through its strict “borrow checker”. Furthermore, its release philosophy is supposed to avoid what I call “the problem with Python”: your code stops working if you don’t actively keep up with the latest version of the language. Also unlike Python, Rust is not inherently unhygienic, in that the advertised way to install packages is not also the wrong way to install packages. Contrast to Python, where the official docs on packages lead you to add them to system environment, only to be scolded by Python elders with a “but of course you should be using a venv/virtualenv/conda/pipenv/…, everyone knows that”. My experience with Python would have been so much better if this detail was not relegated to Chapter 12 of 16 in the official tutorial. Rust is also supposed to be better than e.g. Node at avoiding the “oops I deleted the Internet” problem when someone unpublishes a popular package, at least if you use fully specified semantic versions for your packages.

In the long term, the philosophy behind Xous is that eventually it should “get good enough”, at which point we should stop futzing with it. I believe it is the mission of engineers to eventually engineer themselves out of a job: systems should get stable and solid enough that it “just works”, with no caveats. Any additional engineering beyond that point only adds bugs or bloat. Rust’s philosophy of “stable is forever” and promising to never break backward-compatibility is very well-aligned from the point of view of getting Xous so polished that I’m no longer needed as an engineer, thus enabling me to spend more of my time and focus supporting users and their applications.

The Rough Edges of Rust

There’s already a plethora of love letters to Rust on the Internet, so I’m going to start by enumerating some of the shortcomings I’ve encountered.

“Line Noise” Syntax

This is a superficial complaint, but I found Rust syntax to be dense, heavy, and difficult to read, like trying to read the output of a UART with line noise:

Trying::to_read::<&'a heavy>(syntax, |like| { this. can_be( maddening ) }).map(|_| ())?;

In more plain terms, the line above does something like invoke a method called “to_read” on the object (actually `struct`) “Trying” with a type annotation of “&heavy” and a lifetime of ‘a with the parameters of “syntax” and a closure taking a generic argument of “like” calling the can_be() method on another instance of a structure named “this” with the parameter “maddening” with any non-error return values mapped to the Rust unit type “()” and errors unwrapped and kicked back up to the caller’s scope.

Deep breath. Surely, I got some of this wrong, but you get the idea of how dense this syntax can be.

And then on top of that you can layer macros and directives which don’t have to follow other Rust syntax rules. For example, if you want to have conditionally compiled code, you use a directive like

#[cfg(all(not(baremetal), any(feature = “hazmat”, feature = “debug_print”)))]

Which says if either the feature “hazmat” or “debug_print” is enabled and you’re not running on bare metal, use the block of code below (and I surely got this wrong too). The most confusing part of about this syntax to me is the use of a single “=” to denote equivalence and not assignment, because, stuff in config directives aren’t Rust code. It’s like a whole separate meta-language with a dictionary of key/value pairs that you query.

I’m not even going to get into the unreadability of Rust macros – even after having written a few Rust macros myself, I have to admit that I feel like they “just barely work” and probably thar be dragons somewhere in them. This isn’t how you’re supposed to feel in a language that bills itself to be reliable. Yes, it is my fault for not being smart enough to parse the language’s syntax, but also, I do have other things to do with my life, like build hardware.

Anyways, this is a superficial complaint. As time passed I eventually got over the learning curve and became more comfortable with it, but it was a hard, steep curve to climb. This is in part because all the Rust documentation is either written in eli5 style (good luck figuring out “feature”s from that example), or you’re greeted with a formal syntax definition (technically, everything you need to know to define a “feature” is in there, but nowhere is it summarized in plain English), and nothing in between.

To be clear, I have a lot of sympathy for how hard it is to write good documentation, so this is not a dig at the people who worked so hard to write so much excellent documentation on the language. I genuinely appreciate the general quality and fecundity of the documentation ecosystem.

Rust just has a steep learning curve in terms of syntax (at least for me).

Rust Is Powerful, but It Is Not Simple

Rust is powerful. I appreciate that it has a standard library which features HashMaps, Vecs, and Threads. These data structures are delicious and addictive. Once we got `std` support in Xous, there was no going back. Coming from a background of C and assembly, Rust’s standard library feels rich and usable — I have read some criticisms that it lacks features, but for my purposes it really hits a sweet spot.

That being said, my addiction to the Rust `std` library has not done any favors in terms of building an auditable code base. One of the criticisms I used to leverage at Linux is like “holy cow, the kernel source includes things like an implementation for red black trees, how is anyone going to audit that”.

Now, having written an OS, I have a deep appreciation for how essential these rich, dynamic data structures are. However, the fact that Xous doesn’t include an implementation of HashMap within its repository doesn’t mean that we are any simpler than Linux: indeed, we have just swept a huge pile of code under the rug; just the `collection`s portion of the standard library represents about 10k+ SLOC at a very high complexity.

So, while Rust’s `std` library allows the Xous code base to focus on being a kernel and not also be its own standard library, from the standpoint of building a minimum attack-surface, “fully-auditable by one human” codebase, I think our reliance on Rust’s `std` library means we fail on that objective, especially so long as we continue to track the latest release of Rust (and I’ll get into why we have to in the next section).

Ideally, at some point, things “settle down” enough that we can stick a fork in it and call it done by well, forking the Rust repo, and saying “this is our attack surface, and we’re not going to change it”. Even then, the Rust `std` repo dwarfs the Xous repo by several multiples in size, and that’s not counting the complexity of the compiler itself.

Rust Isn’t Finished

This next point dovetails into why Rust is not yet suitable for a fully auditable kernel: the language isn’t finished. For example, while we were coding Xous, a thing called `const generic` was introduced. Before this, Rust had no native ability to deal with arrays bigger than 32 elements! This limitation is a bit maddening, and even today there are shortcomings such as the `Default` trait being unable to initialize arrays larger than 32 elements. This friction led us to put limits on many things at 32 elements: for example, when we pass the results of an SSID scan between processes, the structure only reserves space for up to 32 results, because the friction of going to a larger, more generic structure just isn’t worth it. That’s a language-level limitation directly driving a user-facing feature.

Also over the course of writing Xous, things like in-line assembly and workspaces finally reached maturity, which means we need to go back a revisit some unholy things we did to make those critical few lines of initial boot code, written in assembly, integrated into our build system.

I often ask myself “when is the point we’ll get off the Rust release train”, and the answer I think is when they finally make “alloc” no longer a nightly API. At the moment, `no-std` targets have no access to the heap, unless they hop on the “nightly” train, in which case you’re back into the Python-esque nightmare of your code routinely breaking with language releases.

We definitely gave writing an OS in `no-std` + stable a fair shake. The first year of Xous development was all done using `no-std`, at a cost in memory space and complexity. It’s possible to write an OS with nothing but pre-allocated, statically sized data structures, but we had to accommodate the worst-case number of elements in all situations, leading to bloat. Plus, we had to roll a lot of our own core data structures.

About a year ago, that all changed when Xobs ported Rust’s `std` library to Xous. This means we are able to access the heap in stable Rust, but it comes at a price: now Xous is tied to a particular version of Rust, because each version of Rust has its own unique version of `std` packaged with it. This version tie is for a good reason: `std` is where the sausage gets made of turning fundamentally `unsafe` hardware constructions such as memory allocation and thread creation into “safe” Rust structures. (Also fun fact I recently learned: Rust doesn’t have a native allocater for most targets – it simply punts to the native libc `malloc()` and `free()` functions!) In other words, Rust is able to make a strong guarantee about the stable release train not breaking old features in part because of all the loose ends swept into `std`.

I have to keep reminding myself that having `std` doesn’t eliminate the risk of severe security bugs in critical code – it merely shuffles a lot of critical code out of sight, into a standard library. Yes, it is maintained by a talented group of dedicated programmers who are smarter than me, but in the end, we are all only human, and we are all fair targets for software supply chain exploits.

Rust has a clockwork release schedule – every six weeks, it pushes a new version. And because our fork of `std` is tied to a particular version of Rust, it means every six weeks, Xobs has the thankless task of updating our fork and building a new `std` release for it (we’re not a first-class platform in Rust, which means we have to maintain our own `std` library). This means we likewise force all Xous developers to run `rustup update` on their toolchains so we can retain compatibility with the language.

This probably isn’t sustainable. Eventually, we need to lock down the code base, but I don’t have a clear exit strategy for this. Maybe the next point at which we can consider going back to `nostd` is when we can get the stable `alloc` feature, which allows us to have access to the heap again. We could then decouple Xous from the Rust release train, but we’d still need to backfill features such as Vec, HashMap, Thread, and Arc/Mutex/Rc/RefCell/Box constructs that enable Xous to be efficiently coded.

Unfortunately, the `alloc` crate is very hard, and has been in development for many years now. That being said, I really appreciate the transparency of Rust behind the development of this feature, and the hard work and thoughtfulness that is being put into stabilizing this feature.

Rust Has A Limited View of Supply Chain Security

I think this position is summarized well by the installation method recommended by the rustup.rs installation page:

`curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh`

“Hi, run this shell script from a random server on your machine.”

To be fair, you can download the script and inspect it before you run it, which is much better than e.g. the Windows .MSI installers for vscode. However, this practice pervades the entire build ecosystem: a stub of code called `build.rs` is potentially compiled and executed whenever you pull in a new crate from crates.io. This, along with “loose” version pinning (you can specify a version to be, for example, simply “2” which means you’ll grab whatever the latest version published is with a major rev of 2), makes me uneasy about the possibility of software supply chain attacks launched through the crates.io ecosystem.

Crates.io is also subject to a kind of typo-squatting, where it’s hard to determine which crates are “good” or “bad”; some crates that are named exactly what you want turn out to just be old or abandoned early attempts at giving you the functionality you wanted, and the more popular, actively-maintained crates have to take on less intuitive names, sometimes differing by just a character or two from others (to be fair, this is not a problem unique to Rust’s package management system).

There’s also the fact that dependencies are chained – when you pull in one thing from crates.io, you also pull in all of that crate’s subordinate dependencies, along with all their build.rs scripts that will eventually get run on your machine. Thus, it is not sufficient to simply audit the crates explicitly specified within your Cargo.toml file — you must also audit all of the dependent crates for potential supply chain attacks as well.

Fortunately, Rust does allow you to pin a crate at a particular version using the `Cargo.lock` file, and you can fully specify a dependent crate down to the minor revision. We try to mitigate this in Xous by having a policy of publishing our Cargo.lock file and specifying all of our first-order dependent crates to the minor revision. We have also vendored in or forked certain crates that would otherwise grow our dependency tree without much benefit.

That being said, much of our debug and test framework relies on some rather fancy and complicated crates that pull in a huge number of dependencies, and much to my chagrin even when I try to run a build just for our target hardware, the dependent crates for running simulations on the host computer are still pulled in and the build.rs scripts are at least built, if not run.

In response to this, I wrote a small tool called `crate-scraper` which downloads the source package for every source specified in our Cargo.toml file, and stores them locally so we can have a snapshot of the code used to build a Xous release. It also runs a quick “analysis” in that it searches for files called build.rs and collates them into a single file so I can more quickly grep through to look for obvious problems. Of course, manual review isn’t a practical way to detect cleverly disguised malware embedded within the build.rs files, but it at least gives me a sense of the scale of the attack surface we’re dealing with — and it is breathtaking, about 5700 lines of code from various third parties that manipulates files, directories, and environment variables, and runs other programs on my machine every time I do a build.

I’m not sure if there is even a good solution to this problem, but, if you are super-paranoid and your goal is to be able to build trustable firmware, be wary of Rust’s expansive software supply chain attack surface!

You Can’t Reproduce Someone Else’s Rust Build

A final nit I have about Rust is that builds are not reproducible between different computers (they are at least reproducible between builds on the same machine if we disable the embedded timestamp that I put into Xous for $reasons).

I think this is primarily because Rust pulls in the full path to the source code as part of the panic and debug strings that are built into the binary. This has lead to uncomfortable situations where we have had builds that worked on Windows, but failed under Linux, because our path names are very different lengths on the two and it would cause some memory objects to be shifted around in target memory. To be fair, those failures were all due to bugs we had in Xous, which have since been fixed. But, it just doesn’t feel good to know that we’re eventually going to have users who report bugs to us that we can’t reproduce because they have a different path on their build system compared to ours. It’s also a problem for users who want to audit our releases by building their own version and comparing the hashes against ours.

There’s some bugs open with the Rust maintainers to address reproducible builds, but with the number of issues they have to deal with in the language, I am not optimistic that this problem will be resolved anytime soon. Assuming the only driver of the unreproducibility is the inclusion of OS paths in the binary, one fix to this would be to re-configure our build system to run in some sort of a chroot environment or a virtual machine that fixes the paths in a way that almost anyone else could reproduce. I say “almost anyone else” because this fix would be OS-dependent, so we’d be able to get reproducible builds under, for example, Linux, but it would not help Windows users where chroot environments are not a thing.

Where Rust Exceeded Expectations

Despite all the gripes laid out here, I think if I had to do it all over again, Rust would still be a very strong contender for the language I’d use for Xous. I’ve done major projects in C, Python, and Java, and all of them eventually suffer from “creeping technical debt” (there’s probably a software engineer term for this, I just don’t know it). The problem often starts with some data structure that I couldn’t quite get right on the first pass, because I didn’t yet know how the system would come together; so in order to figure out how the system comes together, I’d cobble together some code using a half-baked data structure.

Thus begins the descent into chaos: once I get an idea of how things work, I go back and revise the data structure, but now something breaks elsewhere that was unsuspected and subtle. Maybe it’s an off-by-one problem, or the polarity of a sign seems reversed. Maybe it’s a slight race condition that’s hard to tease out. Nevermind, I can patch over this by changing a <= to a <, or fixing the sign, or adding a lock: I’m still fleshing out the system and getting an idea of the entire structure. Eventually, these little hacks tend to metastasize into a cancer that reaches into every dependent module because the whole reason things even worked was because of the “cheat”; when I go back to excise the hack, I eventually conclude it’s not worth the effort and so the next best option is to burn the whole thing down and rewrite it…but unfortunately, we’re already behind schedule and over budget so the re-write never happens, and the hack lives on.

Rust is a difficult language for authoring code because it makes these “cheats” hard – as long as you have the discipline of not using “unsafe” constructions to make cheats easy. However, really hard does not mean impossible – there were definitely some cheats that got swept under the rug during the construction of Xous.

This is where Rust really exceeded expectations for me. The language’s structure and tooling was very good at hunting down these cheats and refactoring the code base, thus curing the cancer without killing the patient, so to speak. This is the point at which Rust’s very strict typing and borrow checker converts from a productivity liability into a productivity asset.

I liken it to replacing a cable in a complicated bundle of cables that runs across a building. In Rust, it’s guaranteed that every strand of wire in a cable chase, no matter how complicated and awful the bundle becomes, is separable and clearly labeled on both ends. Thus, you can always “pull on one end” and see where the other ends are by changing the type of an element in a structure, or the return type of a method. In less strictly typed languages, you don’t get this property; the cables are allowed to merge and affect each other somewhere inside the cable chase, so you’re left “buzzing out” each cable with manual tests after making a change. Even then, you’re never quite sure if the thing you replaced is going to lead to the coffee maker switching off when someone turns on the bathroom lights.

Here’s a direct example of Rust’s refactoring abilities in action in the context of Xous. I had a problem in the way trust levels are handled inside our graphics subsystem, which I call the GAM (Graphical Abstraction Manager). Each Canvas in the system gets a `u8` assigned to it that is a trust level. When I started writing the GAM, I just knew that I wanted some notion of trustability of a Canvas, so I added the variable, but wasn’t quite sure exactly how it would be used. Months later, the system grew the notion of Contexts with Layouts, which are multi-Canvas constructions that define a particular type of interaction. Now, you can have multiple trust levels associated with a single Context, but I had forgotten about the trust variable I had previously put in the Canvas structure – and added another trust level number to the Context structure as well. You can see where this is going: everything kind of worked as long as I had simple test cases, but as we started to get modals popping up over applications and then menus on top of modals and so forth, crazy behavior started manifesting, because I had confused myself over where the trust values were being stored. Sometimes I was updating the value in the Context, sometimes I was updating the one in the Canvas. It would manifest itself sometimes as an off-by-one bug, other times as a concurrency error.

This was always a skeleton in the closet that bothered me while the GAM grew into a 5k-line monstrosity of code with many moving parts. Finally, I decided something had to be done about it, and I was really not looking forward to it. I was assuming that I messed up something terribly, and this investigation was going to conclude with a rewrite of the whole module.

Fortunately, Rust left me a tiny string to pull on. Clippy, the cheerfully named “linter” built into Rust, was throwing a warning that the trust level variable was not being used at a point where I thought it should be – I was storing it in the Context after it was created, but nobody every referred to it after then. That’s strange – it should be necessary for every redraw of the Context! So, I started by removing the variable, and seeing what broke. This rapidly led me to recall that I was also storing the trust level inside the Canvases within the Context when they were being created, which is why I had this dangling reference. Once I had that clue, I was able to refactor the trust computations to refer only to that one source of ground truth. This also led me to discover other bugs that had been lurking because in fact I was never exercising some code paths that I thought I was using on a routine basis. After just a couple hours of poking around, I had a clear-headed view of how this was all working, and I had refactored the trust computation system with tidy APIs that were simple and easier to understand, without having to toss the entire code base.

This is just one of many positive experiences I’ve had with Rust in maintaining the Xous code base. It’s one of the first times I’ve walked into a big release with my head up and a positive attitude, because for the first time ever, I feel like maybe I have a chance of being able deal with hard bugs in an honest fashion. I’m spending less time making excuses in my head to justify why things were done this way and why we can’t take that pull request, and more time thinking about all the ways things can get better, because I know Clippy has my back.

Caveat Coder

Anyways, that’s a lot of ranting about software for a hardware guy. Software people are quick to remind me that first and foremost, I make circuits and aluminum cases, not code, therefore I have no place ranting about software. They’re right – I actually have no “formal” training to write code “the right way”. When I was in college, I learned Maxwell’s equations, not algorithms. I could never be a professional programmer, because I couldn’t pass even the simplest coding interview. Don’t ask me to write a linked list: I already know that I don’t know how to do it correctly; you don’t need to prove that to me. This is because whenever I find myself writing a linked list (or any other foundational data structure for that matter), I immediately stop myself and question all the life choices that brought me to that point: isn’t this what libraries are for? Do I really need to be re-inventing the wheel? If there is any correlation between doing well in a coding interview and actual coding ability, then you should definitely take my opinions with the grain of salt.

Still, after spending a couple years in the foxhole with Rust and reading countless glowing articles about the language, I felt like maybe a post that shared some critical perspectives about the language would be a refreshing change of pace.

Precursor: From Boot to Root

Wednesday, February 16th, 2022

I have always wanted a computer that was open enough that it can be inspected for security, and also simple enough that I could analyze it in practice. Precursor is a step towards that goal.

As a test, I made a one hour video that walks through the Precursor tech stack, from hardware to root keys. I feel it’s a nice demo of what evidence-based trust should look like:

The video is a bit of a firehose, so please refer to our wiki for more info, or open an issue to further the discussion.

Erratum #1: I had mistakenly attributed SpinalHDL as a subset of Chisel. SpinalHDL is actually a separately developed HDL by Charles Papon. It was developed contemporaneously with Chisel and inspired by many concepts in it, such as using Scala as the underlying language; but it is not affiliated with Chisel.

The Plausibly Deniable DataBase (PDDB)

Tuesday, February 8th, 2022

The problem with building a device that is good at keeping secrets is that it turns users into the weakest link.


From xkcd, CC-BY-NC 2.5

In practice, attackers need not go nearly as far as rubber-hose cryptanalysis to obtain passwords; a simple inspection checkpoint, verbal threat or subpoena is often sufficiently coercive.

Most security schemes facilitate the coercive processes of an attacker because they disclose metadata about the secret data, such as the name and size of encrypted files. This allows specific and enforceable demands to be made: “Give us the passwords for these three encrypted files with names A, B and C, or else…”. In other words, security often focuses on protecting the confidentiality of data, but lacks deniability.

A scheme with deniability would make even the existence of secret files difficult to prove. This makes it difficult for an attacker to formulate a coherent demand: “There’s no evidence of undisclosed data. Should we even bother to make threats?” A lack of evidence makes it more difficult to make specific and enforceable demands.

Thus, assuming the ultimate goal of security is to protect the safety of users as human beings, and not just their files, enhanced security should come hand-in-hand with enhanced plausible deniability (PD). PD arms users with a set of tools they can use to navigate the social landscape of security, by making it difficult to enumerate all the secrets potentially contained within a device, even with deep forensic analysis.

Precursor is a device we designed to keep secrets, such as passwords, wallets, authentication tokens, contacts and text messages. We also want it to offer plausible deniability in the face of an attacker that has unlimited access to a physical device, including its root keys, and a set of “broadly known to exist” passwords, such as the screen unlock password and the update signing password. We further assume that an attacker can take a full, low-level snapshot of the entire contents of the FLASH memory, including memory marked as reserved or erased. Finally, we assume that a device, in the worst case, may be subject to repeated, intrusive inspections of this nature.

We created the PDDB (Plausibly Deniable DataBase) to address this threat scenario. The PDDB aims to offer users a real option to plausibly deny the existence of secret data on a Precursor device. This option is strongest in the case of a single inspection. If a device is expected to withstand repeated inspections by the same attacker, then the user has to make a choice between performance and deniability. A “small” set of secrets (relative to the entire disk size, on Precursor that would be 8MiB out of 100MiB total size) can be deniable without a performance impact, but if larger (e.g. 80MiB out of 100MiB total size) sets of secrets must be kept, then archived data needs to be turned over frequently, to foil ciphertext comparison attacks between disk imaging events.

The API Problem

“Never roll your own crypto”, and “never roll your own filesystem”: two timeless pieces of advice worth heeding. Yet, the PDDB is both a bit of new crypto, and a lot of new filesystem (and I’m not particularly qualified to write either). So, why take on the endeavor, especially when deniability is not a new concept?

For example, one can fill a disk with random data, and then use a Veracrypt hidden volume, or LUKS with detached partition headers. So long as the entire disk is pre-filled with random data, it is difficult for an attacker to prove the existence of a hidden volume with these pre-existing technologies.

Volume-based schemes like these suffer from what I call the “API Problem”. While the volume itself may be deniable, it requires application programs to be PD-aware to avoid accidental disclosures. For example, they must be specifically coded to split user data into multiple secret volumes, and to not throw errors when the secret volumes are taken off-line. This substantially increases the risk of unintentional leakage; for example, an application that is PD-naive could very reasonably throw an error message informing the user (and thus potentially an attacker) of a supposedly non-existent volume that was taken off-line. In other words, having the filesystem itself disappear is not enough; all the user-level applications should be coded in a PD-aware fashion.

On the other hand, application developers are typically not experts in cryptography or security, and they might reasonably expect the OS to handle tricky things like PD. Thus, in order to reduce the burden on application developers, the PDDB is structured not as a traditional filesystem, but as a single database of dictionaries containing many key-value pairs.

Secrets are overlaid or removed from the database transparently, through a mechanism called “Bases” (the plural of a single “Basis”, similar to the concept of a vector basis from linear algebra). The application-facing view of data is computed as the union of the key/value pairs within the currently unlocked secret Bases. In the case that the same key exists within dictionaries with the same name in more than one Basis, by default the most recently unlocked key/value pairs take precedence over the oldest. By offering multiple views of the same dataset based on the currently unlocked set of secrets, application developers don’t have to do anything special to leverage the PD aspects of the PDDB.

The role of a Basis in PD is perhaps best demonstrated through a thought experiment centered around the implementation of a chat application. In Precursor, the oldest (first to be unlocked) Basis is always the “System” Basis, which is unlocked on boot by the user unlock password. Now let’s say the chat application has a “contact book”. Let’s suppose the contact book is implemented as a dictionary called “chat.contacts”, and it exists in the (default) System Basis. Now let’s suppose the user adds two contacts to their contact book, Alice and Bob. The contact information for both will be stored as key/value pairs within the “chat.contacts” dictionary, inside the System Basis.

Now, let’s say the user wants to add a new, secret contact for Trent. The user would first create a new Basis for Trent – let’s say it’s called “Trent’s Basis”. The chat application would store the new contact information in the same old dictionary called “chat.contacts”, but the PDDB writes the contact information to a key/value pair in a second copy of the “chat.contacts” dictionary within “Trent’s Basis”.

Once the chat with Trent is finished, the user can lock Trent’s Basis (it would also be best practice to refresh the chat application). This causes the key/value pair for Trent to disappear from the unionized view of “chat.contacts” – only Alice and Bob’s contacts remain. Significantly, the “chat.contacts” dictionary did not disappear – the application can continue to use the same dictionary, but future queries of the dictionary to generate a listing of available contacts will return only Alice and Bob’s key/value pairs; Trent’s key/value pair will remain hidden until Trent’s Basis is unlocked again.

Of course, nothing prevents a chat application from going out of its way to maintain a separate copy of the contact book, thus allowing it to leak the existence of Trent’s key/value pair after the secret Basis is closed. However, that is extra work on behalf of the application developer. The idea behind the PDDB is to make the lowest-effort method also the safest method. Keeping a local copy of the contacts directory takes effort, versus simply calling the dict_list() API on the PDDB to extract the contact list on the fly. However, the PDDB API also includes a provision for a callback, attached to every key, that can inform applications when a particular key has been subtracted from the current view of data due to a secret Basis being locked.

Contrast this to any implementation using e.g. a separate encrypted volume for PD. Every chat application author would have to include special-case code to check for the presence of the deniable volumes, and to unionize the contact book. Furthermore, the responsibility to not leak un-deniable references to deniable volumes inside the chat applications falls squarely on the shoulders of each and every application developer.

In short, by pushing the deniability problem up the filesystem stack to the application level, one greatly increases the chances of accidental disclosure of deniable data. Thus a key motivation for building the PDDB is to provide a set of abstractions that lower the effort for application developers to take advantage of PD, while reducing the surface of issues to audit for the potential leakage of deniable data.

Down the Rabbit Hole: The PDDB’s Internal Layout

Since we’re taking a clean-sheet look at building an encrypted, plausibly-deniable filesystem, I decided to start by completely abandoning any legacy notion of disks and volumes. In the case of the PDDB, the entire “disk” is memory-mapped into the virtual memory space of the processor, on 4k-page boundaries. Like the operating system Xous, the entire PDDB is coded to port seamlessly between both 32-bit and 64-bit architectures. Thus, on a 32-bit machine, the maximum size of the PDDB is limited to a couple GiB (since it has to share memory space with the OS), but on a 64-bit machine the maximum size is some millions of terabytes. While a couple GiB is small by today’s standards, it is ample for Precursor because our firmware blob-free, directly-managed, high write-lifetime FLASH memory device is only 128MiB in size.

Each Basis in the PDDB has its own private 64-bit memory space, which are multiplexed into physical memory through the page table. It is conceptually similar to the page table used to manage the main memory on the CPU: each physical page of storage maps to a page table entry by shifting its address to the right by 12 bits (the address width of a 4k page). Thus, Precursor’s roughly 100MiB of available FLASH memory maps to about 25,000 entries. These entries are stored sequentially at the beginning of the PDDB physical memory space.

Unlike a standard page table, each entry is 128 bits wide. An entry width of 128 bits allows each page table entry to be decrypted and encrypted independently with AES-256 (recall that even though the key size is 256 bits, the block size remains at 128 bits). 56 bits of the 128 bits in the page table entry are used to encode the corresponding virtual address of the entry, and the rest are used to store flags, a nonce and a checksum. When a Basis is unlocked, each of the 25,000 page table entries are scanned by decrypting them using the unlock key for candidates that have a matching checksum; all the candidates are stored in a hash table in RAM for later reference. We call them “entry candidates” because the checksum is too small to be cryptographically secure; however, the actual page data itself is protected using AES-GCM-SIV, which provides cryptographically strong authentication in addition to confidentiality.

Fortunately, our CPU, like most other modern CPUs, support accelerated AES instructions, so scanning 25k AES blocks for valid pages does not take a long time, just a couple of seconds.

To recap, every time a Basis is unlocked, the page table is scanned for entries that decrypt correctly. Thus, each Basis has its own private 64-bit memory space, multiplexed into physical memory via the page table, allowing us to have multiple, concurrent cryptographic Bases.

Since every Basis is orthogonal, every Basis can have the exact same virtual memory layout, as shown below. The layout always uses 64-bit addressing, even on 32-bit machines.

|   Start Address        |                                           |
|------------------------|-------------------------------------------|
| 0x0000_0000_0000_0000  |  Invalid -- VPAGE 0 reserved for Option   |
| 0x0000_0000_0000_0FE0  |  Basis root page                          |
| 0x0000_0000_00FE_0000  |  Dictionary[0]                            |
|                    +0  |    - Dict header (127 bytes)              |
|                   +7F  |    - Maybe key entry (127 bytes)          |
|                   +FE  |    - Maybe key entry (127 bytes)          |
|              +FD_FF02  |    - Last key entry start (128k possible) |
| 0x0000_0000_01FC_0000  |  Dictionary[1]                            |
| 0x0000_003F_7F02_0000  |  Dictionary[16382]                        |
| 0x0000_003F_8000_0000  |  Small data pool start  (~256GiB)         |
|                        |    - Dict[0] pool = 16MiB (4k vpages)     |
|                        |      - SmallPool[0]                       |
|                  +FE0  |      - SmallPool[1]
| 0x0000_003F_80FE_0000  |    - Dict[1] pool = 16MiB                 |
| 0x0000_007E_FE04_0000  |    - Dict[16383] pool                     |
| 0x0000_007E_FF02_0000  |  Unused                                   |
| 0x0000_007F_0000_0000  |  Medium data pool start                   |
|                        |    - TBD                                  |
| 0x0000_FE00_0000_0000  |  Large data pool start  (~16mm TiB)       |
|                        |    - Demand-allocated, bump-pointer       |
|                        |      currently no defrag                  |

The zeroth page is invalid, so we can have “zero-cost” Option abstractions in Rust by using the NonZeroU64 type for Basis virtual addresses. Also note that the size of a virtual page is 0xFE0 (4064) bytes; 32 bytes of overhead on each 4096-byte physical page are consumed by a journal number plus AES-GCM-SIV’s nonce and MAC.

The first page contains the Basis Root Page, which contains the name of the Basis as well as the number of dictionaries contained within the Basis. A Basis can address up to 16,384 dictionaries, and each dictionary can address up to 131,071 keys, for a total of up to 2 billion keys. Each of these keys can map a contiguous blob of data that is limited to 32GiB in size.

The key size cap can be adjusted to a larger size by tweaking a constant in the API file, but it is a trade-off between adequate storage capacity and simplicity of implementation. 32GiB is not big enough for “web-scale” applications, but it’s large enough to hold a typical Blu-Ray movie as a single key, and certainly larger than anything that Precursor can handle in practice with its 32-bit architecture. Then again, nothing about Precursor, or its intended use cases, are web-scale. The advantage of capping key sizes at 32GiB is that we can use a simple “bump allocator” for large data: every new key reserves a fresh block of data in virtual memory space that is 32GiB in size, and deleted keys simply “leak” their discarded space. This means that the PDDB can handle a lifetime total of 200 million unique key allocations before it exhausts the bump allocator’s memory space. Given that the write-cycle lifetime of the FLASH memory itself is only 100k cycles per sector, it’s more likely that the hardware will wear out before we run out of key allocation space.

To further take pressure off the bump allocator, keys that are smaller than one page (4kiB) are merged together and stored in a dedicated “small key” pool. Each dictionary gets a private pool of up to 16MiB storage for small keys before they fall back to the “large key” pool and allocate a 32GiB chunk of space. This allows for the efficient storage of small records, such as system configuration data which can consist of hundreds of keys, each containing just a few dozen bytes of data. The “small key” pool has an allocator that will re-use pages as they become free, so there is no lifetime limit on unique allocations. Thus the 200-million unique key lifetime allocation limit only applies to keys that are larger than 4kiB. We refer readers to YiFang Wang’s master thesis for an in-depth discussion of the statistics of file size, usage and count.

Keeping Secret Bases Secret

As mentioned previously, most security protocols do a good job of protecting confidentiality, but do little to hide metadata. For example, user-based password authentication protocols can do a good job of keeping the password a secret, but the existence of a user is plain to see, since the username, salt, and password hash are recorded in an unencrypted database. Thus, most standard authentication schemes that rely on unique salts would also directly disclose the existence of secret Bases through a side channel consisting of Basis authentication metadata.

Thus, we use a slightly modified key derivation function for secret Bases. The PDDB starts with a per-device unique block of salt that is fixed and common to all Bases on that device. This salt is hashed with the user-provided name of the Basis to generate a per-Basis unique salt, which is then combined with the password using bcrypt to generate a 24-byte password hash. This password is then expanded to 32 bytes using SHA512/256 to generate the final AES-256 key that is used to unlock a Basis.

This scheme has the following important properties:

  • No metadata to leak about the presence of secret Bases
  • A unique salt per device
  • A unique (but predictable given the per-device salt and Basis name) salt per-Basis
  • No storage of the password or metadata on-device
  • No compromise in the case that the root keys are disclosed
  • No compromise of other secret Bases if passwords are disclosed for some of the Bases

This scheme does not offer forward secrecy in and of itself, in part because this would increase the wear-load of the FLASH memory and substantially degrade filesystem performance.

Managing Free Space

The amount of free space available on a device is a potent side channel that can disclose the existence of secret data. If a filesystem always faithfully reports the amount of free space available, an attacker can deduce the existence of secret data by simply subtracting the space used by the data decrypted using the disclosed passwords from the total space of the disk, and compare it to the reported free space. If there is less free space reported, one may conclude that additional hidden data exists.

As a result PD is a double-edged sword. Locked, secret Bases must walk and talk exactly like free space. I believe that any effective PD system is fundamentally vulnerable to a loss-of-data attack where an attacker forces the user to download a very large file, filling all the putative “free space” with garbage, resulting in the erasure of any secret Bases. Unfortunately, the best work-around I have for this is to keep backups of your archival data on another Precursor device that is kept in a secure location.

That being said, the requirement for no sidechannel through free space is at direct odds with performance and usability. In the extreme, any time a user wishes to allocate data, they must unlock all the secret Bases so that the true extent of free space can be known, and then the secret Bases are re-locked after the allocation is completed. This scheme would leak no information about the amount of data stored on the device when the Bases are locked.

Of course, this is a bad user experience and utterly unusable. The other extreme is to keep an exact list of the actual free space on the device, which would protect any secret data without having to ever unlock the secrets themselves, but would provide a clear measure of the total amount of secret data on the device.

The PDDB adopts a middle-of-the-road solution, where a small subset of the true free space available on disk is disclosed, and the rest is maybe data, or maybe free space. We call this deliberately disclosed subset of free space the “FastSpace cache”. To be effective, the FastSpace cache is always a much smaller number than the total size of the disk. In the case of Precursor, the total size of the PDDB is 100MiB, and the FastSpace cache records the location of up to 8MiB of free pages.

Thus, if an attacker examines the FastSpace cache and it contains 1MiB of free space, it does not necessarily mean that 99MiB of data must be stored on the device; it could also be that the user has written only 7MiB of data to the device, and in fact the other 93MiB are truly free space, or any number of other scenarios in between.

Of course, whenever the FastSpace cache is exhausted, the user must go through a ritual of unlocking all their secret Bases, otherwise they run the risk of losing data. For a device like Precursor, this shouldn’t happen too frequently because it’s somewhat deliberately not capable of handling rich media types such as photos and videos. We envision Precursor to be used mainly for storing passwords, authentication tokens, wallets, text chat logs and the like. It would take quite a while to fill up the 8MiB FastSpace cache with this type of data. However, if the PDDB were ported to a PC and scaled up to the size of, for example, an entire 500GiB SSD, the FastSpace cache could likewise be grown to dozens of GiB, allowing a user to accumulate a reasonable amount of rich media before having to unlock all Bases and renew their FastSpace cache.

This all works fairly well in the scenario that a Precursor must survive a single point inspection with full PD of secret Bases. In the worst case, the secrets could be deleted by an attacker who fills the free space with random data, but in no case could the attacker prove that secret data existed from forensic examination of the device alone.

This scheme starts to break down in the case that a user may be subject to multiple, repeat examinations that involve a full forensic snapshot of the disk. In this case, the attacker can develop a “diff” profile of sectors that have not changed between each inspection, and compare them against the locations of the disclosed Bases. The remedy to this would be a periodic “scrub” where entries have their AES-GCM-SIV nonce renewed. This actually happens automatically if a record is updated, but for truly read-only archive data its location and nonce would remain fixed on the disk over time. The scrub procedure has not yet been implemented in the current release of the PDDB, but it is a pending feature. The scrub could be fast if the amount of data stored is small, but a re-noncing of the entire disk (including flushing true free space with new tranches of random noise) would be the most robust strategy in the face of an attacker that is known to take repeated forensic snapshots of a user’s device.

Code Location and Wrap-Up

If you want to learn more about the details of the PDDB, I’ve embedded extensive documentation in the source code directly. The API is roughly modeled after Rust’s std::io::File model, where a .get() call is similar to the usual .open() call, and then a handle is returned which uses Read and Write traits for interaction with the key data itself. For a “live” example of its usage, users can have a look at WiFi connection manager, which saves a list of access points and their WPA keys in a dictionary: code for saving an entry, code for accessing entries.

Our CI tests contain examples of more interesting edge cases. We also have the ability to save out a PDDB image in “hosted mode” (that is, with Xous directly running on your local machine) and scrub through its entrails with a Python script that greatly aids with debugging edge cases. The debugging script is only capable of dumping and analyzing a static image of the PDDB, and is coded to “just work well enough” for that job, but is nonetheless a useful resource if you’re looking to go deeper into the rabbit hole.

Please keep in mind that the PDDB is very much in its infancy, so there are likely to be bugs and API-breaking changes down the road. There’s also a lot of work that needs to be done on the UX for the PDDB. At the moment, we only have Rust API calls, so that applications can create and use secret Bases as well as dictionaries and keys. There currently are no user-facing command line tools, or a graphical “File Explorer” style tool available, but we hope to remedy this soon. However, I’m very happy that we were able to complete the core implementation of the PDDB. I’m hopeful that out-of-the-box, application developers for Precursor will start using the PDDB for data storage, so that we can start a community dialogue on the overall design.

This work was funded in part through the NGI0 PET Fund, a fund established by NLNet with financial support from the European Commission’s Next Generation Internet Programme.