Name that Ware, November 2020

November 30th, 2020

The ware for November 2020 is shown below.

I’m not even sure if these boards are all from the same chassis, but they are definitely all from the same manufacturer. However, I suspect these will be stumpers, to the point where I’ve left the model number on the boards and I’ve only very lightly obscured a couple of logos/names. I have an inkling of who they are made by and what they are for; it’ll be interesting to find out if anybody knows more about the history behind these boards!

Winner, Name that Ware October 2020

November 30th, 2020

The Ware for October 2020 is a VL53L1 time of flight distance sensor. It incorporates both a vertically-firing laser (small, square die on the left hand side) and the associated sensing circuitry to measure the time it takes for light to travel over short distances, into a package that’s smaller than a pinky nail — it’s the sort of thing that would have sounded like science fiction a decade ago. It’s neat to see how they embed two photodiodes onto a single silicon chip, and then use the molding compound to partition the part into optically isolated Tx and Rx halves. Presumably the photodiode in the Tx part (which gets exposed to stray laser light) is used to measure the outgoing power, to help calibrate the Rx side.

I was playing around with it for an art project as a candidate for a proximity sensor, but unfortunately, it does not work well through any sort of light-diffusive barrier, which is why I was desoldering the part. Gratz to ninja bomb for guessing it, email me for your prize!

Evaluating Precursor’s Hardware Security

November 23rd, 2020

Making and breaking security go hand in hand. I’ve talked a lot about how Precursor, a mobile hardware development platform for secure applications, was made. In this post, I try to break it.

Hardware security is a multi-faceted problem. First, there is the question of “can I trust this piece of hardware was built correctly?”; specifically, are there implants and back doors buried in the hardware? We refer to this as the “supply chain problem”. It is a particularly challenging problem, given the global nature of our supply chains, with parts pulled from the four corners of the world, passing through hundreds of hands before reaching our doorstep. Precursor addresses this problem head-on with open, verifiable hardware: the keyboard, display, and motherboard are easy to access and visually inspect for correct construction. No factory or third-party tool is ever trusted with secret material. Precursor is capable of generating its own secret keys and sealing them within the hardware, without additional tools.

We also use a special kind of logic chip for the CPU – an FPGA – configured by the user, not the factory, to be exactly the CPU that the user specified. Crucially, most users have no evidence-based reason to trust that a CPU contains exactly what it claims to contain; few have the inspection capability to verify a chip in a non-destructive manner. On the other hand, with an FPGA, individual users can craft and inspect CPU bitstreams with readily available tools. Furthermore, the design can be modified and upgraded to incorporate countermeasures against hardware exploits discovered in the FPGA’s underlying fabric. In other words, the current trustability situation for an ASIC-style CPU is basically “I surrender”, whereas with an FPGA, users have the power to configure and patch their CPUs.

See my previous post for more details on how an FPGA helps solve the problem of transparency in CPU designs

Trustable hardware is only one facet of hardware security. Beyond correct construction, there are also questions of “has anyone tampered with the hardware while I wasn’t looking?”, and “can my hardware keep its secrets if it falls into the wrong hands?” These are the “tamper evidence” and “tamper resistance” problems, respectively. The rest of this post will drill down into these two questions as they relate to Precursor.

Despite any claims you may have heard otherwise, tamper resistance is a largely unsolved problem. Any secrets committed to a non-volatile format are vulnerable to recovery by a sufficiently advanced adversary. The availability of near-atomic level microscopy, along with sophisticated photon and phonon based probing techniques, means that a lab equipped with a few million dollars worth of top-notch gear and well-trained technicians has a good chance of recovering secret key material out of virtually any non-volatile storage media. The hard part is figuring out where the secrets are located on the chip. Once this is known for a given make and model, the attack is easy to repeat. This means the incremental cost of recovering keys from a previously studied design is in the ballpark of $10k’s and a matter of days. For some popular types of silicon packaging, such as flip chips, so-called “back side probing” techniques may be able to non-destructively extract secret key material within a matter of hours.

Ptychographic X-ray imaging as referenced in this article reveals the detailed 3-D structure of a modern chip.

Keys stored in volatile memory – that is, battery-backed RAM – can significantly up the ante because any disruption of power to the key memory can disrupt its data. Cryogenic freezing of the circuitry can help preserve the data for readout in the event of power loss, but battery-backed key storage can also incorporate active countermeasures to detect environmental anomalies and reactively zeroize the keys. While this sounds great for storing extremely sensitive secrets, it’s also extremely risky because false triggers can erase the keys, millions of dollars of cryptocurrency could vanish with a single false move. Thus volatile keys are best suited for securing highly sensitive but ephemeral communications and not for the long-term storage of high-value secrets.

Tamper evidence, not to be confused with tamper resistance, is the ability of a secured hardware device to show evidence of tampering. An example of very basic tamper evidence is the “warranty void if removed” sticker. It is typically placed over a screw essential to holding the case together, so that any attempt to disassemble the unit requires puncturing the sticker. This very basic measure is easily defeated by duplicating and replacing the sticker, and it’s not hard to acquire even the holographic stickers if you know the right people.

The next level up in tamper evidence is the incorporation of a physically unclonable feature in the tamper-evident seal, such as meauring the random pattern of fibers in the paper of the sticker, or recording the position of glittery bits embedded in nail polish applied over a seam. Of course, none of these physical features can attest to any evidence of side-channel attacks, which do not require opening the case. In the case of side channel attacks, adversaries can gain enough information about secrets contained within a piece of hardware by just observing patterns in the power consumption or in the spurious radio waves emitted from the hardware. Attackers can also induce transient faults with the hope of tricking a device into disclosing secrets without breaking through labels by exposing the hardware to high levels of radiation or temperature extremes. Depending on the threat model, the hardware may simply need to hamper such attacks, or it may need to react to them and take appropriate action depending upon the nature of the attack.

Because hardware security is hard, the lack of a known exploit should never be taken as the existence of strong security. We take this fact to heart in Precursor: in addition to putting effort into designing a trustable, secured hardware solution, we have also put effort into exploiting it. Unlike most other secure hardware vendors, we also tell you about the exploits in detail, so you can make an informed decision about the practical limits of our implementation’s tamper evidence and resistance.

Precursor uses the Xilinx 7-series of FPGAs for its trusted root. This series of FPGAs is perhaps one of the most extensively studied from a security perspective; dozens of papers have been published about its architecture and vulnerabilities. As a result, its key stores have been reverse engineered and analyzed down to the transistor level.

An example of the analysis done by other researchers on the 7-Series FPGAs

The most serious known vulnerability in the chip was published in early 2020, by Maik Ender, Amir Moradi, and Christof Paar in a paper titled “The Unpatchable Silicon: A Full Break of the Bitstream Encryption of the 7-Series FPGAs”. In a nutshell, it allows an attacker equipped with nothing more than the ciphertext of the bitstream and access to the FPGA’s JTAG port to recover the plaintext of a bitstream. It relies on a feature known as the WBSTAR register, normally used to recover from a corrupted boot. By design, it loads a recovery address from a location encrypted in the bitstream, and places the address in a register that is necessarily disclosed to the boot media to set a new loading location for the fallback image. Unfortunately, the “recovery address” is just a piece of data at the pointy end of an unchecked pointer. This means it can be used to fetch one word of plaintext from anywhere in the bitstream. The attack also takes advantage of the malleability of AES CBC blocks to adjust the WBSTAR load offset, thus allowing the FPGA to be used as an “oracle” to decrypt ciphertext one word at a time, without any knowledge of the encryption key.

The paper suggested a mitigation involving a pair of pins on the FPGA that change value based on the contents of the WBSTAR register to trigger a key reset (assuming you’ve stored the key in battery-backed RAM and not burned it permanently into the chip). While trying to implement the proposed mitigation, we found several ways to easily circumvent it. In particular, the external pin update is not automatic, it is done by an additional command later in the bitstream. By just omitting the pin updating command, we can render this mitigation ineffective without affecting the ability to read out the plaintext. We have published the code necessary to reproduce this attack in our jtag-trace github repository.

This means that the tamper resistance of Precursor is equivalent to the strength of the glue applied over the SPI ROM chip and the JTAG port. The glue on the SPI ROM chip hampers the recovery of the ciphertext, necessary for the attack, while gluing the JTAG port shut hampers the upload of exploit code. Furthermore, the ciphertext should be treated as a secret, as knowledge of it is necessary for recovery of the plaintext. Therefore, the FPGA should still be burned with an encryption key and set to only accept encrypted bitstreams. This is because JTAG readout of the SPI ROM ciphertext can only happen with the help of a bitstream specifically crafted for the purpose. As long as such a bitstream is not encrypted to the secret key inside the FPGA, the ciphertext should be unavailable through the JTAG port.

Based on this evaluation, we’re planning on implementing the circuitry necessary to zeroize RAM-backed encryption keys, but not for the purpose of directly mitigating this exploit. Unfortunately, a separate oversight in the security features of the 7-Series means devices that boot from BBRAM must also accept unencrypted bitstreams. Therefore access to ciphertext is much easier on BBRAM-keyed devices than on devices that boot from fused keys. However, by including the BBRAM key zeroization circuitry, users could at least have a choice of which type of key (battery-backed or fused) to use, depending on their risk profile. Battery-backed keys complicate any hardware attack by requiring the hardware to receive uninterrupted power as it is disassembled, in exchange for the risk of key loss in case the user forgets to keep the battery charged. Fortunately, the power needed to preserve the key is miniscule, lower than the self-discharge current of the battery itself. It’s also useful for users who wish to have a “self-destruct timer” or “panic button” to completely, securely, and irrevocably wipe their devices in a fraction of a second. This is useful for protecting high-value secrets that have a short or pre-determined lifetime.

A schematic of Precursor’s proposed BBRAM key zeroization circuit. It is a pair of back-to-back NMOS-style inverters, which latches the system into a shutdown state until the battery is removed. There are additional circuits provisioned to rapidly discharge key-related voltage rails, not shown here.

Once Precursor has been glued shut, we propose the easiest method to recover the ciphertext and to gain access to the JTAG ports is to put the Precursor device into a precision CNC milling machine, mill out the PCB from the back side, and then place the remaining assembly into a pogo-pin based mechanism to perform the readout. This of course destroys the Precursor device in the process, but it is probably the most direct and reliable method of recovering the encryption keys, as it is very similar to an existing technique used for certain types of attacks on iPhones. Storing keys in BBRAM can greatly complicate the task of milling out the PCB by creating a high risk of accidental key erasure, but a sufficiently precise CNC with a non-conductive ceramic bit, or a precision laser-based ablation milling system can reduce the risk of key loss substantially. Cryogenic cooling of the FPGA chip itself may also help to preserve key material in the case of very short accidental power glitches.

CNC milling machines can be used to perform precision attacks on dense circuit boards. Above is a vendor demo of a CNC milling bit removing a NAND chip from an iPhone motherboard, without damage to adjacent components. The mill can be readily purchased for a couple thousand USD.

The level of equipment and skill required to execute such attacks is higher than available in a typical Makerspace or government field office, but could be made available within a specialized lab of any modern intelligence agency or large technology corporation. Of course, if Precursors became a popular method to store high-value secrets, an entrepreneurial hacker could also start a lucrative service to recover keys from devices with just a couple hundred thousand dollars in startup capital.

That being said, such an attack would likely be noticed. In other words, if your device is functional and its seals intact, your Precuror has probably not been tampered with. But, if it is confiscated or stolen, you can assume its secrets could be extracted in as little as a few hours by a well-prepared adversary. This is not ideal, but this barrier is still higher than countless other “secured systems” ranging from from game consoles to smartphones to crypto wallets that can be broken with nothing more than a data cable and a laptop. Of course, these “easy breaks” are typically due to a bug in the firmware of the device, but unlike most of these devices Precursor’s firmware is patchable, as well as being completely open for audit.

To assist users with gluing their Precursors shut, we include an easy-to-mix two-part epoxy in the box so that users may “pot” their boards once they have inspected their device and are satisfied it meets their standard for correct construction. Remember, we don’t seal the boards in the factory because that makes it impossible for users to confirm no back doors were inserted into the hardware. Upon completing inspection of the hardware, we recommend users perform the following actions to permanently seal their Precursor systems:

  1. Instruct the Precursor firmware to self-generate and burn an AES encryption key to the device, re-encrypt the FPGA bitstream to this key, and finally burn the fuse preventing boot from any other source.
  2. Disassemble the unit and snap the provided metal lid onto the motherboard. In addition to making it more difficult to physically access the hardware, it absorbs radio waves, thus making RF-based attacks more difficult.
  3. Hand-draw a memorable object or word onto a piece of tissue paper that is about 2.5 cm (1 inch) square.
  4. Use a cotton swab to drip the provided two-part clear epoxy through the holes of the RF shield, especially over the debug connector and SPI ROM.
  5. Lay the tissue paper over the wet epoxy, allowing the paper to become saturated with the epoxy.
  6. Wait 24 hours for the epoxy to dry, and re-assemble for use.

Precursor is designed to have the “Trusted” zone covered with a metal shield that can be glued down with epoxy.

The main risk of applying the epoxy is it leaking onto a connector that should not be glued down. The board is laid out to avoid that outcome, but caution is still needed during sealing. Pouring too much glue into the metal lid can result in glue leaking onto connectors that should not be sealed, preventing re-assembly. Notably, the sealing is done at the user’s risk; once a sealing attempt has been made, we can’t repair or replace a unit since the epoxy is unremovable by design.

The purpose of the tissue paper is to provide a simple but memorable way to confirm the uniqueness of your Precursor unit. It relies upon a human’s innate ability to recognize drawings or lettering made with their own hand. This method allows verification of the seal with simple visual inspection, instead of relying upon a third party tool to, for example, record and analyze the position of glitter dots or paper fibers for authenticity.

We provide Epotek 301 “Bi-packs” as the two-part epoxy for potting Precursor devices. It is commonly used for encapsulating optoelectronics and semiconductors such as LEDs and sensors. As such, it has excellent electrical properties, good optical clarity, robust mechanical durability, and a suitable viscosity for penetrating the nooks and crannies of small electronic parts. Once cured, any attempt to physically access the electronics contained within will leave some kind of mark as evidence of the attempt.

The Precursor device has a well-characterized hardware security level. We’ve designed the hardware to be easier to inspect and seal, so you have an evidenced-based reason to trust the hardware. As all devices are vulnerable to hardware tampering, we’ve taken the approach of preferring devices with well-characterized vulnerabilities. We picked the Xilinx 7-Series FPGAs in part because it has been studied for years: its fuse boxes and encryption engine have been analyzed down to the gate level and there have been no findings so far reporting special access modes for law enforcement, undocumented shadow key stores, or other bugs that could lead to fully remote exploits or back doors. We’ve also taken the time to reproduce and disclose the known vulnerabilities, so that you don’t have to trust our statements about the security of the hardware: you’re empowered to reproduce our findings and make your own evidence-based findings about the trustworthiness of Precursor. In parallel with Precursor, the Betrusted project aims to raise the bar even higher, but doing so will require fabricating a custom ASIC, which incidentally carries its own set of verifiability and supply chain risks that we hope to address with countermeasures that are beyond the scope of this post.

In the end, there is no such thing as perfect security, but we firmly believe that the best mitigation is an evidence-based ecosystem built around the principles of openness, transparency, and lots and lots of testing. If you’re interested in participating in our ecosystem, please visit the Precursor crowdfunding page.

What is a System-on-Chip (SoC), and Why Do We Care if They are Open Source?

November 10th, 2020

Note: This post originally appeared as an update in the crowdfunding campaign for Precursor.

Modern gadgets are typically built around a single, highly integrated chip, known as a “System on Chip” (SoC). While the earliest home computer motherboards consisted of around a hundred chips, Moore’s Law pushed that down to just a handful of chips by the time 80286 PC/AT clones were mainstream, and the industry has never looked back. Now, a typical SoC integrates a CPU core complex, plus dozens of peripherals, including analog, RF, and power functions; there are even “System in Package” solutions available that package the SoC, RAM, and sometimes even the FLASH die into a single plastic package.

Modern SoCs are exceedingly complex. The “full user’s manual” for a modern SoC is thousands of pages long, and the errata (“bug list”) – if you’re allowed to see it – can be hundreds of pages alone. I put “full user’s manual” in quotations because even the most open, well-documented SoCs (such as the i.MX series from NXP) require a strict NDA to access thousands of pages of documentation on third party Intellectual Property (IP) blocks for functions such as video decoding, graphics acceleration, and security. Beyond the NDA blocks, there is typically a deeper layer of completely unpublished documentation for disused silicon, such as peripherals that were designed-in but did not make the final cut, internal debugging facilities, and pre-boot facilities. Many of these disused features aren’t even well-known within the team that designed the chip!

Disused silicon is a thing because building chips is less like snapping together Legos, and more like a sculptor chiseling away at a marble block: adding a circuit is much harder than deactivating a circuit. Adding a circuit might cost around $1 million in new masks, while delaying the project by about 70 days (at a cost of 100,000 man-hours worth of additional wages); with proper planning, deactivating a circuit may be as simple as a code change, or a small edit to a single mask layer, at a cost of perhaps $10,000 and a few days (assuming wafers were held at intermediate stages to facilitate this style of edit).

photo credit: “Man With Mallet & Chisel Bas Relief (Washington, DC)” by takomabibelot is licensed under CC BY 2.0

Thus a typical SoC mask set starts with lots of extra features, spare logic, and debug facilities that are chiseled away (disused) until the final shape of the SoC emerges. As Michelangelo once said “every block of stone has a statue inside it, and it is the task of the sculptor to discover it,” we could say “every SoC mask set has a datasheet inside it, and it is the task of the validation team to discover it”. Sometimes the final chisel blow happens at boot: an errant feature may be turned off or patched over by pre-boot code that runs even before the CPU executes its first instruction. As a result, even the best documented SoCs will have a non-trivial fraction of transistors that are disused and unaccountable, theoretically invisible to end users.

From a security standpoint, the presence of such “dark matter” in SoCs is worrisome. Forget worrying about the boot ROM or CPU microcode – the BIST (Built in Self Test) infrastructure has everything you need to do code injection, if you can just cajole it into the right mode. Furthermore, SoC integrators all buy functional blocks such as DDR, PCI, and USB from a tiny set of IP vendors. This means the same disused logic motifs are baked into hundreds of millions of devices, even across competing brands and dissimilar product lines. Herein lies a hazard for an unpatchable, ecosystem-shattering security break!

Precursor sidesteps this hazard by implementing its SoC using an FPGA. FPGAs are user-reconfigurable, drastically changing the calculus on the cost of design errors; instead of chiseling away at a block of marble, we are once again building with a Lego set. Of course, this flexibility comes at a cost: an FPGA is perhaps 50x more expensive than a feature-equivalent SoC and about 5-10x slower in absolute MHz. However, it does mean there is no dark matter in Precursor, as every line of code used to describe the SoC is visible for inspection. It also means if logic bugs are found in the Precursor SoC, they can be patched with an update. This drastically reduces the cost to iterate the SoC, making it more economically compatible with an open source approach. In an ideal world, the Precursor SoC design will be thoroughly vetted and audited over the next couple of years, converging on a low-risk path toward a tape out in fixed silicon that can reduce production costs and improve performance all while maintaining a high standard of transparency.

LiteX: The Framework Behind Precursor’s SoC

Precursor’s SoC is built using LiteX. LiteX is a framework created by Florent Kermarrec for defining SoCs using the Migen/MiSoC FHDL, which itself is written in Python 3.6+. The heart of LiteX is a set of “handlers” that will automatically adapt between bus standards and widths. This allows designers to easily mix and match various controllers and peripherals with Wishbone, AXI, and CSR, bus interconnect standards. It is pretty magical to be able to glue an extra USB debug controller into a complex SoC with just a few lines of code, and have an entire infrastructure of bus arbiters and adapters figure themselves out automatically in response. If you want to learn more about LiteX and FPGAs, a great place to start is Florent’s “FPGA_101” mini-course.

A Brief Tour of Precursor’s SoC

Above is a block diagram of Precursor’s SoC, as of October 2020. It’s important to pay attention to the date on documentation, because an FPGA-based SoC can and will change over time. We generally eschew pretty, hand-drawn block diagrams like this because they are out of date almost the day they are finished. Instead, our equivalent of a “programmer’s manual” is dynamically generated by our CI system with every code push, and for Rust programmers we have a tool, svd2utra that automatically translates SVD files generated by LiteX into a Rust API crate. With an open source FPGA-based SoC, automated CI isn’t merely best practice, it’s essential, because small but sometimes important patches in submodule dependencies will regularly affect your design.

Core Complex

The “Core Complex” currently consists of one RISC-V core, implemented using Charles Papon’s VexRiscV. We configured it to support the “RV32IMAC” instruction subset, gave it an MMU, and beefed up the caches. The VexRiscV limits cache size to 4kiB, but effective capacity can be increased by upping the cache associativity. We get about a 10% performance boost by tuning the core to have a two-way I-cache, and a four-way D-cache. We also provision a 32 kiB boot ROM, which currently holds three instructions, but will someday be expanded to include signature checks on code loaded from external memory and a 128kiB on-board SRAM for tightly coupled/higher security operations. The CPU core is adapted to, and arbitrated into, a multi-controller Wishbone bus by LiteX and further adapted into a CSR bus by a dedicated CSR bridge that has been configured to automatically space peripherals on 4-kiB page boundaries, so that they can be individually remapped with the MMU. There’s also an IRQ handler that manages interrupts originating from peripherals sprinkled around the chip.

The Core Complex also includes a set of mostly boilerplate CSRs which perform the following functions:

  • “Reboot” allows us to specify a new location for the reset vector
  • “Ctrl” allows us to issue a soft reset
  • “Timer 0” is the default timer provided by LiteX. It is a high resolution 32-bit timer clocked at the same frequency as the CPU core.
  • “CRG” is an interface to control the FPGA’s clock generator. Right now we don’t do much with it, but eventually this is going to play a central role in power management and extending battery life.
  • “Git Info” is a static register that provides information about the state of the git repo from which Precursor was built.
  • “BtSeed” is a 64-bit number that can be randomized to force entropy into the place-and-route process, in case the end user desires a final FPGA netlist unique to their device without having to modify the code (otherwise the builds are entirely reproducible).
  • “Litex ID” is a human-readable text string that identifies the SoC design.
  • “TickTimer” is a low-resolution, 64-bit timer clocked in 1 ms increments. It serves as a source of time for the Xous OS.

Debug Block

Adjacent to the Core Complex is a Debug block. The Debug block features a full speed USB MAC/PHY that can tunnel Wishbone packets and serve as an alternate Wishbone controller to the CPU. We use this to drive the debug interface on the CPU, thus allowing GDB to connect to Precursor over USB even when the CPU is halted. In fact, one could build Precursor with no RISC-V CPU and just tunnel Wishbone packets over USB for debug and driver development. The debug block also includes a small CSR peripheral called the “Messible”, which is a 64-entry by 8-bit wide FIFO, useful as a mailbox/scratchpad during debugging.

Memory Mapped and CSR I/O

The memory space of the RISC-V CPU is mapped onto various peripherals and memory blocks via a Wishbone bus. For traditional SoC designers, Wishbone is kind of like AXI, but open source. Wishbone supports fancy features like multiple masters, pipelining, and block transfers. A portion of the Wishbone bus space is further mapped onto a bus called the Configuration and Status Register (CSR) bus.

While Wishbone is high-performance, it requires more interface logic and is happiest when the peripheral’s bit width matches the bus width. CSRs are area-efficient and gracefully accommodates registers of arbitrary bit-width from both a hardware and software API standpoint, but are lower performance. Thus CSRs are ideal for low-to-medium speed I/O tasks (such as the eponymous configuration and status registers), whereas Wishbone is ideal for memory-mapped I/O where improved bandwidth and latency are worth the area overhead.

From a design process, most peripherals start life mapped to CSR space, and are then upgraded to a memory-mapped implementation to meet performance demands. Thus, it’s no coincidence that most peripherals on Precursor are CSR-only devices. Here is a brief description of each CSR peripheral. As a reminder, you can always consult our reference manual for more details.

  • “COM SPI” is the SPI bus that connects to the Embedded Controller (EC) SoC. It’s a 20MHz SPI peripheral that has a fixed transfer width of 16-bits. This block is targeted for an upgrade to a memory mapped I/O block.
  • “I2C” is an I2C bus controller. Currently, only a real time clock (RTC) chip and an audio CODEC chip are are connected to this I2C bus.
  • “BtEvents” is a catch-all block for handling various external real-time interrupt sources. Currently it handles interrupts from the EC and RTC chips.
  • “KeyScan” is the keyboard controller. It’s designed to scan a 9×10 keyboard matrix for key hits, using a slow external 32kHz clock source. By decoupling the keyboard scanner from the system core clock, the system can go to a lower power state while waiting for keyboard presses, extending the number of days that Precursor can go between charges.
  • “BtPower” is a set of GPIOs dedicated specifically to managing power. It can turn the audio and discrete TRNG on and off, override the EC’s power control commands, activate boost mode for the USB type C port (allowing operation as a DFP or “host”), and engage the self-destruct mechanism.
  • “JTAG” is a set of GPIOs looped back to the FPGA’s JTAG pins. These are used in combination with our eFuse API drivers to self-provision AES bitstream encryption fuses on the 7 Series FPGA.
  • “XADC” is the interface for the 7-Series XADC block, which is a 12-bit, multi-channel ADC. This is primarily used for the self monitoring of system voltages. In the final production revision, at least one channel of the ADC will also be available as a configuration option on the GPIO internal header so that users have an easier path to integrating analog sensors into Precursor.
  • “UART” is a simple 115200, 8-N-1 serial interface which is connected to the debug header for console I/O.
  • “BtGpio” is a straight-forward digital I/O block for driving the pins on the GPIO internal header. Note that due to the nature of the FPGA’s implementation, it’s not possible to switch between a digital GPIO function and an analog GPIO function without updating the bitstream.

In addition to the CSR I/Os, a few I/O devices are memory-mapped for high performance:

  • “External SRAM” is a 32-bit wide, asynchronous interface that memory-maps 16 MiB of external SRAM. The SRAM is battery-backed so that it can retain state while the SoC is powered off. The intention is to optimize power by reducing sleep/wake overhead. However, this also means that the self-destruct procedure must first clear sensitive data from SRAM before activating the final blow that knocks out the SoC, as the self-destruct circuitry is also powered by the SRAM’s backup power supply. The External SRAM block also has a CSR interface to read out the configuration mode of the SRAM.
  • “Audio” is an I2S interface to an external audio CODEC. In addition to a CSR block that configures the I2S interface, it also includes a pair of 256×16 entry memory-mapped sample FIFOs.
  • “SPI OPI” is a high-speed SPI-like interface to external FLASH storage that memory-maps 12 8MiB of non-volatile storage. The “O” in OPI stands for octal – it’s an 8-bit bus that runs at 100MHz DDR speeds. It also includes a pre-fetcher that can hold several cache line’s worth of code, optimizing the case of straight-line code execution. High performance on this bus is essential, since the intention is for the CPU to run most code as XIP out of FLASH. It also features a CSR interface to control operations like block erase and page programming.
  • “MemLCD” is the frame buffer for the LCD. The Sharp Memory LCD contains its own internal memory, which allows it to retain an image even when the host is powered off. The MemLCD frame buffer is thus a cache for the LCD itself. It manages which lines of the LCD are dirty and will flush only the dirty lines to the LCD upon requests made via the CSR. This improves the perceived update rate of the LCD, which is limited to 10 Hz if the entire screen is being updated, but improves inversely proportional to the fraction of the screen that is static.

Cryptography Complex

All the features described thus far consume about 20% of the FPGA’s logic; the majority of the logic in Precursor’s FPGA is dedicated to the Cryptography Complex.

Above is an amoeba plot that visualizes the relative size of various functions within the Precursor SoC design. Some blocks, such as the semi-redundant SHA-512 and SHA-2 accelerators, are currently included simply because we could fit both of them in the FPGA, and not because we strictly needed both of them. Fortunately, removing the SHA-2 block is as easy as commenting out four lines of code, saving about 2800 SLICE LUTs or about 9% of the device’s resources. LiteX and the svd2rust scripts take care of everything else!

Here’s a quick run-down of the blocks inside the Cryptography Complex:

  • “Engine25519” is an arithmetic accelerator for operations in the prime field 2^255-19. It’s a microcoded, 256-bit arithmetic engine capable of computing a 256-bit multiply plus normalization in about one microsecond, about a 30x speedup over running the equivalent code on the RISC-V CPU. It consumes a huge amount of resources, but was deemed essential because the Betrusted secure communications application is built around the Double-Ratchet Algorithm, which relies heavily on this type of math. The CI documentation is probably the best starting point to understand more about the Engine25519 implementation. The block is big enough that later on it will get an entire post dedicated to explaining its function.
  • “SHA-512” and “SHA-2” are hardware-accelerated SHA hash blocks. They are derived from Google’s OpenTitan SystemVerilog source code. The SHA-2 block is directly from OpenTitan and included mostly because it was easy to integrate. The SHA-512 block is our own adaptation of the SHA-2 block. This is the historical reason for why we have both in the current build of Precursor, even though most applications will only need one hash or the other to be hardware accelerated.
  • “AES” is an AES accelerator also lifted directly from the Google OpenTitan project. It is capable of doing AES 128, 192, and 256, and supports encryption and decryption in ECB, CBC, and CTR modes.
  • “KeyROM” is a 256×32 ROM implemented using fixed-location LUTs in the FPGA. Since the ROM’s location is fixed, we can use PrjXray to determine the location of the KeyROM bits in the FPGA’s bitstream. This allows us to edit the key ROMs directly into the FPGA bitstream, thus enabling a transfer of trust from the low-level eFuse AES key into the higher-level functions of the Precursor SoC. We will discuss more about some important, recently-discovered vulerabilities in the FPGA eFuse AES key in a post coming soon.
  • “TRNG” is an on-chip, ring oscillator-based TRNG. It uses multiple small rings to collect entropy which are then merged into a single large ring for final measurement. The construction and validation of Precursor’s TRNGs will also get their own post at some point down the road.
  • “ICAPE2” is an explicit tie-down for an unused internal debug port in the FPGA fabric. ICAPE2 is Xilinx’s way of allowing an FPGA to introspect and access internal configuration state. We explicitly tie it down so that no other functions can try to claim it. Also, since the ICAPE2 is at a well-known location in the bitstream, it is possible to write a tool that does post-compilation inspection of the bitstream to verify that the ICAPE2 block is in fact deactivated.

Parting Thoughts

That’s it for our whistle-stop tour of the Precursor SoC! We’ve sculpted in the parts that are essential to functionality and security and hope the development community will add more. By commenting out a few lines of code, you can clear out unnecessary blocks and make space for your own creations. Precursor’s code base is entirely open and available for inspection – no hidden test logic or microcode blobs and no NDA required to trace an unambiguous, cycle-accurate path from the release of reset to the execution of the first instruction. This lack of “dark matter” and total transparency of design adds yet another argument in the evidence-based case to trust Precursor’s hardware with your private matters.

If you enjoyed this post, please check out Precursor’s campaign page for more details and project updates!

Name that Ware, October 2020

October 31st, 2020

The ware for October 2020 is shown below.

I was desoldering this from a board, and surprisingly, the case just melted off revealing this. I thought it was pretty neat to see what was inside!