Archive for the ‘IRIS’ Category

A Kinematically Coupled, Nanometer-Resolution Piezo Focus Stage

Sunday, March 31st, 2024

This post is part of a series about giving users a tangible reason to trust their hardware through my IRIS (Infra-Red, in-situ) technique for the non-destructive inspection of chips. Previously, I discussed the process of designing the IRIS light source in some detail, as well as my methodology for learning new things.

This post will describe the focus stage for IRIS.

The focus stage is the thing at the bottom of the above image, covered in a black foil and with red and black wires coming out of one side. It’s responsible for controlling the fine positioning of the sample in the “Z” direction.

Background

The depth of field of the 10x objective used on IRIS is estimated to be around 8.5 microns. This means that I need to be able to control the distance of the chip to the lens in steps much finer than 8.5 microns. Note that depth of field typically decreases with increasing magnification, so if we want to support even higher magnifications, we would need even smaller focus steps.

Above: example of a calibration image used to measure the effective resolution of IRIS, which is about 4.75 pixels per micron with a 10x objective.

The resolution limit of the Z-stepper motors on the Jubilee motion platform is about 10 microns, so we can’t get our samples into perfect focus using the stepper motors of the motion platform alone (the Jubilee motion platform is the cage-like structure that the microscope is mounted in, you can read more about it here). One solution to this is to use an additional fine-focus mechanism that has a limited range of motion, but a very fine increment.

A bit of searching around reveals that a couple ways to do this include a micropositioner (basically a very fine mechanical screw-type mechanism) or a piezoelectric (piezo) positioner. I’ve never used a micropositioner or a piezoelectric actuator before, but the piezoelectric actuator seemed appealing because it would be a solid-state design – more compact, and less mechanical parts to machine and tune. The downsides of a piezoelectric system seems to be a limited range of motion, a limited amount of actuation force, and some non-linearities in position versus voltage due to hysteresis mechanisms.

Unfortunately, piezoelectric actuators are expensive. They start around $1000, and go up from there if you need features like kinematic coupling. So, I decided I’d try to build one from scratch, because it seemed like the rare case where solving an interesting problem with a one-off solution is also cheaper than buying an off-the-shelf unit.

After a couple of days scouring the Internet for suitable actuators, I came across the PowerHap (TM) series of piezo actuators made by TDK. They are intended for automotive haptic interfaces, and are available as a stock item at Digi-Key for around $20 in single unit quantities.

The larger actuators can produce a 100 micron deflection with a few Newtons of force. There are hysteric non-linearities, but they can be reduced with some preload and/or feedback mechanisms. Since the design is intended to be used with a dynamic auto-focus algorithm, absolute linearity is less important than monotonicity and repeatability (in other words, the non-linearities are probably not going to be an issue because we’re wrapping things in a feedback loop).

Sidebar: Kinematic Coupling

It would be convenient to be able to remove the fine focus stage to tweak a sample, and then place it back into the machine without affecting the repeatability of measurements. This would require a coupling that can mate to the stage with sub-micron accuracy with minimal effort. Simply shoving a plate onto a set of brackets or screw holes would not be able to achieve this level of precision. However, it turns out there is a well-established technique for accomplishing this: kinematic coupling.

I hadn’t heard of kinematic coupling until Prof. Nadya Peek, a collaborator on the IRIS project, advised me on the topic. The TL;DR is that in the most abstract sense, an object’s position in space can be precisely constrained with exactly six points of contact. Any less, and the object has a degree of freedom to move; but more importantly, any attempt to use more than six points means there is more than one stable solution for the position of the object.

Why does this matter? Because when systems are over-constrained, small imperfections in fabrication will cause extra constraints to fight with each other. You need to have a little bit of slop to ensure you can put things together without forcing parts together in ways that can damage them.

For example, I generally specify my screw holes to be at least 0.1mm, and ideally 0.2mm, larger than the screw meant to go through them. This makes it easier to assemble things, but it also means that every time I take things part and put them back together again, the final alignment of things moves around by about a hundred microns.

A hundred microns! That’s pretty big compared to the few-micron target of the focus stage.

It turns out that if you can reduce the coupling between the focus stage and its actuators to six points of contact, you can remove the stage and replace it repeatedly with a precision comparable to the size of the contact points. This is really desirable for being able to fiddle with samples without disrupting the work flow.

There’s some pretty good open-access material on how to design systems that are “exactly constrained”; this chapter from MIT’s 2.76 course and this thesis get into the meat-and bones of the topic.

However, the TL;DR is that the practical way to create an idealized “point of contact” is to push a sphere into a cylinder or a plane: for example, a ball bearing pushed onto a pair of dowels, or into a V-groove cut into a plate. If these are constructed from hard, smooth materials, you get pretty close to a perfect single point of contact. If you cut three V-grooves into a plate, and push three spherical bearings into the grooves, you’ve got six exact points of contact: a kinematic coupling!

The final piece missing is the force needed to make the system find its unique solution, i.e., the thing that pushes the spheres into the grooves. In the case of my microscope stage, the force comes from gravity acting on the top plate assembly, and nothing else.

Mechanical Design

The idea behind the mechanical design is to keep it as simple as possible. The microscope stage itself is a simple slab of aluminum with three V-shaped grooves milled into it.

There’s a few holes drilled into the plate to help with mounting samples, but it’s about as simple as you can get. I’ll link to all the design files and where to order parts at the end of the post.

The piezoelectric actuators themselves have a simple shape:

They are basically slabs of piezoelectric material with round “cymbals” bonded to either side. The cymbals act as mechanical amplifiers that increase the displacement of the actuator.

The center of each cymbal has a flat, circular region. I adhere a hemispherical glass cabochon to the center of the region using VHB tape to create the spherical half of the kinematic coupling. “Glass cabochon” (without affiliate code) sounds fancy but it’s a common and cheap material, often used in craft projects to add a bit of a sparkle or a dew-drop effect. The actuators themselves are mounted in an aluminum plate with channels to help route the wires to the edge.

That’s basically it for the mechanical design – two aluminum plates, three piezo actuators, and three glass crafting beads with a bit of glue. “Kinematically coupled piezo actuated focus stage” sounds fancy, but one of the nice parts about kinematic coupling is that you don’t need to complicate them – you are trying to reduce a design to six points of contact, no more, no less.

Above is a view of the bottom plate with the actuators and cabochons glued in place, but with the focus stage taken off.

And above is the assembly with the focus stage resting on the cabochons. The only additions to the design that happened since the photos were taken is the addition of an Acktar black film to suppress stray light reflections, and a small L-shaped bracket to aid with squaring a chip relative to the microscope’s field of view.

An important additional benefit of using a kinematically coupled stage with three separate actuators for each pair of coupling points is that you also get the ability to trim the angle of the plane relative to the optics “for free” (the real cost is the headache of figuring out the maths to make it happen). As long as each of the actuators can be independently driven, you can trim the sample’s plane, thus keeping the entire region in focus.

Electronic Design

Conceptually, I wanted a Serial-to-Position widget: something that converts ASCII commands into different heights on the positioning stage.

I used an RP2040 with a small Rust program to convert serial commands into low voltages through a multi-channel, 14-bit DAC.

In theory, the 14-bit DAC could control the piezo position to a resolution of 6nm (100um stroke divided by 16384 steps), but I’m guessing the actual accuracy is closer to 10’s of nanometers because I was only modestly careful in isolating all the noise sources from the DAC. This is still plenty of resolution given that our target is to land within an 8 micron window. Unfortunately, I have no definitive way to characterize the absolute accuracy of the setup; a nanometer-precise laser interferometer capable of doing these types of measurements costs a mint, and are probably subject to strict export controls.

I then convert the low voltages into the hundreds of volts needed to drive the piezo actuators using a piezo driver chip. Thanks to the proliferation of haptic feedback interfaces in consumer electronics, I’m able to find several single-chip solutions for driving the actuators; for this design, I settled on the TI DRV2700. Since the DRV2700 is capable of higher voltages than I needed as well as negative voltages, the main challenge of the design was making sure I didn’t exceed the maximum specifications on the PowerHap piezo, and that I didn’t accidentally apply a negative voltage to it. This is compounded by the datasheet being a bit vague on some parameters, so I had to re-read it a couple of times and make some educated guesses.

Above: hand-soldered, wired up and ready to go!

Testing high voltage drivers is always an exciting experience: the first time you command it to go to 100 volts, you sort of take a deep breath, close your eyes (so you don’t damage a cornea from flying fragments of chip package), and hit enter. While that test went smoothly, I did end up frying one channel during bring up so badly that the inductor desoldered itself from the board; this was also when I discovered that it was a bad idea to put reverse voltage protection diodes on the channels, because at high positive biases the leakage current was enough to cause the protection diode to self-heat, leading to higher leakage, leading to more heat … and eventually the circuit cooking itself off its pads.

However, once I got past the teething issues, things went surprisingly smoothly. The piezo actuator worked reliably and seems remarkably linear. Perhaps the biggest problem I encountered was that the response is too quick: the focus steps during auto-focus search were energetic enough that it would cause the chip samples to bounce around on the stage, causing them to “walk” out of the field of view over time. This was fixed with a small change in the control software to ramp the waveform, instead of doing a square-wave step to the final position.

Above is a short video of the auto-focus mechanism in action. The focus steps are very fine, so the effect is subtle. However, if you look at the bright laser marking stipples in the lower part of the image, you’ll notice more of an effect. In this video, the algorithm refocuses, sweeping the Z-height over a few microns, doing curve fitting, and determining the optimal point for focus on that step, all within a couple of seconds. The software algorithm to run the autofocus was actually much more challenging to get right than the hardware itself, and I’ll go into some of that in a future post.

The neat part is that at the end of the day, the whole thing cost a few hundred bucks, even counting all of the costs for one-off custom machining of parts and building circuit boards (it helps that I hand-soldered the whole thing myself). I’m not sure how this compares in performance to the thousand-dollar plus commercial alternatives, but it worked well enough and the software is tailored to exactly what I needed.

It helps that I can get CNC aluminum parts made at a reasonable quality for a very good price (about $55 for each of the aluminum plates) through a friend in Shenzhen. His company is called Jiada, and if you have Wechat, you can contact him at “Victor-Jiada”, and the design files can be found here. Just let him know that you got his contact via my blog.

Not a Product

The success of the design begs the question if I will offer it as a product. I’m not actually sure it would be worth it, because of creeping regulatory enshittification – FCC/CE certifications & declarations, RoHS, REACH, probably some UL/safety weirdness because of the high voltages, and unpredictable export controls because it’s related to semiconductor fabrication – is pretty significant, both in terms of real money cost but more importantly in terms of distraction, aggravation, and risk. Just last month, I had to write a letter to DigiKey explaining to them that no, really, I hand-solder everything in my home, and I’m not some fly-by-night trader re-exporting parts to banned entities like Russia or China.

I’ve also been burned by products bounced out of customs because of some small irregularity in the paperwork, followed up by angry customer emails about non-delivery, and ultimately me eating the cost of return shipping and a full refund to the customer, compounded with unsold inventory because my market shrinks with every new regulatory barrier – sometimes over really stupid stuff like post-Brexit UK requiring the words “UKCA” to be printed on products in order to pass customs.

Yes, I’m bitter, because I grew up in a world with fewer trade barriers, and without a DMCA. I remember when we were free to think, travel, and trade, and I liked it. I feel kind of sorry for the youth today, growing up in a world shaped by narratives of safety, fear, and ultimately, control. Also, climate change.

I digress.

Maybe if a few like-minded creators were interested in a kit of mostly assembled parts, I can make a “lo-fi” offering that skirts the most onerous regulations. But in the post trade-war economy, there is a disincentive to creating overly-polished products: if it looks too professional, regulators hold you to a higher level of scrutiny, and that makes it not worth the effort to “do better”, especially for small, bespoke lab tools like this. Welcome to the world of enshittified hardware.

Besides, I don’t have access to nanometer-precision metrology tools, so I’m not actually 100% sure what the piezo displacement curve looks like: “it works for IRIS”, and that’s about all I can guarantee.

DIY

The good news is that all this work is open source, so you’re more than welcome to build your own version of this nm-resolution piezo focus stage based on my design files – you should be able to find everything you need to do that at this github repository! You’re even free to sell product based on these files, but, don’t expect me to support your customers – my work comes with no warranty of fitness. You sell it, you support it!

The next post will dive into the mechanical design of the light positioner itself. Thanks for reading, and thanks again to NLnet and my Github Sponsors for supporting my research.

Designing The Light Source for IRIS

Monday, March 25th, 2024

This post is part of a longer-running series about giving users a tangible reason to trust their hardware through my IRIS (Infra-Red, in-situ) technique. IRIS allows us to see the insides of certain types of chips, even after they are soldered to a circuit board. This is possible because under infrared light, silicon is practically transparent:

And this is what the current generation of IRIS machinery looks like:

Previously, I introduced the context of IRIS, and touched on my general methods for learning and exploring. This post will cover how I arrived at the final design for the light source featured in the above machine. It is structured as a case study on the general methods for learning that I covered in my previous post, so if you see foofy statements about “knowing it” or “being ignorant of it”, that’s where it comes from. Thus, this post will be a bit longer and more circuitous than usual; however, future posts will be more direct and to the point.

Readers interested in the TL;DR can scroll past most of this post and just look at the pretty pictures and video loops near the bottom.

As outlined in my methods post, the first step is to make an assessment of what you know and don’t know about a topic. One of the more effective rhetorical methods I use is to first try really hard to find someone else who has done it, and copy their work.

Try Really Hard to Copy Someone Else

As Tom Knight, my PhD advisor, used to quip, “did you know you could save a whole afternoon in the library by spending two weeks in the lab?” If there’s already something out there that’s pretty close to what I’m trying to do, perhaps my idea is not as interesting as I had thought. Maybe my time is better spent trying something else!

In practice, this means going back to the place where I had the “a-ha!” moment for the idea, and reading everything I can find about it. The original idea behind IRIS came from reading papers on key extraction that used the Hamamatsu Phemos series of failure analysis systems. These sophisticated systems use scanning lasers to non-destructively generate high-resolution images of chips with a variety of techniques. It’s an extremely capable system, but only available to labs with multi-million dollar budgets.

Above: except from a Hamamatsu brochure. Originally retrieved from this link, but hosted locally because the site’s link structure is not stable over time.

So, I tried to learn as much as I could about how it was implemented, and how I might be able to make a “shallow copy” of it. I did a bunch of dumpster-diving and acquired some old galvanometers, lasers, and a scrapped confocal microscope system to see what I could learn from reverse engineering it (reverse engineering is especially effective for learning about any system involving electromechanics).

Nvidia@5nm@AdaLovelace@AD102@GeForce_RTX_4090@S_TW_2324A1_U2F028.MOW_AD102-301-A1___DSCx6@IR

However, in the process of reading articles about laser scanning optics, I stumbled upon Fritzchens Fritz’s Flickr feed (you can browse a slideshow of his feed, above), where he uses a CMOS imager (i.e. a Sony mirrorless camera) to do bulk imaging of silicon from the backside, with an IR lamp as a light source. This is a perfect example of the “I am ignorant of it” stage of learning: I had negative emotions when I first saw it, because I had previously invested so much effort in laser scanning. How could I have missed something so obvious? Have I really been wasting my time? Surely, there must be a reason why it’s not widely adopted already… I recognized these feelings as my “ignorance smell”, so I pushed past the knee-jerk bad feelings I had about my previously misdirected efforts, and tried to learn everything I could about this new technique.

After getting past “I am ignorant of it” and “I am aware of it”, I arrived at the stage of “I know of it”. It turns out Fritz’s technique is a great idea, and much better than anything I had previously thought of. So, I abandoned my laser scanner plan and tried to move to the stage of “tried it out” by copying Fritzchen Fritz’s setup. I dug around on the Internet and found a post where some details about his setup were revealed:

I bought a used Sony camera from Kolari Vision with the IR filter removed to try it out (you can also swap out the filter yourself, but I wanted to be able to continue using my existing camera for visible light photos). The results were spectacular, and I shared my findings in a short arXiv paper.

Above is an example of an early image I collected using a Sony camera photographing an iPhone6 motherboard. The chip’s internal circuitry isn’t overlaid with Photoshop — it’s actually how it appears to the camera in infrared lighting.

Extending the Technique

Now that I was past the stage of “I have tried it out”, it was time to move towards “I know it” and beyond. The photographs are a great qualitative tool, but verification requires something more quantitative: in the end, we want a “green/red light” indicator for if a chip is true to its blueprint, or not. This would entail some sort of automated acquisition and analysis of a die image that can put tight bounds on things like the number of bits of RAM or how many logic gates are in chip. Imaging is just one part of several technologies that have to come together to achieve this.

I’m going to need:

  • A camera that can image the chip
  • A light source that can illuminate the chip
  • A CNC robot that can move things around so we can image large chips
  • Stitching software to put the images together
  • Analysis software to correlate the images against designs
  • Scan chain techniques to complement the gate count census

Unfortunately, the sensors in Sony’s Alpha-NEX cameras aren’t available in a format that is easily integrated with automated control software. However, Sony CMOS sensors from the Starvis2 line are available from a variety sources (for example, Touptek) in compact C-mount cases with USB connectors and automation-ready software interfaces. The Starvis2 line targets the surveillance camera market, where IR sensitivity is a key feature for low-light performance. In particular, the IMX678 is an 8-Mpix 16:9 sensor with a response close to 40% of peak at 1000nm (NB: since I started the project, Sony’s IMX676 sensor is now also available (see E3ISPM12000KPC), a 12-Mpix model with a 1:1 aspect ratio that would be a better match for the imaging I’m trying to do; I’m currently upgrading the machine to use this). While there are exotic and more sensitive III-V NIR sensors available, after talking to a few other folks doing chip imaging, I felt pretty comfortable that these silicon CMOS cameras were probably the best sensors I could get for a couple hundred dollars.

With the camera problem fully constrained within my resource limits, I turned my attention to the problems of the light source, and repeatability.

Light Sources Are Hard

The light source turns out to be the hard problem. Here are some of the things I learned the hard way about light sources:

  • They need to be intense
  • They need to be uniform
  • Because of the diffractive nature of imaging chips, the exact position of the light source relative to the sample turns out to be critical. Viewing a chip is like looking at a hologram: the position of your eyes changes the image you see. Thus, in addition to X, Y and Z positioning, I would need azimuth and zenith controls.
  • For heavily doped substrates (as found on Intel chips), spectral width is also important, as it seems that backscatter from short wavelength sidebands quickly swamp the desired signal (note: this mechanism is an assumption, I’m not 100% sure I understand the phenomena correctly)

Above is the coordinate system used by IRIS. I will frequently refer to theta/zenith and phi/azimuth to describe the position of the lightsource in the following text.

Of course, when starting out, I didn’t know what I didn’t know. So, to get a better feel for the problem, I purchased an off-the-shelf “gooseneck” LED lamp, and replaced the white LEDs with IR LEDs. Most LED lamps with variable intensity use current-based regulation to control the white LEDs, which means it is probably safe to swap the white LEDs for IR LEDs, so long as the maximum current doesn’t exceed the rating of the IR LEDs. Fortunately, most IR LEDs can handle a higher current relative to similarly packaged white LEDs, since they operate at a lower forward voltage.

With these gooseneck-mounted IR LEDs, I’m able to position a light source in three dimensional space over a chip, and see how it impacts the resulting image.

Above: using gooseneck-mounted IR LEDs to sweep light across a chip. Notice how the detail of the circuitry within the chip is affected by small tweaks to the LED’s position.

Sidebar: Iterate Through Low-Effort Prototypes (and not Rapid Prototypes)

With a rough idea of the problem I’m trying to solve, the next step is build some low-effort prototypes and learn why my ideas are flawed.

I purposely call this “low-effort” instead of “rapid” prototypes. “Rapid prototyping” sets the expectation that we should invest in tooling so that we can think of an idea in the morning and have it on the lab bench by the afternoon, under the theory that faster iterations means faster progress.

The problem with rapid prototyping is that it differs significantly from production processes. When you iterate using a tool that doesn’t mimic your production process, what you get is a solution that works in the lab, but is not suitable for production. This conclusion shouldn’t be too surprising – evolutionary processes respond to all selective pressures in the environment, not just the abstract goals of a project. For example, parts optimized for 3D printing consider factors like scaffolding, but have no concern for undercuts and cavities that are impossible to produce with CNC processes. Meanwhile CNC parts will gravitate toward base dimensions that match bar stock, while minimizing the number of reference changes necessary during processing.

So, I try to prototype using production processes – but with low-effort. “Low-effort” means reducing the designer’s total cognitive load, even if it comes at the cost of a longer processing time. Low effort prototyping may require more patience, but also requires less attention. It turns out that prototyping-in-production is feasible, and is actually the standard practice in vibrant hardware ecosystems like Shenzhen. The main trade-off is that instead of having an idea that morning and a prototype on your desk by the afternoon, it might take a few days. And yes – of course there ways to shave those few days down (already anticipating the comments informing me of this cool trick to speed things up) – but the whole point is to not be distracted by the obsession of shortening cycle times, and spend more attention on the design. Increasing the time between generations by an order of magnitude might seem fatally slow for a convergent process, but the direction of convergence matters as much as the speed of convergence.

More importantly, if I were driving a PCB printer, CNC, or pick-and-place machine by myself, I’d be spending all morning getting that prototype on my desk. By ordering my prototypes from third party service providers, I can spend my time on something else. It also forces me to generate better documentation at each iteration, making it easier to retrace my footsteps when I mess up. Generally, I shoot for an iteration to take 2-4 weeks – an eternity, I suppose, by Silicon Valley metrics – but the two-week mark is nice because I can achieve it with almost no cognitive burden, and no expedite fees.

I then spend at least several days to weeks characterizing the results of each iteration. It usually takes about 3-4 iterations for me to converge on a workable solution – about a few months in total. I know, people are often shocked when I admit to them that I think it will take me some years to finish this project.

A manager charged with optimizing innovation would point out that if I could cut the weeks out where I’m waiting to get the prototype back, I could improve the time constant on an exponential and therefore I’d be so much more productive: the compounding gains are so compelling that we should drop everything and invest heavily in rapid prototyping.

However, this calculus misses the point that I should be spending a good chunk of time evaluating and improving each iteration. If I’m able to think of the next improvement within a few minutes of receiving the prototype, then I wasn’t imaginative enough in designing that iteration.

That’s the other failure of rapid prototyping: when there’s near zero cost to iterate, it doesn’t pay to put anything more than near zero effort into coming up with the next iteration. Rapid-prototyping iterations are faster, but in much smaller steps. In contrast, with low-effort prototyping, I feel less pressure to rush. My deliberative process is no longer the limiting factor for progress; I can ponder without stress, and take the time to document. This means I can make more progress every step, and so I need to take fewer steps.

Alright, back to the main story — how we got to this endpoint:

The First Low-Effort Prototypes

I could think of two ways to create a source of light that had a controllable azimuth and zenith. One is to mount it to a mechanism that physically moves the light around. The other is to create a digital array of lights with lights in every position, and control the light source’s position electronically.

When I started out, I didn’t have a clue on how to build a 2-axis mechanical positioner; it sounded hard and expensive. So, I gravitated toward the all-digital concept of creating a hemispherical dome of LEDs with digitally addressable azimuth and zenith.

The first problem with the digital array approach is the cost of a suitable IR LED. On DigiKey, a single 1050nm LED costs around $12. A matrix of hundreds of these would be prohibitively expensive!

Fortunately, I could draw from prior experience to help with this. Back when I was running supply chain operations for Chibitronics, I had purchased over a million LEDs, so I had a good working relationship with an LED maker. It turns out the bare IR LED die were available off-the-shelf from a supplier in Taiwan, so all my LED vendor had to do was wirebond them into an existing lead frame that they also had in stock. With the help of AQS, my contract manufacturing partner, we had two reels of custom LEDs made, one with 1050nm chips, and another with 1200nm chips. This allowed me to drop the cost of LEDs well over an order of magnitude, for a total cost that was less than the sample cost of a few dozen LEDs from name-brand vendors like Marubeni, Ushio-Epitex, and Marktech.

With the LED cost problem overcome, I started prototyping arrays using paper and copper tape, and a benchtop power supply to control the current (and thus the overall brightness of the arrays).

Above: some early prototypes of LEDs mounted on paper using copper tape and a conventional leaded LED for comparison.

Since paper is flexible, I was also able to prototype three dimensional rings of LEDs and other shapes with ease. Playing with LEDs on paper was a quick way to build intuition for how the light interacts with the silicon. For example, I discovered through play that the grain of the polish on the backside of a chip can create parasitic specular reflections that swamp out the desired reflections from circuits inside the die. Thus, a 360-degree ring light without pixel switching would have too many off-target specular reflections, reducing image contrast.

Furthermore, since most of the wires on a chip are parallel to one of the die edges, it seemed like I could probably get away with just a pair of orthogonal pixel-based light sources illuminating at right angles to each other. In order to test this theory, I decided to build a compact LED bar with individually switchable pixels.

Evolving From Paper and Tape to Circuit Boards

As anyone who has played with RGB LED tape knows, individually addressable pixels are really easy to do when you have a driver IC embedded inside the LED package. For those unfamiliar with RGB LED tape, here’s a conceptual diagram of its construction:

Each RGB triple of LEDs is co-packaged with a controller chip (“serial driver IC”), that can individually control the current to each LED. The control chip translates serial input data to brightness levels. This “unit cell” of control + LEDs can be repeated hundreds of times, limited primarily by the resistance of copper wire, thanks to the series wiring topology.

What I wanted was something like this, but with IR LEDs in the package. Unfortunately, each IR LED can draw up to 100mA – more than an off-the-shelf controller IC can handle – and my custom LEDs are just simple, naked LEDs in 3528 packages. So, I had to come up with some sort of control circuit that allowed me to achieve pixel-level control of the LEDs, at a high brightnesses, without giving up the scalability of a serial topology.

Trade-Offs in Driver Topologies

For lighting applications, it’s important that every LED shines with equal brightness. The intensity of an LED’s light output is correlated with the current flowing through it; so in general if you have a set of LEDs that are from the same manufacturing process and “age” (hours illuminated), they will emit the same flux of light for the same amount of current. This is in contrast to applying the same voltage to every LED; in the scenario of a constant voltage, minute structural variations between the LEDs and local thermal differences can lead to exponential differences in brightness.

This means that, in general, we can’t wire every LED in parallel to a constant voltage; instead, every LED needs a regulator that adjusts the voltage across the LED to achieve the desired fixed current level.

Fortunately, this problem is common enough that there are several inexpensive, single-chip offerings from major chip makers that provide exactly this. A decade ago this would have been expensive and hard, but now one can search for “white LED driver IC” and come up with dozens of options.

The conceptually simplest way of doing this – giving each LED its own current regulator – does not scale well, because for N LEDs, you need N regulators with 2N wires. In addition to the regulation cost scaling with the number of LEDs, the wire routing becomes quite problematic as the LED bar becomes longer.

Parallel, switchable LED drive concept. N.B.: The two overlapping circles with an arrow through it is the symbol I learned for a variable current source.

Because of this scaling problem, the typical go-to industry technique for driving an array of identical-illumination LEDs is to string them in series, and use a single boost regulator to control the current going through the entire chain; the laws of physics demands that a string of LEDs in series all share the same current. The regulator adjusts the total voltage going into the string of LEDs, and nature “figures out” what the appropriate voltage is for every individual LED to achieve the desired current.

This series arrangement, shown above, allows N LEDs to share a single regulator, and is the typical solution used in most LED lamps.

Of course, with all the LEDs in series, you don’t have a switchable matrix of LEDs – reducing the current through one LED means the current through all the others identically!

The way to switch off individual LEDs in series is to short out the LEDs that should be turned off. So, conceptually, this is the circuit I needed:

In the above diagram, every LED has an individual switch that can shunt current around the LED. This has some problems in practice; for example, if all the LEDs are off, you have a short to ground, which creates problems for the boost regulator. Furthermore, switching several LEDs on and off simultaneously would require the regulator to step its voltage up and down quickly, which can lead to instability in the current regulation feedback loop.

Below is the actual, practical implementation of this idea:

Here, the logical function undergoes two steps of transformation to achieve the final circuit.

First, we implement the shunt switch using a P-channel FET, but also put a “regular” diode in series with the P-FET. The “regular” diode is chosen such that it has a lower forward voltage than the LED, but only just slightly lower. Because diodes have an exponential current flow with voltage, even a slightly lower voltage conventional diode in parallel with with an LED will effectively steal all the current from the LED and turn it off. In this case, instead of emitting light, all the current is turned into waste heat. While this is inefficient, it has the benefit that the current regulator loop transient is minimized as LEDs turn on and off, and also when all the LEDs are off, you don’t have a short to ground.

Finally, we implement the “regular” diode by abusing the P-channel FET. By flipping the P-channel FET around (biasing the drain higher than the source) and connecting the FET in the “off” state, we activate the intrinsic “body diode” of the P-channel FET. This is an “accidental” diode that’s inherent to the structure of all MOSFETs, but in the case of power transistors, device designers optimize for and specify its performance since it is used by circuit designers to do things like absorb the kick-back of an inductive load when it is suddenly switched off.

Using the body diode like this has several benefits. First, the body diode is “bad” in the sense that it has a high forward voltage. However, for this application, we actually want a high forward voltage: our goal is to approach the forward voltage of an LED (about 1.6V), but be slightly less than that. This requirement is the opposite of what most discrete diodes optimize for: most diodes optimize for the lowest possible forward voltage, since they are commonly used as power rectifiers and this voltage represents an efficiency loss. Furthermore, the body diode (at least in a power transistor) is optimized to handle high currents, so, passing 100mA through the body diode is no sweat. We also enjoy the enhanced thermal conductivity of a typical power transistor, which helps us pull the waste heat out. Finally, by doubling-down on a single component, we reduce our BOM line-item count and overall costs. It actually turns out that P-channel power FETs are cheaper per device, and come in far smaller packages, than diodes of similar capability!

With this technique, we’re actually able to fit the entire circuity of the switch PFET, diode dummy load, an NFET for gate control, and a shift-register flip-flop underneath the footprint of a single 3528 LED, allowing us to create a high-density, high-intensity pixel-addressable IR LED strip.

First Version

On the very first version of the strip, I illuminated two LEDs at a time because I thought I would need at least two LEDs to generate sufficient light flux for imaging. The overall width of the LED strip was kept to a minimum so the strip could be placed as close to the chip as possible. Each strip was placed on a single rotating axis driven by a small hobby servo. The position of the light on the strip would approximate the azimuth of the light, and the angle of the axis of the hobby servo would approximate the zenith. Finally, two of these strips were intended to be used at right angles to improve the azimuth range.

As expected, the first version had a lot of problems. The main source of problems was a poor assumption I made about the required light intensity: much less light was needed than I had estimated.

The optics were evolved concurrently with the light source design, and I was learning a lot along the way. I’ll go into the optics and mechanical aspects in other posts, but the short summary is that I had not appreciated the full impact of anti-reflective (AR) coatings (or rather, the lack thereof) in my early tests. AR coatings reduce the amount of light reflected by optics, thus improving the amount of light going in the “right direction”, at the expense of reducing the bandwidth of the optics.

In particular, my very first imaging tests were conducted using a cheap monocular inspection microscope I had sitting around, purchased years ago on a whim in the Shenzhen markets. The microscope is so cheap that none of the optics had anti-reflective coatings. While it performs worse than more expensive models with AR coating in visible light, I did not appreciate that it works much better than other models with AR-coating in the infra-red wavelengths.

The second optical testbench I built used the cheapest compound microscope I could find with a C-mount port, so I could play around with higher zoom levels. The images were much dimmer, which I incorrectly attributed to the higher zoom levels; in fact, most of the loss in performance was due to the visible-light optimized AR coatings used on all of the optics of the microscope.

When I put together the “final” optics path consisting of a custom monocular microscope cobbled together from a Thorlabs TTL200-B tube lens, SM1-series tubes, and a Boli Optics NIR objective, the impact of the AR coatings became readily apparent. The amount of light being put out by the light bar was problematically large; chip circuitry was being swamped by stray light reflections and I had to reduce the brightness down to the lowest levels to capture anything.

It was also readily apparent that ganging together two LEDs was not going to give me fine enough control of azimuth position, so, I quickly turned around a second version of the LED bar.

Second Version

The second version of the bar re-used the existing mechanical assembly, but featured individually switchable LEDs (instead of pairs of LEDs). A major goal of this iteration was to vet if I could achieve sufficient azimuth control from switching individual LEDs. I also placed a bank of 1200nm LEDs next to 1050nm LEDs. Early tests showed that 1200nm could be effective at imaging some of the more difficult-to-penetrate chips, so I wanted to explore that possibility further with this light source.

As one can see from the photo above, the second version was just a very slight modification from the first version, re-using most of the existing mounting hardware and circuitry.

While the second version worked well enough to start automated image collection, it became apparent that I was not going to get sufficient angular resolution through an array of LEDs alone. Here are some of the problems with the approach:

  • Fixing the LEDs to the stage instead of the moving microscope head means that as the microscope steps across the chip, the light direction and intensity is continuously changing. In other words, it’s very hard to compare one part of a chip to another part of a chip because the lighting angle is fundamentally different, especially on chips larger than a few millimeters on a side.
  • While it is trivial to align the LEDs with respect to the wiring on the chip (most wires are parallel to one of the edges of the chip), it’s hard to align the LEDs with respect to the grain of the finish on the back side of the chip.

Many chips are not polished, but “back-grinded”. Polished chips are mirror-smooth and image extremely well at all angles; back-grinded chips have a distinct grain to their finish. The grain does not run in any consistent angle with respect to the wires of the chip, and a light source will reflect off of the grain, resulting in bright streaks that hide the features underneath.

Above is an example of how the grain of a chip’s backside finish can reflect light and drown out the circuit features underneath.

Because of these effects, it ends up being very tricky to align a chip for imaging, involving half an hour of prodding and poking with tweezers until the chip is at just the right angle with respect to the light sources for imaging. Because the alignment is manual and fussy, it is virtually impossible to reproduce.

As a result of these findings, I decided it was time to bite the bullet and build a light source that is continuously variable along azimuth and zenith using mechanically driven axes. A cost-down commercial solution would likely end up using a hybrid of mechanical and electrical light source positioning techniques, but I wanted to characterize the performance of a continuously positionable light source in order to make the call on if and how to discretize the positioning.

Third and Current Version

The third and current version of the light source re-uses the driver circuity developed from the previous two iterations, but only for the purpose of switching between 1050 and 1200nm wavelengths. I had to learn a lot of things to design a mechanically positionable light source – this is an area I had no prior experience in. This post is already quite long, so I’ll save the details of the mechanical design of the light source for a future post, and instead describe the light source qualitatively.

As you can see from the above video loop, the light source is built coaxially around the optics. It consists of a hub that can freely rotate about the Z axis, a bit over 180 degrees in either direction, and a pair of LED panels on rails that follow a guide which keeps the LEDs aimed at the focal point of the microscope regardless of the zenith of the light.

It was totally worth it to build the continuously variable light source mechanism. Here’s a video of a chip where the zenith (or theta) of the light source is varied continuously:

And here’s a more dramatic video of a chip where the azimuth / psi of the light source is varied continuously:

The chip is a GF180 MPW test chip, courtesy of Google, and it has a mirror finish and thus has no “white-out” angles since there is no back-grind texture to interfere with the imaging as the light source rotates about the vertical axis.

And just as a reminder, here’s the coordinate system used by IRIS:

These early tests using continuously variable angular imaging confirm that there’s information to be gathered about the construction of a chip based not just on the intensity of light reflecting off the chip, but also based on how the intensity varies versus the angle of the illumination with respect to the chip. There’s additional “phase” information that can be gleaned from a chip which can help differentiate sub-wavelength features: in plain terms, by rotating the light around the vertical axis, we can gather more information about the type logic cells used in a chip.

In upcoming posts, I’ll talk more about the light positioning mechanism, autofocus and the software pipelines for image capture and stitching. Future posts will be more to-the-point; this is the only post where I give the full evolutionary blow-by-blow of a design aspect, but actually, every aspect of the project took about an equal number of twists and turns before arriving at the current solution.

Taking an even bigger step back, it’s sobering to remember that image capture is just the first step in the overall journey toward evidence-based verification of chips. There are whole arcs related to scan chain methodology and automated image analysis on which I haven’t even scratched the surface; but Rome wasn’t built in a day.

Again, a big thanks goes to NLnet for funding independent, non-academic researchers like me, and their patience in waiting for the results and the write-ups, as well as to my Github Sponsors. This is a big research project that will span many years, and I am grateful that I can focus on doing the work, instead of fundraising and/or metrics such as impact factor.

Sidebar on Meta-Knowledge

Saturday, March 23rd, 2024

IRIS (Infra-Red, in-situ) is a multidisciplinary project I’m developing to give people a tangible reason to trust their hardware.

Above: example of IRIS imaging a chip mounted on a circuit board.

When I set out to research this technique, there were many unknowns, and many skills I lacked to complete the project. This means I made many mistakes along the way, and had to iterate several times to reach the current solution.

Instead of presenting just the final solution, I thought it might be interesting to share some of the failures and missteps I made along the way. The propensity to show only final results can make technology feel less inclusive: if you aren’t already in the know, it’s easy to feel like everything is magic. Nothing can be farther from the truth.

This short “sidebar” post will wax philosophical and discuss my general methods for learning and exploration; if you have no interest in this topic, you can safely skip this post.

The Rule of Three

When I have no way to derive how many iterations it will take to get something right, I use the “rule of three”: generally, you can get somewhere interesting with three iterations of a methodical process. The rule of three has roots in the observation many natural phenomena can be described with relationships based on the natural logarithm, e. In particular, diffusive processes – that is, progress toward a goal that is driven by random walks over a concentration gradient – have shapes and time constants governed by this relationship. As a corollary it matters less the exact nature of the process, and more the magnitude and proximity of the realizable incentives to get it right.

Image credit: BruceBlaus, CC-BY 3.0

Such processes tend to get “63% of the way there” in the first interval, “86% of the way there” in the second interval, and “95% of the way there” by the third interval (these percentages correspond to inverse powers of e, that is: 63% ≈ 1 – e-1, 86% ≈ 1 – e-2, etc…). You can’t iterate to perfection, but 95% of the way there is usually good enough. So when I can’t find a better analysis to guide a process, I’ll apply the “rule of 3” to everything from project management for a complex system, to how many times I rinse a dish rag before I hang it to dry.

Meta-knowledge: Knowing what You Know

When it comes to planning a big project like IRIS, a realistic self-assessment improves my ability to estimate time and resource requirements; the rule of three only works if you’re realistic about what you can achieve with every iteration.

Thus, I have developed a series of criteria to keep myself grounded, and periodically I take some time to reflect and correct my behavior if it is out of line.

Here is my self-assessment criteria, presented as a series of statements I can make about my knowledge, followed by a set of tests I might use to prove the statement.

  • I am ignorant of it: the concept does not exist in my consciousness; there’s an instinct to reject the possibility of its existence, especially if it is adjacent to something I already know well. The path to knowledge starts with recognizing ignorance; learning the smell of my own ignorance (that is, the instinct to reject or be incredulous) helps me get over this barrier faster.
  • I am aware of it: I’ve heard enough about it that I can throw the term around in the right context and impress someone who hasn’t heard of it.
  • I know of it: I’ve seen others do it, read some articles or papers about it, perhaps even played with a toy version of it and/or correctly answered basic questions about it.

Everyone is different, but this is roughly the level of knowledge I felt I had when I finished my basic undergraduate-level courses in university.

  • I have tried it out: did a small “original” project with it, and it seemed to go OK. This is the point where it’s easy to fall into the trap of knowing enough to be dangerous, but not realizing it.

This is around the point I felt I got to after completing some thesis-level projects in university.

  • I know it: did at least two projects with it, one of which I struggled to finish, because I hit a limit of the example code, API, or physics

This is roughly where I felt when I was as a junior professional in my first jobs out of college.

  • I know it well: extended it with a previously unknown aspect, or built a version of it from near first-principles; can teach it to others, but pupils still come away overwhelmed by jargon. Usually requires at least one several-month period of not touching it, and then coming back to it before I can reach the next stage
  • I have mastered it: knowing what I don’t know about it, and what it might take to figure out the missing bits; can correctly identify which problems it can be used to solve, and effectively solve them; able to use it as a reference to explore other less-known things; can readily extend it to meet other people’s needs; can offer a lucid and compact explanation of the topic to a beginner, without relying on jargon.

This is roughly what I would expect out of a senior professional or professor.

  • I am overfitting it: using it to solve everything, and everything is solvable with it; learning new things is harder and riskier relative to converting all the problems into something solvable with it – so I stop learning new things and spend more of my time converting all problems into its domain. This is the point at which everything looks like a nail because you’ve got a really nice, fancy hammer and you can swing it like nobody else can.

Overfitting can happen at any stage of learning, but it tends to happen whenever you become the most skilled within a given peer group. It’s avoidable, but is often a terminal state of learning. Overfitting can prevent forward progress in other skills, because it can seem like there is no need to master any other technique since you’re already “the smartest person in the room”.

I find that the final stages of learning are a constant tension between overfitting and asymptotically approaching mastery; there is no clear answer as to when I’m overfitting or when I’m just judiciously applying a well-worn tool to a job. However, as a matter of habit, when I start to feel too comfortable with a tool or technique, I try to force myself out of my comfort zone and try something new, just to make sure I’m not overfitting.

There is a cost to this, however, since it almost always means passing up easy money or fame to make the time to explore. An excellent way to break the overfitting cycle is to create art. Art is a safer space for exploration; even technical failures, if sufficiently spectacular, may have artistic merit. I also learn a lot when I collaborate with artists, because they often see aspects of familiar topics that I’ve been blind to my entire life.

Working within my Limitations

Significantly, progress past the “know it well” stage often requires me to take a several month break from doing anything with the topic or tool. During this time, all my short-term memory of the subject is lost, so I have to re-acquire the knowledge when I return to the topic. Re-learning from experience is an important step because I get a fresh look on the topic. Because I’m already somewhat familiar with things, I have the surplus cognitive capacity to put everything into context, while having the awareness to identify and break bad habits.

This cool-down period on learning puts a fundamental cap on the rate at which I can learn any single topic, but, the process of forgetting is aided by rotating through other skills and learning other things. I can use this to my advantage to learn several things in parallel. As a result, I generally try to have at least two projects running at the same time, each exercising a different set of skills. For example, most recently I have been alternating between maintaining the Xous OS (Rust programming), designing IRIS (mechanical design), and designing the next-generation Precursor (chip design).

At least for me, another important aspect is also knowing when to stop learning. You don’t need to be a master of everything (if your goal is to build a thing and deliver it on time). The trick is to learn just enough to get the job done correctly. Since time is a limited resource, overlearning can be as problematic as overfitting. My usual rule is to learn enough to get the job done, and then just enough more to be aware of a few crucial things that I might be missing. If none of these things have a substantial impact on the outcome of the project, it’s time to move on.

In the next post, I’ll describe the process of creating a light source for IRIS as a case study of this self-assessment methodology in action.

IRIS (Infra-Red, in situ) Project Updates

Sunday, March 10th, 2024

A goal of mine is to give everyday people tangible reasons to trust their hardware. Betrusted is a multi-year project of mine to deliver a full-stack verifiable “from logic gates to Rust crates” supply chain for security-critical applications such as password managers. At this point, many parts of the project have come together: Precursor is an FPGA-based open hardware implementation, and it runs Xous, our Rust-based microkernel message-passing OS. I currently use my Precursor on a daily basis with the “vault” application loaded to manage my passwords, TOTP tokens, and FIDO/U2F logins.

However, Precursor is expensive, because FPGAs are expensive. The device could be much cheaper with a dedicated security chip, but then we have no reason to trust these chips – security chip vendors don’t facilitate any form of user-side inspection, so we can’t tell if we have real or fake security chips in our device.

Kind of defeats the purpose, if you ask me.

Last March, I introduced the concept of Infra-Red, in situ (IRIS) inspection of silicon in a blog post and an arXiv paper. My hope has been that IRIS, plus some circuit-level scans and mathematical methods, could be the missing link that allows us to transition from our expensive FPGA-based Precursor solution, to a more pocketbook-friendly ASIC-based solution.

At the time when I released the initial paper, every picture was manually composed and focused; every sharp image was cherry-picked from dozens of fuzzy images. It was difficult to reproduce images, and unsuitable for automatically tiling multiple images together. The technique was good enough for a demo, but shaky as a foundation for full-chip verification.

Over the past year, I’ve refined the technique and implemented a fully automated system that can robustly and repeatably image whole chips at micron-scale resolution in a matter of minutes. The idea is not for everyone to have one of these robots in their home (but how cool would that be!); rather, the idea is that most users could utilize an inexpensive but somewhat fiddly setup and compare their results against reference images generated by the few users like me who have fully automated systems.

Here’s an example of the MPW7 run on SKY130A, courtesy of Matt Venn, imaged using my automated IRIS machine:

The above is just a thumbnail; click on the image to zoom into and browse the full-resolution version at siliconprawn.org (and check out my collection on that server for some more full-chip IRIS images). Each of the mottled dots in the lighter-shaded rectangles in the image corresponds to a logic gate, flip flop, or a “fill cell” (dummy transistors wired up as decoupling capacitors). To create the image, I just had to tell the machine where three corners of the chip are, and it automatically focuses and scans the full area. A script then post-processes the tiles into the fully-composed image you see here.

A broad goal of the project is to democratize silicon imaging and improve the state of the art in hardware verification. This is an area where simply popularizing silicon imaging can move the needle on security for everyone, because a credible threat of being caught reduces the incentive for adversaries to invest in expensive Trojan-implantation capabilities.

The good news is that the project is now at a state where, over the next couple of months, I can share a series of posts that detail the methodology used to develop the automated IRIS system, as well as document the construction of the device.

To kick things off, I’m going to start with a review of the current state of the art in hardware verification.

Review: Current State of the Art in Hardware Verification

Is my computer trustworthy? How do I know it was built correctly, and will execute my instructions faithfully? These are the questions that hardware verification aims to answer.

Hardware verification happens at every level of the supply chain. Let’s start with the relatable, every-day problem of how do I pick a piece of hardware to buy, and from there dive all the way down to esoteric topics such as verifying the circuits and devices that make up the hardware itself.

Consumers shopping for a computer rely principally on reputation mechanics to make choices: is the manufacturer a reputable brand? And how do my peers rate it?

Many of us can relate to how these mechanisms can fail. Ratings can be inflated by purchasing fake reviews, and stores can sell counterfeit brand goods. “Surely, this is just a problem of the retail market”, one might think. Once we’ve waded through the swamp of advertisements and on-line storefronts, the powers that be ought to ensure the device we ordered is the device we get! Otherwise it’s like, fraud or something bad like that, right?

Unfortunately, the level of sophistication for verification at every level of the supply chain is precisely as much as it needs to be for the vendor to get away with it, and not one iota more. For example, when it comes to logistics, we largely rely upon anti-tamper seals and tracking numbers to make sure our package arrives intact.

These measures are generally effective at deterring petty theft, but most anti-tamper tape can be purchased by the roll on gray markets, and tracking updates are too infrequent to rule out package diversion and intervention. Besides, that, consumers are conditioned to accept packages that have been mis-routed or experienced an “exception” during delivery – few will return an item that was delivered a day late under fears that the item could have spent a night in a facility where back doors were installed. Our ready acceptance of delivery exceptions is just one example of how supply chains are only as tight as they need to be for broad consumer acceptance, and not one iota more.

Once you’ve received the shipping box, most high-end consumer electronics have additional seals on their packaging. Unfortunately, most seals rely on easy-to-copy anti-tamper solutions such as holograms or fine printing; or at best contain serial numbers that are easy to copy yet have no easy way to check for authenticity.

Solutions such as glitter seals or other stochastic seals that rely upon the randomness inherent in paper fiber or glue to create a unique, unforgeable seal for every package have been proposed, but adoption is low and there is a lack of standardized, easy-to-use verification tools for such seals. Again, packaging seals are just as good as they need to be for broad consumer acceptance, no more, no less.

At the product level, there is a modicum of good news, at least in certain classes of products.

The traditional route of verification – observing the “fit and finish” of a product to detect counterfeits – is still the dominant method for most consumer products. However, in mobile phones and some laptops, manufacturers deploy electronic serial numbers and tamper detection techniques to deter would-be thieves of components or sub-assemblies. The effectiveness of these techniques depend intimately upon the implementation details; but in any case, they incur a cost in repairability and often times the end consumer can’t access the vendor’s databases to check that everything is in order. Instead, consumers are forced to delegate their trust to the vendors; yet regular consumers have no way to audit the vendors. The deferred trust boogeyman haunts everything from Apple’s iPhone ecosystem, to Intel’s SGX remote attestation mechanisms.

One would hope that this increase in verification sophistication is a trend that improves the deeper you go into the underlying technology. Unfortunately, it’s quite the opposite.

At the component level, the standard to this day for verifying the authenticity of a component is to look at the top marking (that is, the laser-etched numbers and logo) and the fit and finish of the package. Counterfeiters will often miss subtle details such as the font of the numbering, the location of the pin 1 marking, the composition of the overmold material, etc. Factories train staff to inspect and detect defects based on these irregularities.

What happens when a counterfeiter gets all these factors right? Well … the component goes into production, and we find out later about problems, either due to the assemblies failing test in the factory, or perhaps failing in peculiar ways in the field. For better or for worse, these problems are rare, generally affecting less than single-digit percentages of end users, and absent specific requirements or payments from customers to do more, equipment makers do exactly this and nothing more to protect the supply chain.

Even though most modern microcontrollers ship with an electronic serial number, few device manufacturers take advantage of them, and, perhaps somewhat surprisingly, there is usually no easy way to authenticate serial numbers with the component maker. Often times the purpose of the serial number is to serve as a unique ID for tracking products once manufactured; they are not structured to serve as a cryptographic method for determining provenance of the chip itself. Some security-forward microcontrollers feature things like PUFs (physically unclonable functions), but their implementation is usually directed at preventing people from tampering with or servicing their devices, rather than enabling users to verify the construction of the device itself.

And that’s about it – this is where any attempt to verify our electronics stops. To a first order, nobody even looks at the wires inside the chip.

This is because prior to IRIS, your options for inspecting silicon are either destructive, or experiments conducted in high energy physics labs. If you’ve ever seen the brilliant teardowns done by companies like TechInsights, the chips are generally imaged at the circuit level with a SEM or FIB, which requires the chip to be removed from its package and some of the metal layers to be permanently stripped off. Thus, one can obtain extremely high-quality imagery of a sample chip, but these techniques cannot be used to verify the very chip you want to use inside your computer, as the imaged chip must be destroyed.

An alternative, non-destructive technique known as X-ray ptychography can be thought of as a very high resolution 3D scanner for circuits. It’s very impressive, but to date it can only be done in a handful of high energy physics labs and it takes a long time (about three hours for 20 cubic microns) to image a full chip.

The technique I’m developing, IRIS, is a non-destructive technique to acquire micron-resolution images of the first metal layer of a chip at a rate of a seconds per square millimeter. To the best of my knowledge, this is the first practical technique that gives users a glimpse of the actual circuits they will use after it has been mounted on a circuit board.

There’s one final layer deeper into the technology stack beyond imaging of the circuits: electrically testing the circuits directly with a technique known as a “scan chain”. The good news is that scan chains are a mature technology; the bad news is that it is almost never done by users because the details of the scan chain are kept secret, and that a scan chain inspection alone can be easily defeated by a malicious adversary.

The purpose of a scan chain is to assist with the rapid detection of fabrication defects. It works by adding an extra path to a finished design that strings every register into one or more chains. Bit patterns are loaded into the chain, and the resulting logical operations performed by the gates connected between the registers is observed on the output of the chain. With enough patterns, you can build up an idea of what logic is between every register. Of course, the space of bits to explore grows exponentially with the number of bits in a chain, so it’s not practical to brute-force a large state space.

As a result, scan chains are good for detecting flaws in known circuits introduced by mother nature, but ineffective at deterring a malicious adversary. This is true even if one could brute-force the entire state-space due to the epistemic circularity of trusting a circuit to test itself. More colloquially, one may have heard of the “Hawthorne Effect” or the “observer effect”, which describes a subject under study temporarily altering their behavior because they know they are being studied, thus affecting the results of the study. In this case, a scan chain knows when it’s being queried, and thus, a malicious modification to a scan chain can add hidden states that alter its behavior for the duration of a scan, allowing it to pretend to be correctly constructed, but only when a check is actively running.

Although almost every chip goes through a scan chain test before it is shipped, the test vectors are proprietary, and often times the scan chains are deliberately and permanently sealed in a way to make it impossible for users to access. Scan chains may be scuttled after the factory test for ostensible security reasons, as an adversary can use them to read out the full state of a chip. However, depending on the threat model, it may be preferable to give users the option to exercise the scan chain and then permanently fuse out the scan chain after inspection.

Filling in the Verification Gap

As the summary chart below shows, supply chain verification techniques, although imperfect, enjoy broad adoption at the component level and above. However, there is a significant gap in user verification starting at the chip level and below.

My work, the Infra-Red, in situ (IRIS) inspection of silicon, is a step toward filling in this verification gap. In its simplest form, chips are deliberately constructed for optical inspection with infra-red light shined through the back side of the chip – that is, the side facing “up” that is not bonded to the circuit board.

The technique works because although silicon looks opaque at visible light, it is transparent starting at near-infrared wavelengths (roughly 1000 nm and longer). Today’s commodity optics and CMOS cameras are actually capable of working with lights at this wavelength; thus, IRIS is a low-cost and effective technique for confirming the construction of chips down to block level. For example, IRIS can readily help determine if a chip has the correct amount of RAM, number of CPU cores, peripherals, bond pads, etc. This level of verification would be sufficient to deter most counterfeits or substitutions.

However, due to the diffraction limit of infra-red light, it is insufficient for transistor-level imaging. Furthermore, it can only reliably infer the existence of the metal layers closest to the transistors (in technical jargon, it can infer the existence of “standard cell” library elements); it cannot reveal too much information about the higher-level metal wires that route between logic gates.

Thus, for threat models which require protection against adversaries capable of manipulating wires on an integrated circuit, IRIS should be combined with scan chain techniques to robustly verify a chip’s construction.

Hybrid Verification For the Win

As alluded to previously, scan chains alone are insufficient for detecting modifications to a circuit, because an adversary may modify the scan chain in such a manner that it responds with correct answers during the scan itself, but behaves maliciously otherwise.

However, such modifications require the introduction of additional logic gates to track the scan state and compute both correct and malicious responses. Because IRIS can “see” logic gates, it is able to put a firm upper bound on the potential amount of additional logic present in an integrated circuit. Thus, the combination of IRIS and scan chain techniques may be able to effectively verify that a circuit is correctly constructed.

The diagram above illustrates how IRIS and scan chain techniques compliment each other to obtain a high-confidence verification of a chip’s structure.

  • At the largest scales, IRIS can trivially quantify the number of IP blocks, pads, analog functions and memories; this is in contrast to scan-chain techniques which may struggle to characterize analog functions and other macro-scale properties, due to factors such as the analog limitations of scan chains, and the exponential growth of state-space to explore at the macro-level.
  • At intermediate scales, IRIS can quantify the number of bits of memory, or bound the number of standard cells in a region. This places constraints on how much malicious logic could be injected that could otherwise defeat a scan chain test.
  • At the smallest scales, IRIS cannot discern individual wires or gates. However, scan chain excels at exploring the topology and function of logic at the smallest and most local increments.

Thus, in combination, the two techniques may be used to achieve a high confidence verification of a chip’s function – at a cost and time scale suitable for point-of-use (e.g. end user) verification. I say “high confidence” because in the end, there is a probabilistic nature to both imaging and scan chain pattern coverage. As future work I’d like to explore the possibility of using formal methods to mathematically rule out any escapes, but absent formal proofs, it is important to understand that the technique is probabilistic in nature. That being said, it is still vastly better than the current state of the art, which is doing nothing at all.

Hybrid verification could be a viable path toward filling in the verification gap at the most fundamental levels of the supply chain, assuming chip vendors are willing to facilitate such verification by designing and packaging their products in a manner conducive to optical verification, and assuming chip vendors are willing to share scan chain test vectors with end users.

IRIS: Where We Are, and Where We are Headed

Because of the 1000 nm wavelength limit imposed by the transparency of silicon, IRIS has a limit on the features it can resolve. Below is an example IRIS imaging a small part of a RISC-V core on a 130nm chip fabricated using the SKY130A open PDK from Matt Venn’s MPW7 run; you can browse the entire chip image here.

Use the slider to compare the base image against an overlay derived from the design files. Each colored rectangle in the overlay corresponds to a “standard cell”: blue are flip flops, pink are filler/capacitor cells, aqua are varieties of and-or-invert gates, etc. Note that the native resolution of the image above is 1469 pixels wide; it has been scaled down to fit the width of this page.

One can see that in a 130nm process, IRIS has a reasonable chance of conducting a gate-count census of an entire chip. And yes, it’s not atypical for chips to be limited not by logic density, but by wiring density; hence, the majority of a chip’s active area contains filler cells (the pink rectangles).

Above is of an identically scaled region of a 22nm chip, again of a RISC-V core, but this time almost the entire core is within view because the logic gates are, unsurprisingly, much smaller: a single gate can be as small as a few pixels. At this node, IRIS can place an upper bound on gate count to within a couple dozen extra flip flops.

An important caveat when comparing images above: the backside finish of the 130nm chip is a mirror polish, but the 22nm chip only went through backgrinding; in other words, the 22nm chip’s image clarity is degraded due to a series of small surface ridges that refract light. If the 22nm chip had the same mirror-finish quality as the 130nm chip, the imaging resolution of the 22nm chip would be improved. Back side polishing of chips is not a difficult or uncommon process, but it’s not done unless explicitly required. Thus, chips intended for optical verification should specify a high quality mirror finish for the back side.

IRIS goes hand-in-hand with electrical scan chains to achieve full chip verification. Scan chains are able to quickly confirm the wiring between small clusters of standard cells, but Trojans can evade detection by including an honest copy of the affected logic. IRIS confirms that a given cluster of logic being tested by a scan chain is approximately the right size. The search perimeter for rouge cells is reduced by running the scan chain test at high speeds. Ideally, the total bounds are tight enough to rule out the existence of sufficient extra logic to evade detection in scan chain testing.

In other words, what IRIS can’t directly image, the scan chain has to make up for with complexity of test. Thus, at 130nm, a simple bit-shift scan chain may be sufficient, since individual gates are resolvable; at 22nm, a more complicated technique splitting the scan chain into multiple segments, capable of challenging the system with mutually unpredictable data patterns, may be required to drive up the lower bound on circuit complexity to the “dozens of logic gates” range required to bypass the test. And at the most advanced nodes, even more scan chain segments may be required along with supplementary design techniques to drive the lower confidence bound into the “hundreds of logic gates” range for reliable Trojan detection with IRIS.

In general, when I use the term “IRIS” alone in a context where high-confidence gate level verification is required, readers should infer that I’m referring to some kind of hybrid verification technique of both IRIS and scan chain.

That being said, I envision IRIS mainly being used to verify high-value circuitry, such as those found in a discrete cryptographic enclave intended to store secrets such as root keys. These enclaves would not require the performance or density of the latest process nodes. With careful design, a 22nm or 28nm process can offer GHz clock speeds, sufficient for storing and processing bulk data with root secrets. The “2x” nm node is particularly interesting because it is the best “value per transistor” sweet-spot, and likely to stay that way for the foreseeable future: it’s the smallest process node that still uses the easier-to-fabricate planar CMOS transistors while requiring only single-patterning DUV lithography techniques.

Thus, the bulk of my on-going research will focus on samples produced in 130nm and 180nm (because there are Open PDKs available for those nodes today), and 22nm (because of the ultimate economic importance of the node). I am also betting that while the 2x nm node is not open source today, it will become more open within the next decade if the world continues on a “business as usual” scenario where technology continues to race down the commodification curve, and fabs continue to compete on price and need more designs to keep them busy and profitable.

While the imaging system has met its initial goals, the software still has miles to go before I sleep (And miles to go before I sleep). In particular, I’m still working on training a computer to automatically recognize patterns of gates in IRIS images and to generate a gate count census. I’d also ideally like to find a way to use formal methods to place an upper bound on the amount of logic one can hide in a scan chain for a given testing methodology, so designers can have a formally proven tool to validate that their scan chains are sufficiently dense so that IRIS can pick up any attempts to bypass them. Unfortunately, I’m neither a software engineer nor a mathematician, but the problems are interesting enough that I’ll still give them a go. Worst case, I will learn something new along the path to failure.

Above is the IRIS machine that I’ve built. There’s a lot going on here – but basically, it’s an IR camera attached to a microscope, a nanometer-resolution focusing mechanism, and a pair of 1050nm light sources that have continuously adjustable azimuth and zenith. This microscope assembly is mounted in a Jubilee motion platform. The Jubilee is open source, and was designed by Sonya Vasquez of Prof. Nadya Peek‘s Machine Agency group. I got it as a kit from Filastruder. The base motion platform is capable of 10 micron steps, and features a kinematically coupled Z-bed with three independent actuators, allowing me to dynamically compensate for planarity issues in the sample undergoing imaging.

Above is a short video loop showing the core mechanics in action. The weird thing on the bottom with the red and black wires coming out of it is the kinematically coupled nanometer-resolution fine focus stage; its motions are too small to be picked up by the camera.

An explicit goal of this project is to open source all of IRIS, so that anyone can replicate the imaging system. Democratizing chip verification is important because a credible threat of being caught reduces the incentive of adversaries to deploy expensive Trojan-implantation capabilities.

With little fear of being caught, there’s a payoff even if an adversary has to plow tens of millions of dollars into a capability for planting chip-level hardware Trojans in high-value targets. However, if chip inspection equipment can purchased in the ballpark of hundreds to perhaps thousands of dollars, and more than a handful of users are known to routinely inspect chips, the path to payoff for an adversary before they are caught becomes murky. In this case, a rational adversary may be deterred from targeting an IRIS-enabled design, instead reserving their capabilities only for the chips that are difficult to inspect.

Aside from that, I’ll be straight with you – a big motivation for IRIS is simply because I am curious, and I want to look inside chips and see how they are built (and it’s kind of fun and strangely satisfying to build robots). I haven’t been disappointed with what I’ve seen so far – I have to stop myself from wasting evenings browsing through the construction of chips. I’ve done a bit of chip design in the past, and it’s fascinating to see the diversity of techniques and new trends in chip designs. I’m excited to share this sense of wonder with kindred spirits!

Given the volume of material to cover, I’m going to break the documentation up into a series of blog posts that go into the methodology used to build the machine, as well as details about every custom component, and the design decisions that went into them. I’ll also summarize the status of the analysis software that accompanies the system – so stay tuned for more posts!

However, if you’re impatient and don’t want to wait for the documentation, you can already browse the source files for the microscope, control software, stitching software, and layout extraction software.

Finally, a big shout-out to NLnet and the European Commission. NLnet’s NGI0 Entrust fund, established with support from the European Commission’s Next Generation Internet Program, are instrumental in facilitating my work on IRIS. Also a big shout-out to my Github Sponsors for their incredible generosity and monthly support. Thanks to all these donors, I’m able to keep IRIS 100% open source and free of conflicts of interest with commercial investors.

❤️ Sponsor me on Github! ❤️

Infra-Red, In Situ (IRIS) Inspection of Silicon

Wednesday, March 8th, 2023

Cryptography tells us how to make a chain of trust rooted in special-purpose chips known as secure elements. But how do we come to trust our secure elements? I have been searching for solutions to this thorny supply chain problem. Ideally, one can directly inspect the construction of a chip, but any viable inspection method must verify the construction of silicon chips after they have been integrated into finished products, without having to unmount or destroy the chips (“in situ“). The method should also ideally be cheap and simple enough for end users to access.

This post introduces a technique I call “Infra-Red, In Situ” (IRIS) inspection. It is founded on two insights: first, that silicon is transparent to infra-red light; second, that a digital camera can be modified to “see” in infra-red, thus effectively “seeing through” silicon chips. We can use these insights to inspect an increasingly popular family of chip packages known as Wafer Level Chip Scale Packages (WLCSPs) by shining infrared light through the back side of the package and detecting reflections from the lowest layers of metal using a digital camera. This technique works even after the chip has been assembled into a finished product. However, the resolution of the imaging method is limited to micron-scale features.

This post will start by briefly reviewing why silicon inspection is important, as well as some current methods for inspecting silicon. Then, I will go into the IRIS inspection method, giving background on the theory of operation while disclosing methods and initial results. Finally, I’ll contextualize the technique and discuss methods for closing the gap between micron-scale feature inspection and the nanometer-scale features found in today’s chip fabrication technology.

DOI: 10.48550/arXiv.2303.07406

Side Note on Trust Models

Many assume the point of trustable hardware is so that a third party can control what you do with your computer – like the secure enclave in an iPhone or a TPM in a PC. In this model, users delegate trust to vendors, and vendors do not trust users with key material: anti-tamper measures take priority over inspectability.

Readers who make this assumption would be confused by a trust method that involves open source and user inspections. To be clear, the threat model in this post assumes no third parties can be trusted, especially not the vendors. The IRIS method is for users who want to be empowered to manage their own key material. I acknowledge this is an increasingly minority position.

Why Inspect Chips?

The problem boils down to chips being literal black boxes with nothing but the label on the outside to identify them.

For example, above is a study I performed surveying the construction of microSD cards in an effort to trace down the root cause of a failed lot of products. Although every microSD card ostensibly advertised the same product and brand (Kingston 2GB), a decap study (where the exterior black epoxy is dissolved using a strong acid revealing the internal chips while destroying the card) revealed a great diversity in internal construction and suspected ghost runs. The take-away is that labels can’t be trusted; if you have a high-trust situation, something more is needed to establish a device’s internal construction than the exterior markings on a chip’s package.

What Are Some Existing Options for Inspecting Chips?

There are many options for inspecting the construction of chips; however, all of them suffer from a “Time Of Check versus Time Of Use” (TOCTOU) problem. In other words, none of these techniques are in situ. They must be performed either on samples of chips that are merely representative of the exact device in your possession, or they must be done at remote facilities such that the sample passes through many stranger’s hands before returning to your possession.

Scanning Electron Microscopy (SEM), exemplified above, is a popular method for inspecting chips (image credit: tmbinc). The technique can produce highly detailed images of even the latest nanometer-scale transistors. However, the technique is destructive: it can only probe the surface of a material. In order to image transistors one has to remove (through etching or polishing) the overlying layers of metal. Thus, the technique is not suitable for in situ inspection.

X-rays, exemplified in the above image of a MTK6260DA , are capable of non-destructive in situ inspection; anyone who has traveled by air is familiar with the applicability of X-rays to detect foreign objects inside locked suitcases. However, silicon is nearly transparent to the types of X-rays used in security checkpoints, making it less suitable for establishing the contents of a chip package. It can identify the size of a die and the position of bond wires, but it can’t establish much about the pattern of transistors on a die.

X-Ray Ptychography is a technique using high energy X-rays that can non-destructively establish the pattern of transistors on a chip. The image above is an example of a high-resolution 3D image generated by the technique, as disclosed in this Nature paper.

It is a very powerful technique, but unfortunately it requires a light source the size of a building, such as the Swiss Light Source (SLS) (donut-shaped building in the image above), of which there are few in the world. While it is a powerful method, it is impractical for inspecting every end user device. It also suffers from the TOCTOU problem in that your sample has to be mailed to the SLS and then mailed back to you. So, unless you hand-carried the sample to and from the SLS, your device is now additionally subject to “evil courier” attacks.

Optical microscopy – with a simple benchtop microscope, similar to those found in grade-school classrooms around the world – is also a noteworthy tool for inspecting chips that is easier to access than the SLS. Visible light can be a useful tool for checking the construction of a chip, if the chip itself has not been obscured with an opaque, over-molded plastic shell.

Fortunately, in the world of chip packaging, it has become increasingly popular to package chips with no overmolded plastic. The downside of exposing delicate silicon chips to possible mechanical abuse is offset by improved thermal performance, better electrical characteristics, smaller footprints, as well as typically lower costs when compared to overmolding. Because of its compelling advantages this style of packaging is ubiquitous in mobile devices. A common form of this package is known as the “Wafer Level Chip Scale Package” (WLCSP), and it can be optically inspected prior to assembly.

Above is an example of such a package viewed with an optical microscope, prior to attachment to a circuit board. In this image, the back side of the wafer is facing away from us, and the front side is dotted with 12 large silvery circles that are solder balls. The spacing of these solder balls is just 0.5mm – this chip would easily fit on your pinky nail.

The imaged chip is laying on its back, with the camera and light source reflecting light off of the top level routing features of the chip, as illustrated in the cross-section diagram above. Oftentimes these top level metal features take the form of a regular waffle-like grid. This grid of metal distributes power for the underlying logic, obscuring it from direct optical inspection.

Note that the terms “front” and “back” are taken from the perspective of the chip’s designer; thus, once the solder balls are attached to the circuit board, the “front side” with all the circuitry is obscured, and the plain silvery or sometimes paint-coated “back side” is what’s visible.

As a result, these chip packages look like opaque silvery squares, as demonstrated in the image above. Therefore front-side optical microscopy is not suitable for in situ inspection, as the chip must be removed from the board in order to see the interesting bits on the front side of the chip.

The IRIS Inspection Method

The Infra-Red, In Situ (IRIS) inspection method is capable of seeing through a chip already attached to a circuit board, and non-destructively imaging the construction of a chip’s logic.

Here’s a GIF that shows what it means in practice:

We start with an image of a WLCSP chip in visible light, assembled to a finished PCB (in this case, an iPhone motherboard). The scene is then flooded with 1070 nm infrared light, causing it to take on a purplish hue. I then turn off the visible light, leaving only the infrared light on. The internal structure of the chip comes into focus as we adjust the lens. Finally, the IR illuminator is moved around to show how the chip’s internal metal layers glint with light reflected through the body of the silicon.

Here is a still image of the above chip imaged in infra-red, at a higher resolution:

The chip is the BCM5976, a capacitive touchscreen driver for older models of iPhones. The image reveals the macro-scopic structure of the chip, with multiple channels of data converters on the top right and right edge, along with several arrays of non-volatile memory and RAM along the lower half. From the top left extending to the center is a sea of standard cell logic, which has a “texture” based on the routing density of the metal layers. Remember, we’re looking through the backside of the chip, so the metal layer we’re seeing is mostly M1 (the metal connecting directly to the transistors). The diagonal artifacts apparent through the standard cell region are due to a slight surface texture left over from wafer processing.

Below is the region in the pink rectangle at a higher magnification (click on the image to open a full-resolution version):

The magnified region demonstrates the imaging of meso-scopic structures, such as the row and structure column of memory macros and details of the data converters.

The larger image is 2330 pixels wide, while the chip is 3.9 mm wide: so each pixel corresponds to about 1.67 micron. To put that in perspective, if the chip were fabricated in 28 nm that would correspond to a “9-track” standard cell logic gate being 0.8 microns tall (based on data from Wikichip). Thus while these images cannot precisely resolve individual logic gates, the overall brightness of a region will bear a correlation to the type and density of logic gate used. Also please remember that IRIS is still at the “proof of concept” stage, and there are many things I’m working on to improve the image quality and fidelity.

Here’s another demo of the technique in action, on a different iPhone motherboard:

How Does It Work?

Silicon goes from opaque to transparent in the range of 1000 nm to 1100 nm (shaded band in the illustration below). Above 1100 nm, it’s as transparent as a pane of glass; below 1000 nm, it rapidly becomes more opaque than the darkest sunglasses.

Meanwhile, silicon-based image sensors retain some sensitivity in the near-to-short wave IR bands, as illustrated below.

Between these two curves, there is a “sweet spot” where standard CMOS sensors retain some sensitivity to short-wave infrared, yet silicon is transparent enough that sufficient light passes through the layer of bulk silicon that forms the back side of a WLCSP package to do reflected-light imaging. More concretely, at 1000 nm a CMOS sensor might have 0.1x its peak sensitivity, and a 0.3 mm thick piece of silicon may pass about 10% of the incident light – so overall we are talking about a ~100x reduction in signal intensity compared to visible light operations. While this reduction is non-trivial, it is surmountable with a combination of a more intense light source and a longer exposure time (on the order of several seconds).

Above is a cross-section schematic of the IRIS inspection setup. Here, the sample for inspection is already attached to a circuit board and we are shining light through the back side of the silicon chip. The light reflects off of the layers of metal closest to the transistors, and is imaged using a camera. Conceptually, it is fairly straightforward once aware of the “sweet spot” in infrared.

Two things need to be prepared for the IRIS imaging technique. First, the “IR cut-off filter” has to be removed from a digital camera. Normally, the additional infrared sensitivity of CMOS sensors is considered to be problematic, as it introduces color fidelity artifacts. Because of this excess sensitivity, all consumer digital cameras ship with a special filter installed that blocks any incoming IR light. Removing this filter can range from trivial to very complicated, depending on the make of the camera.

Second, we need a source of IR light. Incandescent bulbs and natural sunlight contain plenty of IR light, but the current demonstration setup uses a pair of 1070 nm, 100 mA IF LED emitters from Martech, connected to a simple variable current power supply (in practice any LED around 1050nm +/- 30nm seems to work fairly well).

To give credit where it’s due, the spark for IRIS came from a series of papers referred to me by Dmitry Nedospadov during a chance meeting at CCC. One published example is “Key Extraction Using Thermal Laser Stimulation” by Lohrke et al, published in IACR Transactions on Cryptographic Hardware and Embedded Systems (DOI:10.13154/tches.v2018.i3.573-595). In this paper, a Phemos-1000 system by Hamamatsu (a roughly million dollar tool) uses a scanning laser to do optical backside imaging of an FPGA in a flip-chip package. More recently, I discovered a photo feed by Fritzchens Fritz demonstrating a similar technique, but using a much cheaper off-the-shelf Sony NEX-5T. Since then, I have been copying these ideas and improving upon them for practical application in supply chain/chip verification.

How Can I Try It Out?

While “off the shelf” solutions like the Phemos-1000 from Hamamatsu can produce high-resolution backside images of chips, the six or seven-figure price tag puts it out of reach of most practical applications. I have been researching ways to scale this cost down to something more accessible to end-users.

In the video below, I demonstrate how to modify an entry-level digital inspection camera, purchasable for about $180, to perform IRIS inspections. The modification is fairly straightforward and takes just a few minutes. The result is an inspection system that is capable of performing, at the very least, block-level verification of a chip’s construction.

For those interested in trying this out, this is the $180 camera and lens combo from Hayear (link contains affiliate code) used in the video. If you don’t already have a stand for mounting and focusing the camera, this one is pricey, but solid. You’ll also need some IR LEDs like this one to illuminate the sample. I have found that most LEDs with a 1050-1070 nm center wavelength works fairly well. Shorter wavelength LEDs are cheaper, but the incidentally reflected light off the chip’s outer surface tends to swamp the light reflected by internal metal layers; longer than 1100 nm, and the camera efficiency drops off too much and the image is too faint and noisy.

Of course, you can get higher quality images if you spend more money on better optics and a better camera. Most of the images shown in this post were taken with a Sony A6000 camera that was pre-modified by Kolari Vision. If you have a spare camera body laying around it is possible to DIY the IR cut-off filter removal; YouTube has several videos showing how.

The modified camera was matched with either the optics of the previously-linked Hayear inspection scope, or directly attached to a compound microscope via a C-mount to E-mount adapter.

Another Sample Image

I’ve been using an old Armada610 chip I had laying around for testing the setup. It’s ideal for testing because I know the node it was fabbed in (55 nm) and the package is a bare flip-chip BGA. FCBGA is a reasonably common package type, but more importantly for IRIS, the silicon is pre-thinned and mirror-polished. This is done to improve thermal performance, but it also makes for very clean backside images.

Above is what the chip looks like in visible light.

And here’s the same chip, except in IR. The light source is shining from the top right, and already you can see some of the detail within the chip. Note: the die is 8mm wide.

Above is the lower part of the chip, taken at a higher magnification. Here we can start to clearly make out the shapes of memory macros, I/O drivers, and regions of differing routing density in the standard cell logic. The die is about 4290 pixels across in this image, or about 1.86 microns per pixel.

And finally, above is the boxed region in the previous image, but a higher magnification (you can click on any of the images for a full-resolution version). Here we can make out the individual transistors used in I/O pads, sense amps on the RAM macros, and the texture of the standard cell logic. The resolution of this photo is roughly 1.13 microns per pixel – around the limit of what could be resolved with the 1070 nm light source – and a hypothetical “9-track” standard cell logic gate might be a little over a pixel tall by a couple pixels wide, on average.

Discussion

IRIS inspection reveals the internal structure of a silicon chip. IRIS can do this in situ (after the chip has been assembled into a product), and in a non-destructive manner. However, the technique can only inspect chips that have been packaged with the back side of the silicon exposed. Fortunately, a fairly broad and popular range of packages such as WLCSP and FCBGA already expose the back side of chips.

Above: Various size scales found on a chip, in relationship to IRIS capabilities.

IRIS cannot inspect the smallest features of a chip. The diagram above illustrates the various size scales found on a chip and relates it to the capabilities of IRIS. The three general feature ranges are prefixed with micro-, meso-, and macro-. On the left hand side, “micro-scale” features such as individual logic gates will be smaller than a micron tall. These are not resolvable with infra-red wavelengths and as such not directly inspectable via IRIS, so the representative image was created using SEM. The imaged region contains about 8 individual logic gates.

In the middle, we can see that “meso-scale” features can be constrained in size and identity. The representative image, taken with IRIS, shows three RAM “hard macros” in a 55 nm process. Individual row sense amplifiers are resolvable in this image. Even in a more modern sub-10 nm process, we can constrain a RAM’s size to plus/minus a few rows or columns.

On the right, “macro-scale” features are clearly enumerable. The number and count of major functional blocks such as I/O pads, data converters, oscillators, RAM, FLASH, and ROM blocks are readily identified.

IRIS is a major improvement over simply reading the numbers printed on the outside of a chip’s package and taking them at face value. It’s comparable to being able to X-ray every suitcase for dangerous objects, versus accepting suitcases based solely on their exterior size and shape.

Even with this improvement, malicious changes to chips – referred to as “hardware trojans” – can in theory remain devilishly difficult to detect, as demonstrated in “Stealthy Dopant-Level Hardware Trojans” by Becker, et al (2013). This paper proposes hardware trojans that only modulate the doping of transistors. Doping modifications would be invisible to most forms of inspection, including SEM, X-Ray ptychography, and IRIS.

The good news is that the attacks discussed (Becker, 2013) are against targets that are entirely unhardened against hardware trojans. With a reasonable amount of design-level hardening, we may be able to up the logic footprint for a hardware trojan into something large enough to be detected with IRIS. Fortunately, there is an existing body of research on hardening chips against trojans, using a variety of techniques including logic locking, built in self test (BIST) scans, path delay fingerprinting, and self-authentication methods; for an overview, see “Integrated Circuit Authentication” by Tehranipoor.

IRIS is a necessary complement to logic-level hardening methods, because logic-only methods are vulnerable to bypasses and emulation. In this scenario, a hardware trojan includes extra circuitry to evade detection by spoofing self-tests with correct answers, like a wolf carrying around a sheep’s costume that it dons only when a shepherd is nearby. Since IRIS can constrain meso-scale to macro-scale structure, we can rule out medium-to-large scale circuit modifications, giving us more confidence in the results of the micro-scale verification as reported by logic-level hardening methods.

Above: Comparison of the detection-vs-protection trade offs of logic level hardening and IRIS inspection.

Thus, IRIS can be used in conjunction with logic-level trojan hardening to provide an overall high-confidence solution in a chip’s construction using non-destructive and in situ techniques, as illustrated above.

The primary requirement of the logic-level hardening method is that it must not be bypassable with a trivial amount of logic. For example, simple “logic locking” (a method of obfuscating logic which in its most basic form inserts X(N)ORs in logic paths, requiring a correct “key” to be applied to one input of the X(N)ORs to unlock proper operation) could be bypassed with just a few gates once the key is known, so this alone is not sufficient. However, a self-test mechanism that blends state from “normal runtime” mode and “self test” mode into a checksum of some sort could present a sufficiently high bar. In such a stateful verification mechanism, the amount of additional logic required to spoof a correct answer is proportional to the amount of state accumulated in the test. Thus, one can “scale up” the coverage of a logic-level test by including more state, until the point where any reliable bypass would be large enough to be detected by IRIS (thanks to jix for pointing me in the right direction!). The precise amount of state would depend on the process geometry: smaller process geometries would need more state.

Under the assumption that each extra bit would imply an additional flip flop plus a handful of gates, a back-of-the-envelope calculation indicates a 28 nm process would require just a few bits of state in the checksum. In this scenario, the additional trojan logic would modify several square microns of chip area, and materially change the scattering pattern of infra-red light off of the chip in the region of the modification. Additional techniques such as path delay fingerprinting may be necessary to force the trojan logic to be spatially clustered, so that the modification is confined to a single region, instead of diffused throughout the standard cell logic array.

Summary and Future Direction

IRIS is a promising technique for improving trust in hardware. With a bit of foresight and planning, designers can use IRIS in conjunction with logic hardening to gain comprehensive trust in a chip’s integrity from micro- to macro-scale. While the technique may not be suitable for every chip in a system, it fits comfortably within the parameters of chips requiring high assurance such as trust roots and secure enclaves.

Of course, IRIS is most effective when combined with open source chip design. In closed source chips, we don’t know what we’re looking at, or what we’re looking for; but with open source chips we can use the design source to augment the capabilities of IRIS to pinpoint features of interest.

That being said, I’m hoping that IR-capable microscopes become a staple on hardware hacker’s workbenches, so we can start to assemble databases of what chips should look like – be they open or closed source. Such a database can also find utility in everyday supply chain operations, helping to detect fake chips or silent die revisions prior to device assembly.

Over the coming year, I hope to improve the core IRIS technique. In addition to upgrading optics and adding image stitching to my toolbox, digitally controlling the angle and azimuth of incident light should play a significant role in enhancing the utility of IRIS. The sub-wavelength features on a chip interact with incident light like a hologram. By modifying the azimuth and angle of lighting, we can likely glean even more information about the structure of the underlying circuitry, even if they are smaller than the diffraction limit of the system.

A bit further down the road, I’d like to try combining IRIS with active laser probing techniques, where IRIS is used to precisely locate a spot that is then illuminated by an intense laser beam. While this has obvious applications in fault induction, it can also have applications in verification and chip readout. For example, the localized thermal stimulation of a laser can induce the Seeback effect, creating a data-dependent change in power consumption detectable with sensitive current monitors. I note here that if physical tamper-resistance is necessary, post-verification a chip can be sealed in opaque epoxy with bits of glitter sprinkled on top to shield it from direct optical manipulation attacks and evil-maid attacks. However, this is only necessary if these attacks are actually part of the threat model. Supply chain attacks happen, by definition, upstream of the end user’s location.

The other half of optical chip verification is an image processing problem. It’s one thing to have reference images of the chip, and it’s another thing to be able to take the image of a chip and compare it to the reference image and generate a confidence score in the construction of the chip. While I’m not an expert in image processing, I think it’s important to at least try and assemble a starter pipeline using well known image processing techniques. A turnkey feature extraction and comparison tool would go a long way toward making IRIS a practically useful tool.

Ultimately, the hope is to create a verification solution that grows in parallel with the open source chip design ecosystem, so that one day we can have chips we can trust. Not only will we know what chips are intended to do, we can rest assured knowing they were built as intended, too.

This research is partially funded by a NGI Zero Entrust grant from NLnet and the European Commission, as well as by the donations of Github Sponsors.