Non-Destructive Silicon Imaging (and Winner of Name that Ware December 2022)

The ware for December 2022 is an AMD Radeon RX540 chip, part number 216-0905018. Congrats to SAM for guessing the ware; email me for your prize. The image is from Fritzchen Fritz’s Flickr feed; I recommend checking out his photos (or you can follow him on twitter). Even if you aren’t into photos of chips, he elevates it to an art. Even more amazingly, all of his work is public domain; hats off to him for contributing these photos to the commons with such a generous license, because it is not easy to prepare the material and take images of this quality. If any of my readers happens to know him and are willing to make an introduction, I’d appreciate that. I only discovered his work by chance while doing some background research.

First, here is the entire photo from which the ware was cropped:


Credit: Fritzchen Fritz

Interestingly, you can see the design of the chip in this photograph. This is not photoshop; based on the notes accompanying the photo, this was taken in “NIR”, or near-infrared, using a Sony NEX-5T.

Silicon is transparent to IR, and so, photographs taken in infra-red can be used to verify, at a coarse level, the construction of a chip!

I was pretty excited to see photos like this posted on the Internet, at full-resolution, because I have only read about this technique in journal articles. Silicon becomes very transparent in infrared:


Silicon’s absorption of light in the near infrared range. A lower value is more transparent. Generated using PV lighthouse.

This principle forms the foundation of my efforts to verify the construction of silicon in a non-destructive fashion.

The line between NIR/SWIR (near/shortwave infrared) depends on who you ask, but according to Edmud Optics, it places the line at 1000nm. By this definition, I’m inferring that the above photograph was probably taken using a powerful 900nm illuminator positioned to the left of the chip near the horizon. A bright light at that wavelength would have sufficient power to penetrate the ~1mm thickness of silicon to image the circuits on the other side, and placing it near the horizon prevents swamping the sensor with reflected light except for the bits of metal that happen to catch the light and reflect it upwards.

It’s also possible to do this with a SWIR sensor, using a wavelength closer to 1300nm (where silicon is as transparent as glass is to visible light), but the resolution of the photographs are much higher than the best SWIR sensor that I’m aware of. Unfortunately, it seems all interesting technologies are regulated by the US government’s ITAR, and SWIR area-scan sensors are no exception. I’m guessing they are also a critical component of night vision gear, and thus it is hard to obtain such sensors without a license. Regardless, even the photos taken at 900nm are a powerful demonstration of the utility of IR for inspecting the construction of silicon.

Here’s another image taken using what looks like the same technique:


Credit: Fritzchen Fritz

This is of the Via Centaur CHA, which has an excellently detailed Wikichip page complete with floorplans, such as the one shown below.


Credit: Wikichip

Remember, the IR image is from the back side of the die, so you have to mirror-image (and rotate) the front-side floorplan in your head to line it up with orientation of the photograph.

According to Wikichip, this is a TSMC 16FFC (16nm) process, with a 194mm^2 die area. This means the die above is about 13.9 mm on a side. The image as-is (which is 90% package and 10% die) resolves at about 18um/pixel, so perhaps if it was a die-only shot we could resolve at something close to 5um/pixel in a single image.

With image stitching, the resolution can be even higher:


Credit: Fritzchen Fritz


Credit: Fritzchen Fritz

In these two photos, it seems the light source was rotated 90 degrees with respect to the chip, so that different sets of components are highlighted, depending on the bias of the metal routes for that component. Note that I’m inferring this image is taken through the back side because of the presence of scratches that would be from the exposed surface of the silicon, and the orientation of the imaged die is consistent with a back-side shot.

The resolution of the above images boils down to about 3um/pixel — getting fairly close to the limit of what you can do with NIR light. To put this in perspective, TSMC 16FFC has minimum metal pitch of 64nm, so a 9-track standard cell would be 0.576um tall, and an SRAM bitcell has a size of 0.074um^2, so one pixel encompasses roughly 25 logic gates or 120 bits of SRAM. In these images, you can clearly make out variations in the density of standard cell logic, as well as the size and location of individual memory macros; the internal structure of the PCI-express drivers is also readily apparent.

I’ve been contemplating silicon supply chain attacks quite a bit, and I think that at this resolution, one can rule out the following forms of silicon supply chain attacks:

  • Replacement of the chip with an entirely different design that emulates the original
  • Insertion of a ROM larger than a few hundred bits containing alternate microcode or instruction codings
  • Insertion of a RAM macro for recording data — probably of any practical size for a RAM macro, due to the presence of line drivers/amplifiers creating a high-signal reflection
  • Insertion of extra I/O drivers
  • Potential detection of extra eFuse elements
  • Likely able to detect recompilation/resynthesis of standard cell blobs

This significantly constrains the types of attacks one has to worry about. Without backside imaging and just looking at the exterior package, it’s difficult to even know if a chip has been wholesale replaced for an inferior clone or an emulated version. The inability to add significant amounts of microcode ROM or RAM constrains the types of modifications one could make to a CPU and “get away with it”; with some additional design-level guard rails and open source RTL I suspect one could virtually eliminate effective CPU instruction-level modifications that doesn’t also introduce ISA-level flaws in every mode of operation that could be easily detected with a software-only test.

I have reasons to suspect that modifications to an eFuse box would be detectable, but because eFuses are carefully guarded black boxes such that even chip designers are not allowed to see their insides, it’s possible that a foundry could just build a back door into every eFuse box and we wouldn’t be able to tell the difference because it would be “normal”.

Finally, depending on the repeatability of the place/route tool, a modification to the RTL that triggers a re-synthesis and place/route could change the gross morphology of the standard cell blob. However, I’m not familiar enough with the latest industry-standard tools to know how big a difference that would create. I imagine there are ways to control the place and route seed so that results look very similar if only small changes are made to the RTL, such as inserting a patch wire on a single bit in a non-congested region of a design. However, a larger change, such as the insertion of a 64-bit sampling register in the datapath somewhere, would likely be detectable with this level of imaging.

There’s still a class of exploits that could be undetected with this level of imaging. This would include:

  • Small changes to test access paths; for example, patching existing ATPG scan chain logic to an existing but unused point on an I/O mux hard macro. This could facilitate unrestricted access to internal state with some additional off-chip circuitry.
  • Spare cell-only modifications that are manually patched using higher metal levels. These patches would be obscured from the back side due to masking by lower metal layers, and by definition no additional transistors are involved.
  • Dopant-level attacks, where transistor flavor or threshold voltages are modified, perhaps to bias a random number generator or to modify the function of a single gate.
  • Other careful modifications that disturb fewer than ~100 logic gates or ~100 bits of SRAM.

However, the attack surface of concern is by far smaller with this level of imaging than the current state-of-practice, which consists of squinting at the top markings on a chip package.

My hope for supply chain verification is that end users can establish a practical amount of trust in silicon chips through a combination of imaging and design analysis, without requiring a fully-open PDK (although it certainly is easier and better if the PDK is open). The missing link is an automated imager that can produce results similar to the ones demonstrated by Fritzchens Fritz. These images can then be compared against die shots released by the designer. These die shots would be low enough resolution to not violate foundry NDA, but still have enough detail to constrain the intended positions of blocks. The remaining verification gap (on the order of hundreds of gates / hundreds of bits of SRAM) could be filled in with design techniques that harden against small exploit vectors, perhaps by the use of redundant/byzantine fault tolerant logic, or by some combination of inducing faults and scan chain analysis to confirm correct construction at the gate level. And finally, open source RTL is required to help establish a link between what is visible, and what was intended by the designer (and of course also to help discover any bugs/backdoors introduced by the designer).

And now back to the Name that Ware competition. Confusingly, one of the first answers in the comments points to a tweet that also claims to have taken the photo. I did a bit of poking about and the image appears to be identical to the one on Fritzchen Fritz’s feed, down to the position of solder particulates and lint. There’s a number of possible explanations for this; I won’t speculate as to what is going on, but I will comment that the chip is not typically referred to as an “AMD M74AP” — M74AP is the lot code, so I couldn’t declare Taylan the winner, unfortunately (so close, though!). 216-0905018 is the canonical part number; if you search around for the part number, you will see several examples of chips that have the same part number, but a different lot code. This one, for example, has a lot code of M62K8.00.

Postscript

When going through Fitzchen Fritz’s photos, I was also considering using this image as the Name that Ware:


Credit: Fritzchen Fritz

It’s a tiny portion (1/400th the area) of an Intel i3-8121U (187MiB full-res mirror link), fabbed in a 10nm process. The region is cropped from a section centered in the top right quadrant of the image.

In terms of actual dimensions, the region is about 485um x 375um if I’ve done my math right – about the area covered by a medium sand particle. According to Wikichip, a 9-track standard cell would be 0.324um high, so if the area were covered with nothing but square 9-track standard cells, it would hold 1500 x 1150 cells (1.7M cells, or about one gate per pixel in the photo), or 700kiB of the densest SRAM cells (without sense amps etc.)

However, the area is not homogeneously covered with one or the other, and in fact has lots of unused silicon. The darker purplish regions are unused silicon — for one reason or the other (often times routing/floorplanning constraints, and sometimes schedule constraints), there are no logic transistors there. I think only the solid tan regions in the lower left hand corners contain high density SRAM cells; the smaller rectangles above them could contain SRAM, but could also be some other type of memory more optimized for performance or port count.

Each SRAM region is divided by sense amps and other driver logic. One solid, SRAM-cell-only region is about 48.7×28.7um, which is about 5.4kiB, so the overall region of larger rectangles holds about 22kiB of memory, including an overhead of about 35% for the drivers and amps. Likewise, the cauliflower-like structure in the center is about 750 gates wide by 900 gates high (if the gates were square — which they aren’t, so this is an upper bound), or about 600k gates (again, this image is at a resolution of about 1 pixel/gate). That would fit about a dozen VexRiscv cores, or a few 80486’s, so it’s not a small chunk of logic.

Finally, I think (but am not sure) that the rectangular cut-out regions within the cauliflower-region are clock drivers or repeaters. No transistors are placed in the trench around them probably to meet thermal flux constraints, and I also wouldn’t be surprised if they packed some local decoupling capacitors around the drivers using dummy transistors and/or MIM capacitors to reduce power droop and induced jitter in that region.

What I love about this image is how clouds of standard cells take on organic shapes when viewed at this resolution. To me it looks more like mold or bacteria growing in a petri dish than the pinnacle of precision manufactured goods. But perhaps this is just convergent evolution in action, driven by the laws of physics: signals diffuse through on-chip wires, much like nutrients in a media.

11 Responses to “Non-Destructive Silicon Imaging (and Winner of Name that Ware December 2022)”

  1. Bryce says:

    Is this the fastest/shortest time between a NTW and the announcement of its winner?

    The fact the internals can be imaged at all using near-visible light is really quite mind-blowing. Thank you for the in-depth details!

    • bunnie says:

      Actually, this one took a relatively long time to be guessed. However, I usually hold off until the end of the month before announcing the winner. I time it simultaneously with the release of the next ware so that the top post is a picture of a Ware, and not a block of text announcing a winner.

      But this time I actually had a lot to say about the Ware, so the moment it was guessed I announced the winner so I could share commentary. The next Ware won’t go up until the end of this month, per usual.

  2. Taylan Ayken says:

    Ahh, so close. But at least I got the date code right. The tweet referred to it as an M74AP, so I just went with it. I saw the 216-0905018 and thought it was the serial number or some sort of ID number.

    As for the similarity of the pictures: I’m not sure what you refer to as “lint”, but some solder balls and bumps on the carrier seem to be identical. I tried to do some metadata analysis as well but that didn’t pan out anything useful.

    • bunnie says:

      The “lint” are small bits of fibrous material (probably left over from a tissue or cotton swab) on the package, mostly found near/around the decoupling capacitors.

      On Fritz’s larger image, there is a fairly apparent example of a fiber toward the lower-right hand corner. However, the image in the tweet is cropped, so you can only see the more subtle fibers that are in between a pair of decoupling capacitors of the original’s top-left corner (lower-left on the tweet). You may have to adjust the contrast of the photo to see it clearly.

      It’s really tough to tell the difference between lot codes and part numbers, especially for parts that don’t publish easily obtainable datasheets. I make the mistake all the time myself, so when I saw your answer I had to triple-check myself. I was really impressed that you found an image so quickly though!

      • Taylan Ayken says:

        Ah, OK, now I see what you’re referring to. Yeah, the first piece of lint is cropped out. The fibers in the second location are really hard to see, but they appear to be the same. Good eyes!

        Another interesting area is the top of the die in the original and the left side of the die in the tweet. I’m not sure if those are scratches or something else. But the pattern there seems to be the same as well.

  3. Paul Campbell says:

    I’m not sure that the dark purple areas on the image you’re showing are unused, what I suspect we’re seeing is a process where maybe layers 3/4 are power and ground and pretty regular and when the underlying logic gets really congested the router is pushing some routing up to the top layers (above the power/ground nets) which is what’s showing as grey.

    Each of those blocks (you can see where stuff suddenly changes) was likely routed separately before being hooked up on the die.

    Back when we just had 2-3 layers of metal we could see the silicon in die shots, these days with 5-6 layers all we really get to see are metal – and on some processes with essentially ground/power planes on the upper layers there’s not much to see at all, other than the outlines of individual blocks

    • bunnie says:

      You’re right, the areas aren’t entirely unused! Some of the empty spaces are almost certainly used for routing, which is often a limiting factor.

      What I meant to say, more precisely, is that the areas are devoid of logic transistors. It’s also possible that instead of logic transistors, some form of MOS decoupling capacitor structure is put there; but that could also have a negative impact on yield. In which case it might still not be entirely blank, it’s likely a fill pattern of some sort to improve the uniformity of critical structures.

      • Paul Campbell says:

        I was more suggesting that there likely is logic under those purple areas, it’s just not dense enough to push routing into those upper layers (upper layer wires tend to be slower, higher cap, than lower wires so they only get used if there’s no other space)

  4. Adam M. Joseph says:

    Wow, awesome posting! More of these, please.

    > it’s possible that a foundry could just build a back door into every eFuse box and we wouldn’t be able to tell the difference because it would be “normal”.

    That is truly nightmare-inducing, but how would the foundry be sure it could access the backdoor? Most chips have a state machine that copies the fuse values into SRAM at startup. If the “already written” bit causes the (designer-controlled) JTAG chain to ignore OTP write commands, how does the badguy(tm) tickle the fuse? Anything complex enough to decode an “open sesame” password modulated onto the OTP Write Voltage pin is the sort of thing that ChipWorks would notice right away. All the other eFuse pins are wrapped with designer-controlled logic.

    > Finally, depending on the repeatability of the place/route tool

    /me giggles when hearing “repeatability” and “place+route” in the same sentence :)

    While we’ve got our tinfoil hats on, what do you think about those ~Xum*~Xum metrology holes you’re required to leave in the GDS every ~Xmm*~Xmm? Someone told me that a certain popular fab calls these “In Chip Overlays” and requires them on GDS layer 165. This isn’t dummy/CMP fill; the “holes” are almost a thousand hole-widths away from each other; at that spacing they can’t be used to fix density issues.

    • bunnie says:

      Of course, if someone did a full de-layer and traced out every transistor, it’s quite possible a backdoor would not escape such analysis. However, I could imagine a few factors that complicate detection.

      eFuse blocks aren’t 100% reliable (and they have a propensity to “heal” over time) so they require some form of ECC. The ECC algorithm is not documented, it’s often non-trivial, and it can be part of the eFuse black box. The ECC logic could be place/routed from RTL by the foundry for every chip. At the very least, the logic would be re-generated for each process variant, and is likely different for every width/bank configuration of eFuse. There is even precedent for it to be different for every customer, since well-known “hard macros” such as RAM macros are, for example, generated to-spec for each design, and can include options like spare rows and trimming bits for improved yield, that are sometimes controlled by commands issued by the foundry at wafer test. Presumably the eFuse logic would also need some sort of ATPG scan chain for wafer test. So, some combination of custom ECC + scan chain + slightly different logic per design would make a bigger metaphorical haystack to bury a needle in.

      I’ve heard of the metrology holes…those are also totally opaque and subject to change by the foundry, even post-tapeout. I suppose you could maybe bury something that’s parasitically powered in the metrology hole that could gather information from the chip through capacitively coupled wires, so even if there were no obvious wires between the metrology holes and the rest of the chip, maybe it could still gather information. I think it’s a bit of a stretch tho…I’d like to imagine the stuff in the metrology hole would probably have a pretty regular structure of things that are, well, easy to measure, since that’s what it’s ostensibly for. I could believe a ring oscillator, or even a bank of proto-SRAM; but if I saw significant random logic and wires going to the chip from the metrology hole, I’d be suspicious…

  5. Adam M. Joseph says:

    > eFuse blocks aren’t 100% reliable (and they have a propensity to “heal” over time)

    That sounds like a floating-gate cell (like flash) that’s being passed off as a fuse. I wish they wouldn’t do that… it would be more honest to at least call it “OTP”. On the older technologies where I’ve seen the eFuse cells (130nm) you got individual one-bit macros.

    But yeah if the fuse isn’t really a fuse… I can see why they would be so secretive about that.

    > I suppose you could maybe bury something that’s parasitically powered in the metrology hole that could gather information from the chip through capacitively coupled wires

    I don’t think it has to be that tricky. If they want to monkey with *every* wafer they’ll just change the mask (foundries stopped letting you buy your masks elsewhere a long time ago). If they want to monkey with just one lot of wafers they’ll expose those wafers twice.

    The concern is that the big metrology hole leaves them a place where they can drop something the size of a microcontroller. Then the only thing they need to snake through the fill areas of your design is the probe wires.

    > if I saw significant random logic and wires going to the chip from the metrology hole,

    I think the idea is to insert it *after* showing you the signoff geometry. Sort of like how you never get to see the post-OPC geometry, since how OPC is done is one of the most closely guarded secrets. There are a bunch of steps between the last thing you get to see and the mask being cut.

Leave a Reply