Today, I gave a talk on an implementation of a man in the middle (MITM) attack on HDCP-secured video links. Here is a full copy of the slides that I presented (with explanatory diagrams), as well as the text-only of the paper which accompanies the slides, below.
Also, please note that the hardware disclosed in this talk is now available for purchase from the good folks at Adafruit. You can find more technical documentation about the NeTV at the kosagi.com wiki, and you can discuss at the kosagi.com forum.
A man-in-the-middle attack on HDCP-secured video links is demonstrated. The attack is implemented on an embedded Linux platform, with the help of a Spartan-6 FPGA, and is capable of operating real-time on HD video links. It utilizes the HDCP master key to derive the corresponding private keys of the video source and sink through observation and computation upon the exchanged public keys. The man-in-the-middle then genlocks its raster and cipher state to the incoming video stream, enabling it to do pixel by pixel swapping of encrypted data. Since the link does no CRC or hash verification of the data, one is able to forge video using this method.
Significantly, the attack enables forging of video data without decrypting original video data, so executing the attack does not constitute copyright circumvention. Therefore, this novel and commercially useful application of the HDCP master key impairs equating, in a legal sense, the master key with circumvention. Finally, the embodiment of the exploit is entirely open-source, including the hardware and the Verilog implementation of the FPGA.
BACKGROUND & CONTEXT
In September 2010, the HDCP master key was circulated via Pastebin. Speculation ensued around the application of the master key to create HDCP strippers, which would enable the circumvention of certain copyright control mechanisms put in place around video links. Unfortunately, this is a legally risky application, for a number of reasons, including potential conflicts with DMCA legislation that criminalizes the circumvention of copyright control mechanisms.
This talk discloses a new use for the HDCP master key that side-steps some of the potential legal issues. This hack never decrypts video; without decryption, there is no circumvention, and as a result the DMCA cannot apply to this hack. Significantly, by demonstrating a bona-fide commercially significant purpose for the HDCP master key that does not circumvent an access control measure, this hack impairs the equating of trafficking or possession of the HDCP master key to circumvention and/or circumvention-related crimes.
The main purpose of this hack is to enable the overlay of video content onto an HDCP encrypted stream. The simple fact that a trivial video overlay becomes an interesting topic is illustrative of the distortion of traditional rights and freedoms brought about by the DMCA. While the creation of derivative works of video through dynamic compositing and overlay (such as picture in picture) seems intuitively legal and natural in a pre-HDCP world, the introduction of HDCP made it difficult to build such in-line equipment. The putative purpose role of HDCP in the digital video ecosystem is to patch the plaintext-hole in the transmission of otherwise encrypted video from shiny disks (DVDs, BDs) to the glass (LCD, CRT). Since the implementation of video overlay would typically require manipulation of plaintext by intermediate processing elements, or at least the buffering of a plaintext frame where it can be vulnerable to readout, the creation of such devices has generally been very difficult to get past the body that controls the granting of HDCP keys, for fear that they can be hacked and/or repurposed to build an HDCP stripper. Also, while a manufacturer could implement such a feature without the controlling body’s blessing, they would have to live in constant fear that their device keys would be revoked.
While the applications of video overlay are numerous, the basic scenario is that while you may be enjoying content X, you would also like to be aware of content Y. To combine the two together would require a video overlay mechanism. Since video overlay mechanisms are effectively banned by the HDCP controlling organization, consumers are slaves to the video producers and distribution networks, because consumers have not been empowered to remix video at the consumption point.
The specific implementation of this hack enables the overlay of a WebKit browser over any video feed; a concrete example of the capability enabled by this technology is the overlay of twitter feeds as “news crawlers” across a TV program, so that one may watch community commentary in real-time on the same screen. While some TV programs have attempted to incorporate twitter feeds into the show, the incorporation has always been on the source side, and as such users are unable to pick their hashtags. Now, with this hack, the same broadcast program (say, a political debate) can have a very different viewing experience based on which hashtag is keyed into the viewer’s twitter crawler.
A Spartan-6 FPGA was used to implement a TMDS-compatible source and sink. TMDS is the signaling standard used by HDMI and DVI. The basic pipeline within the FPGA deserializes incoming video and reserializes it to the output. In this trivial mode, it is simply a signal amplifier for the video.
In order to enable the overlay of a WebKit browser, an 800 MHz ARM-based Linux computer is connected to the FPGA. The Linux computer is based upon the PXA168 by Marvell, and it features 128 MB of DDR2 and a microSD card for firmware. The distribution is based upon Angstrom and it is built using OpenEmbedded with the help of buildbot. The entire build system for the Linux computer is available through a public EC2 cloud image that anyone can copy and rent from Amazon.
From the Linux computer’s standpoint, the FPGA emulates a parallel RGB LCD, and thus from the programming standpoint looks simply like a framebuffer at /dev/fb0. There is also a device management interface revealed through I2C that is managed using the standard Linux I2C driver. The I2C management interface handles routine status requests, such as reading the video timing and PLL state, and also handles reading out sections of snooping buffers, the significance of which will be discussed later. The FPGA also has a chroma-key feature where a magic color (240,0,240) is remapped to “transparent”.
The FPGA itself is bootstrapped through a programming interface where the device’s compiled bitstream is sent to the FPGA by writing to /dev/fpga. There are also IOCTLs available on /dev/fpga that enable other meta-level functions such as resetting the FPGA or querying its configuration state.
In addition to passing through the TMDS signal, the FPGA also has the ability to listen to *and* manipulate the DDC. The DDC is an I2C link found on HDMI cables that enables the reporting of monitor capability records (EDIDs) and also is the medium upon which the key exchange happens. Therefore, being able to listen to this passively is of great importance to the hack. The FPGA implements a “shadow-RAM” which records all reads and writes to specific addresses that fall within the expected address ranges for EDID and HDCP transactions.
The FPGA also implements a “squash-RAM” which is used to override bits on the I2C bus. Since I2C is an open collector standard, overriding a 1 to a 0 is trivial; but, overriding a 0 to a 1 requires an active pull-up. The hardware implements a beefy FET on the DDC to enable overriding 0’s to 1’s. The DDC implementation uses a highly oversampled I2C state machine. I2C itself only runs at 100 kHz, but the state machine implementation runs at 26 MHz. This allows the state machine to determine the next state of the I2C bus and decide to override or allow the transaction on-the-fly. The “squash-RAM” feature is used to override the EDID negotiation such that the video source is only informed of modes that the FPGA implementation can handle. For example, this implementation cannot handle 3D TV resolutions, so the reporting of such capabilities from the TV is squashed before it can get to the video source. This causes the source to automatically limit its content to be within the hardware capabilities of the FPGA, and to be within the resolutions that are supported by the WebKit UI.
The key exchange on HDCP consists of three pieces of data being passed back and forth: the source public key (Aksv), the sink public key (Bksv), and a piece of shared state (An). The order in which these are written is well-defined. The completion of the transfer of the final byte of Aksv serves as a trigger to initialize the cipher states of the source and the sink. During this time period, each device computes the dot-product of the other device’s KSV with their internal private key (which is a table of forty 56-bit numbers) and derives a shared secret, known as Km. This is basically an implementation of Blom’s Scheme.
In order to implement the man-in-the-middle attack, the three pieces of data are recorded, and the authentication trigger is passed from the FPGA to the Linux computer through an udev event. udev triggers a program that reads the KSVs from the snoop memory, and performs a computation upon the HDCP master key and the KSVs to derive the private keys that mirrors those found in each of the source and sink devices. In a nutshell, the computation loops through the 40×40 matrix of the HDCP master key, and based upon the KSV having a 1 at a particular bit position it sums in the corresponding 40-entry row or column of the master key to the 40-entry private key vector. The use of a row or columns depends upon if the KSV belongs to a source or a sink.
Once the private keys vectors have been derived, they can be multiplied in exactly the same fashion as would be found in the source or sink to derive the shared secret, Km.
This shared secret, Km, is then written into the FPGA’s HDCP engine, and the cipher state is ready to go. In practice, the entire computation can happen in real-time, but some devices go faster or slower than others, so it is hard to guarantee it always completes in time, particularly with the variable interrupt latency of the udev handler. As a result, the actual link negotiation caches the value of Km from previous authentications, and the udev event primarily verifies that Km hasn’t changed (note that for each given source and sink pair, Km is static and never changes, so unless users are pulling cables out and swapping them between devices, Km is essentially static). If the Km has changed, it updates the Km in the FPGA and forces a 150ms hot plug event, which re-initiates the authentication, thereby making the transaction fairly reliable yet effectively real-time.
Significantly, this system as implemented is incapable of operating without having the public keys provided by both the source and the sink. This means that it cannot “create” an HDCP link: this implementation is not an operational HDCP engine on its own. Rather, it requires the user of this overlay hack to “prove” it has previously purchased a full HDCP link through evidence of valid public keys.
Once the FPGA’s HDCP cipher state is matched to the video source’s cipher state, one can now selectively encrypt different pixels to replace original pixels, and the receiver will decrypt all without any error condition. This is because encryption is done on a pixel by pixel basis and the receiver does little in the way of verification. The lack of link verification is in fact quite intentional and necessary. The natural bit error rate of HD video links is atrocious; but this is acceptable, because the human eye probably won’t detect bit errors even on the level of 1 in every 10,000 bits (at high error rates, users see a “sparkle” or “snow” on the screen, but largely the image is intact). Therefore, this latitude in allowing pixel-level corruption is necessary to keep consumer costs low; otherwise, much higher quality cables would be required along with FEC techniques to achieve a bit error rate that is compatible with strict cryptographic verification techniques such as full-frame hashing.
The selection of which pixel to swap is done by observing the color of the overlay’s video. The overlay video is not encrypted and is generated by the user, so there is no legal violation to look at the color of the overlay video. Note that other pixel-combining methods, such as alpha blending, would necessitate the decryption of video. If the overlay video matches a certain chroma key color, the incoming video is selected; otherwise, the overlay video is selected. This allows for the creation of transparent “holes” in the UI. Since the UI is rendered by a WebKit browser, chroma-key is implemented by simply setting the background color in the CSS of the UI pages to magic-pink. This makes the default state of a web page transparent, with all items rendered on top of it opaque.
Note that pixel-by-pixel manipulation of the incoming video feed is done without any real buffering of the video. A TMDS pixel “lives” inside the FPGA for less than a couple dozen clock cycles: the lifetime of a pixel is simply the latency of the pipelines and the elastic buffers required to deskew wire length differences between differential pairs. This means that the overlay video from the Linux computer must be strictly available at exactly the right time, or else the user will see the overlay jitter and shake. In order to avoid such artifacts, the time resolution requirement of the pixel synchronization is stricter than the width of a pixclock period, which can be as short as dozen nanoseconds.
In order to accomplish this fine-grain synchronization, a genlock mechanism was implemented where vertical retrace signals (which are unencrypted) trigger an interrupt that initiates the readout of /dev/fb0 to the FPGA. However, the interrupt jitter of a non-realtime Linux is much larger than a single pixel time, so in order to absorb this uncertainty, a dynamic genlock engine was implemented in the FPGA. An 8-line overlay video FIFO is used to provide the timing elasticity between the Linux computer and the primary video feed; and the vertical sync interrupt-to-pixel-out latency of the Linux computer is dynamically measured by the FPGA and pre-compensated. In effect, the FPGA measures how slow the Linux box’s reflexes are, and requests for the frame to start coming in advance of when the data is needed. These measures, along with a few lines of FIFO, ensure pixel availability at the precise time when the pixel is needed.
A system has been described that enables a man-in-the-middle attack upon HDCP secured links. The attack enables the overlay of video upon existing streams; an example of an application of the attack is the overlay of a personalized twitter feed over video programs. The attack relies upon the HDCP master key and a snooping mechanism implemented using an FPGA. The implementation of the attack never decrypts previously encrypted video, and it is incapable of operating without an existing, valid HDCP link. It is thus an embodiment of a bona-fide, non-infringing and commercially useful application of the HDCP master key. This embodiment impairs the equating of the HDCP master key with copyright circumvention purposes.