ReRISC Reconfigurable Reduced Instruction Set Computer

"Performance where you need it, Convenience when you want it"

Abstract

The Reconfigurable RISC (ReRISC) processor gives users the opportunity to create application specific instructions for enhanced performance while providing the programming convenience of a conventional RISC processor. The core of the ReRISC consists of an array of 38x8 computational elements, each with 8 configuration contexts that are selectable on a cycle by cycle basis. The computational elements default to the MIT Beta ISA upon soft reset, which reduces redundant reconfiguration cycles. In conjunction with a reconfigurable NOR plane, the core can be wired to perform a wide variety of operations, including vector-style packed word operations, multiply-accumulates, random permutations, tag field verification, and bit field packing and unpacking. This last feature makes the ReRISC better suited for the interpretation of nonnative binaries. The datapath of the 1.8 million transistor ReRISC processor was conceived, designed, implemented and verified in this design project.

Documentation

ReRISC Project Report for MIT's 6.371 Introduction to VLSI Systems

On-line documentation for the first generation ReRISC prototye.

ReRISC slide presentation

Brain Candy

With the ReRISC, compilers can now analyze programs and determine the optimal instruction set architecture (ISA) for that particular program. The code can then be compiled into a binary for that ISA, and executed on the ReRISC. For example, the code for a JPEG decompressor would run best on an ISA which supports vector-style operations (MMX), while the code for an encryption algorithm could take advantage of powerful bit-manipulation instructions. The ReRISC do both.

The ReRISC is well suited for executing non-native binaries. Its powerful full-crossbar, 1/2 PLA NOR plane combined with a programmable masking unit lets the ReRISC extract bitfields out of non-native instructions in a single cycle.

The full-crossbar NOR plane also makes the ReRISC uniquely suited for implementing cryptographic algorithms. One can perform the DES P-box in four cycles, as well as RC-5 data dependent rotations in a single cycle.

The ReRISC architecture may offer better perfomance scaling than conventional processors with decreasing line geometries. Current processors run faster at smaller geometries primarily because the transistors get faster. However, they are unable to efficiently utilize the huge number of transistors available in cutting edge processes because of the complexity involved in superscalar and other parallel architectures; instead, designers are starting to just throw really large caches on-chip for only a few percent gain in performance. Because of the ReRISC's array structure, the increased areal density gained by finer lithography can translate directly into higher performance. For example, a multiply operation on the first-generation ReRISC processor takes four clock cycles because it is only capable of computing 8 partial products simultaneously (the processor is a 38x8 array). Scaling the array to twice its size allows one to complete the operation in half the time. This is in addition to the speedup afforded by the faster transistors.

The ReRISC architecture may provide a good solution for the hardware support of tagged datatypes. Data tags can assist the implementation a number of important software abstractions, including pointer validation, safe datatype management, secure memory management, garbage collection, atomic semaphores, virtual memory, and hash tables. Hardware support for tags can significantly boost the performance of systems which utilize tags, but until now, a change in the software spec for tags meant buying a new processor. ReRISC gives programmers the convenience of being able to arbitrarily change tag definitions without losing the power of hardware support for tags.

The ReRISC architecture allows for the reuse and scaling of instruction set configurations. The mapping of instruction definitions into the computational array is independant of many array parameters, such as the size of the array. Thus, one can upgrade the ReRISC hardware by adding more computational elements in the array while maintaining binary-level backward compatibility. One can also trivially convert scalar instruction definitions into vector operations by simply replicating the scalar definition across the width of the vector datapath. This reusability and level of hardware independance helps encourage the development of instruction set libraries which people can conveniently share. This enables those of us who aren't ReRISC architecture wizards to still write zippy applications.

Acknowledgements

Thanks to Ed Kim, my 6.371 class project partner, for all his hard work on the register file physical design. Also, a hats off to Andre DeHon for his awesome PhD thesis on reconfigurable computing. Last but not least, thanks to TK, my advisor for 6.961 and the smax group, for his guidance and encouragement.

First generation ReRISC prototype:

Block diagram of the ReRISC datapath.

Block diagram of the ReRISC computational cell.

Layout shot of a single computational cell.

Berkeley Magic physical design of the first generation ReRISC datapath elements. -- email bunnie@mit.edu for access to files

Ideas for the next generation ReRISC:

The first generation ReRISC prototype was a heavily memory-dominated design. Future revisions of the ReRISC could do the following to help utilize silicon area more efficiently:

Combined register file and crossbar (1/2 NOR plane) units. The dimensions of the register file and crossbar arrays are very similar, and many of the wires share common functions between the two. Combining the RF and the XB would also yield a faster design.
Set the ratio of configuration memory blocks to bits of datapath controlled to 2:1. Currently, the ratio is 1:1, ie, every bit of the datapath is independantly configurable. A ratio of 2:1 in the computational array would imply that every 2 bits would have to perform an identical computation, but for most applications, that is not a great loss, since the granularity is often as coarse as 8:1. By setting the ratio to 2:1 we can reduce the memory area by 50%, and hence reduce the overall area of the computational array by close to that amount.
Use a smaller RAM cell for storing the configuration information. Currently, an 8T SRAM cell design is employed; perhaps by moving to DRAM, one can significantly reduce the area of the design. The only catch is designing a low area overhead refresh circuit that never stalls processor operation in the case that refresh and processing overlap. One solution might be to include an extra bit in each DRAM configuration context set which mirrors the information in the bit being refreshed.

The second generation ReRISC computational array should also include the computational hardware and connectivity necessary to efficiently implement floating point operations (especially multiplies and adds).

Another idea for the second generation ReRISC is to consider coupling the processor and the memory subsystems more tightly, so as to insure that the processor has sufficient bandwidth to memory. At least, the issue of balancing the processor and memory subsystem should be investigated seriously.

The next gen ReRISC should have a cleaner exception handling spec to facilitate OS development.

A suggestion for the physical design of the next gen device: lay out the computational array first, and then pitch-match the combined register file/crossbar unit to the computational array.

bunnie@mit.edu

Last modified by bunnie@mit.edu Mon May 18 23:56:10 1998

This page has been accessed at least

times since the counter was last reset, or May 14, 1998, whichever is more recent.

home