We have a guest contribution from Zvi Or-Bach, the President and CEO of MonolithIC 3D Inc. For over 4 decades the gap between computer processing speed and memory access has grown at about 50% per year, to more than 1,000x today. This provides an excellent opportunity to enhance the single-core system performance. An innovative 3D integration technology combined with re-architecting the integrated memory device is proposed to bridge the gap and enable a 1,000 x improvement in computer systems. |
For over 4 decades the gap between computer processing speed and memory access has grown at about 50% per year, to more than 1,000x today. This provides an excellent opportunity to enhance the single-core system performance. An innovative 3D integration technology combined with re-architecting the integrated memory device is proposed to bridge the gap and enable a 1,000 x improvement in computer systems. The proposed technology utilizes processes that are widely available and could be integrated in products within a very short time.
Keywords—processor-memory gap; 3D memory;
The Fig. 1 illustrates the growing gap between processing and memory access [1,2]
A joint research by teams from Stanford University, Berkeley and Carnegie Mellon was summarized in their paper “Energy-Efficient Abundant-Data Computing: The N3XT 1,000×” illustrated in Fig. 6 [7].
3D integration using TSV has been considered as another attractive option to bridge the gap. A paper by D. H. Woo concluded “On average, for single-threaded memory intensive applications, the speedups range from 1.53 to 2.14 compared to a conventional 2D architecture” [8]. This is far less than the monolithic 3D work done by Stanford. In Fig. 12 Micron provides a chart comparing TSV vs. DDR3. Micron has formed an industry consortium named Hybrid Memory Cube – “HMC,” to leverage TSV for bridging the memory gap.
While the various approaches offer clear performance improvements, they are far from what is suggested by the N3XT factor of 1,000x. It would seem that the inherent limitations (density, delay) of the TSV technology are the limiting factors. We therefore propose a novel 3D integration that offers more than 1,000x better vertical connectivity to enable comprehensive bridging of the processor memory gap, while still utilizing widely available commercial processes that make it attractive for fast industry adoption.
In TSV technology the layer thickness of each layer in the stack is tens of microns (~50 μm) and the via through each stacked layer is about 5 microns in diameter. Monolithic 3D integration enables stacked layers of tens of nanometers (~50 nm) with vias that are of similar size of regular vias between metal layers. All currently known 3D integration techniques that provide such vertical connectivity require major process and equipment changes at the wafer fab. The following approach is a monolithic 3D innovation that overcomes this challenge.
Or-Bach in “Modified ELTRAN® - A Game Changer for Monolithic 3D” presented a 3D integration that does not require process change but would require a special porous base substrate [9]. Here we propose an alternative substrate could be far more readily available (Fig. 8).
This fine grained 3D integration could be used for reengineering memory product into two strata, one for arrays of bit cells and one for memory control circuitry, as is illustrated in Fig. 10.
Having the memory peripheral circuitry not in the periphery of the device but rather on top of it, allows cost effective breaking up of the memory array into an array of small memory units, such as 200 μm x 200 μm units, each with its own word-lines and bit-lines.
Additionally, multiple memory strata could be vertically integrated to offer much larger memory capacity for the same footprint. Fig. 11 illustrates a region of such memory strata having two adjacent memory units with word-lines or bit-lines traveling in-between, and a per-layer selector, and nano-pads for vertical connectivity.
Fig. 13 illustrates a vertical cross-section view of 4 strata distributing the top select signal to each stratum in the stack.
This connectivity structure opens up many usage options, including redundancy to overcome defects. The memory is constructed by stacks of strata, each constructed as array of units connected in parallel with a select signal per stratum. One of these strata could serve as a redundancy stratum with per unit select allowing repair at the unit level. In addition, multiple memory access options could be enabled from high speed local access to a global—albeit somewhat slower—access to large array of units. An additional advantage of this architecture is having one (or two) memory control strata to service multiple memory strata.
We have presented a 3D integration technology combined with memory architecture that could utilize existing processes and infrastructure to bridge the processor memory gap. Using these concepts would enable the current connectivity of about 100 wires at average 20 mm length to be replaced with 100,000 wires with average 20 μm length, with the corresponding 1,000x improvements in computation speed, power, and cost. Additional advantages would be reduction of overall system costs, establishing a generic memory fabric with build in full repair capability at factory and field.
[1] Hennessy, John L. and David A. Patterson. Computer Architecture: A Quantitative Approach. 4th ed., p. 289. Elsevier, 2007.
[2] McCalpin, John D. “Memory Bandwidth and System Balance in HPC Systems”, Invited Talk, International Conference for High Performance Computing, Networking, Storage, and Analysis, 2016.
[3] Wu, Banqiu, and Ajay Kumar. "Extreme ultraviolet lithography and three dimensional integrated circuit—A review." Applied Physics Reviews 1.1, 2014.
[4] Yeap, Geoffrey. "Smart mobile SoCs driving the semiconductor industry: Technology trend, challenges and opportunities." In Electron Devices Meeting (IEDM), 2013.
[5] Sun, Jack Y-C. "System scaling and collaborative open innovation." VLSI Technology (VLSIT), 2013 Symposium on. IEEE, 2013.
[6] Simon, Horst. "Why we need Exascale and why we won’t get there by 2020." Optical Interconnects Conference, Santa Fe, New Mexico. 2013.
[7] Aly, Mohamed M. Sabry, et al. "Energy-efficient abundant-data computing: The n3xt 1,000 x." Computer 48.12 (2015): 24-33.
[8] Woo, Dong Hyuk, et al. "An optimized 3D-stacked memory architecture by exploiting excessive, high-density TSV bandwidth." High Performance Computer Architecture (HPCA), 2010 IEEE 16th International Symposium on. IEEE, 2010.
[9] Or-Bach, Zvi, et al. "Modified ELTRAN®—A game changer for Monolithic 3D." SOI-3D-Subthreshold Microelectronics Technology Unified Conference (S3S), 2015 IEEE. IEEE, 2015.
[10] Chen, C. K., et al. "3D-enabled heterogeneous integrated circuits." SOI-3D-Subthreshold Microelectronics Technology Unified Conference (S3S), 2013 IEEE. IEEE, 2013.
[11] Chen, C. K., et al. "SOI-enabled three-dimensional integrated-circuit technology." SOI Conference (SOI), 2010 IEEE International. IEEE, 2010.
[12] Orlowski, Marius, et al. "(Invited) Si, SiGe, Ge, and III-V Semiconductor Nanomembranes and Nanowires Enabled by SiGe Epitaxy." ECS Transactions 33.6 (2010): 777-789.
[13] Borenstein, J. T., et al. "Silicon germanium epitaxy: a new material for MEMS." MRS Proceedings. Vol. 657. Cambridge University Press, 2000.
[14] Taraschi, Gianni, et al. "Ultrathin strained Si-on-insulator and SiGe-on-insulator created using low temperature wafer bonding and metastable stop layers." Journal of The Electrochemical Society 151.1 (2004): G47-G56.
[15] Uhrmann, T. "Monolithic IC Integration-Key alignment specifications for high process yield." SOI-3D-Subthreshold Microelectronics Technology Unified Conference (S3S), IEEE. 2014.