When it comes to brand-new fabrication nodes, we expect them to increase performance, cut down power consumption, and increase transistor density. But while logic circuits have been scaling well with the recent process technologies, SRAM cells have been lagging behind and apparently almost stopped scaling at TSMC’s 3nm-class production nodes. This is a major problem for future CPUs, GPUs, and SoCs that will likely get more expensive because of slow SRAM cells area scaling.
SRAM Scaling Slows
When TSMC formally introduced its N3 fabrication technologies earlier this year, it said that the new nodes would provide 1.6x and 1.7x improvements in logic density when compared to its N5 (5nm-class) process. What it did not reveal is that SRAM cells of the new technologies almost do not scale compared to N5, according to WikiChip, which obtained information from a TSMC paper published at the International Electron Devices Meeting (IEDM)
TSMC’s N3 features an SRAM bitcell size of 0.0199µm^², which is only ~5% smaller compared to N5’s 0.021 µm^²SRAM bitcell. It gets worse with the revamped N3E as it comes with a 0.021 µm^² SRAM bitcell (which roughly translates to 31.8 Mib/mm^²), which means no scaling compared to N5 at all.
Meanwhile, Intel’s Intel 4 (originally called 7nm EUV) reduces SRAM bitcell size to 0.024µm^² from 0.0312µm^² in case of Intel 7 (formerly known as 10nm Enhanced SuperFin), we are still talking about something like 27.8 Mib/mm^², which is a bit behind TSMC’s HD SRAM density.
Furthermore, WikiChip recalls an Imec presentation that showed SRAM densities of around 60 Mib/mm^² on a ‘beyond 2nm node’ with forksheet transistors. Such process technology is years away and between now and then chip designers will have to develop processors with SRAM densities advertised by Intel and TSMC (though, Intel 4 will unlikely be used by anyone except Intel anyway).
Loads of SRAM in Modern Chips
Modern CPUs, GPUs, and SoCs use loads of SRAM for various caches as they process loads of data and it is extremely inefficient to fetch data from memory, especially for various artificial intelligence (AI) and machine learning (ML) workloads. But even general-purpose processors, graphics chips, and application processors for smartphones carry huge caches these days: AMD’s Ryzen 9 7950X carries 81MB of cache in total, whereas Nvidia’s AD102 uses at least 123MB of SRAM for various caches that Nvidia publicly disclosed.
Going forward, the need for caches and SRAM will only increase, but with N3 (which is set to be used for a few products only) and N3E there will be no way to reduce die area occupied by SRAM and mitigate higher costs of the new node compared to N5. Essentially, it means that die sizes of high-performance processors will increase, and so will their costs. Meanwhile, just like logic cells, SRAM cells are prone to defects. To some degree chip designers will be able to alleviate larger SRAM cells with N3’s FinFlex innovations (mixing and matching different kinds of FinFETs in a block to optimize it for performance, power, or area), but at this point we can only guess what kind of fruits this will bring.
TSMC plans to bring its density-optimized N3S process technology that promises to shrink SRAM bitcell size compared to N5, but this is set to happen in circa 2024 and we wonder whether this one will provide enough logic performance for chips designed by AMD, Apple, Nvidia and Qualcomm.
Mitigations?
One of the ways to mitigate slowing SRAM area scaling in terms of costs is going multi-chiplet design and disaggregate larger caches into separate dies made on a cheaper node. This is something that AMD does with its 3D V-Cache, albeit for a slightly different reason (for now). Another way is to use alternative memory technologies like eDRAM or FeRAM for caches, though the latter have their own peculiarities.
In any case, it looks like slowing of SRAM scaling with FinFET-based nodes at 3nm and beyond seems to be a major challenge for chip designers in the coming years.