AMD Radeon RX 7900 XTX and XT Review: Shooting for the Top

We’ve tested the AMD Radeon RX 7900 XTX and Radeon RX 7900 XT, and we have the full results ready for your enjoyment — one day before the official launch party. AMD’s latest and greatest RDNA 3 architecture and the RX 7900-Series graphics cards are all set to party with Nvidia’s Ada Lovelace architecture and the GeForce RTX 4080 as they vie for a spot on our list of the best graphics cards.

One thing AMD won’t do: Take down the GeForce RTX 4090 that sits at the top of our GPU benchmarks hierarchy. AMD has said that it doesn’t feel the need to compete directly against a $1,600 (or more!) graphics card, but several senior people at AMD also indicate that Nvidia’s AD102 chip was “bigger than expected” and basically out of reach.

What may also be out of reach for many of our readers are the new AMD RX 7900-series cards. While they cost less than Nvidia’s RTX 4090 and 4080, with prices starting at $899 for the slower — and frankly less desirable — RX 7900 XT, these are clearly not going after the mainstream gamer market. That task will likely fall to the future Navi 32 / RX 7700-series cards (or maybe 7800-series). For now, if you want AMD’s fastest ever consumer graphics card, be prepared to fork over a wad of cash.

We’ve already done an in-depth look at the RDNA 3 architecture and a preview of the cards, so start with those articles if you want to get up to speed. With actual hardware in hand and a bevy of benchmarks under our belts, that’s the main event today. But we do have some additional thoughts, and we’ll start as always with the specifications of AMD’s latest cards, with Nvidia and some previous generation GPUs for comparison.

Swipe to scroll horizontally
AMD and Nvidia Ada GPU Specifications
Graphics Card RX 7900 XTX RX 7900 XT RX 6950 XT RTX 4090 RTX 4080 RTX 3090 Ti RTX 3080 Ti
Architecture Navi 31 Navi 31 Navi 21 AD102 AD103 GA102 GA102
Process Technology TSMC N5 + N6 TSMC N5 + N6 TSMC N7 TSMC 4N TSMC 4N Samsung 8N Samsung 8N
Transistors (Billion) 45.6 + 6x 2.05 45.6 + 5x 2.05 26.8 76.3 45.9 28.3 28.3
Die size (mm^2) 300 + 222 300 + 185 519 608.4 378.6 628.4 628.4
CUs / SMs 96 84 80 128 76 84 80
GPU Shaders 12288 10752 5120 16384 9728 10752 10240
AI / Tensor Cores 192 168 80 512 304 336 320
Ray Tracing Units 96 84 80 128 76 84 80
Boost Clock (MHz) 2500 2400 2310 2520 2505 1860 1665
VRAM Speed (Gbps) 20 20 18 21 22.4 21 19
VRAM (GB) 24 20 16 24 16 24 12
VRAM Bus Width 384 320 256 384 256 384 384
L2 Cache 96 80 128 72 64 6 6
ROPs 192 192 128 176 112 112 112
TMUs 384 336 320 512 304 336 320
TFLOPS FP32 61.4 51.6 23.7 82.6 48.7 40 34.1
TFLOPS FP16 (FP8/INT8) 123 (123) 103 (103) 47.4 661 (1321) 390 (780) 160 (320) 136 (273)
Bandwidth (GBps) 960 800 576 1008 717 1008 912
TBP (watts) 355 315 335 450 320 450 350
Launch Date Dec-22 Dec-22 May-22 Oct-22 Nov-22 Mar-22 Jun-21
Launch Price $999 $899 $1,099 $1,599 $1,199 $1,999 $1,199

There’s a lot to unpack in the specs, but we’ll mostly focus on AMD’s new chips. The RX 7900 XTX has the fully enabled Navi 31 GCD (Graphics Compute Die) along with six MCDs (Memory Cache Dies), while the 7900 XT disables a dozen compute units (CUs) in the GCD and one of the MCDs is fused off. Technically there are still six MCD chips present, to ensure even mounting pressure from the heatsink, but one of them is fused off (it could be a non-functional MCD).

The GPU shader counts are where things start to get a bit different from other architectures. AMD says there are still 64 Streaming Processors (SP) per CU, but there are now four SIMD32 vector units per CU as well — two of which can only process FP32 or Matrix operations and not INT32. We’re going to call each of these a GPU shader, which goes along with AMD’s peak throughput data of 61.4 teraflops FP32 on the 7900 XTX. This is similar to what Nvidia did with Ampere (and now Ada), so just know that the official SP count is no longer the same as the potential GPU shaders count.

We’ve also received some clarification on the “AI Accelerators” that are part of the RDNA 3 architecture. The short summary is that they repurpose the SIMD32 units to do matrix operations instead of FP32 (or FP16). They also support BF16 (16-bit brain-float) formats and INT8 alongside FP16. All three of those (FP16/BF16/INT8) have the same peak throughput that’s double the FP32 single-precision floating-point throughput.

What’s the difference between the previous half-precision FP16 shader support and the AI Accelerator FP16 support? Basically, it comes down to optimizing throughput and reducing power consumption, with some new instructions that are supported in matrix mode. Obviously, the peak FP16/BF16 rates are significantly lower than what the RTX 4080 and 4090 can deliver. Finding software that specifically uses the AI Accelerator on AMD’s RDNA 2 / RDNA 3 GPUs is also proving difficult right now, so we may need to revisit the subject at a later date.

(Image credit: AMD)

Source link