AMD's Ryzen 9 9950X3D2 Dual Edition crams 208MB of cache into a single chip

(arstechnica.com)

70 points | by zdw 2 hours ago

9 comments

magicalhippo 41 minutes ago
Probably fun for those who already bought DDR5 memory... still kicking myself for not just pulling the trigger on that 128GB dual stick kit I looked at for $600 back in September. Now it's listed at $4k...
Meanwhile I hope my AM4 will chug along a few more years.
chao- 1 hour ago
Crazy to think that my first personal computer's entire storage (was 160MB IIRC?) could fit into the L3 of a single consumer CPU!
It's probably not possible architecturally, but it would be amusing to see an entire early 90's OS running entirely in the CPU's cache.
[-]
- shric 1 minute ago
  You had ~160,000 times more storage than I did for my first personal computer.
- cwzwarich 1 hour ago
  https://github.com/coreboot/coreboot/blob/main/src/soc/intel...
  [-]
  - wmf 45 minutes ago
    Context: Early in the firmware boot process the memory controller isn't configured yet so the firmware uses the cache as RAM. In this mode cache lines are never evicted since there's no memory to evict them to.
- pwg 1 hour ago
  In my case it began with 16K (yes, 161024 bytes) and 90K (yes, 901024 bytes) 5.25" floppy disks (although the floppies were a few months after the computer). Eventually upgraded to 48K RAM and 180K double density floppy disks. The computer: Atari 800.
  [-]
  - MegaDeKay 56 minutes ago
    I'll see your Atari 800 and raise you my Atari 2600 with its whopping 128 bytes of RAM. Bytes with a B. I can kinda sorta call it a computer because you could buy a BASIC cartridge for it (I didn't and stand by that decision - it was pretty bad).
- compounding_it 20 minutes ago
  Maybe in 50 years the cache of CPUs and GPUs will be 1TB. Enough to run multiple LLMs (a model entirely run for each task). Having robots like in the movies would need LLMs much much faster than what we see today.
- basilikum 1 hour ago
  KolibriOS would fit in there, even with the data in memory. You cannot load it into the cache directly, but when the cache capacity is larger than all the data you read there should be no cache eviction and the OS and all data should end up in the cache more or less entirely. In other words it should be really, really fast, which KolibriOS already is to begin with.
  [-]
  - vlovich123 1 hour ago
    Unless you lay everything out continuously in memory, you’ll still get cache eviction due to associativty and depending on the eviction strategy of the CPU. But certainly DOS or even early Windows 95 could conceivably just run out of the cache
    [-]
    - chao- 58 minutes ago
      Yeah, cache eviction is the reason I was assuming it is "probably not possible architecturally", but I also figured there could be features beyond my knowledge that might make it possible.
      Edit: Also this 192MB of L3 is spread across two Zen CCDs, so it's not as simple as "throw it all in L3" either, because any given core would only have access to half of that.
    - basilikum 59 minutes ago
      Well, yeah, reality strikes again. All you need is an exploit in the microcode to gain access to AMD's equivalent to the ME and now you can just map the cache as memory directly. Maybe. Can microcode do this or is there still hardware that cannot be overcome by the black magic of CPU microcode?
- bombcar 1 hour ago
  IIRC some relatively strange CPUs could run with unbacked cache.
  [-]
  - twbarr 1 hour ago
    Intel's platform, at the very least, use cache-as-ram during the boot phase before the DDR interface can be trained and started up. https://github.com/coreboot/coreboot/blob/main/src/soc/intel...
- m463 1 hour ago
  I wonder how much faster dos would boot, especially with floppy seek times...
  [-]
  - userbinator 1 hour ago
    Instantly.
    If you run a VM on a CPU like this, using a baremetal hypervisor, you can get very close to "everything in cache".
monster_truck 14 minutes ago
The extra cache doesn't do a damn thing (maybe +2%)
The lower leakage currents at lower voltages allowed them to implement a far more aggressive clock curve from the factory. That's where the higher allcore clock comes from (+30W TDP)
I'm not complaining at all, I think this is an excellent way to leverage binning to sell leftover cache.
Though if I may complain, Ars used to actually write about such things in their articles instead of speculate in a way that suspiciously resembles what an AI would write.
erulabs 13 minutes ago
9950X3D2? AMD, who is making you name your products like this? At some point just give up and name the chip a UUID already.
fc417fc802 34 minutes ago
Given that the dies still have L3 on them does this count as L4 or does the hardware treat it as a single pool of L3?
Would be neat to have an additional cache layer of ~1 GB of HBM on the package but I guess there's no way that happens in the consumer space any time soon.
[-]
- trynumber9 13 minutes ago
  Per compute die it's one 96M L3 with uniform latency. It is two cycles more latency than the configuration with smaller 32M L3. But there are two compute dies, each with their own L3. And like the 9950X coherency between these two L3 is maintained by global memory interconnect to the third (IO) die.
nexle 58 minutes ago
Breakdown of the (semi-clickbait) 208MB cache: 16MB L2 (8MB per die?) + 32MB L3 * 2 dies + 64MB L3 Stacked 3D V-cache * 2
For comparison, 9950X3D have a total cache of 144MB.
[-]
- trynumber9 50 minutes ago
  > 16MB L2 (8MB per die?)
  It is indeed 8MB per compute die but really 1MB per core. Not shared among the entire CCD.
Readerium 1 hour ago
Can someone explain if the 3D Vcache are stacked on top of each other or side by side.
If they are stacked then why not 9800X3D2?
[-]
- zdw 1 hour ago
  The 99xx chips have two CPU dies, and one cache die is on each CPU die.
  [-]
  - modeswitch 1 hour ago
    The 3D V-Cache sits underneath only one of the CCDs. See https://en.wikipedia.org/wiki/Ryzen#Ryzen_9000.
    [-]
    - anonymars 28 minutes ago
      That's what's different about this one. "Enter the Ryzen 9 9950X3D2 Dual Edition, a mouthful of a chip that includes 64MB of 3D V-Cache on both processor dies, without the hybrid arrangement that has defined the other chips up until now."
    - Tostino 51 minutes ago
      Did you forget which thread we are on?
renewiltord 1 hour ago
I have a gigabyte of cache on my 9684x at home!
qmr 1 hour ago
[flagged]