Nobody really expected Nvidia to release something like the GB10. After all, why would a tech company that transformed itself into the most valuable firm ever by selling parts that cost hundreds of thousands of dollars, suddenly decide to sell an entire system for a fraction of the price?
I believe that Nvidia wants to revolutionize computing the way IBM did it almost 45 years ago with the original IBM PC.
Project DIGITS, as a reminder, is a fully formed, off-the-shelf super computer built into something the size of a mini PC. It is essentially a smaller version of the DGX-1, the first of its kind launched almost a decade ago, back in April 2016. Then, it sold for $129,000 with a 16-core Intel Xeon CPU and eight P100 GPGPU cards; Digits costs $3,000.
Nvidia confirmed it has an AI performance of 1,000 Teraflops at FP4 precision (dense/sparse?). Although there’s no direct comparison, one can estimate that the diminutive super computer has roughly half the processing power of a fully loaded 8-card Pascal-based DGX-1.
At the heart of Digits is the GB10 SoC, which has 20 Arm Cores (10 Arm Cortex-X925 and 10 Cortex-A725). Other than the confirmed presence of a Blackwell GPU (a lite version of the B100), one can only infer the power consumption (100W) and the bandwidth (825GB/s according to The Register).
You should be able to connect two of these devices (but not more) via Nvidia’s proprietary ConnectX technology to tackle larger LLMs such as Meta’s Llama 3.1 405B. Shoving these tiny mini PCs in a 42U rack seems to be a near impossibility for now as it would encroach on Nvidia’s far more lucrative DGX GB200 systems.
All about the moat
Why did Nvidia embark on Project DIGITS? I think it is all about reinforcing its moat. Making your products so sticky that it becomes near impossible to move to the competition is something that worked very well for others: Microsoft and Windows, Google and Gmail, Apple and the iPhone.
The same happened with Nvidia and CUDA – being in the driving seat allowed Nvidia to do things such as shuffling the goal posts and wrongfooting the competition.
The move to FP4 for inference allowed Nvidia to deliver impressive benchmark claims such as “Blackwell delivers 2.5x its predecessor’s performance in FP8 for training, per chip, and 5x with FP4 for inference”. Of course, AMD doesn’t offer FP4 computation in the MI300X/325X series and we will have to wait till later this year for it to roll out in the Instinct MI350X/355X.
Nvidia is therefore laying the ground for future incursions, for lack of a better word or analogy, from existing and future competitors, including its own customers (think Microsoft and Google). Nvidia CEO Jensen Huang’s ambition is clear; he wants to expand the company’s domination beyond the realm of the hyperscalers.
“AI will be mainstream in every application for every industry. With Project DIGITS, the Grace Blackwell Superchip comes to millions of developers, placing an AI supercomputer on the desks of every data scientist, AI researcher and student empowers them to engage and shape the age of AI,” Huang recently commented.
Short of renaming Nvidia as Nvid-ai, this is as close as it gets to Huang acknowledging his ambitions to make his company’s name synonymous with AI, just like Tarmac and Hoover before them (albeit in more niche verticals).
I was also, like many, perplexed by the Mediatek link and the rationale for this tie-up can be found in the Mediatek press release. The Taiwanese company “brings its design expertise in Arm-based SoC performance and power efficiency to [a] groundbreaking device for AI researchers and developers” it noted.
The partnership, I believe, benefits Mediatek more than Nvidia and in the short run, I can see Nvidia quietly going solo. Reuters reported Huang dismissed the idea of Nvidia going after AMD and Intel, saying, “Now they [Mediatek] could provide that to us, and they could keep that for themselves and serve the market. And so it was a great win-win”.
This doesn’t mean Nvidia will not deliver more mainstream products though, just they would be aimed at businesses and professionals, not consumers where cut throat competition makes things more challenging (and margins wafer thin).
Reuters article quotes Huang saying, “We’re going to make that a mainstream product, we’ll support it with all the things that we do to support professional and high-quality software, and the PC (manufacturers) will make it available to end users.”
Header Cell – Column 0 | DIGITS | DIGITS 2.4X | DGX-1 v1 | Variance (DGX vs DIGITS) |
---|---|---|---|---|
Depth (est.) in mm | 89 | 89 | 866 | 9.73x |
Width (est.) in mm | 135 | 324 | 444 | 1.37x |
Height (est.) in mm | 40 | 40 | 131 | 3.28x |
Weight in Kg | ~1 | ~2.4 | 60.8 | 25.35x |
Price USD (Nov 2024 adjusted) | 3000 | 7200 | 170100 | 23.63x |
Performance GPU FP16 (TF) | 0 | 0 | 170 | Row 5 – Cell 4 |
Performance GPU FP16 Dense (TF) | ~282 | 676.8 | 680 | 1.00x |
Performance GPU FP4 Dense (TF) | 1000 | Row 7 – Cell 2 | 0 | Row 7 – Cell 4 |
GPU memory (GB) | 128 | 307.2 | 128 | 0.42x |
Max Power consumption (W) | ~150 | ~300 | 3200 | 10.67x |
Storage (TB) | 4 | 9.6 | 7.68 | 0.80x |
GPU Family | Blackwell | Blackwell | Pascal | Row 11 – Cell 4 |
GPU power comsumption (W) x8 | ~100 | ~240 | 2400 | 10x |
GPU transistor count (bn) x8 | ~30 | ~72 | 120 | 1.67x |
Memory Bandwidth (GB/sec) x | ~850 | ~850 | 720 | 0.85x |
Gazing in my crystal ball
One theory I came across while researching this feature is that more data scientists are embracing Apple’s Mac platform because it offers a balanced approach. Good enough performance – thanks to its unified memory architecture – at a ‘reasonable’ price. The Mac Studio with 128GB unified memory and 4TB SSD currently retails for $5,799.
So where does Nvidia go from there? An obvious move would be to integrate the memory on the SoC, similar to what Apple has done with its M series SoC (and AMD with its HBM-fuelled Epyc). This would not only save on costs but would improve performance, something that its bigger sibling, the GB200 already does.
Then it will depend on whether Nvidia wants to offer more at the same price or the same performance at a lower price point (or a bit of both). Nvidia could go Intel’s way and use the GB10 as a prototype to encourage other key partners (PNY, Gigabyte, Asus) to launch similar projects (Intel did that with the Next Unit of Computing or NUC).
I am also particularly interested to know what will happen to the Jetson Orin family; the NX 16GB version was upgraded just a few weeks ago to offer 157 TOPS in INT8 performance. This platform is destined to fulfill more DIY/edge use cases rather than pure training/inference tasks but I can’t help but think about “What If” scenarios.
Nvidia is clearly disrupting itself before others attempt to do so; the question is how far will it go.
+ There are no comments
Add yours