Apple embraces Nvidia GPUs to accelerate LLM inference via its open source ReDrafter tech

ReDrafter delivers 2.7x more tokens per second compared to traditional auto-regression
ReDrafter could reduce latency for users while using fewer GPUs
Apple hasn’t said when ReDrafter will be deployed on rival AI GPUs from AMD and Intel

Apple has announced a collaboration with Nvidia to accelerate large language model inference using its open source technology, Recurrent Drafter (or ReDrafter for short).

The partnership aims to address the computational challenges of auto-regressive token generation, which is crucial for improving efficiency and reducing latency in real-time LLM applications.

ReDrafter, introduced by Apple in November 2024, takes a speculative decoding approach by combining a recurrent neural network (RNN) draft model with beam search and dynamic tree attention. Apple’s benchmarks show that this method generates 2.7x more tokens per second compared to traditional auto-regression.

Could it extend beyond Nvidia?

Through its integration into Nvidia’s TensorRT-LLM framework, ReDrafter extends its impact by enabling faster LLM inference on Nvidia GPUs widely used in production environments.

To accommodate ReDrafter’s algorithms, Nvidia introduced new operators and tweaked existing ones within TensorRT-LLM, making the tech available for any developers looking to optimize performance for large-scale models.

In addition to the speed improvements, Apple says ReDrafter has the potential to reduce user latency while requiring fewer GPUs. This efficiency not only lowers computational costs but also lessens power consumption, a vital factor for organizations managing large-scale AI deployments.

While the focus of this collaboration remains on Nvidia’s infrastructure for now, it’s possible that similar performance benefits could be extended to rival GPUs from AMD or Intel at some point in the future.

Breakthroughs like this can help improve machine learning efficiency. As Nvidia says, “This collaboration has made TensorRT-LLM more powerful and more flexible, enabling the LLM community to innovate more sophisticated models and easily deploy them with TensorRT-LLM to achieve unparalleled performance on Nvidia GPUs. These new features open exciting possibilities, and we eagerly anticipate the next generation of advanced models from the community that leverage TensorRT-LLM capabilities, driving further improvements in LLM workloads.”

You can read more about the collaboration with Apple on the Nvidia Developer Technical Blog.

Source link

Breaking News

Alienware resurrects its Area-51 desktop PC

How to Craft the Perfect French 75 Cocktail for Any Celebration

Final Fantasy 7 Remake Physical Edition Gets PS5 Reprint – Snag It While You Can

The HP Omen Max 16 is its most powerful gaming laptop yet

Alienware revived its Area-51 laptops with serious performance and far-out designs

The MSI Titan 18 HX Dragon Edition is fierce as hell

The Swippit Hub keeps your iPhone loaded up with fresh batteries

Razer Unveils Its Thinnest Gaming Laptop Yet At CES 2025

This Tangy Batch Cocktail Is the Perfect Negroni Remix for Your End-of-Year Fun Times

Alienware resurrects its Area-51 desktop PC

How to Craft the Perfect French 75 Cocktail for Any Celebration

Final Fantasy 7 Remake Physical Edition Gets PS5 Reprint – Snag It While You Can

The HP Omen Max 16 is its most powerful gaming laptop yet

Apple embraces Nvidia GPUs to accelerate LLM inference via its open source ReDrafter tech

More From Author

Alienware resurrects its Area-51 desktop PC

How to Craft the Perfect French 75 Cocktail for Any Celebration

Final Fantasy 7 Remake Physical Edition Gets PS5 Reprint – Snag It While You Can

+ There are no comments

Cancel reply

Latest Samsung Galaxy S25 leak includes predicted specs, pricing, and color details

20+ 3-Step High-Protein Winter Dinner Recipes

You May Also Like:

Alienware resurrects its Area-51 desktop PC

How to Craft the Perfect French 75 Cocktail for Any Celebration

Final Fantasy 7 Remake Physical Edition Gets PS5 Reprint – Snag It While You Can

The HP Omen Max 16 is its most powerful gaming laptop yet

Alienware revived its Area-51 laptops with serious performance and far-out designs

The MSI Titan 18 HX Dragon Edition is fierce as hell

The Swippit Hub keeps your iPhone loaded up with fresh batteries

Razer Unveils Its Thinnest Gaming Laptop Yet At CES 2025

Breaking News

Top Tagged

+ There are no comments

Latest Samsung Galaxy S25 leak includes predicted specs, pricing, and color details

20+ 3-Step High-Protein Winter Dinner Recipes