Apple adopts Nvidia GPUs to accelerate LLM inference through open source ReDrafter technology

ReDrafter delivers 2.7x more tokens per second compared to traditional autoregression ReDrafter has the potential to reduce latency for users while using less GPU No word on when it will be introduced to competitive AI GPUs

Apple has announced that it is working with Nvidia to accelerate large-scale language model inference using the company’s open source technology, Recurrent Drafter (ReDrafter for short).

This partnership aims to address the computational challenges of autoregressive token generation, which is critical to improving efficiency and reducing latency in real-time LLM applications.

Introduced by Apple in November 2024, ReDrafter takes a speculative decoding approach by combining a recurrent neural network (RNN) draft model with beam search and dynamic tree attention. According to Apple benchmarks, this method generates 2.7x more tokens per second compared to traditional autoregression.

Can it scale beyond Nvidia?

Through its integration into Nvidia’s TensorRT-LLM framework, ReDrafter extends its impact by enabling faster LLM inference on Nvidia GPUs, which are widely used in production environments.

To accommodate ReDrafter’s algorithms, Nvidia is introducing new operators and fine-tuning existing operators within TensorRT-LLM, making the technology available to developers who want to optimize the performance of large models. I made it.

In addition to speed improvements, Apple says ReDrafter has the potential to reduce latency for users while reducing GPU requirements. This efficiency not only reduces computational costs, but also reduces power consumption, a critical factor for organizations managing large-scale AI deployments.

For now, the focus of this collaboration remains on Nvidia’s infrastructure, but similar performance benefits could be extended to competing GPUs from AMD or Intel at some point in the future.

Such breakthroughs can help improve the efficiency of machine learning. As Nvidia states, “This collaboration makes TensorRT-LLM more powerful and more flexible, allowing the LLM community to innovate more sophisticated models and easily deploy them using TensorRT-LLM.” Now you can achieve unparalleled performance on Nvidia GPUs. These new features open up exciting possibilities, and we are leveraging TensorRT-LLM capabilities. We eagerly look forward to the next generation of advanced models from the community that will drive further improvements to our workloads.”

For more information about our collaboration with Apple, please visit the Nvidia Developer Technical Blog.

What's Hot

SIA praises finalization of CHIPS incentives for new Hemlock polysilicon manufacturing facility in Michigan

Explore the intersection of healthcare and agent AI

NVIDIA puts Grace Blackwell on every desk and at every AI developer’s fingertips.

NVIDIA puts Grace Blackwell on every desk and at every AI developer’s fingertips.

Nvidia promotes robots and self-driving cars, introduces AI-powered Media2 system

Nvidia’s mini ‘desktop supercomputer’ is 1,000 times more powerful than a laptop and fits in your bag

Nvidia’s big day is here: What to expect when the AI giant reports after the bell

Can Nvidia’s bull market continue? Timothy Arcuri predicts

Nvidia shows off progress on Blackwell server installation — AI and datacenter roadmap sees Blackwell Ultra coming next year, Vera CPUs and Rubin GPUs coming in 2026

Most Popular

Nvidia’s big day is here: What to expect when the AI giant reports after the bell

Can Nvidia’s bull market continue? Timothy Arcuri predicts

Nvidia shows off progress on Blackwell server installation — AI and datacenter roadmap sees Blackwell Ultra coming next year, Vera CPUs and Rubin GPUs coming in 2026

Our Picks

Explore the intersection of healthcare and agent AI

NVIDIA Announces AI Foundation Model for RTX AI PCs

Government-linked UK AI startup developing military drone technology | Artificial Intelligence (AI)

Subscribe to Updates

What's Hot

Apple adopts Nvidia GPUs to accelerate LLM inference through open source ReDrafter technology

Can it scale beyond Nvidia?

you may also like

Related Posts

Subscribe to Updates