The term Private AI has been around for over seven years, but it has often been narrowly defined and based on niche use cases. We believe the opportunity and market reach of Private AI is much more impactful, and over the past year we have been working to add clarity, context, and innovation to this now rapidly growing market segment.
When we first shared our thoughts about Private AI at last year’s VMware Explore conference, we said it was a new way to deploy AI models on customer data. We didn’t talk about Private AI as a product, but as a powerful architectural approach that allows us to deliver the benefits of AI to our customers without compromising control over their data, privacy, or compliance.
Why did it resonate?
A year ago, our customers told us that AI was out of reach because it required hundreds or thousands of GPUs to get started. And because we couldn’t procure the required processing power, our only option was to run all our services on public cloud providers. Looking back, our first “eureka” moment was when we fine-tuned our Hugging Face StarCoder model on a single NVIDIA A100 GPU. The start-up cost of AI was much lower than we thought, and as we started to move our services into production, we realized that the cost of running AI inference services in our data centers was much lower. This, in turn, had a direct impact on our AI product strategy and roadmap.
Civilian AI: One year later
In the year since then, our approach has gained widespread acceptance: Private AI is now covered as an industry market category by leading analyst firms, there are commercial private AI solutions on the market, and customers frequently request conversations about private AI.
In conversations with AI leaders across nearly 200 end-user organizations, it became clear that organizations are leveraging both public cloud and private data centers (owned or leased capacity) to meet their needs. While SaaS AI services have demonstrated their value in a variety of use cases, such as marketing content and demand generation, there are many use cases that require a different approach due to privacy, control, and compliance. We have seen customers start their AI applications in the public cloud and then deploy them in private data centers for several reasons:
Cost – We have heard from customers with mature AI environments that private AI cost savings can be 3-5x compared to comparable public cloud AI services. Using an open source model and managing your own AI infrastructure also gives you a predictable cost model rather than the token-based billing you may have become accustomed to with public AI services. Token-based billing can lead to unpredictable costs from month to month. Privacy and control – Organizations want to maintain physical control of their data and run AI models adjacent to existing data sources. They don’t want to risk data leakage, whether real or imagined. Flexibility – The AI field is changing so quickly that it’s not feasible to cover all your AI needs with a single vertical stack. Instead, a platform that can share a common AI infrastructure pool gives you the flexibility to add new AI services, A/B test, and swap out AI models as the market evolves.
Latest from VMware and NVIDIA
Announced with great fanfare at Explore 2023, VMware Private AI Foundation with NVIDIA became generally available in May of this year. Since then, we have seen strong demand for the platform across all major industry verticals, including the public sector. At this year’s show, we announced new capabilities today as well as a glimpse of what’s to come in the future when VMware Cloud Foundation 9 becomes available.
Today, we are introducing a new model store that enables ML Ops teams and data scientists to curate and deliver more secure LLMs with integrated role-based access control, ensuring governance and security of the environment, and privacy of enterprise data and IP. This new capability is based on the open source Harbor container registry, allowing you to store and manage models as OCI-compliant containers. It also includes native NVIDIA NGC and Hugging Face integration (including Hugging Face CLI support), providing a simplified experience for data scientists and application developers. Additionally, we are adding guided deployment to automate the workload domain creation workflow and other infrastructure components of VMware Private AI Foundation powered by NVIDIA, accelerating deployment velocity and further reducing management tasks, accelerating time to value.
Additionally, some exciting features planned for VCF 9 that will be showcased at Explore include:
Data Indexing and Retrieval Service – Chunks, indexes, vectorizes, and makes data available through an updateable knowledge base. Configurable update policies ensure model outputs are kept up to date. AI Agent Builder Service – Use natural language to rapidly build AI agents such as chatbots, accelerating time to value for new AI applications. vGPU Profile Visibility – Centrally view and manage vGPU profiles across clusters, providing a holistic view of utilization and available capacity. GPU Reservation – Reserve capacity to accommodate larger vGPU profiles, ensuring smaller vGPU workloads do not monopolize capacity leaving enough headroom for larger workloads. GPU HA with Preemptible VMs – VM classes allow you to utilize 100% of your GPU capacity and take snapshots of non-mission-critical VMs for graceful shutdown when capacity is needed (e.g., prioritizing production over research).
Why choose us?
Organizations are choosing VMware, a division of Broadcom, as their strategic AI partner for many benefits, including:
Lower TCO – AI applications are complex and require significant intelligence at the infrastructure layer to meet performance and availability requirements. This starts with simplifying and standardizing the infrastructure. That’s why organizations are building their AI infrastructure on VMware Cloud Foundation, which delivers a significantly lower TCO than alternatives. As mentioned above, running AI services on a virtualized and shared infrastructure platform delivers much lower, predictable costs than comparable public AI services. By virtualizing and sharing capacity between data scientists and AI applications, organizations get all the economic benefits for themselves compared to consuming public AI services, where the ability to virtualize and share the provider’s capacity is reflected in the organization’s profit margins. Best of all, you can virtualize your infrastructure for AI without sacrificing performance, and in some cases, you can achieve better performance than bare metal. Resource sharing – Resource scheduling is one of the most complex aspects of AI operations, and VMware Distributed Resource Scheduler (DRS) has been evolving for nearly 20 years. Our technology leadership in this space enables organizations to virtualize and intelligently share GPUs, network, memory and compute power, automating provisioning and load balancing. Our innovation leadership is the primary reason why organizations looking to operate their own homegrown AI platforms have turned to NVIDIA and the VMware Private AI Foundation. Automation – Our ability to securely automate the delivery of AI app stacks within minutes and continue to drive automation even after deployment is also a key factor driving excitement and adoption. This ranges from building new AI workstations to operating NVIDIA Inference Microservices (NIM). Centralized Operations – Centralized operations is another key benefit we provide. Organizations can use the same set of tools and processes for both AI and non-AI services, further reducing the TCO of AI applications. This also includes centralized monitoring of GPU assets. Trust – Organizations have relied on VMware technology for years to run some of their most critical applications. They expect and trust our Private AI roadmap to deliver.
Private AI: It’s all about the ecosystem
Over time, it has also become clear that there is no single solution for AI. It’s truly an ecosystem game, and we continue to push forward with partners of all sizes to build the best ecosystem for VMware Private AI. Today at Explore, we announced new and expanded engagements with the following partners:
Intel: VMware Private AI for Intel announced support for Intel Gaudi 2 AI accelerators, expanding customer choice and use cases with high-performance acceleration for GenAI and LLM. Codeium: A powerful AI coding assistant assists developers with code generation, debugging, testing, modernization and more to accelerate delivery. Tabnine: A robust AI code assistant streamlines code generation and automates mundane tasks, freeing developers to spend more time on value-added work. WWT: WWT is a leading technology solutions provider for full stack AI solutions and a Broadcom partner. To date, WWT has developed and supported AI applications for over 75 organizations, working with us to help customers get faster value from Private AI, from infrastructure deployment and operations to AI applications and other services. HCLTech: Offers Private Gen AI products designed to help enterprises accelerate their Gen AI journey through a structured approach. This turnkey solution, combined with a customized pricing model and HCLTech’s data and AI services, enables faster migration from Gen AI POC to production with a clearly defined TCO.
Future outlook
It’s clear that AI will become even more mainstream in the coming years as organizations harness its power to increase human productivity and innovation. But it will also place the onus on businesses to ensure their infrastructure is robust enough to keep up with the accelerating transition.
A year ago, we argued that the AI space is changing so quickly that customers shouldn’t bet on a single solution. Investing in a platform that is flexible enough to adapt to new circumstances will prepare you for the future. We argued that this platform approach will make it easier for your organization to adopt it internally as requirements change and better AI models emerge. We also knew that there was a growing demand to run AI models wherever organizations had data, and that privacy, control, and lower TCO would drive architecture and buying decisions.
Now, a year later, we’re even more convinced that we’re on the right path, and the best part is, there’s so much more to come.