Today, we are announcing the general availability of Amazon Elastic Compute Cloud (Amazon EC2) P5en instances powered by NVIDIA H200 Tensor Core GPUs and custom 4th generation Intel Xeon Scalable processors with an all-core turbo frequency of 3.2 GHz (max core turbo). frequency 3.8 GHz) is available only on AWS. These processors deliver 50% higher memory bandwidth and up to 4x more throughput between CPU and GPU with PCIe Gen5 to help improve performance for machine learning (ML) training and inference workloads.
P5en with 3rd generation Elastic Fabric Adapter (EFAv3) with up to 3200 Gbps with Nitro v5 has improved latency by up to 35% compared to the previous generation P5 with EFA and Nitro. This helps improve collective communication performance for distributed training workloads such as deep learning, generative AI, real-time data processing, and high performance computing (HPC) applications.
The specifications for P5en instances are as follows:
Instance Size vCPU Memory (GiB) GPU (H200) Network Bandwidth (Gbps) GPU Peer-to-Peer (GB/s) Instance Storage (TB) EBS Bandwidth (Gbps) p5en.48xlarge 192 2048 8 3200 900 8 x 3.84 100
On September 9, Amazon launched an Amazon EC2 P5e instance with eight NVIDIA H200 GPUs with 1128 GB of high-bandwidth GPU memory, 3rd generation AMD EPYC processors, 2 TiB of system memory, and 30 TB of local NVMe storage. Introduced. These instances offer up to 3,200 Gbps of aggregate network bandwidth with EFAv2 and support GPUDirect RDMA for low latency and efficient scale-out performance by bypassing the CPU for inter-node communications. becomes possible.
P5en instances can further reduce inference and network latency, improving the overall efficiency of a wide range of GPU-accelerated applications. P5en instances improve local storage performance by up to 2x and Amazon Elastic Block Store (Amazon EBS) bandwidth by up to 25% compared to P5 instances. This further improves inference latency performance for users using local storage for cache models. weight.
Data transfer between CPU and GPU can be time-consuming, especially for large datasets and workloads that require frequent data exchange. PCIe Gen 5 provides up to 4x more bandwidth between CPU and GPU compared to P5e and P5e instances for model training, fine-tuning, and complex large language models (LLMs) and multimodal foundation models. (FM) inference execution latency can be further improved. ), memory-intensive HPC applications such as simulation, drug discovery, weather forecasting, and financial modeling.
Get started with Amazon EC2 P5en instances
Through EC2 Capacity Blocks for ML, On-Demand, and Savings Plan purchase options, you can use EC2 P5en instances available in the US East (Ohio), US West (Oregon), and Asia Pacific (Tokyo) AWS Regions.
Demonstrates how to use P5en instances with optional Capacity Response. To reserve an EC2 capacity block, choose (Capacity Reservations) in the Amazon EC2 console in the US East (Ohio) AWS Region.
(Purchase capacity blocks for ML), select the total capacity, and specify how long you want EC2 capacity blocks for your p5en.48xlarge instance. The total number of days that you can reserve an EC2 capacity block is 1 to 14, 21, or 28 days. You can purchase EC2 capacity blocks up to 8 weeks in advance.
When you select (Find capacity blocks), AWS returns the lowest-priced offering available that meets your specifications within the date range you specify. After reviewing the EC2 capacity block details, tags, and total price information, select (Purchase).
The EC2 capacity block is now successfully scheduled. The total price of an EC2 capacity block is paid upfront and does not change after purchase. Payment will be charged to your account within 12 hours of purchasing your EC2 capacity block. For more information, see Capacity Blocks for ML in the Amazon EC2 User Guide.
To run instances within your purchased capacity blocks, you can use the AWS Management Console, AWS Command Line Interface (AWS CLI), or AWS SDKs.
Below is a sample AWS CLI command to run 16 P5en instances to maximize the benefits of EFAv3. This configuration provides up to 3200 Gbps of EFA network bandwidth and up to 800 Gbps of IP network bandwidth using 8 private IP addresses.
$ aws ec2 run-instances –image-id ami-abc12345 \ –instance-type p5en.48xlarge \ –count 16 \ –key-name MyKeyPair \ –instance-market-options MarketType=”capacity-block” \ –capacity-reservation-specification CapacityRegistrationTarget={CapacityRegistrationId=cr-a1234567} –network-interfaces “NetworkCardIndex=0,DeviceIndex=0,Groups=security_group_id,SubnetId=subnet_id,InterfaceType=efa” \ “NetworkCardIndex=1,DeviceIndex=1,Groups=security_group_id,SubnetId=subnet_id,InterfaceType=efa-only” \ “NetworkCardIndex=2,DeviceIndex=1,Groups=security_group_id,SubnetId=subnet_id,InterfaceType=efa-only” \ “NetworkCardIndex=3,DeviceIndex=1,Groups=security_group_id,SubnetId=subnet_id,InterfaceType=efa-only” \ “NetworkCardIndex=4,DeviceIndex=1,Groups=security_group_id,SubnetId=subnet_id,InterfaceType=efa” \ “NetworkCardIndex=5,DeviceIndex=1,Groups=security_group_id,SubnetId=subnet_id,InterfaceType=efa-only” \ “NetworkCardIndex=6,DeviceIndex=1,Groups=security_group_id,SubnetId=subnet_id,InterfaceType=efa-only” \ “NetworkCardIndex=7,DeviceIndex=1,Groups=security_group_id,SubnetId=subnet_id,InterfaceType=efa-only” \ “NetworkCardIndex=8,DeviceIndex=1,Groups=security_group_id,SubnetId=subnet_id,InterfaceType=efa” \ “NetworkCardIndex=9,DeviceIndex=1,Groups=security_group_id,SubnetId=subnet_id,InterfaceType=efa-only” \ “NetworkCardIndex=10,DeviceIndex=1,Groups=security_group_id,SubnetId=subnet_id,InterfaceType=efa-only” \ “NetworkCardIndex=11,DeviceIndex=1,Groups=security_group_id,SubnetId=subnet_id,InterfaceType=efa-only” \ “NetworkCardIndex=12,DeviceIndex=1,Groups=security_group_id,SubnetId=subnet_id,InterfaceType=efa” \ “NetworkCardIndex=13,DeviceIndex=1,Groups=security_group_id,SubnetId=subnet_id,InterfaceType=efa-only” \ “NetworkCardIndex=14,DeviceIndex=1,Groups=security_group_id,SubnetId=subnet_id,InterfaceType=efa-only” \ “NetworkCardIndex=15,DeviceIndex=1,Groups=security_group_id,SubnetId=subnet_id,InterfaceType=efa-only” \ “NetworkCardIndex=16,DeviceIndex=1,Groups=security_group_id,SubnetId=subnet_id,InterfaceType=efa” \ “NetworkCardIndex=17,DeviceIndex=1,Groups=security_group_id,SubnetId=subnet_id,InterfaceType=efa-only” \ “NetworkCardIndex=18,DeviceIndex=1,Groups=security_group_id,SubnetId=subnet_id,InterfaceType=efa-only” \ “NetworkCardIndex=19,DeviceIndex=1,Groups=security_group_id,SubnetId=subnet_id,InterfaceType=efa-only” \ “NetworkCardIndex=20,DeviceIndex=1,Groups=security_group_id,SubnetId=subnet_id,InterfaceType=efa” \ “NetworkCardIndex=21,DeviceIndex=1,Groups=security_group_id,SubnetId=subnet_id,InterfaceType=efa-only” \ “NetworkCardIndex=22,DeviceIndex=1,Groups=security_group_id,SubnetId=subnet_id,InterfaceType=efa-only” \ “NetworkCardIndex=23,DeviceIndex=1,Groups=security_group_id,SubnetId=subnet_id,InterfaceType=efa-only” \ “NetworkCardIndex=24,DeviceIndex=1,Groups=security_group_id,SubnetId=subnet_id,InterfaceType=efa” \ “NetworkCardIndex=25,DeviceIndex=1,Groups=security_group_id,SubnetId=subnet_id,InterfaceType=efa-only” \ “NetworkCardIndex=26,DeviceIndex=1,Groups=security_group_id,SubnetId=subnet_id,InterfaceType=efa-only” \ “NetworkCardIndex=27,DeviceIndex=1,Groups=security_group_id,SubnetId=subnet_id,InterfaceType=efa-only” \ “NetworkCardIndex=28,DeviceIndex=1,Groups=security_group_id,SubnetId=subnet_id,InterfaceType=efa” \ “NetworkCardIndex=29,DeviceIndex=1,Groups=security_group_id,SubnetId=subnet_id,InterfaceType=efa-only” \ “NetworkCardIndex=30,DeviceIndex=1,Groups=security_group_id,SubnetId=subnet_id,InterfaceType=efa-only” \ “NetworkCardIndex=31,DeviceIndex=1,Groups=security_group_id,SubnetId=subnet_id,InterfaceType=efa-only” …
If you launch a P5en instance, you can use the AWS Deep Learning AMI (DLAMI) to support your EC2 P5en instance. DLAMI provides ML practitioners and researchers with the infrastructure and tools to quickly build scalable, secure, and distributed ML applications in preconfigured environments.
You can use AWS Deep Learning Containers to run containerized ML applications on P5en instances using libraries from Amazon Elastic Container Service (Amazon ECS) or Amazon Elastic Kubernetes Service (Amazon EKS).
For fast access to large datasets, use up to 30 TB of local NVMe SSD storage or virtually unlimited cost-effective storage using Amazon Simple Storage Service (Amazon S3) . You can also use Amazon FSx for Luster file systems with P5en instances, delivering hundreds of GB/s of throughput and millions of input/output operations (IOPS) for large-scale deep learning and HPC workloads. You can access the data with .
currently available
Amazon EC2 P5en instances are currently available through EC2 capacity blocks in the US East (Ohio), US West (Oregon), Asia Pacific (Tokyo) AWS Regions and the US East (Atlanta) local zone us-east-1-atl-2a Possible for ML, On-Demand, and Savings Plan purchase options. For more information, please visit the Amazon EC2 pricing page.
Try out Amazon EC2 P5en instances in the Amazon EC2 console. For more information, please visit the Amazon EC2 P5 Instances page and submit feedback to AWS re:Post for EC2 or through your regular AWS Support contact.
— Chany