Covering Scientific & Technical AI | Wednesday, December 11, 2024

AMD Expands ROCm 6.3 with Optimized Libraries for AI and HPC Workflows 

Nov. 26, 2024 -- AMD has announced the release of ROCm 6.3 open-source platform, introducing advanced tools and optimizations to elevate AI, ML, and HPC workloads on AMD Instinct GPU accelerators.

Credit: Shutterstock

ROCm 6.3 is engineered to empower a wide range of customers—from innovative AI startups to HPC-driven industries—by enhancing developer productivity. This release includes seamless SGLang integration for accelerated AI inferencing, a re-engineered FlashAttention-2 for optimized AI training and inference, the introduction of multi-node Fast Fourier Transform (FFT) to advance HPC workflows and more.

Explore these exciting updates and more as ROCm 6.3 continues to drive innovation across industries:

1. SGLang in ROCm 6.3: Super-Fast Inferencing of Generative AI (GenAI) Models
GenAI is transforming industries, but deploying large models often means grappling with latency, throughput, and resource utilization challenges. Enter SGLang, a new runtime supported by ROCm 6.3, purpose-built for optimizing inference of cutting-edge generative models such as LLMs and VLMs on AMD Instinct GPUs.

Why It Matters:

  • 6X Higher Throughput: Achieve up to 6X higher performance on LLM inferencing compared to existing systems as researchers have found, enabling your business to serve AI applications at scale.
  • Ease of Use: Python-integrated and pre-configured in the ROCm Docker containers enable developers to accelerate deployment for interactive AI assistants, multimodal workflows, and scalable cloud backends with reduced setup time.

Whether you're building customer-facing AI solutions or scaling AI workloads in the cloud, SGLang delivers the performance and ease-of-use needed to meet enterprise demands. Discover the powerful features of SGLang and learn how to seamlessly set up and run models on AMD Instinct GPU accelerators here.

2. Next-Level Transformer Optimization: Re-Engineered FlashAttention-2 on AMD Instinct
Transformer models are at the core of modern AI, but their high memory and compute demands have traditionally limited scalability. With FlashAttention-2 optimized for ROCm 6.3, AMD addresses these pain points, enabling faster, more efficient training and inference.

Why Developers Will Love It:

  • 3X Speedups: Achieve up to 3X speedups on backward pass and a highly efficient forward pass compared to FlashAttention-1, accelerating model training and inference to reduce time-to-market for enterprise AI solutions.
  • Extended Sequence Lengths: Efficient memory utilization and reduced I/O overhead make handling longer sequences on AMD Instinct GPUs seamless.

Optimize AI pipelines with FlashAttention-2 on AMD Instinct GPU accelerators today, seamlessly integrated into existing workflows through ROCm’s PyTorch container with Composable Kernel (CK) as the backend.

3. AMD Fortran Compiler: Bridging Legacy Code to GPU Acceleration
Enterprises running legacy Fortran based HPC applications can now unlock the power of modern GPU acceleration with AMD Instinct accelerators, thanks to the new AMD Fortran compiler introduced in ROCm 6.3.

Key Benefits:

  • Direct GPU Offloading: Leverage AMD Instinct GPUs with OpenMP offloading, accelerating key scientific applications.
  • Backward Compatibility: Build on existing Fortran code while taking advantage of AMD’s next-gen GPU capabilities.
  • Simplified Integrations: Seamlessly interface with HIP Kernels and ROCm Libraries, eliminating the need for complex code rewrites.

Enterprises in industries such as aerospace, pharmaceuticals, and weather modeling can now future proof their legacy HPC applications, realizing the power of GPU acceleration without the need for extensive code overhauls previously required. Get started with the AMD Fortran Compiler on AMD Instinct GPUs through this detailed walkthrough.

4. New Multi-Node FFT in rocFFT: Game changer for HPC Workflows
Industries relying on HPC workloads—from oil and gas to climate modeling—require distributed computing solutions that scale efficiently. ROCm 6.3 introduces multi-node FFT support in rocFFT, enabling high-performance distributed FFT computations.

Why It Matters for HPC:

  • Built-in Message Passing Interface (MPI) Integration: Simplifies multi-node scaling, helping reduce complexity for developers and accelerating the enablement of distributed applications.
  • Leadership Scalability: Scale seamlessly across massive datasets, optimizing performance for critical workloads like seismic imaging and climate modeling.

Organizations in industries like oil and gas and scientific research can now process larger datasets with greater efficiency, driving faster and more accurate decision-making.

5. Enhanced Computer Vision Libraries: AV1, rocJPEG, and Beyond
AI developers working with modern media and datasets require efficient tools for preprocessing and augmentation. ROCm 6.3 introduces enhancements to its computer vision libraries, rocDecode, rocJPEG, and rocAL, empowering enterprises to tackle diverse workloads from video analytics to dataset augmentation.

Why It Matters:

  • AV1 Codec Support: Cost-effective, royalty-free decoding for modern media processing via rocDecode and rocPyDecode.
  • GPU-Accelerated JPEG Decoding: Seamlessly handle image preprocessing at scale with built-in fallback mechanisms that come with rocJPEG library.
  • Better Audio Augmentation: Improved preprocessing for robust model training in noisy environments with rocAL library.

From media and entertainment to autonomous systems, these features enable developers to create better AI-advanced solutions for real-world applications.

Beyond these standout features, it’s worth highlighting that Omnitrace and Omniperf, introduced in ROCm 6.2, have been rebranded as ROCm System Profiler and ROCm Compute Profiler. This rebranding will help with enhanced usability, stability and seamless integration into the current ROCm profiling ecosystem.

Why ROCm 6.3?

AMD ROCm has been making strides with every release, and version 6.3 is no exception. It delivers cutting-edge tools to simplify development while driving better performance and scalability for AI and HPC workloads. By embracing the open-source ethos and continuously evolving to meet developer needs, ROCm empowers businesses to innovate faster, scale smarter, and stay ahead in competitive industries.

Explore the full potential of ROCm and see how AMD Instinct accelerators can power your enterprise’s next big breakthrough. The ROCm Documentation Hub and other avenues are being updated as we write this blog with the latest ROCm 6.3 content—details will be available very soon, so stay tuned!

About AMD

For more than 50 years AMD has driven innovation in high-performance computing, graphics, and visualization technologies. Billions of people, leading Fortune 500 businesses, and cutting-edge scientific research institutions around the world rely on AMD technology daily to improve how they live, work, and play. AMD employees are focused on building leadership high-performance and adaptive products that push the boundaries of what is possible.


Source: Ronak Shah, AMD

AIwire