Covering Scientific & Technical AI | Monday, October 7, 2024

Vast Data Boosts AI Infrastructure with New Unified Data Platform 

(Gorodenkoff/Shutterstock)

Vast Data unveiled a new platform at its Build Beyond event earlier this week. The VAST Data Platform is the company’s new offering that unifies storage, database, and virtualized compute engine services, designed for the deep learning era.

The proliferation of large language models has thrust generative AI and deep learning into the spotlight. Vast says this new era of AI-driven discovery has the potential to accelerate solving humanity’s biggest challenges like fighting disease, addressing climate change, and uncovering new fields of science and math.

As enterprises build AI applications for these endeavors, data management has become an essential aspect. Deep learning applications require AI infrastructure that can deliver capabilities like parallel file access, GPU-optimized performance for neural network training and inference on unstructured data, and global access to data from multiple sources including hybrid multi-cloud and edge environments.

“This new data platform was designed for the deep learning era to scale up to levels natural data requires – pictures, genomes, video, sound – and to enable machines to understand and generate insight and discoveries from these vast datasets,” said Vast Data Founder and CEO, Renen Hallak, as he unveiled the new platform at Build Beyond.

In designing the new platform, Vast says it sought to resolve fundamental infrastructure tradeoffs that have previously limited applications from computing and understanding datasets from global infrastructure in real-time. The company considered many types of unstructured and structured data in designing the platform, including data from video, imagery, free text, data streams, and instrument data.

(Source: Vast Data)

To close the gap between event-driven and data-drive architectures, Vast says the VAST Data Platform can access and process data in any private or major public cloud, understand natural data by embedding a queryable semantic layer into the data itself, and continuously and recursively compute data in real time.

When building AI applications, it is necessary to give structure to unstructured data. To address this, Vast has added a native semantic database layer, VAST DataBase. The company says it was designed for rapid data capture and fast queries at any scale and claims it is the first system to break the barriers of real-time analytics, from the event stream to the archive.

The second element in the new platform is the VAST DataEngine, a global function execution engine consolidating datacenters and cloud regions into one computational framework. Vast says the engine supports popular programming languages like SQL and Python and introduces an event notification system along with reproducible model training for managing AI pipelines.

Finally, the third key element is the VAST DataSpace, a global namespace that the company says permits every location to store, retrieve and process data from any location with high performance while enforcing strict consistency across every access point. The DataSpace allows the new platform to be deployed in on-prem and edge environments while also extending it to leading public clouds including Google Cloud, AWS, and Microsoft Azure.

In its "Worldwide AI Spending Guide," IDC predicts global investment in AI-centric systems will continue to grow at double-digit rates to reach a five-year CAGR of 27%, eventually exceeding $308 billion by 2026, according to Ritu Jyoti, group VP of AI and automation research practice at IDC.

“Data is foundational to AI systems, and the success of AI systems depends crucially on the quality of the data, not just their size. With a novel systems architecture that spans a multi-cloud infrastructure, Vast is laying the foundation for machines to collect, process and collaborate on data at a global scale in a unified computing environment - and opening the door to AI-automated discovery that can solve some of humanity's most complex challenges,” Jyoti noted.

Nvidia’s partnership with Vast was highlighted at the event. The new VAST Data Platform is integrated with Nvidia’s DGX AI supercomputing infrastructure, accessible to enterprises building generative AI applications.

(Source: Vast Data)

Manuvir Das, VP of enterprise computing at Nvidia explained how the company has seen accelerated computing evolve, noting that it seems to have come full circle.

“If you think about the evolution of computing, it's been interesting the phases it has gone through. Back in the 2000s, there was a realization that the workloads require more and more data, and so we moved into a model of data-centric computing,” he said.

“And then we had the advent of the cloud,” Das continued. “It was this great place to find compute, but the storage buffers were basically empty, where people started filling up those storage repositories in the cloud. So we actually went back to a model where we were bringing data to the compute again. I think now we've gone full circle where there's enough data in these locations in the clouds that we can think about bringing compute to the data again.”

One user of the VAST Data Platform is the nonprofit research group Allen Institute, which uses it to process the large datasets needed to map neural circuits in its research focused on the brain.

David Feng, director of scientific computing at Allen Institute, said the organization collects a gigantic amount of data with new files growing to hundreds of terabytes within a few days.

“Everything changes about how you need to manage data when it’s that big and that fast,” he said in a statement. “We were excited to work with Vast because of the performance they could offer at this scale, and the system’s multiple protocol support is critical to our entire pipeline. Taking advantage of new advancements in AI will be pivotal to help us make sense of all of this data, and the VAST Data Platform allows us to collect massive amounts of data, so that we can ultimately map as many neural circuits as possible - and its mechanisms for collaboration enable us to rapidly share that data around the world.”

AIwire