GenAI Show Us What’s Most Important, MinIO Creator Says: Our Data
Generative AI is a fundamental breakthrough that will have far-reaching implications for computing, according to MinIO CEO and co-founder Anand Babu “AB” Periasamy. But the biggest impact GenAI will have, he said, is reminding businesses of their most important asset: their data.
There’s no denying that GenAI has generated its share of hoopla over the past 14 months. From warnings of human extinction to predictions of a $7 trillion economic impact, GenAI has caught people’s attention, for better or worse.
While some of the fanfare is obviously unwarranted–no, GenAI is not going to replace all workers with digital robots–it is also capturing the imaginations of some of the world’s foremost technologists. You can count Periasamy, who co-founded the open source object storage company MinIO and created the distributed file system Gluster before that, among those who have been quite impressed with what GenAI has showed so far.
“GenAI is actually a real, fundamental breakthrough,” Periasamy told Datanami in a recent interview. “I would look at it at the most significant breakthrough in all of computing. It will take two to three years for us to see the major impact, but the impact will be huge.”
A lot of the startups that have popped up around GenAI are full of hot air. But just like the dot-com boom and subsequent flame out created the fertile soil through which advanced Web technologies eventually sprouted, today’s GenAI revolution will eventually yield paradigm-shifting changes in how we use technology, he said.
“The breakthrough is real,” Periasamy said. “There will be a lot of hype. There will be bunch of startups going out of business in two to three years. But I think, just like the real dot-com effect we saw the benefit of it after the bubble burst, the same thing will happen here too.”
New Value from Data
Today’s hot GenAI applications are primarily chatbots and copilots. As ChatGPT showed, you can carry on a conversation with GenAI for hours or even days on end. And GenAI copilots, such as the popular one offered by GitHub that can write boilerplate code, are warming the cockles of developers tired of the same old routine.
But the biggest impact that GenAI will have is unlocking that has been value trapped in data, Periasamy said.
“The proprietary data that every business has, they are starting to realize that, even without hiring any data science or engineering, they can now procure a software stack and then fine-tune a data store–a data store on MinIO” to mine it, he said. “All of the data you are now storing on object store, they’re able to put it to use very quickly. This was not possible before.”
Only the biggest companies with names like Anthropic and OpenAI will develop large language models (LLMs). A larger (but still relatively small) group of companies will take the next step and fine-tune those existing LLMs on their own data, Periasamy said.
The real sweet spot of GenAI, however, will be found by companies that use less sophisticated methods like prompt engineering and retrieval augmented generation (RAG) to connect their internal data to open source LLMs, he said.
“You can take these foundational models and play on them without ever training or fine tuning, or even hiring a single data scientist inside your organization,” the 2018 Datanami Person to Watch said. “Because once you vectorize [your data], you can now comprehend that knowledge and incorporate that on top of the foundational data. That is your organization’s expert.”
It takes just a modicum of technical skill to get started with GenAI. Anyone who can write a basic Python script figure out how to connect data data to an LLM using RAG techniques or prompt engineering, Periasamy said. The key step is vectorizing the business data to make it accessible to the LLM. The hardest part of that is creating the vector indexing, he said.
Processing Blockages
The biggest hurdle to GenAI over the past year has arguably been getting one’s hands on GPUs. Production GenAI systems are processor-hungry, and high-end GPUs from Nvidia have been in high demand. Some of the bigger companies have even hoarded them, and it can be tough to find them in the cloud.
“The advantage of GPU is they have a huge graphics memory, and that is needed for holding large models,” Periasamy said. “With small models, you can even run on the CPUs. But the large models you need to have H100, A100 GPUs.”
The good news is that the GPU bottleneck is starting to ease, Periasamy said. As Intel and AMD successfully roll out midrange GPUs in large numbers, it will put pressure on Nvidia to lower prices and ease the entire market, he said.
When that finally happens–Periasamy estimates the GPU squeeze will start to ease later this year–the race will be on to see which businesses can make the best use of all the unstructured data they have shoved into their object store over the years.
“The fight will be around who has the most valuable data and how to put them to use. This is where enterprises will see a big push,” Periasamy said. “All of the data they are now storing on object store, they’re able to put it to use very quickly.”
MinIO is already playing a central role in all this, at several levels. As an S3-compatible object storage system capable of storing hundreds of petabytes in the cloud or on-prem, MinIO already store a lot of the unstructured data that will eventually be running through LLMs. It’s also being used to store vector embeddings for vector databases, such as Milvus.
Periasamy isn’t one to add new capabilities to MinIO for the sake of it, which is a direct reflection of the object store’s minimalist approach “We’re an anti-roadmap company,” he said. “If you ask me to remove a feature I will gladly do it. For me to add a new feature, you have to convince me why MinIO is incomplete without it.”
Nevertheless, new features are in the works to accommodate GenAI. The details are still hazy, but it seems likely that MinIO will be gaining an add-on that enables the execution of functions to facilitate GenAI.
When Periasamy founded MinIO back in 2014, he stated it was his intention to “solve storage” for unstructured data. But solving storage was just the first step in his plan to tackle bigger problems and deliver bigger solutions, including enabling deep learning and AI on mass amounts of unstructured data. With the current breakthroughs we’re seeing in GenAI on unstructured data and MinIO’s embrace of it, it would seem that events are progressing in close accordance with Periasamy’s initial plan.
This article originally appeared in Datanami.
Related
Alex Woodie has written about IT as a technology journalist for more than a decade. He brings extensive experience from the IBM midrange marketplace, including topics such as servers, ERP applications, programming, databases, security, high availability, storage, business intelligence, cloud, and mobile enablement. He resides in the San Diego area.