Covering Scientific & Technical AI | Monday, October 7, 2024

Cerebras and G42’s Inception Unveil Jais: A 13B Parameter Arabic LLM Trained on Condor Galaxy 

Condor Galaxy is an AI system recently debuted by Cerebras Systems and Middle Eastern cloud provider G42. The system has already been busy with training Jais, a 13-billion parameter Arabic large language model trained on a 395-billion-word Arabic and English dataset.

Named after Jebel Jais, the UAE’s highest mountain, the Jais LLM is a collaboration between G42’s Inception, Mohamed bin Zayed University of Artificial Intelligence (MBZUAI), and Cerebras. The open source model was trained on a purpose-built dataset of 116 billion Arabic tokens which the companies say was designed to capture the complexity, nuance, and richness of Arabic.

Inception says the Jais model’s release marks a significant milestone in the realm of AI for the Arabic world, as it was homegrown in Abu Dhabi and was built to empower the 400 million Arabic-speaking individuals across the globe with the potential of generative AI.

“By open sourcing Jais, Inception aims to engage the scientific, academic, and developer communities to accelerate the growth of a vibrant Arabic language AI ecosystem. This can serve as a model for other languages currently underrepresented in mainstream AI,” the company said in a statement.

Inception CEO Andrew Jackson says innovation thrives with collaboration, and this release sets a new standard for AI advancement in the Middle East, to ensure “the Arabic language, with its depth and heritage, finds its voice within the AI landscape. Jais is a testament to our commitment to excellence and our dedication to democratizing AI and promoting innovation.”

The company claims Jais is outperforming existing Arabic models by a sizable margin. Jais’s training data also included 279 billion English word tokens aimed at increasing the model’s performance through cross-language transfer. Inception also claims it competes with similarly sized English models, although it was trained with much less English data.

“This interesting result shows that the model’s English component learned from the Arabic data and vice versa, opening a new era in LLM’s development and training,” Inception said in a release.

Jais is now being utilized by organizations including the UAE Ministry of Foreign Affairs, the UAE Ministry of Industry and Advanced Technology, Department of Health – Abu Dhabi, Abu Dhabi National Oil Company (ADNOC), and Etihad Airways, according to Inception.

Condor Galaxy is a network of nine interconnected supercomputers (with a planned capacity of 36 exaflops) that promises to reduce AI model training time. The first AI supercomputer on this network is the Condor Galaxy 1 (CG-1) which has four exaflops and 54 million cores. CG-1 links 64 Cerebras CS-2 systems into a single system that is offered by Cerebras and G42 as a cloud service.

The companies’ shared vision is to use Condor Galaxy to address pressing challenges across healthcare, energy, and climate action, said G42 Cloud CEO Alkaissi. “Collaborating with Cerebras to rapidly deliver the world’s fastest AI training supercomputer and laying the foundation for interconnecting a constellation of these supercomputers across the world has been enormously exciting. This partnership brings together Cerebras’ extraordinary compute capabilities, together with G42’s multi-industry AI expertise.”

MBZUAI is a graduate research university dedicated to AI. MBZUAI President and University Professor Eric Xing said: “Developing such a high-caliber Arabic LLM demanded cutting-edge AI research in addition to an in-depth and nuanced understanding of the Arabic language, its diversity and heritage, and the growing importance of LLMs across all echelons of society. Thanks to our research and partnerships with Inception and other top regional and global organizations, MBZUAI will continue pioneering the development of LLMs that are efficient, effective, and accurate.”

Inception and MBZUAI say they will continue to expand and refine Jais as its user community grows. The model will be available for download on Hugging Face.

AIwire