GPU-Accelerated Deep Neural Nets Look for Cures that Already Exist
Discovering cures for cancer, for Alzheimer's, for multiple sclerosis, for Parkinson’s, for the halting and reversing of aging itself, may not require the development of new drugs. It may mean discovering properties and therapies in drugs already developed and used for other diseases.
That's the principle driving bioinformatics start-up Insilico Medicine, a Baltimore-based company utilizing GPU-accelerated NVIDIA advanced scale computing to power deep learning neural nets using massive datasets for drug repurposing research that targets aging and age-related diseases.
Drug re-targeting is not new. One of the best known cases is rapamycin, a drug originally thought to be an antifungal agent before it became widely used in in organ transplantation and then as a cancer fighter. Other companies have pursued drug re-purposing as a development strategy, but Dr. Alex Zhavoronkov, Insilico CEO, said his company using big data analytics to scale the strategy to a level never previously attempted.
Insilico researchers not only generate their own data, they ”scavenge” existing datasets that pharmaceutical companies and research institutions have retired because they were too small, in themselves, to provide much research value. Aggregated and analyzed, the data is providing Insilico, its pharmaceutical partners and physicians with insights into how medications designed and approved for one ailment can be redirected to attack another.
“We’ve found a way to suture together our data with many other databases,” said Zhavoronkov, “and then it starts making sense.” Altogether, Insilico has 3 million gene expression samples amounting to hundreds of terabytes of data. “The breakthrough is combining so many pieces of the puzzle in one particular place,” he said, explaining that Hadoop has been instrumental to harmonizing large amounts of unstructured, weakly related data, and then running Insilico’s drug scoring algorithms against it.
Of course, drug discovery is an endeavor prone to high hopes and false starts. Many new drugs are found early in the development process to have toxicities that cause unacceptable side effects. But Insilico’s approach has the advantage of focusing on 20,000 medications worldwide already in use, drugs that have been approved (either in the U.S. or in other countries) and whose side effects are known. Although Insilico’s anti-aging agenda may trigger skepticism, its basic methodology has drawn positive attention from investors (Deep Knowledge Ventures, Hong Kong), industry analysts and pharmaceutical companies.
In February 2015 at the Personalized Medicine World Conference in Mountain View, CA, Insilico was recognized as the “Most Promising Company” in the fields of human genetics and personalized medicine. In March, Insilico was one of 12 finalists selected to present at the Early Stage Challenge at NVIDIA’s 2015 GPU Technology Conference. In partnership with Novartis last September, Insilico organized an international aging forum at Basel Life Science Week in Switzerland. The company also launched bioinformatics research partnerships with ATLAS Generation (stem cell research), Vision Genomics (ocular diseases); Pathway (cancer research); and Canada Cancer and Aging (personalized medicine and aging research). And the company said Insilico research papers have been published in 50 peer-reviewed journals over the past two years.
Insilico has configured four NVIDIA DevBox desktop supercomputer, using TESLA K80 GPU accelerators and four Titan X graphics cards, for a total of 28TF of processing power.
NVIDIA GPUs are the foundational technology driving deep learning techniques used by Insilico to compare healthy and diseased tissues, as well as aged and young tissues, and then to test – in digital formats – the impacts of drugs on those tissues to restore them to health and youth.
Zhavoronkov said Insilico is experimenting with many flavors of deep neural nets as well as deep learning combined with more traditional research and testing methods. This includes deep feed forward neural nets using different data types as inputs, stacked autoencoders for cross-platform data harmonization, deep belief nets for drug scoring and, ultimately, drug repurposing.
While deep neural net concepts have been around for decades, a revolution in their use started around 2010 when deep learning systems were trained on large image datasets, initially achieving – and then surpassing – the image recognition capabilities of humans. Application of deep learning to genomics and drug discovery has been slow because training systems to enable algorithms to work with “multi-omics” and patient data requires the use of databases on such a massive scale. Insilico developed methods to augment its proprietary gene expression and proteomic data using Hadoop and other methods to harmonize and compare data from different sources and turn it into usable pathway activation profiles that can be used by deep learning algorithms. In so doing, the company has created biomarkers for cancer, Alzheimer's and other diseases.
The results include:
• DeepPharma, a GPU-based visual computing platform for creating virtual cells, tissues, bodies, and even virtual populations. These virtual laboratories are used is to simulate and test tissue-specific pathway activation – also called “net signaling drift” – measuring the effects of millions of compounds on the molecules within diseased or aged cells.
• OncoFinder, a personalized medicine decision-support tool that has been used by physicians, mostly in Europe and Asia, to help identify drug treatments for more than 800 patients.
“When you’re using deep learning in bioinformatics your only option today is GPU computing,” Zhavoronkov said. “Deep neural networks are evolving and revolutionizing many aspects of our daily lives – in pictures in videos in voice. GPU computing is becoming much more available and more databases, with millions of samples, also are becoming available. So success in deep learning is primarily centered around two factors: being able to utilize the full power of GPU computing, and access to huge databases.”
Development of the analytical algorithm used in OncoFinder took about six months, Zhavoronkov said, while another 18 months of virtual clinical trials was needed to test its predictive capabilities on retrospective data to validate its use in clinical settings.
Insilico is not required to undergo FDA or other regulatory approvals because OncoFinder is not used for diagnostics, Zhavoronkov said. Rather, it is a research service and decision support tool that helps doctors select medications that may be most effective in treating patients’ diseases.
Zhavoronkov said one of his greatest challenges has been assembling a staff combining expertise in machine learning, human genetics and pharmacology – particularly since deep learning is new to genomics research. “Finding talent that is qualified to experiment with deep learning applied to gene expression data is very difficult,” he said, “because you need people who are good with math and programming but also understand the biology. There are few people with this range of skills, so it’s a very precious resource.”
One of Insilico’s first aging-related projects is researching the process of skin aging. This involves “digitizing” both the net signaling drift between young and old skin tissue and then virtually testing digitized forms of old skin for drugs that correct this difference. The project includes investigation of the impact of ultraviolet irradiation (sunlight) of skin.
Zhavoronkov said Insilico has predicted the first compounds that may ameliorate the skin aging process and will announce its findings next year. Insilico also plans to partner with a company that “measures” facial wrinkles, which would then be used to assess the effectiveness of anti-aging medications identified by Insilico tools.
“Our first frontier is human skin,” Zhavoronkov said, “if you can successfully treat skin aging you can basically apply the principle to other tissues.”