Dr. Hal Barron, chief scientific officer and president of R&D at GSK, talks about the future of health
Machine learning is poised to unlock the mysteries of the genome, Dr. Hal Barron told attendees at this year’s virtual AI Summit London.
The man in charge of R&D for pharma giant GlaxoSmithKline explained that the sheer scale of human DNA data meant that conventional data analysis methods will struggle to cope.
And so, the next wave of healthcare innovation is set to be delivered through artificial intelligence.
We could do with a few medical breakthroughs
“We all have three billion base pairs in our DNA,” Barron said. “And just one base pair can change and lead to [there being] an increased risk of disease.
“We know about thousands of these associations where a base pair change leads to a disease,” he added. “But 90 plus percent of those associations, we don't know really why that is. We don't know what gene is responsible for that, what genes are, what proteins or, why.”
To try and answer these questions, a relatively new field of scientific study has emerged – functional genomics. It attempts to describe gene (and protein) functions and interactions using large genomic datasets.
GSK has one of the world’s largest datasets of human genomes, thanks to partnerships with the UK Biobank, Open Targets, FinnGen, and 23andMe – not to mention its in-house supply. But while more data means higher chances of a breakthrough, it also brings with it significant challenges.
“If you think about millions of people with thousands of genes, and three billion base pairs and all the different possibilities, the datasets are really large,” Barron said. “And when you combine that with functional genomics, where you're manipulating a gene, and the impact of other genes…”
Researchers create gene-gene interaction maps, where they have 20,000 genes, and another 20,000 genes, and try to see how they interact. “Now you're doing 200 million combinations to see what impact it has at the cell level,” Barron explained. “You start to realize you're dealing with trillions and trillions of data points, even per experiment.
“And no human can interpret that it's just too complicated. You can look at a tiny piece of that and maybe glean something, but to really fully appreciate the deep semantic representation behind these genes, that requires machine learning.”
Barron believes the three fields that will be able to find genes responsible for diseases are human genetics, functional genomics, and machine learning. It’s early days, but GSK claims the combination has allowed it to be twice as successful at developing medicines for certain gene-related diseases.
Barron has great hopes for the future: “I was talking to our head of AI and machine learning, Kim Branson, about what would it take for a drug discovery that is analogous to AlphaGo. And I think we can find new targets, where no one would have seen them, no one would have gotten to them anytime soon in terms of understanding, because the biology is just years and years away from being discovered.”
He believes it won’t be long before machine learning helps discover genes that cause certain diseases, helping us prevent them: “I think that would be phenomenal to actually make a discovery on a target that we believe is important and that has a very significant unmet medical need.”
“It's a very, very high bar, but I think within the next year or two, we might find a target that can make a real difference.
“I like to think of this as almost like a new microscope – we can see things with machine learning, when the datasets are large enough that you just can't see [as a human]. They're right in front of you, but our existing microscopes are not sensitive enough.”
Really large datasets
An AI-led breakthrough in genomics would mark a huge step for machine-led pharmaceutical research, a field explored by several companies. There's the George Church-affiliated Dyno Therapeutics, which came out of stealth earlier this year; there’s the work at MIT on COVID-19 research; and, most notably, there’s Google.
The search giant launched Calico in 2013, a subsidiary focused on “curing death,” hoping that deep pockets and understanding of machine learning would give it a head start.
But the company suffered several setbacks, including the departure of its AI lead Daphne Koller in 2018. She left to found insitro, which hopes to use machine learning in drug development.
That same year, Calico had another high-profile departure – Hal Barron, head of its R&D at the time.
"One of my biggest lessons learned from spending time at Google through Calico was that machine learning really requires very, very large and highly dimensional datasets," Barron said.
"And engineers are used to working with these massive, massive datasets, but not in the biological sciences. We would show machine learning people what we thought were these massive datasets, they'd chuckle and say ‘well, that's not even large.’ And yet, these are the largest datasets we've ever had."
Now, with functional genomics working on the human genome, Barron thinks that life sciences have finally found a project with a large enough data set. "It's getting close enough that machine learning people can say, 'Okay, now you're talking.'"