Tony Kerlavage on Data Lakes, Data Commons, and Empowering the Research of the Future

Banner Trenches 01

At the National Cancer Institute, Tony Kerlavage knows quite a bit about managing very large pools of data. When NCI launched the Genomic Data Commons, it aimed to democratize access to the genomic data in The Cancer Genome Atlas and other sources. Since then, though, Kerlavage points out that our data types and volumes have only grown. Now NCI is taking a “Commons of Commons” approach to link pools of well-structured data. “The more data we can bring together in a well-structured way, the more value it has in the long run,” he believes. He advocates for sharable Python notebooks and reusable R programming, believing significant investments in data hygiene and interoperability delivers more value than simply mining data lakes with artificial intelligence tools—for now, at least. The challenge for researchers, Kerlavage says, is to view their work with an eye to the future: How might someone else use this data going forward?

Tony Kerlavage, Director, Center for Biomedical Informatics & Information Technology, National Cancer Institute
Dr. Tony Kerlavage has served as the director of CBIIT since May of 2019. He joined NCI as a program director in 2011 after more than 25 years in the public and private sector as a leader in bioinformatics and genomics. He became chief of the Cancer Informatics Branch in 2012 and acting director of CBIIT in 2017. During his tenure, NCI’s efforts in advancing open data, open software, and open science have increased exponentially. Dr. Kerlavage has led ground-breaking efforts in these areas, including helping to establish the NCI Cancer Cloud Resources and the Cancer Research Data Commons.

Share:

Newsletter

Get updates from BioTeam in your inbox.