By Reinhard Laubenbacher, Center for Quantitative Medicine, University of Connecticut Health Center, and Jackson Laboratory for Genomic Medicine
Job opportunities for graduates with degrees in the mathematical sciences have never been better, as the world is being viewed through increasingly quantitative eyes. While standard statistical methods remain the work horse for data analytics, new methods have appeared that help us look for all sorts of hidden patterns in data. Examples include statistical methods inspired by tools from abstract algebra, geometric data analysis based on methods from algebraic topology, and new machine learning methods, such as deep neural nets, combined with novel optimization methods. Most importantly, perhaps, an eye trained for the discovery of patterns can go beyond standard analysis approaches through ad hoc data interrogation. Mathematics can be viewed as the science of (non-obvious) patterns, so it is not surprising that a solid mathematics education makes for excellent training in data analysis. It is now more widely known than ever that mathematics is the key enabling technology for the solution of the most difficult scientific problems facing humankind. Human health is arguably at the top of this list. I will focus here on data analytics in healthcare, a field growing by leaps and bounds, although one can make similar statements about the need for mathematical scientists in many other areas.
The holy grail in health care lies in the integration of three types of data: basic research and clinical models, electronic health records, and population health data, such as health insurance claims data, an important step toward making personalized medicine a reality. For instance, genetic profiles of large populations, combined with their health records, lifestyle information, and insurance claims history can help us develop predictive tools for the attributes of healthy aging. Across these application areas, there are severe shortages of qualified data scientists who are able to go beyond the application of software tools to an understanding of the underlying algorithms and their limitations, developing and implementing modifications or new algorithms.
Regarding basic research and clinical practice, new, so-called next generation sequencing technologies are providing insights into molecular events at the genome level as well as the level of molecular networks, uncovering new approaches to the search for targeted drugs against a host of diseases. Mathematical and statistical models, based on gene sequence and expression data, combined with measurements of proteins and metabolites, provide the tools to distinguish normal cells from cancer cells, for instance. Molecular profiles of patients suffering from schizophrenia, combined with behavioral and clinical data, can point to more targeted drug prescriptions. New data types are being developed at breakneck speed, and data analysis methods are struggling to keep pace. In genomics, for instance, new sequencing technologies, such as atac-seq, allow the detection of so-called epigenetic features that capture the status of chromatin, a “wrapper” of DNA that needs to be unpackaged before a gene can be transcribed, or data that capture information about gene-gene interactions that utilize information about the 3D structure of chromosomes, rather than just linear sequence information.
The use of electronic health records promises to revolutionize the delivery of health services. Here too, challenges arise from the quantity of data, their heterogeneity and, frequently, a lack of appropriate data analytics methods, for instance the development of predictive models for patient response to a certain diabetes drug, given co-morbidities such as hypertension or heart disease. Finally, private and public health insurance providers have large quantities of data to analyze for their policy decisions. A major bottleneck in all these areas is the lack of qualified data analysts.
In my experience, M.S. and Ph.D. level mathematicians have the perfect intellectual skill set to excel at this type of problem. (Not surprisingly, the National Security Agency and hedge funds have recognized this some time ago.) The best background is a solid training in fundamental mathematics, algebra, analysis, topology, etc., combined with programming skills. Equipped with this skill set, acquisition of algorithms from statistics, bioinformatics, machine learning, and topological data analysis, to name a few, is straightforward, together with the needed domain knowledge. The intellectual flexibility of someone with solid mathematics training frees them from the limitation of practitioners with modest mathematics training to go looking for nails that fit their particular hammer.
What does this mean for the training of mathematics graduates to put them in a position to take advantage of this new “golden age?” As mentioned, I believe that a solid education in “pure” mathematics is best, together with some other skills. This should be complemented by hands-on experience in data analysis, ideally as part of an ongoing analytics project. Needed specific skills can be learned as part of “on-the-job” training. This, of course, requires that mathematics graduate programs partner with organizations such as medical schools, research institutes, companies, or state agencies to provide access to data projects. While this seems like a simple task, it can sometimes pose formidable obstacles. Nonetheless, existing M.S. and Ph.D. programs in mathematics need to make only relatively minor adjustments to their curriculum to train graduates that will be highly sought after in a broad range of healthcare-related organizations. It is worth emphasizing that, even though this contribution is focused on graduate education, many of my comments also apply to undergraduate mathematics and statistics majors.
Of course, many mathematics departments already have new or established activities in data analytics, ranging from entire degree programs, such as a professional M.S. degree at Georgetown University in Washington, DC, complete with industrial internships, to formal course offerings, such as a 1-year course sequence on data analytics at SUNY Albany. (It is generally difficult to glean such information from Departmental websites, and I would be grateful for any information about ongoing or contemplated efforts.) My main hypothesis in this contribution is that there is much that can be done with relatively minor administrative effort or restructuring of the curriculum. Most departments have appropriately generic course offerings on the books that can be used if formal credit is needed. And opportunities for hands-on training are plentiful and can be handled quite informally. The main requirement is probably one or two committed faculty members.
Based on my experience, there is a great willingness on the part of healthcare and biomedical research organizations to provide initial training to mathematicians who might not know the first thing about electronic health records or next generation sequencing, but come equipped with curiosity and some communications skills across fields. Almost all universities and colleges already offer relevant communications training that can be leveraged by a department. Many students are eager to combine their love for mathematics with a desire to solve real-life problems but, in my experience, many of them do not know how they can use their training for careers in “non-standard” settings. While biomedicine and healthcare typically do not offer the high salaries of the financial industry, they do offer a plethora of problems that can be solved by someone with mathematical training, whose positive impact on people’s lives can be clearly seen, providing strong motivation.
In response to questions about the usefulness of mathematics, students are sometimes told by their professors (including me, when I taught mathematics courses) that with mathematical training one can do “anything.” My experience in the life science and healthcare fields has taught me that there is a lot of truth to that assertion.