Data Science Across the Liberal Arts
Data science is the study of the generalizable extraction of knowledge from data, and the liberal arts are those subjects or skills considered essential for a citizen to know in order to take an active part in civic life. Today, liberal arts education requires training in data science, and data science is intimately tied with skills across the liberal arts landscape. UW-Madison needs to meet the challenge of data science across the liberal arts, and the Sloan Foundation can be a key player in this transformation. Put simply, we need to train a new generation that can tell meaningful stories with data.
A 2011 McKinsey report (https://www.mckinsey.com/capabilities/mckinsey-digital/our-insights/big-data-the-next-frontier-for-innovation) projected “a shortage of talent necessary for organizations to take advantage of big data. By 2018, the United States alone could face a shortage of 140,000 to 190,000 people with deep analytical skills as well as 1.5 million managers and analysts with the know-how to use the analysis of big data to make effective decisions.” That is, the US needs to train roughly 50% more individuals than are currently employed in such jobs. The President’s Council of Advisors on Science and Technology (PCAST) reported in 2013 that the US needs one million additional STEM college graduates by 2018, and encourages transforming STEM education to engage students in active learning through evidence-based research experiences (https://eric.ed.gov/?id=ED541511). In a recent visit to UW, Susan Singer, NSF Director of the Division of Undergraduate Education, stressed how NSF and other funding agencies are creating new grant opportunities to gear up US institutions for this challenge.
Big data made a splash in 2012, but it is now recognized that “big” is relative, depending on context, amount of data and complexity of the problem.
The UW-commissioned EAB market study, as well as the CoE Applied Computing market study, point to broad interest in development of a non-traditional program in data science. We expect workforce employee interest from many companies based in Wisconsin, as well as beyond. Here are some examples of data science at UW-Madison:
- Data science interest is profoundly felt at UW-Madison, with the major emphasis in L&S departments. In particular, the past few years have witnessed dramatically increased undergraduate enrollment in courses and programs in mathematics, statistics and computer science. The statistics undergraduate major has increased 10-fold (now 160+) in just over five years without any marketing. This recent trend, particularly in statistics, is national and global, evident at schools across the country.
- Computer science, mathematics, and economics recently developed new masters or certificate programs that address demand for data science. Statistics just created an option to its MS program in Data Science. Other programs or groups of individuals are in discussion about developing new programming in data science or data analytics, including a proposed MS in Biomedical Informatics, a proposal under development for Data Analytics & Applied Computing (interdisciplinary effort led by Dan Negrut, CAE), and internal discussions in the School of Library Information Science, the School of Journalism and Mass Communication, and the Department of Sociology. Again, all of these efforts have strong roots in L&S.
- Statisticians, mathematicians and computer scientists are “data scientists”, but many other professionals are as well. The obvious ones at UW-Madison are in BMI, ECE, and CAE, but look further to actuarial science, marketing, library information science, and journalism & mass communication. Reach further to art history and geography, where curation and overlay of widely varying visual information and other “metadata” has exploded.
- Many other disciplines are “users” of data science, including biology, chemistry, physics, music and sociology, and more broadly the digital humanities. However, “applications” in these and other areas are actually generating new theory and methods, often through interdisciplinary collaborations with “core” data scientists. More fundamentally, discipline-based innovations in data science are driving the development of new data science theory and methods.
What has changed? As an example, the costs of sequencing individual human genome is shrinking and will soon be under $1,000, but the costs to make (some) sense of that genome may be $20,000 and are growing. Today, inexpensive massive data measurements made possible by advances in data management and high volume computing are meeting serious shortfalls in terms of analytic and visualization methods. Beyond the pressing need for well-trained personnel with deep interdisciplinary skills, there are equally important needs for agile workflow systems that allow diverse teams to share and develop new methods to analyze, visualize and interpret data.
Data science has emerged as an inherently interdisciplinary field, or consortium of fields, requiring a combination of quantitative, computational and communication skills. We do not know if “data science” has staying power beyond the next ten years, but we do know the needs it represents cannot be ignored, and that training in data science is now vital for a liberal arts education.
At the same time, data science is diverse, and very context dependent, and must usually be complemented by deep discipline-based knowledge, whether that be English or music computer science or biology. While some students may crave a degree in Data Science, it is unrealistic for anyone to be an expert in all aspects of this emerging field. Rather, it is important to be well trained in an established discipline, coupled with technical/quantitative and communication skills that enable adapting to the needs of complex projects and working with teams across multiple disciplines. Tomorrow’s data scientist will draw on deep knowledge and broad experience to tackle the https://en.wikipedia.org/wiki/Wicked_problem so prevalent today.