Translating Data into Knowledge: The Data Science Initiative

By Ashley Serpa – Mining, organizing, interpreting, and analyzing data presents challenges for researchers of every stripe. Interdisciplinary and multifaceted, the UC Davis Data Science Initiative (DSI) is here to help.

Pamela Reynolds, academic coordinator at the DSI, describes data science as “an integrative, multi-disciplinary field translating data into knowledge.” Director Duncan Temple Lang, sees it as a pipeline for all the activities involved in working with data, from “identifying, acquiring and accessing, transforming and manipulating, exploring, visualizing, modeling, summarizing and making inferences to inform decision-making.” The DSI, a “campus-wide activity,” supports and provides training, and seeks to increase data literacy while elevating the university’s research potential.

Data literacy is an essential tool for researchers, and the DSI aims to make the technologies and skills of data science accessible to all members of the university. Data science at its core is an interdisciplinary endeavor that “touches all academic disciplines ranging from the traditionally quantitative and computationally heavy to the social sciences and humanities,” says DSI Associate Director for Humanities and UC Davis Library Director of Digital Scholarship Carl Stahmer.

Indeed, the DSI not only encourages cross-disciplinary collaboration and discussion, but relies upon it. A digital humanist, Stahmer’s research using text mining and natural language processing is an example of how data-enabled approaches can promote research in the humanities. One of Stahmer’s research projects looks at the flow of ideas in newsprint over time, overcoming challenges of text recognition and modeling from a corpus of hundreds of thousands of pieces of literature.

Seeking solutions to interdisciplinary challenges 

The DSI is helping spark innovation in research, too. For example, the DSI’s “un-Seminar” series, a novel forum for discussion that encapsulates the initiative’s ethos, brings together researchers from a variety of disciplines to brainstorm solutions to challenging research questions and applying data science techniques. With this interdisciplinary activity, the DSI hopes to “help the individual researcher not get bogged down in the limitations they think they have and instead empower them to reach for qualitatively novel research questions,” as Reynolds put it. From plant sciences to healthcare, feminism to political science, in the flipped-format seminars, researchers give a brief overview of a project, with a focus on where they’re stuck. Then the audience leverages their collective backgrounds to offer potential solutions.

In one un-Seminar, English professor Gina Bloom presented on wrangling and analyzing data from a Shakespearean video game and a biologist suggested an approach based on studying animal behavior and migration. These cross-disciplinary discussions help data scientists learn how to leverage and apply their skillsets to new data types and structures. They also help everyone figure out how to ask the right questions to solicit helpful feedback—an important skill—and foster an environment where innovative questions and unique solutions are born. The un-Seminars are also topical. For example, the upcoming un-Seminar on November 21, ‘Predicting Conversation Length of Protest-Related Discussions on Twitter’, will likely be of interest to an interdisciplinary range researchers given the current social and political climate. 

Computers as collaborators

In addition to serving as a university resource for data science training, the DSI also helps create computing packages to amass, organize and interpret data. This is invaluable in all disciplines, from the deeply quantitative to the highly qualitative. For example, Jacob Hibel, associate professor of sociology, is collaborating with the DSI to study California school board reporting documents called LCAPs. With thousands of pages of PDF documents submitted by school boards each year, it’s incredibly difficult to read them all. To help this research project, the DSI is developing a software package in the statistical language R to programmatically read and extract relevant text. 

In a similar vein, the DSI also worked with Mairaj Syed, professor of religious studies, to “web scrape” and organize religious accounts, called hadiths, to enable dating and authentication. Support from the “DSI was absolutely instrumental in furthering research on the project,” says Syed, who received a DHI grant for the collaborative research.

Stahmer advises that we should “think of the computer as a collaborator.” For example, in research projects that require reading through a vast number of documents, “there are ways in which the computer is a better reader than you are and there are ways in which you are a better reader than the computer.” By treating the computer as a collaborator, researchers can glean more, and potentially novel, information. Researchers set the parameters for programmatic searches and still need to check the computer’s “work”—even computers make errors, after all—but ultimately it can make navigating material easier and more efficient.

New skills, new angles

Another project at the DSI is leveraging tools from the humanities to help preserve and analyze seismographic earthquake data. Using techniques common for processing old manuscripts, the DSI will help geologists separate and extract important handwritten notes that appear across the seismograph wave etchings. “This is why I love the DSI,” says Reynolds. “Where else on campus would you get geologists, computer scientists, and people who study ancient manuscripts all in the same room, solving the same problem?” 

Jared Joseph, a graduate student in sociology and DSI affiliate, says “the skills I’ve acquired helped me look at questions from new angles, especially in regards to scale.” Scale, be it temporal or in the sheer amount of available source material, often limits research questions and projects, but improved data literacy can overcome this obstacle. For example, “Using NLP [natural language processing], I can look through thousands of documents at once, find what topics are appearing most often and what topics are related to each other,” Joseph says. “These are the same things I would be interested in if I hand-read the same documents, just faster and more consistent.” 

Innovation and opportunity

In addition to un-Seminars and assisting researchers with their research needs, the DSI is engaging with developing new educational opportunities in data science at UC Davis. These include a Designated Emphasis and Graduate Academic Certificate, a Data Studies minor, and a new academic unit. Along with workshops, free consultations and office hours, mini-courses, networking opportunities, and a modern co-working/hacker lab space in Shields Library, these educational opportunities represent the ways in which the DSI is applying and fostering a community of experts in data science across domains. Graduate students and postdocs who want to engage and strengthen their skillsets can apply to be DSI Affiliates, and faculty can submit collaborative projects and apply for membership. 

The DSI represents an innovative interdisciplinary support center providing forums for discussion, research and training. “There is an incredible amount of talent distributed across various research labs, teams and centers at this university,” says Reynolds. The DSI is building bridges between those communities to serve as an umbrella for learning and using data science. If you want to incorporate data science into your research, to advance your data literacy or are simply curious about the resources DSI has to offer, you can find them at Shields Library.

“You just need to come through the door,” Stahmer says.

The Data Science Initiative is located on the 3rd floor of Shields Library, room 360. A list of upcoming DSI events, workshops and information for joining their mailing list can be found on their website at dsi.ucdavis.edu. For direct questions, contact datascience@ucdavis.edu.

Filed under: