Data Science: What’s in it for the New Librarian?

The news is full of headlines describing the “rise of big data” and the consequent need for data scientists and “big data” professionals. Yet, as stewards of vast troves of printed and electronic information for generations, haven’t librarians always dealt with big data? Could it be that Data Science is just a “hype” term for what librarians have been doing all along?

Moore’s “Law”

Before we follow up on that question, please indulge a bit of recent history.  Many people are familiar with the idea of Moore’s Law: that the amount of computer processing power per chip doubles every 18 months. A similar idea, promoted by Mark Kryder, the former chief technology officer of Seagate (a hard disk manufacturer), suggests that the amount of data storage one can fit on a given area of a magnetic medium also doubles every 18 months. What is little understood about these “laws” is that doubling in a fixed amount of time creates an accelerating trend that starts off slow but eventually reaches a tipping point of massive growth. Between 2005 and the present we leaped from reasonably affordable disk drives that could comfortably hold all of your family photos to online “cloud” storage sufficient to digitize a whole floor of library books that is available completely free to anyone with a computer and an Internet connection. Today, for less than the price of a nice meal at a fancy restaurant one can buy a hard disk drive that has sufficient storage to hold the entire printed collection of the Library of Congress.

Today’s Data Problems need Generalists and Specialists

Thanks to this accelerating trend, hospitals, schools, manufacturers, colleges, retailers, government agencies, and libraries have begun to collect and store truly enormous amounts of data. The goal in many cases is to make use of these data to provide valuable new services or to improve efficiency. The problem with reaching these goals is that as the amount of storage and processing has grown, the complexity of the data and the challenges of working with it have also accelerated. In the good old days a programmer would write a program, a user would use the program, a statistician would analyze the data that the user produced with the program, and a librarian would archive the report that the statistician created by analyzing the data. Those days are gone. The reason we now see lots of job advertisements for “data scientist” is that there is a pressing need for interdisciplinary bridge builders who understand all of the above: the Internet, databases, analytics, visualization, and data curation. These professionals have their specialties – some are good at working with numbers, others are database experts, still others have expertise in unstructured data (e.g., text) – but they also need generalist skills that let them blend the wide range of methods needed to manage today’s data problems.

Where does the New Librarian fit in?

Librarians have always been great at information management and organization. This is a core skill in data science; it manifests most strongly in the data curation component of the big data problem. Many librarians are also outstanding communicators and have been trained in the art and science of transforming user information needs into strategies and resources for investigation and learning. So librarians clearly have roles at the start and the finish of the big data problem. But what about the middle of the equation, where data transformation, analysis, and visualization are the heart of the data science endeavor? This brings us back to our original question of how library science and data science are connected.

The essential task of the data science professional is to transform raw, messy data into actionable knowledge that can be used by decision makers. To paraphrase my astute colleague R. David Lankes, ‘the mission of librarianship is to facilitate knowledge creation in communities.’ It is easy to see the overlap here. A librarian does not need to become a programmer, but every librarian interested in knowledge creation should have some essential familiarity with how various software tools can transform data. A librarian need not be a database engineer, but every librarian must understand the underpinnings of information retrieval tools. A librarian does not need to be a statistician, but every librarian should have a clear understanding of how descriptive summaries and basic tests of numeric data can be used and misused. Finally, a librarian does not need to be a graphic designer, but every librarian needs to recognize the features of effective data displays. In short, to fulfill their missions, librarians can exercise a range of sophisticated skills that squarely occupy the central ground between understanding information user needs on one end and data curation on the other.

When you consider some of the key values that drive librarianship, however, it becomes evident that librarians must take a leading role in working with big data lest this emerging specialty become the servant only of proprietary interests. Librarians stand for open access to information, for privacy rights, for serving the information needs of the community, for the importance of accurate information in a democratic society, and for the necessity of preserving the legacy of historical information for future generations. Public library users, students in school libraries, and faculty and students in university libraries all depend upon these bedrock values to support their missions of learning, exploration, and citizenship. We’ve known for quite a while that fulfilling these missions requires much more than choosing, shelving, and lending books. In the near future, the ability to fulfill the roles of citizenship will require finding, joining, examining, analyzing, and understanding diverse sources of data. For a citizen to become an effective advocate tomorrow, she might need to “mash-up” map data, census data, health data, and environmental data to develop a meaningful understanding of a challenge that the community faces. Who but a librarian will stand ready to give the assistance needed, to make the resources accessible, and to provide a venue for knowledge creation when the community advocate arrives seeking answers?

Information v. Data

We frequently hear the word “information” paired up with other words to describe our world – the information age, the information industry, the information society. In this light, data science almost seems like a step backwards from the place where most librarians get their professional education: in graduate programs of library and information science. The excitement and burgeoning interest in data science, however, arises from a recognition that data are the raw ingredients of knowledge, and that we urgently need more professionals who possess a deep understanding of how to transform, analyze, and present data to facilitate knowledge creation. Librarians are poised to become the core of future cadres of data scientists, but doing so will require filling in that middle ground in data education where too few librarians have gone so far. Doing so will require an additional educational commitment, and quite possibly less attention to certain traditional topics. The tradeoff will be worthwhile, as data science holds enormous potential as a focus area in the future of librarianship.

What are your thoughts on librarians as potential data scientists?  Share in the comments.


Jeffrey M. Stanton

Jeffrey M. Stanton, Ph.D. is Interim Dean and Professor at Syracuse University’s School of Information Studies. Dr. Stanton’s research focuses on organizational behavior and technology. He is co-author of The Visible Employee: Using Workplace Monitoring and Surveillance to Protect Information Assets – Without Compromising Employee Privacy or Trust and Information Nation: Educating the Next Generation of Information Professionals. Dr. Stanton is also the author of an open source electronic textbook entitled Introduction to Data Science, which is published through the iTunes Bookstore.

More Posts

  • Sally Goodenough

    Great post, will go and read the book now. But to the statement “we urgently need more professionals who possess a deep understanding of how to transform, analyze, and present data” I would also like to see added “design, define and collect”. Without designing the data structure and defining its components and data values (e.g. using agreed standard vocabularies and/or master/reference data) the transforming, analyzing and presenting is not going to work well.

  • Pingback: Collection of Links: Big Data | The Modern MLIS()

  • Steve

    It seems the data science field is highly technical. Where does a librarian (even one with a relatively high-degree of technical expertise compared with the rest of the profession) fit in? Sure, librarians deal with data, but data science is a different ballgame. I pose this question because I have an Information Science degree myself. I’m trying to expand my horizons and I see a lot of opportunities in this field. I don’t know if it is beyond my reach, however. (I won’t be paying for any more education).

    • Guest

      @Steve: I am in agreemetn with you. I’m currently a second-year MLS graduate student in the *only* data science class offered in the MIM or MLS program. I have been reading so much about “Big Data” but I don’t think most library schools, including my own, are equipped to prepare students for the data job market since competition will be steep. It’s difficult to be an expert Jack-of-all-trades unless, I suppose, you’re a prodigy with a trust fund.

      • Jennifer Williams

        I also agree, and I’d love to hear what the author has to say about this. I am the “tech-savvy” librarian at my institution, but looking at the skills required for data science, I know I’d need additional schooling, which I cannot afford thanks to the extensive loans I had to take out to get the MLS in the first place. I’d love to transition into this field, but how?

  • Pingback: Data science and the future Librarian | Future Library()

  • Pingback: Voices From The Past Reflecting On The Future (Number 5): The Inferiority Complex - The Ubiquitous Librarian - The Chronicle of Higher Education()

  • Pingback: Librarian today, big data expert tomorrow… | Patrice Katsiroumbas()