Susan Gibbons Interview

Bio: Gibbons is vice provost and dean of the River Campus Libraries at the University of Rochester in Rochester, New York. Under her guidance, the Libraries have studied the work practices of faculty, researchers and students in order to improve the libraries’ services, facilities and digital presence. In 2007 The Chronicle of Higher Education recognized Gibbons, who has been highly praised for her pioneering work in the field of library administration, as an “up and coming” librarian.

Have you had specific experiences with data sharing, and if so, when did you begin your data sharing effort?

Data sharing is not nearly as systematic at our University as would be ideal.  Just the question of “what is data?” varies quite significantly across the many disciplines that are represented at our University.  In some cases, it is data that has been purchased (such as data sets used by the political science faculty) and those faculty are seeking a place to store the data and control its access (e.g., it may only be licensed for use by University faculty).  In other cases, we are talking about datasets that need to be “publicly available” per the requirements of a funding agency.

When the NIH added a data-sharing clause into their grants of over $500K in October of 2003, we, in the University Libraries, scrambled to prepare for what we thought was going to be a flood of data sets seeking a home in an institutional repository.  In reality, the data sharing strategies are much more disparate than we had imagined, so the flood of data sets is yet to come.

What type of data have you shared; what sort of work load and costs did the data sharing impose on you and your colleagues?

There is no systematic data sharing program; it is distributed across the institution with very limited interactions with the Libraries.  Consequently, it is not possible to put a cost onto it.

Was your data sharing effort successful? Please be as specific as possible about any benefits derived from data sharing?

Although our University was one of the earliest to provide an institutional repository for, among other things, the long-term storage and distribution of data, our repository has had very limited use as a data repository.  The repository is being used to store and distribute, for example, dissertations, white papers, technical reports, images, scans of public domain music scores from the Sibley Music Library’s collection, but very little on the data set sharing front.

What problems/hurdles have you encountered personally or what problems/hurdles have you observed generally in the scientific realm?

In talking to faculty and researchers about data sharing, it is clear that there are a lot of barriers.  First, there is the concept of “this is my sandbox and I don’t want anyone else playing in it yet.”  This reflects a natural desire to first make sure you have exhausted the potential uses for a data set before sharing it with others.   A second barrier we heard was a wish to know who and how others would be using a data set; some researchers much preferred a method wherein another researcher would have to ask and explain how he/she would use a dataset before gaining access.  A third barrier is a widely differing opinion about the expected duration of the useful life for the data.  Are we trying to preserve and make available this data for 1 year? 5 years? Forever?  The duration answers require very different strategies.

How did you tackle those hurdles?

Personally, I think that clear guidelines and mandates from funding agencies is an effective way to ensure some level of compliance.  For example, I personally think the NIH needs to be much more explicit about its expectations for data sharing.  If the policy was much more specific about when data needed to be shared, for how long, and  what rights, if any, the creators of the data retain, then more formalized data sharing strategies might start to emerge at the institutional or  regional or disciplinary levels..

From the researchers’ perspective, I think smaller repositories for specialized areas are much more compelling and appealing than institutional repositories. A data repository could be one locus for the community of researchers in a given subfield to gather and form around.  However, as the funding struggles for arXiv (the preprint archive for physics which is physically housed at Cornell) are demonstrating, there isn’t a clear funding model for small (relatively speaking) distributed repositories.  I think a clearer funding model can be created out of a national data sharing repository, but then I think the buy-in from the user community would be less.

What agency should run it/them and why? If you advocate a system of smaller repositories organized around specific disciplines, should there be an overview agency that coordinates/supervises interactions among databases? If so, describe the agency’s proposed function.

I honestly do not know the federal agencies well enough to be able to point to one or two that I think would do a good job at running data repositories.   The best model I can conceive would have the main agencies of the government (NIH, NSF, DOE, DOD, Dept of Education, etc.)  identify, “bless,” and help fund existing repositories for their fields. PubMedCentral and the Department of Education’s ERIC (Education Resources Information Center) system are good examples of repositories that are firmly established within the given disciplines.  Presently, those repositories are largely for the archiving of articles, white papers, and other finished products.  However, it seems to me, that the raw data that supported the conclusions and finding of an article should be bundled and stored with the article.

Ideally, we could get to a system wherein when you access an article you are also accessing the raw data, lab notes, images, and other artifacts that the research has spawn.  In the absence of obvious existing subject repositories, then the funding agencies should partner with the main professional societies to help create and fund repositories.  Without the involvement of the professional societies, I think the repositories will struggle to create legitimacy in the discipline.

What are your suggestions on the most powerful ways to combat researchers’ resistance to data sharing? How effective do you see the requirements now in place by the NIH and high-profile journals requiring data sharing as condition of funding? Do you believe more teeth should be put into these requirements and, if so, how?

From my limited view, I think the open access mandate for the NIH is far more successful than the data sharing conditions.  The NIH Public Access Policy mandates that scientists submit journal articles that arise from NIH funding into PubMed Central so that they can be made publicly accessible no later than 12 months after publication.  When the NIH public access policy only “strongly encouraged” deposits into PubMed Central, the deposit rate was lower than 5%.  With the more recent NIH Public Access Policy Mandate, submissions have increased dramatically http://www.nihms.nih.gov/stats/index.shtml.  I think a similarly, specific and clear mandate needs to occur if data sharing is going to occur as well.

With stronger data sharing mandates and penalties (e.g., failure to comply puts a researcher’s entire institution at jeopardy for future funding from the agency), the institutions would likely have to put together structures to help their researchers with compliance.  Most researchers themselves do not have the time nor currently prioritize data sharing to make the deposits themselves, so you need to create incentives for the institutions to create the necessary support structures to make this happen for the researchers.