Editor’s Note: Though the article access issues discussed by Harvard Professor Shieber are not related directly to the sharing of raw medical data, the sort of campaign he describes has implications for the issue of data sharing in all realms of science.

What was the motivation for the Harvard policy you won unanimous adoption of in February 2008?

A lot of work we have been doing has to do with sharing the write-ups of research once a piece of research is in a position to be synthesized into a written document.

Typically, that information has been distributed through articles in scholarly journals. The vast majority of fields use this avenue of publication and have done so for decades. But there is a systemic market failure in the scholarly publishing market. I think it’s well understood why that failure occurred. The standard business model for journals is that a publisher receives exclusive rights, typically in the form of transfer of copyright. They can then monetize those rights by selling access to subscribers. That is how it has worked for decades now. The symptoms of market failure include inelasticity of demand, and a persistent hyperinflation of those rates. Another symptom of market failure is huge price disparities: The subscription price per page for commercial journals is about six times the price per page of non-profit journals.

Looking at cost per citation of an article shows the difference even more starkly. The commercial journal cost per citation is 16 times higher than the non-profits. Such price disparities can only persist in a dysfunctional market.

There are two main causes of the underlying market failure: First, the goods that are being sold involve access, and access is a monopolistic good, so the publishers are able to extract monopoly rents. The second reason is what economists call a ‘moral hazard’: a situation in which a consumer is insulated from paying the true cost of a good they are consuming.

Why is there this moral hazard?

The consumers of the goods in this case are researchers accessing the articles. It is the libraries of academic institutions of the world that are paying the subscriptions, not the researchers. The consumers – the researchers – are completely oblivious of the costs being charged.

Subscription prices of journals have been rising at several times the rate of inflation for decades. When you have price increases like this, prices are growing exponentially. But libraries’ budgets aren’t growing commensurately. In fact, in the last few years, the economic downturn has had a terrible effect on institutional endowments, so library budgets are hurting and libraries are substantially decreasing their purchase of journals that researchers want to read.

Something has to give. The first thing that goes may be collecting of monographs and print versions of journals. Eventually libraries have to cut journal subscriptions, which means the researchers at those institutions can’t read the articles.

So what did you do about it?

We worked behind the scenes for several years, and in February 2008 we won unanimous support for an open access policy from the Faculty of Arts and Sciences at Harvard. Five other faculties within Harvard have followed suit, as have some other universities, including MIT and Duke.

The point of the Harvard policy was exactly to address the symptom of this problem — the symptom of reduced access that occurs when academic libraries cancel their subscriptions to journals that researchers want access to through their institutions. The whole point of writing articles in academia is so others can read them.  Researchers were writing articles at Harvard and other institutions and fewer and fewer people could read them as academic libraries canceled their subscriptions.

What does the policy state?

The policy has three parts. First, the faculty member grants permission to Harvard University to distribute the articles for free. Second, we wanted to maintain free choice of the faculty member as to whether or not rights are retained in this way: If for any reason the faculty member doesn’t want the university to have this license, the faculty member can direct that a waiver be granted for that article so the university does not have permission to distribute a given article for free. There is no vetting of the reason — it is completely up to the faculty member to make this decision. These two parts of the policy mean that by default the university is able to distribute the article unless the faculty member opts out and expressly says he or she doesn’t want the university to have this license.

Before the policy was adopted, if the faculty did nothing, they would have no rights to distribute the article beyond what the publisher permitted (unless the faculty member negotiated a special arrangement with the publisher). I did negotiate with publishers often, but very few other faculty negotiated such arrangements. Under this 2008 policy, the default is the other way around: By default the university is retaining the rights to distribute an article for publications unless a faculty member says not to do so. We have moved from an opt-in to an opt-out system.

Experience has shown in many areas that when there is a move from opt-in to opt-out, participation increases dramatically. We found that here too. Before, the amount of opting in was approximately zero; now it turns out that the opt-out option is exercised very rarely. In Harvard’s Faculty of Arts and Sciences, it looks like there is about a five percent waiver rate so for 95 percent of articles Harvard is able to distribute the article wherever it chooses. At MIT the whole university at once adopted this policy, and there I hear the waiver rate is about 1.5 percent.

These rights that are retained are transferable, which means they can be and are transferred back to the author. So authors are retaining rights — they can distribute their articles, assign them as reading in classes, and so forth as long as they don’t sell the articles — all without having to ask permission of the publisher.

The third part of the policy is that faculty commit to providing copies of their articles to a repository called DASH — Digital Access to Scholarship at Harvard. From this repository the articles are distributed, and we have in general broad rights to distribute them to anyone in the world on the web. For articles that don’t fall under the open-access policy (because there is a waiver or they predate the policy) we abide by the distribution rights provided by the publisher. If the publisher requires an embargo of six months or a year, say, we put them in the repository and when the embargo period runs out, the system automatically allows the article to be shared.

What does the DASH repository hold now?

There are more than 4,000 articles in the DASH repository today. We are distributing them to anyone who wants to read them. All are authored by members of the Harvard community. This repository is an effort to supplement the access that some people get through journal subscriptions, and many researchers do not.

Does the repository replace journals?

No, the repository provides  supplemental access. It doesn’t replace journals and isn’t intended to. We still need journals. They provide important services: management of the peer review process, production services, imprimatur.

The open access policy has a positive effect for our authors whether other universities and colleges establish similar policies or not. More people can read articles Harvard authors are writing.

What is the point of these open access policies?

The open access policies are intended to address the symptom of the underlying problem, the reduction in access to scholarly articles. But it doesn’t address the underlying problem itself, which is the market failure in the prevailing business model for journals.

Trying to address this market failure is a much harder nut to crack, because now we are looking at the business models publishers use to underwrite their journals. We need to find some alternative business model that doesn’t have this kind of market failure. We need publishers to move to this new business model – but we recognize that it may not be possible to make this shift happen.

What can be done to promote this shift in business model?

We are trying to make viable a business model for open-access journals. Open-access journals are like traditional journals except they don’t charge for online access to the articles they publish.

Of course, it costs money to run a journal and if you don’t charge for online access, how do you get revenue? One open-access model is to charge a publication fee – on the writer side of the equation – and not on the reader side. The universities’ libraries pay on behalf of the readers. For a publication fee model, you would have universities and funding agencies paying on behalf of the authors. If you don’t have that, there is a big problem regarding the financial sustainability of open access journals: They are at a disadvantage when trying to compete with traditional journals.

If we are going to place this open access model on a level playing field with subscription journals, we have to underwrite publication fees for open access journals just as we now underwrite subscription fees for subscription journals. We established the Compact for Open-Access Publishing Equity (COPE) to codify this kind of commitment to support open-access journals charging publication fees. The compact signatory universities commit to timely establishment of durable mechanisms for underwriting reasonable publication fees for open access journals. (Check out the website at oacompact.org.)

The idea of the compact is not that each university gets the benefit by doing it for itself. This only works if more or less everyone does it. You need lots of universities and funding agencies to pay their fair share. If every institution does its part, it may free up publishers to entertain this new revenue model. If few universities adopt the model, it won’t provide much impetus for publishers, but on the other hand, it won’t cost much either.

What are your thoughts on data sharing generally?

Data sharing is especially important because without access to the raw data, you can’t do the kind of verification and replication that you’d want to do. It’s a huge boon to having open data.

The thing that makes data sharing difficult as opposed to sharing articles as I have been talking about is that for the articles there is no incentive on the part of researchers to limit access. In general, researchers want the widest possible access to their articles. But for data there are counterincentives: The researchers may want to exploit the data they’ve painstakingly collected and it’s hard to tell people to make it public immediately and put them in competition with everyone else who wants to mine that data.

It is hard to make that decision for someone else and I think a lot of pressure against open data is trying to deal with this problem – how do you get data as open as possible while being cognizant of the desires of the researchers? A funding agency can say “if you take our money, you publish data according to our policy. If you don’t like that policy, you don’t have to take our money.” So funding agencies have a huge role with open data – much more than universities. It feels to me that funding agencies need to take the lead on this issue. The NIH has done this, saying, you have to put your articles in PubMed within a year if you accept our money. The same could be done by journals.

Stuart Shieber Bio:

Professor Stuart Shieber directs the Office for Scholarly Communication at Harvard University — a position he has held since the office was created in 2008. He is also a professor of computer science in Harvard’s School of Engineering and Applied Sciences. As a strong open-access advocate, he has led a multi-year effort to shape Harvard’s policies in this arena. The Harvard open-access policy, adopted unanimously by Harvard’s Faculty of Arts and Sciences in February 2008, states that “each faculty member grants to the President and Fellows of Harvard College permission to make available his or her scholarly articles and to exercise the copyright in those articles.” Since then, faculties from five other schools within Harvard have adopted open-access policies modeled on Shieber’s original policy, and other universities have done so as well.

Shieber received his B.A. in applied mathematics from Harvard in 1981 and his Ph.D. in 1989 in computer science from Stanford University. The focus of his study is computational linguistics — the study of human languages from the perspective of computer science.
Working in many academic realms — from linguistics to computer systems and theoretical computer science — Shieber studies the way that natural languages are formed to allow for efficient communication. He also studies problems presented by automated graphic design and is working on development of a more graphically articulate computer.

In 1989 Shieber joined the Harvard Faculty of Arts and Sciences. He also is the Center for Research on Computation and Society’s founding director and is a faculty co-director of the Berkman Center for Internet and Society at Harvard University.