<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Scientific Data Sharing Project</title>
	<atom:link href="http://scientificdatasharing.com/feed/" rel="self" type="application/rss+xml" />
	<link>http://scientificdatasharing.com</link>
	<description>Sharing data to help humanity.</description>
	<lastBuildDate>Tue, 27 Sep 2011 00:06:19 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.1</generator>
		<item>
		<title>DataCite Summer Meeting highlights issues and advancements within data sharing communities</title>
		<link>http://scientificdatasharing.com/general/datacite-summer-meeting-highlights-issues-and-advancements-within-data-sharing-communities/</link>
		<comments>http://scientificdatasharing.com/general/datacite-summer-meeting-highlights-issues-and-advancements-within-data-sharing-communities/#comments</comments>
		<pubDate>Tue, 20 Sep 2011 23:01:08 +0000</pubDate>
		<dc:creator>angela</dc:creator>
				<category><![CDATA[Events]]></category>
		<category><![CDATA[Featured]]></category>
		<category><![CDATA[General]]></category>
		<category><![CDATA[News]]></category>

		<guid isPermaLink="false">http://scientificdatasharing.com/?p=1206</guid>
		<description><![CDATA[Late this August, California became a haven for proponents of data-sharing as the California Digital Library played host to the annual DataCite meeting in Berkeley. DataCite is a non-profit organization which aims to promote the sharing and re-use of research data by helping to provide tools to support a global infrastructure for data archiving, access, and citation. DataCite is composed [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://scientificdatasharing.com/wp-content/uploads/2011/09/DataCite.png"><img src="http://scientificdatasharing.com/wp-content/uploads/2011/09/DataCite.png" alt="" title="DataCite" width="132" height="150" class="alignnone size-full wp-image-1209" /></a><br />
Late this August, California became a haven for proponents of data-sharing as the California Digital Library played host to the annual DataCite meeting in Berkeley. DataCite is a non-profit organization which aims to promote the sharing and re-use of research data by helping to provide tools to support a global infrastructure for data archiving, access, and citation. DataCite is composed of an international consortium of libraries, data centers, research institutions.  </p>
<p>The keynote speaker, John Wilbanks (<a href="http://sciencecommons.org/about/">Science Commons</a>), kicked off the meeting by describing the new concept of data in the information age and highlighting the types of tools and infrastructure needed to advance this concept, potentially leading to enormous growth in knowledge building capabilities.</p>
<p><a href="http://www.researchremix.org/wordpress/">Heather Piwowar</a> described some of her research providing quantifiable measures of the benefits to sharing data, including <a href="http://www.ncbi.nlm.nih.gov/pubmed/17375194">increased visibility of the data generator&#8217;s work (via increased citation rate)</a> and <a href="http://www.ncbi.nlm.nih.gov/pubmed/21593852">favorable returns on investments in data sharing</a>.</p>
<p>Even representatives from various publication outlets are getting in on the data-sharing fervor. Hylke Koers of Elsevier spoke about their effort to develop an &#8216;<a href="http://www.articleofthefuture.com/">Article of the Future</a>&#8216; which would include links to associated shared applications and data repository materials along with the standard journal article as we are familiar with it. </p>
<p>Many other wonderful speakers described specific projects aimed at supporting a culture of data sharing, and you can learn more about this informative meeting by going to the <a href="http://datacite.org">DataCite website</a></p>
]]></content:encoded>
			<wfw:commentRss>http://scientificdatasharing.com/general/datacite-summer-meeting-highlights-issues-and-advancements-within-data-sharing-communities/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Not by Metadata Alone: The Use of Diverse  Forms of Knowledge to Locate Data for Reuse</title>
		<link>http://scientificdatasharing.com/general/not-by-metadata-alone-the-use-of-diverse-forms-of-knowledge-to-locate-data-for-reuse/</link>
		<comments>http://scientificdatasharing.com/general/not-by-metadata-alone-the-use-of-diverse-forms-of-knowledge-to-locate-data-for-reuse/#comments</comments>
		<pubDate>Sun, 13 Feb 2011 20:46:21 +0000</pubDate>
		<dc:creator>Editor</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://scientificdatasharing.com/?p=1190</guid>
		<description><![CDATA[An important set of challenges for eScience initiatives and digital libraries concern the need to provide scientists with the ability to access data from multiple sources. This paper argues that an analysis of scientists' reuse of data prior to the advent of eScience can illuminate the requirements and design of digital libraries and cyberinfrastructure. ]]></description>
			<content:encoded><![CDATA[<p><a href="http://scientificdatasharing.com/wp-content/uploads/2011/02/zimmerman_ann.jpg"><img class="alignright size-full wp-image-1187" title="zimmerman_ann" src="http://scientificdatasharing.com/wp-content/uploads/2011/02/zimmerman_ann.jpg" alt="" width="140" height="140" /></a>Ann Zimmerman<br />
School of Information<br />
University of Michigan<br />
105 S. State Street<br />
3438 North Quad<br />
Ann Arbor, MI 48109-1285<br />
USA</p>
<p>email: asz@umich.edu</p>
<p>An important set of challenges for eScience initiatives and digital libraries concern the  need to provide scientists with the ability to access data from multiple sources. This paper argues  that an analysis of scientists&#8217; reuse of data prior to the advent of eScience can illuminate the  requirements and design of digital libraries and cyberinfrastructure. As part of a larger study on  data sharing and reuse, I investigated the processes by which ecologists locate data that were  initially collected by others. Ecological data are unusually complex and present daunting problems  of interpretation and analysis that must be considered in the design of cyberinfrastructure. The  ecologists that I interviewed found ways to overcome many of these difficulties. One part of my  results shows that ecologists use formal and informal knowledge that they have gained through  disciplinary training and through their own data-gathering experiences to help them overcome  hurdles related to finding, acquiring, and validating data collected by others. A second part of my  findings reveals that ecologists rely on formal notions of scientific practice that emphasize  objectivity to justify the methods they use to collect data for reuse. I discuss the implications of  these findings for digital libraries and eScience initiatives.</p>
<p>Keywords  Data reuse · Data sharing · Ecology</p>
<p><a href="http://scientificdatasharing.com/wp-content/uploads/2011/02/Zimmerman_Not-by-metadata-alone_2007.pdf" target="_blank">full article</a></p>
]]></content:encoded>
			<wfw:commentRss>http://scientificdatasharing.com/general/not-by-metadata-alone-the-use-of-diverse-forms-of-knowledge-to-locate-data-for-reuse/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>New Knowledge from Old Data:  The Role of Standards in the Sharing and Reuse of Ecological Data</title>
		<link>http://scientificdatasharing.com/general/new-knowledge-from-old-data-the-role-of-standards-in-the-sharing-and-reuse-of-ecological-data/</link>
		<comments>http://scientificdatasharing.com/general/new-knowledge-from-old-data-the-role-of-standards-in-the-sharing-and-reuse-of-ecological-data/#comments</comments>
		<pubDate>Sun, 13 Feb 2011 19:49:56 +0000</pubDate>
		<dc:creator>Editor</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://scientificdatasharing.com/?p=1176</guid>
		<description><![CDATA[In this paper, I analyze the experiences of ecologists who used data they did not collect themselves.  Specifically, I examine the processes by which ecologists understand and assess the quality of the data they reuse, and I investigate the role that standard methods of data collection play in these processes.]]></description>
			<content:encoded><![CDATA[<p><a href="http://scientificdatasharing.com/wp-content/uploads/2011/02/zimmerman_ann.jpg"><img class="alignright size-full wp-image-1187" title="zimmerman_ann" src="http://scientificdatasharing.com/wp-content/uploads/2011/02/zimmerman_ann.jpg" alt="" width="140" height="140" /></a><br />
Ann Zimmerman<br />
School of Information<br />
University of Michigan<br />
105 S. State Street<br />
3438 North Quad<br />
Ann Arbor, MI 48109-1285</p>
<p>email: asz@umich.edu</p>
<p>In this paper, I analyze the experiences of ecologists who used data they did not collect themselves.  Specifically, I examine the processes by which ecologists understand and assess the quality of the data they reuse, and I investigate the role that standard methods of data collection play in these processes.  Standardization is one means by which scientific knowledge is transported from local to public spheres.  While standards can be helpful, my results show that knowledge of the local context is critical to ecologists‟ reuse of data.  Yet, this information is often left behind as data move from the private to the public world.  The knowledge that ecologists acquire through fieldwork enables them to recover the local details that are so critical to their comprehension of data collected by others. Social processes also play a role in ecologists efforts to judge the quality of data they reuse.</p>
<p>Keywords: data sharing; data reuse; ecology; objectivity; standardization</p>
<p><a href="http://scientificdatasharing.com/wp-content/uploads/2011/02/Zimmerman_New-knowledge-from-old-data_2008.pdf" target="_blank">full article</a></p>
]]></content:encoded>
			<wfw:commentRss>http://scientificdatasharing.com/general/new-knowledge-from-old-data-the-role-of-standards-in-the-sharing-and-reuse-of-ecological-data/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Interview with Susanna-Assunta Sansone</title>
		<link>http://scientificdatasharing.com/general/interview-with-susanna-assunta-sansone/</link>
		<comments>http://scientificdatasharing.com/general/interview-with-susanna-assunta-sansone/#comments</comments>
		<pubDate>Fri, 04 Feb 2011 19:20:45 +0000</pubDate>
		<dc:creator>Editor</dc:creator>
				<category><![CDATA[General]]></category>
		<category><![CDATA[Interviews]]></category>

		<guid isPermaLink="false">http://scientificdatasharing.com/?p=1161</guid>
		<description><![CDATA[Susanna-Assunta Sansone is a Team Leader at the University of Oxford e-Research Centre, UK. There her work is focused on standards and software development to facilitate the data annotation, sharing and meta-analysis of biological, biomedical and environmental studies. She is the co-founder of MIBBI and the BioSharing initiatives.]]></description>
			<content:encoded><![CDATA[<p><strong> </strong></p>
<div id="_mcePaste">Susanna-Assunta Sansone is a team leader at the University of Oxford e-Research Centre. There her work is focused on ontology, standards and software development.</div>
<p>Before her work at Oxford, she worked as a coordinator of international collaborative projects at the European Bioinformatics Institute in Cambridge.</p>
<p>In addition, she is the co-founder of the Minimum Information for Biological and Biomedical Investigations (<a href="http://mibbi.org" target="_blank">MIBBI</a>) and the <a href="http://biosharing.org">BioSharing</a> initiatives. She received her PhD in Molecular Biology from Imperial College of Science, Technology and Medicine in London.</p>
<p><strong>Have you had specific experiences with data annotation and sharing, and if so, what is your experience?</strong></p>
<p>When I was a ‘wet bench experimentalist’, in my case data was of low volume shared mainly by email, or on a disk, as text, images or in some machine specific, proprietary format. With the rise of the high-through experiments in the genetics, genomics and functional genomics domains, I moved into bioinformatics and developed a significant experience in the area of standardization for the purpose of enabling data reporting and sharing. An increasing variety of ‘standard’ minimal information checklists, terminologies and exchange formats are being developed by the international grassroots community, such as the <a href="http://gensc.org/">Genomic Standards Consortium (GSC)</a>, to enable the description of biological, biomedical and environmental studies in an unambiguous manner. If annotated in a standard manner, these studies will be comprehensible and (in principle) can be reproduced &#8212; a principle supported by the rising number of data sharing policies developed by funding agencies and large consortia.</p>
<p>With my team and international collaborators, I contribute to the development of some of these standards and collaboratively we build software to empower researchers to uptake these community-defined standards.</p>
<p><strong>What type of data have your collaborators shared; what sort of workload and costs did the data annotation and sharing impose on them?</strong></p>
<p>I collaborate with a variety of communities, working in biological, biomedical and environmental domains. Their studies often run source material through several kinds of assays in parallel, such as genomic sequencing, protein-protein interaction assays, or the measurement of metabolite concentrations and fluxes. However, often these studies are only shared internal, or within a consortium or a set of close collaborators; in general, a subset of the studies is released in the public domain, mainly upon publication.</p>
<p>When these studies are shared, the main workload is the annotation, or reporting, phase. Data must be shared &#8212; accompanied by enough contextual information (i.e., metadata; sample characteristics, technology and measurement types, instrument parameters and sample-to-data relationships) to make the resulting data comprehensible and reusable, and standards should be used to harmonize the description. To accomplish this, however, takes time and expertise, something the researcher does not necessarily have or is not paid to do, in many cases.  Standards are just ‘a means to an end’, but we need to develop (easy to use) tools to educate and empower researchers to perform basic curation tasks, by enabling them to access the emerging portfolio of community-defined standards to annotate their data in a timely and effective manner.</p>
<p><strong>What problems/hurdles have you encountered personally in data annotation and sharing or what problems/hurdles have you observed generally in the scientific realm?</strong></p>
<p>In addition to ethical and security issues and the concern of having others exploiting the data, the barriers to sharing remain significant for three more reasons.  First, there is an increasing variety of standards and the evolving landscape is still quite unstable. Second, there is a lack of (easy to use) tools that enable researchers to access the emerging portfolio of standards. Lastly, there is the difficulty of utilizing shared data, and in turn this can only further discourage the will to share. Shared data is of little value if it is not sufficiently well annotated in a standard manner.</p>
<p><strong>How did you tackle those hurdles?</strong></p>
<p>With my team and collaborators we work to tackle both standards and tools-related hurdles, in parallel.</p>
<p>Dr. Dawn Field and I have founded BioSharing (<a href="http://biosharing.org/">http://biosharing.org</a>) to expedite the communication and the production of an integrated, standards-based framework for the capture and sharing of high-throughput genomics and functional genomic bioscience data in particular. This project stems from i) the initial work published in Science in collaboration with a range of representatives from US, UK and European funding agencies (<a href="http://www.ncbi.nlm.nih.gov/pubmed/19815759" target="_blank">Field, Sansone et al. 2009</a>) and ii) the MIBBI project (<a href="http://www.nature.com/nbt/journal/v26/n8/pdf/nbt.1411.pdf">Taylor, Field, Sansone, 2008</a>), we established with Chris Taylor, in 2006. BioSharing works at the global level to build stable linkages in particular between journals and funders implementing data sharing policies, and well-constituted standardization efforts in the biosciences domain. This objective is achieved via the creation of web-based catalogues of policies and standards (minimal information checklists, terminologies and exchange formats) and a communication forum. In this first phase we work on the prototypes of the catalogues that will be enriched and enhanced iteratively. As these become increasingly stable, we will move into the next phase to promote and coordinate interactions among what otherwise might be an increasing variety of non-interoperable standards. The BioSharing catalogues aim at:</p>
<ul>
<li>Providing a “one-stop shop” for those seeking data sharing policy documents and information about the standards and technologies that support them;</li>
<li>Exposing core information on well-constituted, community-driven standardization efforts and linking to their reporting standards;</li>
<li>Linking to exiting complementary portals, such as MIBBI (<a href="http://mibbi.org/">http://mibbi.org</a>), <a href="http://bioportal.bioontology.org/">BioPortal</a> but also open access resources, such as <a href="http://www.biomedcentral.com/">BMC Research Notes</a> and <a href="http://precedings.nature.com/">Nature Preceding</a>, with documents or publications on standards, but also standards-compliant systems and research data.</li>
</ul>
<p>With my team, we also work on the ISA software suite (<a href="http://isa-tools.org/">http://isa-tools.org</a>; <a href="http://bioinformatics.oxfordjournals.org/content/26/18/2354.full.pdf+html">Rocca et al, 2010</a>), an open source effort in collaboration with many international groups that work to serve researches to annotate and share their data. The tools are targeted to<span style="text-decoration: underline;"> </span>curators and experimentalists and:</p>
<ul>
<li>assist in the reporting and local management of experimental metadata (i.e. sample characteristics, technology and measurement types, sample-to-data relationships) from studies employing one or a combination of technologies;</li>
<li>empower users to uptake community-defined, minimum information checklists and terminologies, where required;</li>
<li>format studies for submission to a growing number of international public repositories.</li>
</ul>
<p><strong>Do you see a need for a national data sharing repository or smaller repositories for specialized arenas?</strong></p>
<p>In addition to the main institutes, such as NCBI (http://www.ncbi.nlm.nih.gov/), there are many groups that have strong expertise in a specific area of science and also are skilled at developing specialized systems. Our collaborators, for example, have successfully deployed the ISA software components to enable data reporting and sharing for stem cell data. The Harvard Stem Cell Discovery Engine (SCDE, <a href="http://discovery.hsci.harvard.edu/">http://discovery.hsci.harvard.edu</a>) brings together stem cell-based experimental systems and high-throughput data from the Harvard Stem Cell Institute and other researcher communities, including data from public repositories, in a common ‘standardized’ manner. Their re-annotation and harmonization work, using the community-defined standards served via ISA tools, is of pivotal importance to those researchers working with stem cells, in particular, but also to the scientific community at large, working on the meta-analysis of related datasets.</p>
<p><strong>Do you see value in a centralized repository of data?</strong></p>
<p>The whole argument of centralized vs. federated databases has been discussed at length; I believe a central system cannot cater for everybody’s needs, and there is expertise is the community that should also be leveraged. So often the best solution is in a mixed approach. Obtaining rolling funds to maintain each database is, of course, the main issue, and the other is the adoption of widely accepted common standards. If the latter issue was solved, then it would be easy to move information from one system to another.</p>
<p>The agency’s proposed function in this specific case can be two-fold: support &#8211; and progressively enforce &#8211; the use of these community-defined standards in the data management and in grant applications, and ensure applicants evaluate the reuse of open source tools prior to developing a new system. However, only a few agencies actively monitor adherence to the proposed plans and even in these cases, the execution of such plans is rarely scored. Unfortunately, often it is a pre-requisite of a grant proposal to develop something, and it is often easier for a developer to create something de novo to have the full control of what can be done. The result is today’s problem: an unnecessary duplication of efforts in many cases. We have to deal with a variety of (arbitrarily) different and incompatible standards, even in the same domain, which limit the development of interoperable tools to enable data sharing.</p>
<p>From a technical perspective it will be necessary to both remove redundancies and fill gaps between standards. These are difficult but not insurmountable tasks. By contrast, the sociological barriers involved in these kinds of large-scale collaborations can be far more challenging, and extensive liaison is necessary between communities. Managing this process of consensus-building from start to finish takes time, resources, and expertise. The time invested in these efforts to build commonalities and synergies among projects is often very little due to lack of resources. The massively collaborative nature of this undertaking requires frequent face-to-face workshops to create the necessary conditions for the building of consensus.</p>
<p>Utilizing public-funded data is a right, but sharing data produced through public funding should be a duty. Scientists will need a combination of incentives and enforcements, or ‘carrot and stick’ as it often is said, but also a lot of help from those like me and my collaborators, working in the service area of science. Data curation, development and harmonization of standards must be recognized as indispensable means to data sharing and therefore properly funded.</p>
]]></content:encoded>
			<wfw:commentRss>http://scientificdatasharing.com/general/interview-with-susanna-assunta-sansone/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Interview with Stuart Shieber</title>
		<link>http://scientificdatasharing.com/general/interview-with-stuart-shieber/</link>
		<comments>http://scientificdatasharing.com/general/interview-with-stuart-shieber/#comments</comments>
		<pubDate>Mon, 31 Jan 2011 20:43:53 +0000</pubDate>
		<dc:creator>Editor</dc:creator>
				<category><![CDATA[Featured]]></category>
		<category><![CDATA[General]]></category>
		<category><![CDATA[Interviews]]></category>

		<guid isPermaLink="false">http://scientificdatasharing.com/?p=1144</guid>
		<description><![CDATA[Professor  Stuart  Shieber directs the Office for Scholarly Communication at Harvard University. He is also a professor of computer science in Harvard's School of Engineering and Applied Sciences. As a strong open-access advocate, he has led a multi-year effort to shape Harvard's policies in this arena.]]></description>
			<content:encoded><![CDATA[<p><a href="http://scientificdatasharing.com/wp-content/uploads/2011/01/shieber.jpg"><img class="alignright size-full wp-image-1145" title="shieber" src="http://scientificdatasharing.com/wp-content/uploads/2011/01/shieber.jpg" alt="" width="225" height="225" /></a>Editor&#8217;s Note: Though the article access issues discussed by Harvard Professor Shieber are not related directly to the sharing of raw medical data, the sort of campaign he describes has implications for the issue of data sharing in all realms of science.</p>
<p><strong>What was the motivation for the Harvard policy you won unanimous adoption of in February 2008?</strong></p>
<p>A lot of work we have been doing has to do with sharing the write-ups of research once a piece of research is in a position to be synthesized into a written document.</p>
<p>Typically, that information has been distributed through articles in scholarly journals. The vast majority of fields use this avenue of publication and have done so for decades. But there is a systemic market failure in the scholarly publishing market. I think it’s well understood why that failure occurred. The standard business model for journals is that a publisher receives exclusive rights, typically in the form of transfer of copyright. They can then monetize those rights by selling access to subscribers. That is how it has worked for decades now. The symptoms of market failure include inelasticity of demand, and a persistent hyperinflation of those rates. Another symptom of market failure is huge price disparities: The subscription price per page for commercial journals is about six times the price per page of non-profit journals.</p>
<p>Looking at cost per citation of an article shows the difference even more starkly. The commercial journal cost per citation is 16 times higher than the non-profits. Such price disparities can only persist in a dysfunctional market.</p>
<p>There are two main causes of the underlying market failure: First, the goods that are being sold involve access, and access is a monopolistic good, so the publishers are able to extract monopoly rents. The second reason is what economists call a &#8216;moral hazard&#8217;: a situation in which a consumer is insulated from paying the true cost of a good they are consuming.</p>
<p><strong>Why is there this moral hazard?</strong></p>
<p>The consumers of the goods in this case are researchers accessing the articles. It is the libraries of academic institutions of the world that are paying the subscriptions, not the researchers. The consumers – the researchers – are completely oblivious of the costs being charged.</p>
<p>Subscription prices of journals have been rising at several times the rate of inflation for decades. When you have price increases like this, prices are growing exponentially. But libraries&#8217; budgets aren’t growing commensurately. In fact, in the last few years, the economic downturn has had a terrible effect on institutional endowments, so library budgets are hurting and libraries are substantially decreasing their purchase of journals that researchers want to read.</p>
<p>Something has to give. The first thing that goes may be collecting of monographs and print versions of journals. Eventually libraries have to cut journal subscriptions, which means the researchers at those institutions can&#8217;t read the articles.</p>
<p><strong>So what did you do about it?</strong></p>
<p>We worked behind the scenes for several years, and in February 2008 we won unanimous support for an open access policy from the Faculty of Arts and Sciences at Harvard. Five other faculties within Harvard have followed suit, as have some other universities, including MIT and Duke.</p>
<p>The point of the Harvard policy was exactly to address the symptom of this problem — the symptom of reduced access that occurs when academic libraries cancel their subscriptions to journals that researchers want access to through their institutions. The whole point of writing articles in academia is so others can read them.  Researchers were writing articles at Harvard and other institutions and fewer and fewer people could read them as academic libraries canceled their subscriptions.</p>
<p><strong>What does the policy state?</strong></p>
<p>The policy has three parts. First, the faculty member grants permission to Harvard University to distribute the articles for free. Second, we wanted to maintain free choice of the faculty member as to whether or not rights are retained in this way: If for any reason the faculty member doesn&#8217;t want the university to have this license, the faculty member can direct that a waiver be granted for that article so the university does not have permission to distribute a given article for free. There is no vetting of the reason — it is completely up to the faculty member to make this decision. These two parts of the policy mean that by default the university is able to distribute the article unless the faculty member opts out and expressly says he or she doesn&#8217;t want the university to have this license.</p>
<p>Before the policy was adopted, if the faculty did nothing, they would have no rights to distribute the article beyond what the publisher permitted (unless the faculty member negotiated a special arrangement with the publisher). I did negotiate with publishers often, but very few other faculty negotiated such arrangements. Under this 2008 policy, the default is the other way around: By default the university is retaining the rights to distribute an article for publications unless a faculty member says not to do so. We have moved from an opt-in to an opt-out system.</p>
<p>Experience has shown in many areas that when there is a move from opt-in to opt-out, participation increases dramatically. We found that here too. Before, the amount of opting in was approximately zero; now it turns out that the opt-out option is exercised very rarely. In Harvard&#8217;s Faculty of Arts and Sciences, it looks like there is about a five percent waiver rate so for 95 percent of articles Harvard is able to distribute the article wherever it chooses. At MIT the whole university at once adopted this policy, and there I hear the waiver rate is about 1.5 percent.</p>
<p>These rights that are retained are transferable, which means they can be and are transferred back to the author. So authors are retaining rights — they can distribute their articles, assign them as reading in classes, and so forth as long as they don’t sell the articles — all without having to ask permission of the publisher.</p>
<p>The third part of the policy is that faculty commit to providing copies of their articles to a repository called DASH — Digital Access to Scholarship at Harvard. From this repository the articles are distributed, and we have in general broad rights to distribute them to anyone in the world on the web. For articles that don’t fall under the open-access policy (because there is a waiver or they predate the policy) we abide by the distribution rights provided by the publisher. If the publisher requires an embargo of six months or a year, say, we put them in the repository and when the embargo period runs out, the system automatically allows the article to be shared.</p>
<p><strong>What does the DASH repository hold now?</strong></p>
<p>There are more than 4,000 articles in the DASH repository today. We are distributing them to anyone who wants to read them. All are authored by members of the Harvard community. This repository is an effort to supplement the access that some people get through journal subscriptions, and many researchers do not.</p>
<p><strong>Does the repository replace journals?</strong></p>
<p>No, the repository provides  supplemental access. It doesn&#8217;t replace journals and isn’t intended to. We still need journals. They provide important services: management of the peer review process, production services, imprimatur.</p>
<p>The open access policy has a positive effect for our authors whether other universities and colleges establish similar policies or not. More people can read articles Harvard authors are writing.</p>
<p><strong>What is the point of these open access policies?</strong></p>
<p>The open access policies are intended to address the symptom of the underlying problem, the reduction in access to scholarly articles. But it doesn’t address the underlying problem itself, which is the market failure in the prevailing business model for journals.</p>
<p>Trying to address this market failure is a much harder nut to crack, because now we are looking at the business models publishers use to underwrite their journals. We need to find some alternative business model that doesn&#8217;t have this kind of market failure. We need publishers to move to this new business model – but we recognize that it may not be possible to make this shift happen.</p>
<p><strong>What can be done to promote this shift in business model?</strong></p>
<p>We are trying to make viable a business model for open-access journals. Open-access journals are like traditional journals except they don&#8217;t charge for online access to the articles they publish.</p>
<p>Of course, it costs money to run a journal and if you don’t charge for online access, how do you get revenue? One open-access model is to charge a publication fee – on the writer side of the equation – and not on the reader side. The universities&#8217; libraries pay on behalf of the readers. For a publication fee model, you would have universities and funding agencies paying on behalf of the authors. If you don’t have that, there is a big problem regarding the financial sustainability of open access journals: They are at a disadvantage when trying to compete with traditional journals.</p>
<p>If we are going to place this open access model on a level playing field with subscription journals, we have to underwrite publication fees for open access journals just as we now underwrite subscription fees for subscription journals. We established the Compact for Open-Access Publishing Equity (COPE) to codify this kind of commitment to support open-access journals charging publication fees. The compact signatory universities commit to timely establishment of durable mechanisms for underwriting reasonable publication fees for open access journals. (Check out the website at <a href="..:..:Local%20Settings:Temp:11-30%20Stuart%20Shieber%20Transcript_ss.doc">oacompact.org</a>.)</p>
<p>The idea of the compact is not that each university gets the benefit by doing it for itself. This only works if more or less everyone does it. You need lots of universities and funding agencies to pay their fair share. If every institution does its part, it may free up publishers to entertain this new revenue model. If few universities adopt the model, it won’t provide much impetus for publishers, but on the other hand, it won’t cost much either.</p>
<p><strong>What are your thoughts on data sharing generally?</strong></p>
<p>Data sharing is especially important because without access to the raw data, you can&#8217;t do the kind of verification and replication that you’d want to do. It&#8217;s a huge boon to having open data.</p>
<p>The thing that makes data sharing difficult as opposed to sharing articles as I have been talking about is that for the articles there is no incentive on the part of researchers to limit access. In general, researchers want the widest possible access to their articles. But for data there are counterincentives: The researchers may want to exploit the data they’ve painstakingly collected and it’s hard to tell people to make it public immediately and put them in competition with everyone else who wants to mine that data.</p>
<p>It is hard to make that decision for someone else and I think a lot of pressure against open data is trying to deal with this problem – how do you get data as open as possible while being cognizant of the desires of the researchers? A funding agency can say “if you take our money, you publish data according to our policy. If you don’t like that policy, you don’t have to take our money.” So funding agencies have a huge role with open data – much more than universities. It feels to me that funding agencies need to take the lead on this issue. The NIH has done this, saying, you have to put your articles in PubMed within a year if you accept our money. The same could be done by journals.</p>
<p><strong> Stuart Shieber Bio:</strong></p>
<p>Professor  Stuart  Shieber directs the Office for Scholarly Communication at Harvard University &#8212; a position he has held since the office was created in 2008. He is also a professor of computer science in Harvard&#8217;s School of Engineering and Applied Sciences. As a strong open-access advocate, he has led a multi-year effort to shape Harvard&#8217;s policies in this arena. The Harvard open-access policy, adopted unanimously by Harvard’s Faculty of Arts and Sciences in February 2008, states that &#8220;each faculty member grants to the President and Fellows of Harvard College permission to make available his or her scholarly articles and to exercise the copyright in those articles.&#8221; Since then, faculties from five other schools within Harvard have adopted open-access policies modeled on Shieber’s original policy, and other universities have done so as well.</p>
<p>Shieber received his B.A. in applied mathematics from Harvard in 1981 and his Ph.D. in 1989 in computer science from Stanford University. The focus of his study is computational linguistics &#8212; the study of human languages from the perspective of computer science.<br />
Working in many academic realms &#8212; from linguistics to computer systems and theoretical computer science &#8212; Shieber studies the way that natural languages are formed to allow for efficient communication. He also studies problems presented by automated graphic design and is working on development of a more graphically articulate computer.</p>
<p>In 1989 Shieber joined the Harvard Faculty of Arts and Sciences. He also is the Center for Research on Computation and Society’s founding director and is a faculty co-director of the Berkman Center for Internet and Society at Harvard University.</p>
]]></content:encoded>
			<wfw:commentRss>http://scientificdatasharing.com/general/interview-with-stuart-shieber/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Dryad: an international repository of data</title>
		<link>http://scientificdatasharing.com/biology/dryad-an-international-repository-of-data/</link>
		<comments>http://scientificdatasharing.com/biology/dryad-an-international-repository-of-data/#comments</comments>
		<pubDate>Sun, 30 Jan 2011 19:37:37 +0000</pubDate>
		<dc:creator>Editor</dc:creator>
				<category><![CDATA[Biology]]></category>
		<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://scientificdatasharing.com/?p=1127</guid>
		<description><![CDATA[DRYAD is an international repository of data underlying peer-reviewed articles in the basic and applied biosciences. DRYAD is governed by a consortium of journals that collaboratively promote data archiving and ensure the sustainability of the repository.]]></description>
			<content:encoded><![CDATA[<div id="file_news_div_news">
<p>DRYAD is  an international 		repository of data underlying peer-reviewed articles 		in the basic and applied biosciences. DRYAD enables 		scientists to validate published findings, explore new 		analysis methodologies, repurpose data for research questions 		unanticipated by the original authors, and perform 		synthetic studies. DRYAD is governed by a <a href="http://datadryad.org/partners">consortium  of 		journals</a> that collaboratively promote data archiving and ensure 		the sustainability of the repository.</p>
</div>
<div id="org_datadryad_dspace_statistics_SiteOverview_div_front-page-stats">
<p>As of Jan 30, 2011, Dryad contains 440 data 			packages and 1093 data files,  published in 57 journals.</p>
<p><a href="http://datadryad.org/" target="_blank">DRYAD Home Page</a></p>
<p><a href="http://datadryad.org/partners" target="_blank">DRYAD Partners list</a></p>
<p><a href="http://www.youtube.com/user/PeggySchae#p/a/u/0/RP33cl8tL28" target="_blank">YouTube Video: How to Deposit Data in DRYAD</a></p>
</div>
]]></content:encoded>
			<wfw:commentRss>http://scientificdatasharing.com/biology/dryad-an-international-repository-of-data/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Beyond the Data Deluge: A Research Agenda for Large-Scale  Data Sharing and Reuse</title>
		<link>http://scientificdatasharing.com/general/beyond-the-data-deluge-a-research-agenda-for-large-scale-data-sharing-and-reuse/</link>
		<comments>http://scientificdatasharing.com/general/beyond-the-data-deluge-a-research-agenda-for-large-scale-data-sharing-and-reuse/#comments</comments>
		<pubDate>Thu, 13 Jan 2011 18:59:46 +0000</pubDate>
		<dc:creator>Editor</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://scientificdatasharing.com/?p=1113</guid>
		<description><![CDATA[The purpose of this paper is to develop a research agenda for scientific data sharing and reuse that considers these three areas: broader participation in data sharing and reuse, increases in the number and types of intermediaries, and more digital data products.
by Ixchel M. Faniel and Ann Zimmerman]]></description>
			<content:encoded><![CDATA[<p><a href="http://scientificdatasharing.com/wp-content/uploads/2011/01/faniel_ixchel.jpg"><img class="alignright size-medium wp-image-1121" title="faniel_ixchel" src="http://scientificdatasharing.com/wp-content/uploads/2011/01/faniel_ixchel-223x300.jpg" alt="" width="223" height="300" /></a>Ixchel M. Faniel and Ann Zimmerman,<br />
School of Information,<br />
University of Michigan</p>
<p>December 2010</p>
<p><strong>Abstract</strong><br />
There is almost universal agreement that scientific data should be shared for use beyond the purposes for which they were initially collected. Access to data enables system-level science, expands the instruments and products of research to new communities, and advances solutions to complex human problems. While demands for data are not new, the vision of open access to data is increasingly ambitious. The aim is to make data accessible and usable to anyone, anytime, anywhere, and for any purpose. Until recently, scholarly investigations related to data sharing and reuse were sparse. They have become more common as technology and instrumentation have advanced, policies that mandate sharing have been implemented, and research has become more interdisciplinary.  Each of these factors has contributed to what is commonly referred to as the &#8220;data deluge.&#8221; Most discussions about increases in the scale of sharing and reuse have focused on growing amounts of data.  There are other issues related to open access to data that also concern scale that have not been as widely discussed: broader participation in data sharing and reuse, increases in the number and types of intermediaries, and more digital data products. The purpose of this paper is to develop a research agenda for scientific data sharing and reuse that considers these three areas.</p>
<p><a href="http://scientificdatasharing.com/wp-content/uploads/2011/01/Faniel_Zimmerman.pdf">pdf of full paper</a></p>
]]></content:encoded>
			<wfw:commentRss>http://scientificdatasharing.com/general/beyond-the-data-deluge-a-research-agenda-for-large-scale-data-sharing-and-reuse/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Ontologies: Scientific Data Sharing Made Easy</title>
		<link>http://scientificdatasharing.com/general/ontologies-scientific-data-sharing-made-easy/</link>
		<comments>http://scientificdatasharing.com/general/ontologies-scientific-data-sharing-made-easy/#comments</comments>
		<pubDate>Fri, 07 Jan 2011 00:08:10 +0000</pubDate>
		<dc:creator>Editor</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://scientificdatasharing.com/?p=1101</guid>
		<description><![CDATA[An ontology is a logic-based organizational structure for knowledge. Ontologies speed genetic discovery by allowing researchers to quickly find and compare data from multiple sources.
by Nicole Washington and Suzanna Lewis]]></description>
			<content:encoded><![CDATA[<div>
<div><a href="http://scientificdatasharing.com/wp-content/uploads/2011/01/nicole.jpg"><img class="alignright size-full wp-image-1106" title="nicole" src="http://scientificdatasharing.com/wp-content/uploads/2011/01/nicole.jpg" alt="" width="144" height="128" /></a>By: Nicole Washington &amp; Suzanna  Lewis © 2008 Nature Education</div>
</div>
<div>Citation: Washington, N. &amp; Lewis, S. (2008) Ontologies: Scientific Data  Sharing Made Easy. Nature  Education 1(3)</div>
<p>An ontology is a logic-based organizational structure  for knowledge. Ontologies speed genetic discovery by allowing  researchers to quickly find and compare data from multiple sources.</p>
<div>
<div>
<div>
<p><a href="http://scientificdatasharing.com/wp-content/uploads/2011/01/suzanna.png"><img class="alignleft size-full wp-image-1109" title="suzanna" src="http://scientificdatasharing.com/wp-content/uploads/2011/01/suzanna.png" alt="" width="113" height="111" /></a>Imagine that you are investigating the genes involved in bud development. Where would you start? An online scientific  library, such as <a title="PubMed" rel="nofollow" href="http://www.ncbi.nlm.nih.gov/pubmed/" target="_blank">PubMed</a>? Wikipedia? Google? There is a vast  amount of biological data available online—journal articles and books,  information on protein  structures, <a title="genotype-phenotype associations" rel="nofollow" href="http://www.ncbi.nlm.nih.gov/sites/entrez?db=OMIM" target="_blank">genotype-phenotype  associations</a>, genome  maps, drug  efficacy studies, and much more—so the problem is not a lack of data.  Rather, the issue is that there is so much data that sifting through it  all to find relevant information can be a complicated and lengthy  process. For instance, with your &#8220;bud development&#8221;  query, you might retrieve undesired results such as &#8220;rose bud development,&#8221; &#8220;yeast bud  development,&#8221;  or even generic descriptions of &#8220;genes,&#8221; when what you really want is  &#8220;genes involved in limb bud development  in animals.&#8221; You might also miss relevant results because the same  process might be referred to as &#8220;bud formation&#8221; or &#8220;limb morphogenesis&#8221;  in some sources. As you can imagine, filtering relevant results is  time-consuming for humans and nearly impossible for computers, which  slows the pace of scientific discovery.</p>
<p><a href="http://www.nature.com/scitable/topicpage/ontologies-scientific-data-sharing-made-easy-77972" target="_blank">see full article</a></p>
</div>
</div>
</div>
]]></content:encoded>
			<wfw:commentRss>http://scientificdatasharing.com/general/ontologies-scientific-data-sharing-made-easy/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Science Policy Forum: An International Framework to Promote Access to Data</title>
		<link>http://scientificdatasharing.com/general/science-policy-forum-an-international-framework-to-promote-access-to-data/</link>
		<comments>http://scientificdatasharing.com/general/science-policy-forum-an-international-framework-to-promote-access-to-data/#comments</comments>
		<pubDate>Wed, 29 Dec 2010 00:51:04 +0000</pubDate>
		<dc:creator>Editor</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://scientificdatasharing.com/?p=1074</guid>
		<description><![CDATA[Open access to publicly funded data provides greater returns from the public investment in research, generates wealth through downstream commercialization of outputs, and provides decision-makers with facts needed to address problems. This article summarizes key findings of an international group that studied these issues on behalf of the Organisation for Economic Cooperation and Development.]]></description>
			<content:encoded><![CDATA[<p>Peter Arzberger* 1, Peter Schroeder 2, Anne Beaulieu 3, Geof Bowker 1, Kathleen Casey 1, Leif Laaksonen 4, David Moorman 5, Paul Uhlir 6 and Paul Wouters 3</p>
<p>- Author Affiliations</p>
<p>1. University of California, San Diego, La Jolla, CA 92093, USA.<br />
2. Ministry of Education, Culture and Science, Zoetermeer, Netherlands.<br />
3. Networked Research and Digital Information, Royal Netherlands Academy of Arts and Sciences, Amsterdam, Netherlands.<br />
4. CSC-Scientific Computing Ltd., Espoo, Finland.<br />
5. Social Sciences and Humanities Research Council, Ottawa, Canada.<br />
6. National Research Council, Washington, DC 20418, USA.</p>
<p>*To whom correspondence should be addressed. E-mail: parzberg@ucsd.edu</p>
<p>Recent national and multinational investments (1) in networking and continued gains in information technological capability (2) have given rise to a complex cyberinfrastructure that is rapidly increasing our ability to produce, manage, and use data (3). As research becomes increasingly global (4), data-intensive, and multifaceted (5, 6), it is imperative to address national and international data access and sharing issues systematically in a policy arena that transcends national jurisdictions. Open access to publicly funded data provides greater returns from the public investment in research, generates wealth through downstream commercialization of outputs, and provides decision-makers with facts needed to address complex, often transnational, problems. This article summarizes key findings of an international group that studied these issues on behalf of the Organisation for Economic Cooperation and Development (OECD) (7), which resulted in a ministerial-level declaration (8).</p>
<p>Go to <a href="http://www.sciencemag.org/content/303/5665/1777.full?ijkey=sgWI1mlejCudY&amp;keytype=ref&amp;siteid=sci#aff-2" target="_blank">full article</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://scientificdatasharing.com/general/science-policy-forum-an-international-framework-to-promote-access-to-data/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Interview with John Towns</title>
		<link>http://scientificdatasharing.com/interviews/interview-with-john-towns/</link>
		<comments>http://scientificdatasharing.com/interviews/interview-with-john-towns/#comments</comments>
		<pubDate>Wed, 15 Dec 2010 20:29:12 +0000</pubDate>
		<dc:creator>Editor</dc:creator>
				<category><![CDATA[Featured]]></category>
		<category><![CDATA[Interviews]]></category>

		<guid isPermaLink="false">http://scientificdatasharing.com/?p=1046</guid>
		<description><![CDATA[We will continue to suffer from the simple inability to communicate our data if we do not both find means by which we can make legacy data accessible and move communities toward representations that are accessible to others.]]></description>
			<content:encoded><![CDATA[<p><strong>What have your experiences been in the data sharing world and what do they convey to you about that arena&#8217;s benefits and problems? Did your experiences prompt you to reach larger conclusions about how data sharing should be organized in the macro picture?</strong><a href="http://scientificdatasharing.com/wp-content/uploads/2010/12/jtowns.jpg"><img class="alignright size-full wp-image-1060" title="jtowns" src="http://scientificdatasharing.com/wp-content/uploads/2010/12/jtowns.jpg" alt="" width="100" height="150" /></a></p>
<p>Data sharing has come up in a variety of contexts in my experiences.  These have fallen into the following general categories:</p>
<p>(a) Data collections: These are data sets, often represented as a database, made available to a disciplinary community or sub-community.  Primarily these have been focused on collections used in the context of computational science in some way, but spanning many disciplines (e.g. protein structure databases, social sciences statistics, instrument and observational data).</p>
<p>(b) Sharing research results:  In the world I most often work, there are copious amounts of simulation data produced that many researchers would like to share and they have not done so very effectively.  Some teams are attempting to adapt data sharing services for other (primarily observational) data relevant to their community to share their simulation results.  An example is the <a href="http://www.iris.edu/hq/">IRIS</a> which has seismological observational data and folks from the Southern California Earthquake Center (SCEC) would like their simulation results incorporated.  These efforts are non-trivial and meet with many hurdles.</p>
<p>(c) Projects producing data products:  I think of this set slightly differently than described in (a).  These are projects like the LSST, DES and others with a stated goal of collecting observational data and generating data products to be used by researchers. These are long- term projects with significant funding to support what they are doing.  Unfortunately, these activities occur largely in isolation of one another from the perspective of how data is managed and shared with their respective communities</p>
<p>(d) Maintaining research results:  This is the set of data that must be preserved either because it is a requirement of publications of research work that the supporting data be maintained for some period of time, or due to a requirement of the funding agency supporting the research that any research data be maintained and perhaps shared.  The impending NSF requirements on what they call a “data management plan” for all proposals is of particular concern for many of the researchers I work with (given that they are primarily funded via NSF).</p>
<p>My observations through these experiences is that there has not been anything close to a holistic approach to the broad set of needs for the research community with respect to data, data management, data preservation, curation, provenance, etc.  My view is that the majority of these have similar underlying needs with a need for more specialized services layered on top of primitives in order to support the range of capabilities required.  There is interesting work at various levels of this stack (e.g. NSF&#8217;s DataNet projects), but there is no larger view and no national strategy that addresses a full picture from base data hardware infrastructure, through low- level data services and support for developing higher-level, specialized services targeted to particular communities.</p>
<p><strong>In your data sharing work to date and in the work of your colleagues, can you point to specific breakthroughs that occurred because raw data was freely and readily shared among researchers? Be as specific as possible in this answer?</strong></p>
<p>Here are some examples of impact by making data more accessible:</p>
<p><em><a href="http://www.ncsa.illinois.edu/News/Stories/INDICATOR/">http://www.ncsa.illinois.edu/News/Stories/INDICATOR/</a></em></p>
<p><em><a href="http://www.ncsa.illinois.edu/News/Stories/18thConnect/">http://www.ncsa.illinois.edu/News/Stories/18thConnect/</a></em></p>
<p><em><a href="http://www.ncsa.illinois.edu/News/Video/2010/psp10_minsker.html">http://www.ncsa.illinois.edu/News/Video/2010/psp10_minsker.html</a></em></p>
<p><em><a href="http://www.ncsa.illinois.edu/News/Stories/big_data/">http://www.ncsa.illinois.edu/News/Stories/big_data/</a></em></p>
<p><em><a href="http://www.cct.lsu.edu/site.php?pageID=63&amp;newsID=1009">http://www.cct.lsu.edu/site.php?pageID=63&amp;newsID=1009</a></em></p>
<p><strong>How did you tackle any hurdles you encountered that made data sharing more difficult to accomplish?</strong></p>
<p>At NCSA and within the TeraGrid project, there are many examples of jumping such hurdles.  The bit that concerns me is that only in such larger infrastructure and support projects has there been much hope that solutions found might be shared and applied in different contexts.  So my concern is not so much how a solution was found—there are lots of smart people out there who can find innovative solutions to problems.  My concern is the leveraging of the solutions more broadly.</p>
<p><strong>What is the basis for the widespread resistance to data sharing among researchers, and is any of their criticism of data sharing based on valid concerns?</strong></p>
<p>The concerns expressed by researchers that I am aware of include:</p>
<ol>
<li>Desire not to lose competitive advantage in publishing research based on the data: Many researchers have expressed that they wish to restrict access to data until they have published their research work first.  In most cases this is valid and reasonable.</li>
<li>Research data is a competitive advantage in research:  Some researchers view the data they have collected as proprietary and wish to have an ongoing restriction on access in anticipation of possible future publications.  While one can have some sympathy to this, data that languishes unused to further research is a waste of the resources originally expended to collect this data.</li>
</ol>
<p><strong>Do you see a need for a national data sharing repository in the medical arena or smaller discipline-based repositories for specialized arenas?</strong></p>
<p>Some communities have created or have begun to create data repositories and, even though some are much more usable than others, in general they are a Good Thing.  It is difficult for me to speak too specifically regarding the medical arena since that term encompasses a rather broad set of possible data.  I firmly believe there are some specific areas within the medical arena that could be highly beneficial to furthering various medical sciences, diagnosis and treatment.  It is not clear to me that we have sufficiently defined the services and developed the technologies to support a broad range of data types and use modalities.  This is an area that should be actively supported as we develop more specific data resources.</p>
<p><strong>What agency should run it/them and why? If you advocate a system of smaller repositories organized around specific disciplines, should there be an overview agency that coordinates/supervises interactions among databases? If so, describe the agency’s proposed function.</strong></p>
<p>This is an interesting question, but perhaps is too narrow in my mind.  I see a range of repositories that should be developed.  Within this range there will be subsets that are distributed/federated.  This will likely be driven along disciplinary lines but also strongly influenced by interoperability (or lack thereof) amongst various resources.  There is a strong “organic” nature to the bringing together of existing data that must be recognized and addressed.</p>
<p>As such, various agencies must be involved in pushing this forward.  If the focus of this discussion is limited to the medical arena, clearly the NIH should be a major player here.  I tend to advocate the establishment of community bodies that are supported by agencies to facilitate the coordination/interactions amongst data resource representatives.  Each body needs to have a defined scope which will indicate whom should participate.</p>
<p>In any case, a number of efforts should be supported for several reasons.</p>
<ol>
<li>A variety of more focused areas could benefit immediately from the accessibility of data.  These should be moved forward and we should reap the significant direct benefits of these efforts.</li>
<li>Development of capabilities in a variety of areas engages a broader set of smart people to develop good solutions to problems in making data more easily shared.  While there will be some repetition of effort, there needs to be developed a much larger cadre of individuals knowledgeable and expert in this field.</li>
<li>There is a real need for a much better understanding of the broad set of needs of those interested in using these various types of data.  On-the-ground efforts will help to illuminate this rather dark space.  Those needs then can be used by this larger cadre of data professionals to understand the solutions necessary and—of critical importance—to begin to define the standards necessary in order to construct the solution stacks necessary to implement those capabilities.</li>
</ol>
<p>This then indicates the need for a capability of the broader community to develop and establish standards.  There are some standards bodies that can be leveraged, but it will require having a sufficient community of folks to drive this process.</p>
<p><strong>Some have advocated an international agency that would serve as a clearing house – knowledgeable about the work being done by data sharing projects worldwide. Do you see any value in such an agency, and if so, how hard would it be to win the international cooperation needed to create it?</strong></p>
<p>It would seem this is quite dependent on the purpose and scope of such a thing.  In the academic world, international boundaries are less relevant than elsewhere.  My experience is that communities are very good at developing international organizations to support such things.  In the standards space, we can see this a lot as well as in other areas.  These, again, are something that agencies in the US and other countries need to support to make happen until such time as they can become self-supporting if they can. I have not looked recently, but I suspect there are efforts already afoot in this space, but they are not organized and require some leadership to marshal them.</p>
<p><strong>What are your suggestions on the most powerful ways to combat researchers’ resistance to data sharing? How effective do you see the requirements now in place by the NIH and high-profile journals requiring data sharing as condition of funding in some instances? Do you believe more teeth should be put into these requirements and, if so, how?</strong></p>
<p>NIH and other agencies have been moving in this direction and they are certainly effective means to induce data sharing.  New requirements from NIH and NSF are Good Things, but enforcement is where it really matters.  Historically, NSF has been effective at enforcing requirements when it chooses to do so.  I do not have enough experience with NIH to know how effective they are, but those cases I do know seem to indicate there is compliance.  Unfortunately, these are all “stick” methods and it would be good to develop “carrot” methods to encourage data sharing.  On the other hand, there is at least a significant subset of researchers that see the derived/indirect benefits to sharing their data and are thus inclined to do so.  In fiercely competitive fields, this is more difficult, however.</p>
<p><strong>Other incentives to enhance scientists’ acceptance of data sharing?</strong></p>
<p>As mentioned, I would like to see incentives for sharing as opposed to punishments for not sharing data. Sorry&#8230; no brilliant ideas here.</p>
<p><strong>If you could look down the road to a point maybe 20 years from now, how will the field of data sharing look? Will scientists be more willing to share their raw data quickly &#8212; as the Genome Project did, for example?</strong></p>
<p>I do believe that there will be a general trend toward the sharing of data.  Unfortunately, this will be working against the current of increasing value of data.  In the end, owners of data typically will need to find a reason to share data in order for them to take the action to do so.  Even those that are not opposed often do not share data since it requires them to do something in order to make the data accessible.  That something can be quite onerous in some cases.  People rarely do things that require effort without having a reason for doing it.</p>
<p>In some communities there is a general recognition tht they all can benefit from the sharing of their data (e.g. the seismologists and the IRIS) – there is a sense that it “raises all boats” if you will.  For others, they are established senior researchers who share their data as a philanthropic act for their community, knowing that they themselves would never be able to extract the research results that others could. It has been my observation that there are typically such trends in most communities, but some take longer than others to develop.  In 20 years, we will still be discussing the need to share data, but at that point much more data will be shared.  The hard part is that there will be many newer sources of data and communities that are less mature with respect to this issue.</p>
<p><strong>Are there specific laws and/or technological realities/hurdles that will continue to hamper or stop rapid sharing of raw data until they are addressed?</strong></p>
<p>I think there are certainly the obvious legal limiters such as HIPPA and others of that ilk.  If we are to share data affected by those regulations, we must either find a way around them or have those laws changed.  From a technological perspective, the further development of standards is the key to addressing limiters.  We will continue to suffer from the simple inability to communicate our data if we do not both find means by which we can make legacy data accessible and move communities toward representations that are accessible to others.  We should learn from the experiences  of communities that have been collecting data for long periods of time without standards to support sharing of the data (e.g. ecologists) and not repeat it.</p>
<p><em>John Towns bio</em> – At the University of Illinois John Towns is director of the Persistent Infrastructure Directorate at the National Center for Supercomputing Applications (NCSA). In addition, he is Chair of the TeraGrid Forum—the TeraGrid project’s leadership body. He also serves as principal investigator on the NCSA Resource Provider/HPCOPS award for the TeraGrid project and principal investigator for the eXtreme Digital (XD) Technology Insertion Service. He comes from a background in computational astrophysics with a focus on application performance analysis. In his position at NCSA, he works to provide support to a wide range of projects across a range of science and engineering fields, using advanced computing, data, and visualization resources to do so.</p>
<p>In 1987 he received a bachelor’s degree in physics from the University of Missouri-Rolla and in 1990 and 1991 he received master’s degrees in physics and astronomy respectively at the University of Illinois.</p>
]]></content:encoded>
			<wfw:commentRss>http://scientificdatasharing.com/interviews/interview-with-john-towns/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

