Data Are Most Useful When Openly Shared

MARCH 16, 2001 THE CHRONICLE OF HIGHER EDUCATION B13

Data Are Most Useful When Openly Shared

By Michael S. Gazzaniga and Daniel N. Rockinore

LIKE PEOPLE IN MOST FIELDS, scientists take great pride in what they do. It is hard work to design and perform a good experiment, collect and analyze the data, and scrutinize the results to see if they have implications beyond the scope of the experiment. After that long and demanding process, it is natural for researchers to feel almost as if they own their data. Those proprietary feelings can be especially intense when other scholars want to mine the data for publication; after all, publishing is the stuff of which academic careers are made.

Nevertheless, in recent years scholars in a variety of scientific fields have begun to realize that data ought to be shared. The benefits are many: History shows that shared databases speed the development of the disciplines that use them. Recent advances in informatics-or data mining-make it possible to use databases as primary research material. The resulting meta-analyses give researchers ideas for new experiments, cut down on duplication of effort, and allow researchers from other disciplines to work in the field. That last benefit is crucial, for we live at a time when scientific progress almost requires interdisciplinary effort, and giving researchers from many fields access to primary data is an essential component of interdisciplinary teaching and research.

The most famous collection of shared data belongs to the Human Genome Project. Although at first many researchers wanted nothing to do with a shared database, it quickly became apparent that the results of the project should be made available to everyone, because the new information about DNA would have so many ramifications for the practice of medicine. Now it is standard procedure for scientists to deposit their data in the database, known as GenBank. Molecular biologists have used its material to increase our knowledge enormously.

In fact, the success of GenBank led scientists-including some of those who initially objected to it-to call on the National Institutes of Health to develop a new public database. Called Gene Expression Omnibus, the new database makes available the increasing amount of data that researchers are now producing on the expression of genes, the process that translates genetic code at the molecular level into characteristics like eye and hair color or right-handedness. Scientists began adding material to GEO, as it is known, last fall.

Other fields with shared databases have also experienced the pattern of initial reluctance followed by acceptance and excitement, accompanied by significant increases in researchers' understanding of the subject.

The field of X-ray crystallography provides an example. Richard J. Roberts, a 1993 Nobel laureate, first urged the creation of a shared database for the discipline. The idea was eventually endorsed by the American Crystallographic Association, the N.I.H., and the Protein Society. Science magazine decided to require its contributors in the field to deposit their X-ray-crystallographic coordinates in a public database, but allowed the contributors to wait for a year after publication before meeting the requirement. Although companies interested in commercially exploiting the data supported the delay, academic scientists wanted quicker access to the information. Now, both Science and Nature require contributors in the field to deposit their material in the database on publication.

We have been involved in the creation of a shared database in the field of cognitive neuroscience, whose goal is to understand the nature of thought by examining images of brain activity under a range of circumstances and conditions. Together with other neuroscientists, computer scientists, and mathematicians, we decided to form the National Functional Magnetic Resonance Imaging Data Center.

Functional-brain imaging is a new technology that allows researchers to see the brain in action as it tackles various cognitive, perceptual, and attentional operations. Unlike older technologies, this imaging has the potential to reveal the neural landscapes generated by thought processes, and the biochemical mechanisms that underlie emotions. But each experiment using functional-brain imaging can produce up to 10 gigabytes of data, and until recently it was not possible to manage and manipulate such a massive amount of raw data from experiments relying on the images.

Like molecular geneticists and crystallographers, scientists using brain-imaging technology initially reacted with panic to the idea of sharing their data. Last summer, those of us involved in creating the database sent a letter to the authors who had published functional-imaging studies in the Journal of Cognitive Neuroscience the year before. The letter invited them to deposit their data in the database. It also stated that for subsequent articles, the journal would require authors to do so. At that point, a few other leading journals had also agreed to require their authors to contribute to the database.

Scientists who were opposed to the database asked the journals' editors not to impose the requirement. Many of the editors thereupon backed off from the policy that we had negotiated with them, even though most of them already required authors to make all of the data published in those journals available to the public. That is, in fact, the legal requirement if the government has supported the research.

Shared databases generate initial reluctance,
followed by acceptance,
excitement and significant increases in understanding.

WHEN the controversy erupted, we had already received funds, from the National Science Foundation and the W M. Keck Foundation, to help set up the brain-imaging database. We had begun to hire key personnel, buy the requisite hardware and software, and develop a database that works. The database, accessible to the public, contains information from more than a dozen experiments that were reported in a special issue of the Journal of Cognitive Neuroscience, and that were conducted at leading laboratories in the United States and Europe. That special issue began the journal's new policy of requiring authors to deposit material in the database.

The dust now seems to be settling. We have tried to correct misunderstandings about the nature of the database. Indeed, many scientists recently have urged us to expand our services. Some feel that an author should make the data in a scientific paper available for inspection while the paper is still being considered for publication.

Our hope is that the database will lead to major scientific advances, including new insights into cognitive physiology. Researchers working on one of the brain's functions will be able to look at other scientists' experiments for clues about what regions of the brain might be involved. Universities that cannot afford an expensive brain-imaging center will nonetheless have access to data for teaching and research.

Grand dreams are behind the best of science-dreams of knowing the origins of the universe, say, or the secrets of human development. But dreams are not enough. Eventually, any scientific theory needs to be confirmed or revised by experimentation, and it is often the case that the grander the theory or dream, the more data the investigation produces. Recent history shows that making the totality of information available to anyone who is interested is the most efficient means to scientific ends.

Michael S. Gazzaniga is a professor of cognitive neuroscience, and Daniel N Rockmore a professor of mathematics and computer science, at Dartmouth College.