Proteomics Data:
A Report from HUPO2008
By ROBERT CHALKLEy
The Human Proteome Organization (HUPO) 7th
Annual World Congress recently took place in Amsterdam. During the main meeting and during several satellite meetings, there was extensive discussion on the issues
involved in exchanging and publishing proteomics data.
The International Proteomics Summit
A few days before the main meeting, the National Cancer
Institute (NCI) organized an international summit entitled
“Proteomics Data Release and Sharing Policy.” Speakers at
this summit included representatives from major proteomics data repositories, including the European Bioinformatics
Institute (EBI), the Global Proteome Machine (GPM), and
the NIH, which is in the process of launching its own proteomics data repository. There was also strong representation
from various journals and funding agencies.
The summit started with discussion of the different
challenges that are associated with sharing proteomics data
compared with other data types such as genomic data. It was
acknowledged that the greater variety of information that can
be extracted from proteomics data, from protein identification through modification identification to quantitative measurements, means that very different levels of information are
required depending on the goals of the study. The range of
different analogous software tools in use in the community
for acquisition and analysis of mass spectrometry data also
creates significant challenges in allowing the exchange of data
between labs or repositories.
The next session discussed what journals and data repositories are currently doing to deal with the publishing and
exchange of data. The data repositories can be grouped into
two camps: those that report the submitter’s interpretation
of the data they upload to the repository (for example, the
PRIDE repository1), and those sites that choose to interpret
the data that is submitted themselves, so that a consistent
measure of reliability is attached to all data in the repository
(for example, the GPM database2, where all data is processed
using the X!Tandem software). Both approaches have their
advantages; having consistency in the quality of the presented
results is obviously important, but presenting the researcher’s
interpretation is clearly required if the data are linked to a
scientific publication in a journal.
There are mechanisms already in place to exchange data
between repositories through the ProteomExchange consortium3, so if data are submitted to one repository in the consortium it will be distributed throughout the consortium. So,
does it make any difference to which repository a researcher
submits their data? PRIDE differs slightly from the other
repositories in that as well as storing the results it tries to
capture “metadata” about the experiment, such as the source
of the samples and methods of analysis. The capture of this
information could potentially be of importance for journal
submissions, and is discussed below.
Journals and Repositories
When a journal publishes the results of a study, it is implicitly stating it believes the results are reliable. However, until
fairly recently there was no mechanism in place to make sure
proteomic data reliability could be assessed. The production
of publication guidelines for proteomic data has been driven
by Molecular and Cellular Proteomics. A meeting sponsored
by the Journal in 2005 brought together key members of the
proteomics community, including researchers, search engine
developers, instrument manufacturers, and representatives
of most journals publishing proteomic data. The result of this
meeting was a set of rules referred to as the “Paris Guidelines” that document the minimal information required in a
proteomics manuscript to be able to assess the reliability of
results. 4 MCP is currently the only journal enforcing these
requirements at the editorial level for all pertinent manuscripts, whereas other journals recommend the use of these,
or similar guidelines, but rely on reviewers to highlight missing information.