Data Sharing and Preservation

Investigators should choose an approach for sharing based on factors such as the sensitivity of the data, the size and complexity of the dataset, and the volume of requests anticipated. Sharing data upon request, or on personal or laboratory-associated websites is NOT recommended.

Approaches to depositing data can include:

  • Depositing data in a domain-specific data repository. These repositories are often supported and managed by the communities that use the data. This improves discoverability of the data of interest, and ensures the metadata associated with the dataset is useful enough for reuse. Importantly, specialized repositories for protected data help with secure long-term storage. The NIH encourages researchers to use disciplinary repositories when possible for data sharing. For Clinical data, UCLA has an institutional membership with Vivli, a data repository built for sharing anonymized clinical research/study data. The base cost to submit a data set in Vivli is covered as part of the membership for UCLA researchers. NIH provides a searchable list of recommended repositories and lists other great sources if you are having trouble finding a repository to suit your data. Be sure to review NIH documentation for selecting a data repository.
  • Depositing data in a general purpose data repository. These repositories are the most flexible for data storage and are easily citable in publications and grant applications. UCLA Dataverse is a local data repository for UCLA researchers. Dryad is a UC-managed data repository that can be used at no cost to UC researchers. NIH also provides a list of supported generalist repository options on their webpage.
  • Depositing data in a protected repository. Protected data such as those containing personally identifiable information, or data relating to vulnerable or protected populations need to be accessed and used in a secure way to protect the privacy of study participants. Certain data repositories and data enclaves are built with the appropriate level of security and environments for regulated access and secure data analysis. For example, Vivli can be used to publish clinical research data.

The NIH has released guidance on selecting a data repository that follows widely recognized best practices in data sharing and storage. In their recommendations for choosing a repository, the NIH highlights a number of key characteristics:

  • The repository should issue unique persistent identifiers (PID) such as a digital object identifier (DOI) or accession number to support data discovery, reporting and citation. A unique PID will lead to a persistent landing page, even if the data set is deaccessioned or no longer available. 
  • The repository should have a plan for long-term management of data including maintaining its integrity, authenticity and availability. It should be built on a stable technical infrastructure and funding plans with contingency plans to ensure data are maintained and available during and after unforeseen events.
  • The repository should collect sufficient metadata about the data set so that it can be properly discovered, reused and cited. Domain-specific repositories typically collect more detailed metadata than general repositories.
  • The repository should provide or have a mechanism to support data curation and quality assurance.
  • The repository should be free to access and easily accessible in a timely manner after submission.
  • Other considerations include: clear data use guidance, data security and integrity, confidentiality, formatting, provenance.

References: Supplemental Policy Information “Selecting a Repository for Data Resulting from NIH Supported Research