RDMC UNC Dataverse Policies: Collection Development Policy

This article identifies and describes the types of collections within the UNC Dataverse along with our requirements when considering the addition or retention of collections in our holdings.

 

In This Article:

 

Introduction 

UNC Dataverse is managed and maintained by the Research Data Management Core (RDMC) at the University of North Carolina at Chapel Hill (UNC-CH). The mission of UNC Dataverse is to provide a trusted and standards-based platform and service for federation, preservation, access, and use of research data assets produced by the UNC-CH community. UNC Dataverse is intended to be responsive to UNC research community user needs and to support long-term data access, use, and compliance with data sharing policies. 

As an institutional repository for UNC-CH, the collecting mission of the UNC Dataverse remains focused on research data generated by UNC-CH researchers and their affiliates from all disciplines and research domains.   

Designated Community 

UNC Dataverse provides University of North Carolina at Chapel Hill (UNC-CH) researchers with a platform and service to centralize, preserve and share research data assets for long-term access and use. The members of UNC-CH research community include faculty, research staff, collaborators, affiliates, and graduate and undergraduate students conducting research that addresses some of the most critical and grand challenges of our time.      

Data in UNC Dataverse collections are made available to the public and freely available to anyone in the world with an internet connection. Users do not need a UNC Dataverse account to search, view, and download files from open records. A small percentage of our collection employs restricted access at the discretion of the author. Because UNC Dataverse provides free and open access to its collections, data are accessed by researchers, journalists, policymakers, citizen scientists, and others interested in the collections. 

Selection and Appraisal  

RDMC identifies and solicits data that are representative of research and scholarship conducted at UNC-CH.  Data to be published in UNC Dataverse must meet the following criteria: 

  • The data provider is a member of UNC-CH or an official affiliate. 
  • The data being submitted do not contain any personally identifiable information or personal health information (PII/PHI). 
  • The data being submitted comply with the UNC Dataverse Terms of Use and any relevant legal authorities and regulations.  

Re-Appraisal 

RDMC performs a re-appraisal of all data submissions based on the collecting mission of UNC Dataverse and established standards of professional archival practice. Data are appraised at the end of the RDMC preservation commitment period of 10 years. Re-appraisal will include a data retention review wherein RDMC staff assess the data on primary and secondary criteria such as but not limited to: 

Primary criteria:  

  • The data have substantive value to research 
  • The data have influence on the body of knowledge 
  • The data have enduring value to the Designated Community 
  • The data are unique (i.e., the data are not available in another repository) 
  • The data cover a significant or useful timeframe or date span for study 
  • The data has not met or exceeded the funder’s data retention requirements. 
  • The costs are minimal for continuing to store, preserve, and maintain access to these data  

Secondary criteria:  

  • The data support or expand upon subject area concentrations 
  • The data address substantive gaps in existing holdings 
  • The data are in sound physical condition  
  • The data meet quality standards for accuracy and interpretability 
  • The data are accompanied by complete and readable documentation 

Accepted File Formats 

UNC Dataverse accepts data in a variety of formats but prefers numeric data to be fully documented SPSS (.sav), R (.RData), Stata (.dta), or Microsoft Excel (.xlsx) files containing variable and value labels with complete and accurate documentation. Text file formats recommendations include .txt, .pdf, or .pdf/a. 

For all other file types, the repository prefers formats that are: 

  • Widely adopted by the designated community 
  • Able to be converted or transferred to formats widely adopted by the designated community 
  • Non-proprietary or open source 
  • Free of external software dependencies 
  • Well-documented 

Levels of Curation 

RDMC employs three primary levels of curation for UNC Dataverse collections: basic curation, advanced curation, and self-archiving. These curation levels are assigned to data submissions based on the specific processing requirements of the data as well as the value of the data to the Designated Community as determined during the appraisal process. For data submissions that require curation beyond the advanced level or otherwise require specialized processing beyond the capabilities of the RDMC, RDMC will seek third-party service providers to deliver the necessary level of curation support for and stewardship of these data. 

Basic Curation 

Basic curation is assigned to data submissions for which a significant amount of the required processing has already been completed by the depositor. The following are basic curation tasks that are completed prior to data archiving and distribution:  

  • Submissions are reviewed for funder compliance 
  • Common file formats are normalized to preferred file formats 
  • Citation metadata and persistent identifiers are generated 
  • Archival backups and fixity checks are performed 

Advanced Curation 

Data submissions requiring intensive curation include data that are considered of great potential value to the Designated Community. In addition to the tasks associated with basic curation, these data undergo various advanced curation tasks according to the specific needs of the data. Advanced curation may include the following tasks:    

  • Data are reviewed for accuracy and interpretability 
  • Data are reviewed to detect the presence of direct identifiers 
  • Documentation is reviewed for the inclusion of complete data definitions  
  • Data are converted programmatically to preferred file formats  
  • Additional descriptive metadata are generated to facilitate discovery and reuse 
  • Data containing personally identifiable information or protected health information are de-identified to produce a public-use version of the data 
  • Provisions are put in place to resolve data confidentiality, privacy, and ethical concerns 
  • Missing descriptive information is recovered from available resources and assembled into a more complete document set 
  • Provisions for large-scale data ingest and archiving 

The substantial cost of labor and expertise required to perform basic and advanced curation requires that the RDMC recover these costs through service contracts or grant funding.  

Self-Archiving  

UNC Dataverse offers its user community the option to self-archive their data. Using the UNC Dataverse archival platform, individuals may deposit their data in an open-access collection that is administered by the contributor themselves. Self-Archive submissions only receive curation functions performed by the repository system (e.g., persistent identifier, standardized metadata, derivative file format generation). RDMC staff do not perform any basic or advanced curation tasks on these data; however, the data are periodically audited for appropriateness, quality, and policy compliance. Self-depositors should consult UNC Dataverse Support to prepare materials for submission. Individuals who opt for the self-archive option are required to review, agree to, and abide by the UNC Dataverse Terms of Use and all other RDMC UNC Dataverse policies and guidelines. 

Policy Review 

The Research Data Management Core Collection Development Policy is subject to a three-year review. The current policy was approved and issued on February 1, 2025.