Creating and depositing a dataset

1. Log in to the repository

1) Log in with your credentials

2) Go to the page of the instance where you want to deposit the dataset (you will need to have dataset creation permissions on the instance).

3) Click "Add Data" and "New Dataset".

2. Fill in the form fields

A dataset's deposit form contains required (highlighted with a red asterisk) and optional fields. Below are the main form fields:

REPOSITORY FIELD	MANDATORY	DEFINITION	APPLICATION NOTES	EXAMPLE
Host Dataverse	Mandatory	Name of thedataverse where the dataset is deposited.	It is filled in by default according to the instance where it is deposited.	Science Department
Dataset Template	Optional	Institutional template with metadata already established.	If your institution has a predefined template, select it to automatically fill in some fields. If the template is changed, the fields where data has already been entered are deleted.	template_UdG
Citation Metadata
Title *	Mandatory	Name by which the resource is known.	When the title of the dataset matches the title of the related publication, add the expression "Replication data for" in front of the title.	A Benchmark for End-User Structured Data User Interfaces 23 Efficient ways of subsetting a Pandas DataFrame Replication data for Perceived HRM (Human Resource Management) - Health centers
Author *	Mandatory	Principal investigators involved in the production of the data in order of priority. It can be a personal or corporate/institutional name.	Repeatable.
Name *	Mandatory	Full name of the creator.	In the personal name it is necessary to follow the form: Surname, Name. Non-Roman names can be translated according to the ALA-LC12 schemes. For institutions, indicate the developed name of the institution. If, in addition, a structural unit must be added (department, research group, etc.) it must be indicated after the name followed by a period.	Charpy, Antoine Universitat de Barcelona. Departament de Química
Affiliation	Recommended	Organization with which the author is affiliated.	Use the developed name of the institution. If it is necessary to add a structural unit (department, research group, etc.), it must be indicated after the developed name of the institution followed by a period. When there is a double affiliation, the different institutions must be separated by commas. Note: Many institutions have their own regulations for institutional affiliation	Universitat de Girona Institut Català d’Arqueologia Clàssica Universitat Autònoma de Barcelona. Centre d'Estudis Sociològics sobre la Vida Quotidiana i el Treball Centre de Ciència i Tecnologia Forestal de Catalunya, Universitat de Lleida
Identifier Scheme	Recommended	Name identifier scheme.	If the Identifier field is used, the Identifier Scheme field is mandatory. List of controlled values ORCID ISNI LCNA VIAF GND DAI ResearcherID ScopusID	ORCID ISNI DAI
Identifier	Recommended	Unique identifiers of natural or legal persons, according to various schemes.	The format depends on the scheme. For ORCID, use the form XXXX-XXXX-XXXX-XXXX (no URL)	0000-0003-2207-9605
Contact *	Mandatory	Person/s or institution responsible for the dataset with whom users can contact.	Repeatable
Name	Recommended	Full name of the contact.	In the personal name it is necessary to follow the form: Surname, Name. Non-Roman names can be translated according to the ALA-LC12 schemes. For institutions, indicate the developed name of the institution. If, in addition, a structural unit must be added (department, research group, etc.) it must be indicated after the name followed by a period.	Charpy, Antoine Universitat de Barcelona. Departament de Química
Affiliation	Recommended	Contact Affiliation.	Use the developed name of the institution. If it is necessary to add a structural unit (department, research group, etc.), it must be indicated after the developed name of the institution followed by a period. When there is a double affiliation, the different institutions must be separated by commas. Note: Many institutions have their own regulations for institutional affiliation.	Universitat de Girona Institut Català d’Arqueologia Clàssica Universitat Autònoma de Barcelona. Centre d'Estudis Sociològics sobre la Vida Quotidiana i el Treball Centre de Ciència i Tecnologia Forestal de Catalunya, Universitat de Lleida
E-mail *	Mandatory	Email address of the contact (this data will not be accessible to users, it will be used to establish contact with users in an automated way)	This field is not exported in any schema	NameSurname@csuc.cat
Description *	Mandatory	Abstract describing the purpose, nature and scope of the dataset.	Repeatable
Text *	Mandatory	Abstract that explains the contents of the dataset, as well as the purpose, nature and scope of the dataset.	In case there is a related publication, the description of the dataset must not be the same as the summary of that publication. A good description is one that identifies the content of the dataset and helps the user determine whether it can be used. HTML tags can be used.	Dataset of bone stable isotope values (δ34S, δ18O, δ13C and δ15N) for seven marine mammal species from Río de La Plata, Uruguay, and adjoining Atlantic waters (Arctocephalus australis, Otaria flavescens, Pontoporia blainvillei, Lagenodelphis hosei, Pseudorca crassidens, Phocoena spinipinnis, Tursiops truncatus). The δ18O, δ13C and δ15N values were compiled from Drago et al. (2020 and 2021). δ13Ccor: isotope values corrected for Suess effect shifts. The dataset contains two files: a word file including the transcripts of the communication rounds during the experiment and an excel file with the results of the game and socioeconomic data.
Date	Optional	Date of description. In case your dataset contains two descriptions (for example, one prepared by the author of the data and one prepared by the repository where the data was published) the date field will serve to distinguish them.	It is recommended to follow the YYYY-MM-DD encoding.	2020-11-27
Subject *	Mandatory	Dataset knowledge area.	Repeatable. List of controlled values Agricultural Sciences Arts and Humanities Astronomy and Astrophysics Business and Management Chemistry Computer and Information Science Earth and Environmental Sciences Engineering Law Mathematical Sciences Medicine, Health and Life Sciences Physics Social Sciences Other	Arts and Humanities Astronomy and Astrophysics Business and Management Engineering Law
Keyword *	Mandatory	Keywords that describe important aspects of the dataset.	Repeatable.
Term *	Mandatory	Keyword that is indexed and ranked for the purpose of retrieving the dataset.	It is recommended to capitalize the first letter of the word. For expressions with more than one word, capitalize only the first one. You should use controlled vocabularies of your discipline or general ones recommended by your institutions.	Abdominal pregnancy
Vocabulary	Recommended	Controlled vocabulary where the term is used.	Fill in this field when using keywords from controlled vocabularies.	LCSH DBpedia
Vocabulary URL	Recommended	Link to general vocabulary.	Fill in this field when using keywords from controlled vocabularies.	https://id.loc.gov/authorities/subjects.html https://dbpedia.org/sparql
Related publication	Recommended	Publications related to dataset data.	Repeatable
Citation	Recommended	The full citation of the publication.	It is recommended to follow the citation style of each discipline. In case the related article is not yet published, fill in the field with this information, according to the language in which the deposit is made: Article pendent d'acceptació. Artículo pendiente de aceptación. Article submitted for review.	García, Roberto; Gil, Rosa María; Bakke, Eirik; Karger, David R.. (2020). A benchmark for end-user structured data exploration and search user interfaces. Journal of Web Semantics, 2020, vol. 65, p. 100610. https://doi.org/10.1016/j.websem.2020.100610 Article pending acceptance.
ID Type	Recommended	Type of identifier used in the publication.	List of controlled values ark arXiv bibcode doi ean13 eissn handle isbn issn istc lissn lsid pmid purl upc url urn	doi handle
ID Number	Recommended	The identifier of the publication.	In the case of DOIs, indicate only the prefix and suffix.	10.1016/j.websem.2020.100610 10459.1/69484
URL	Recommended	Link to the publication	Include the entire URL in the post. Although the ID Number has already been entered, this field must also be filled in.	https://doi.org/10.1016/j.websem.2020.100610 http://hdl.handle.net/10459.1/69484
Notes	Optional	Important additional information about the dataset that did not appear in the description.	Free text. HTML tags can be used.	This project has been partially supported by the research project InDAGuS (Spanish Government TIN2012-37826-C02), together with the Universitat de Lleida and the Massachusetts Institute of Technology.
Depositor	Optional	Person or organization that deposited the dataset in the repository.	In the personal name it is necessary to follow the form: Surname, Name. Non-Roman names can be translated according to the ALA-LC12 schemes. For institutions, indicate the developed name of the institution. If, in addition, a structural unit must be added (department, research group, etc.) it must be indicated after the name followed by a period. This field is filled by default with the information of the user making the deposit.	Charpy, Antoine Universitat de Barcelona. Departament de Química
Deposit Date	Optional	Dataset deposit date in the repository.	It is recommended to follow the YYYY-MM-DD encoding. This field will be automatically filled with the current date.	2020-11-27
Kind of Data *	Mandatory	Description of the resource type.	List of controlled values: Administrative records data These refer to microdata records contained in files collected and maintained by administrative agencies (ie, programs) and commercial entities. Government and commercial entities maintain these files for the purpose of administering programs and providing services. Aggregate data Aggregate data is high-level data that is obtained by combining individual-level data. For example, the output of an industry is an aggregate of the output of individual firms within that industry. Aggregated data is applied in statistics, data warehousing, and economics. https://en.wikipedia.org/wiki/Aggregate_data Census/enumeration data A census is a survey carried out on the complete set of objects of observation belonging to a given population or universe. Context: A census is the complete enumeration of a population or groups at a given time in terms of well-defined characteristics: for example, population, production, traffic on particular roads. Clinical data They consist of information ranging from health determinants, health measures, health status, to user care documentation. This data is captured for various purposes and stored in numerous health system databases. https://www.ncbi.nlm.nih.gov/books/NBK54290/ Coded documents Code documentation is a specific type of documentation (sometimes described as "internal software code") that includes the written text paired with the source code of computer software, which describes the functionality built into the source code, its structure data, algorithms, and application program interfaces, and explains how computer software performs its functions. Coded textual It is a process of identifying a fragment of the text or other data (photograph, image), searching and identifying concepts and finding relationships between them. So coding is not just labeling; is to link data to the research idea and back to other data. Compiled data Data collected or assembled from multiple, often heterogeneous, sources that have one or more common reference points, and at least one of the sources was originally produced for other purposes. The data is incorporated into a new entity. For example, providing data on the number of universities over the last 150 years using a variety of available sources (eg financial documents, official statistics, university records), combining survey data with information on geographical areas from official statistics (eg population density, doctors per capita, etc.), or using RSS to collect blog posts or tweets, etc. Encoded data Qualitative data (textual, video, audio or still image) originally produced for other purposes into quantitative data (expressed in unit-by-variable matrices) using coding techniques according to predefined categorization schemes. Event/transaction data Events are all incidents or events related to the business or that have an impact on the entity's business. Transactions are those events that have an immediate and measurable monetary impact on the entity's books of accounts. Event data is any data you want to measure about an event. Data comes from many sources and must be transformed into a format that IBM® Predictive Maintenance and Quality can use. Transaction data is data that describes an event and is usually described with verbs. Transaction data always has a time dimension, a numeric value, and refers to one or more objects. Experimental Data Data resulting from the experimental research method that involves the manipulation of some or all of the independent variables included in the hypotheses. Genomic data Genomic data refers to data about an organism's genome and DNA. They are used in bioinformatics to collect, store and process the genomes of living things. Genomic data is a broader term than sequencing data. However, genomic data mostly come from sequencing techniques. It may include non-sequencing data, such as microarray data, real-time PCR panel data, and data from pharmacogenomics studies. Geospatial data Geospatial data is any type of data with spatial coordinates that allow it to be mapped onto the Earth's surface. They can represent physical objects, discrete areas or continuous surfaces. Discrete geospatial data is typically represented by vector data consisting of points, lines, and polygons, while continuous geospatial data is typically represented by raster data, which consists of a grid of cells that each have their own value. Any number of applications in a wide range of areas produce geospatial data, including GIS, remote sensing equipment, GPS units, archaeological total stations, manual mapping, and computer-aided design (CAD), in various formats, including images, vectors , text and tabular data. Vector-based geospatial data include tables listing archaeological sites along with their coordinates, text-based files (eg XML) containing coordinates and topology for historic road networks, political party voting figures by administrative area . Raster-based geospatial data includes satellite images, aerial photographs, scanned maps and digital maps of elevations, vegetation, land use, sea surface temperatures, air pollution, soil types, etc. Laboratory notebook A laboratory notebook (colloq. laboratory notebook or laboratory book) is a primary record of research. Researchers use a lab notebook to document their hypotheses, experiments, and the initial analysis or interpretation of those experiments. This tag is used for both the traditional and electronic lab notebook. Machine-readable text Machine-readable data, or computer-readable data, is data in a format that can be processed by a computer. Machine-readable data must be structured data. Measurement and test data Data resulting from the evaluation of specific properties (or characteristics) of beings, things, phenomena (and/or processes) through the application of pre-established standards and/or specialized instruments or techniques. Observation data/ratings Data resulting from observational research, which consists of collecting observations as they occur (eg, observing behaviors, events, development of a condition or disease, etc.), without attempting to manipulate any of the independent variables. Process-produced The collection and manipulation of data elements to produce meaningful information. In this sense it can be considered a subset of information processing, the change (processing) of information in any way detectable by an observer. Program source code Source code is human-readable text written in a specific programming language. The purpose of source code is to set exact rules and specifications for the computer that can be translated into machine language. As a result, source codes are the basis of programs and websites. Psychological test Refers to raw and scaled scores, client/patient responses to test questions or stimuli. Also to psychologists' notes and recordings of client/patient statements and behavior during an examination. Recorded data Data recorded by mechanical or electronic means, in a form that allows the information to be retrieved and/or reproduced. For example, images or sounds on disk or magnetic tape. Simulation data Data resulting from the modeling or imitative representation of real-world processes, events, or systems, often using computer programs. For example, a program that models the responses of household consumption to changes in indirect taxes; or a dataset on hypothetical patients and their drug exposure, background conditions, and known adverse events. Survey data After conducting a survey, it is the data results from a sample of the people surveyed. This data is comprehensive information, collected from a target audience, about a specific topic to conduct research. https://www.questionpro.com/blog/survey-data-collection/ Textual data Textual data refers to systematically collected material consisting of written, printed or electronically published words, usually written on purpose or transcribed from speech. Time budget diaries Time diaries allow researchers to answer a series of questions about how people allocate their time. That is, how they use time. surveys usually ask respondents to report on the previous day. Other Other kind of data.	Survey data Program source code Machine-readable text Textual data Coded textual Observation

Attention!

You can add more metadata (including disciplinary metadata) or edit existing metadata by following the instructions below

4. Accept Terms and Conditions.

1) Click I have read and accepted the Terms and conditions for the deposit, preservation, and dissemination of data in RDR'

5. Save dataset

1) Click "Save Dataset" (at this point, the dataset will remain in draft form until it is edited and published by the institution administrator)

Page tree

Creating and depositing a dataset