What digital tools are available on the Lorraine university site to manage research data?
If you need help choosing a data management tool, please contact the ADOC Lorraine data workshop: donnees-recherche@univ-lorraine.fr
Drawing up a data management plan for a project or structure
All the institutions responsible for the Lorraine university site recommend the use of DMP OPIDoR.
Producing data
All the institutions advise LimeSurvey (LimeSurvey UL, LimeSurvey Inria, LimeSurvey AgroParisTech, LimeSurvey INRAE) for surveys and EXPLOR for intensive computing.
All institutions provide scientific platforms on which you can find presentations here:
Managing data
All institutions advise FileSender for sending big datasets and Gitlab for managing source code (Gitlab UL, Gitlab Inria, Gitlab INRAE).
Other thematic services are provided:
- Université de Lorraine: electronic laboratory notebook with eLabFTW / samples managing with GRR / managing of access for external members of Université de Lorraine with Invités Numériques
- INRAE: hosting of postgresql or mysql databases (https://cat.opidor.fr/index.php/BD_PostgreSQL) by the IT of INRAE / hosting of Postgresql databases by the SILVA research unit. Many other services are available on request (they require an INRAE account), their description being provided on the Ariane INRAE portalparticularly in the ‘Collective IT Infrastructures’ category. INRAE is also supporting a tool developed in-house for the samples management also described in the Ariane portal.
Storing data
- Université de Lorraine: B’UL for collaborative working and data under 20 GB (OTELo cloud for the OTELo cluster) / PETA for high-volume data.
- Inria: MyBox for collaborative working and data under 10 GB (volume can be extended on request).
- AgroParisTech: SeaFile for collaborative working and data under 100 GB.
- CNRS: sDrive for collaborative working and data under 100 GB / ShareDocs (IR* Huma-Num; for data storage and sharing, day-to-day work; data volumes of up to approx. 1 terabyte) / Huma-Num Box (IR* Huma-Num; for large volumes of cold or warm data).
- INRAE: High-performance file storage solutions, capacitive storage, Sharepoint, Nextcloud and local Nas are offered by the INRAE IT Department. To request these services or access a description, simply go to the Ariane portal.
Publishing data
All institutions recommend that finalized data be deposited in a trusted disciplinary repository (ask the data workshop ADOC Lorraine for advice) ; failing which, we recommend that you deposit Recherche Data Gouv (institutional spaces Université de Lorraine, Inria, INRAE, CNRS).
Valorizing data
- Université de Lorraine: numerical platform CENHTOR for SHS research projects. Scope: interdisciplinary projects, research data, exploitation (tool platform) and enhancement of corpora and databases, support services for data deposit and editorialization, data curation / thematic databases for other disciplines using open-source software OMEKA S / virtual server hosting.
- CNRS: the NAKALA data repository embeds a NAKALA-PRESS publishing system, enabling the editorialization of a customizable website based on a collection of data deposited in the repository.
- INRAE: various services (web applications, R Server, etc.) based on the INRAE IT Department’s virtual machine supply service. For example, the forestry water balance model application BILJOU.
What are research data?
Research data represent all the data (raw or elaborated) that constitute the primary material of a scientific research activity or project. The multiplicity of materials (sounds, videos, lines of code, thermal surveys…) justifies the complexity of giving a unique definition.
Nevertheless, there is general agreement on the most commonly accepted definition, that of the OECD, which describes research data as “factual records (numbers, texts, images and sounds), which are used as primary sources for scientific research and are generally recognized by the scientific community as necessary to validate research results”.
Why be interested in research data?
Data are the foundation of scientific research and the basis for the development of new hypotheses.
Thus, good management allows to:
- reduce the risk of loss and secure data with recommendations, choice of tools and adapted hosting,
- guarantee the origin and traceability of the data handled by documenting these steps,
- optimize the organization of the mass of data produced through best practices.
The final objective is to provide the researcher with a reasoned approach to his or her data in order to derive maximum benefit from it during the project but also in the years to come.
Publishing data more widely allows :
- a re-use of the unique sets because of the costs linked to their production,
a reproducibility of the research presented in the associated publications,
a re-exploitation by students as didactic material, - an emergence of new research tracks to boost innovation,
- and an enrichment of the world scientific heritage.
Source image : Pixabay
How to manage them?
Data management is often referred to as Research Data Management (RDM). It is the set of good practices concerning the planning, collection, storage, processing, sharing and archiving of data produced in the framework of a scientific project.
These activities are generally represented in the form of a life cycle during which the project and the data evolve in parallel.
- Create: design your research, plan data management (formats, storage, consent for sharing…), locate existing data, collect data, create associated metadata
- Process: enter, transcribe, translate data, verify and clean data, anonymize data if necessary, describe data
- Analyze: interpret, derive, produce research results, publish
- Preserve: migrate data to the best format and medium, backup and store, document, archive
- Provide access: share data, control access, establish copyright, promote data
- Reuse: evaluate research, monitor results, re-initiate research
Source : UK Data Archive
Commonly, we will recognize good data management from the moment it complies as closely as possible with the FAIR principles. The “Fair-ization” of data consists in making it Findable, Accessible, Interoperable and Reusable. This series of good practices encourages describing datasets with standards and norms, assigning a unique identifier, depositing them in certified repositories, etc.
Moreover, good management facilitates the opening of data internally (restricted to the laboratory) or externally (accessible to all). Even if not all data can be shared in certain circumstances, their management remains essential.
Why open your data?
Opening your data brings many benefits:
- Get more citations on publications linked to data (source The citation advantage of linking publications to research data)
- Associate your name with the data you have produced
- Meet the requirements of many journals and funders
- Expand your professional network
- Keep your data safe and secure
- Save time and money by reusing other researchers’ data; save time for other researchers
- Contribute to the reproducibility of research results
- Work for the transparency of the scientific process
- Anticipate the evolution of evaluation, which will also concern research data (see the Paris call for research evaluation 2022)
What national and international policy on research data?
The University of Lorraine is involved and participates in various national and international groups on research data: the Open Science Data Working Group (GTSO), the Data College of the Committee for Open Science (CoSo), the European Open Science Cloud (EOSC) and the Research Data Alliance (RDA).
- GTSO Données : the group works on the support that documentation services can offer to researchers in the management of their data. This group has an operational function. In 2020, three priority areas of work have been identified: support for the drafting of data management plans, awareness-raising and training for doctoral students and researchers, and the sharing of experiences between institutions.
- Data College of CoSo : launched in 2018, the mission of the Open Science Committee is to define an open science policy, ensure its development on a national and international scale, and coordinate its implementation at the level of institutions and scientific communities. The Data College brings together researchers and experts representing the diversity of disciplines and professions in higher education and research. It follows the actions related to data defined annually for the CoSo and can also address any issue within its scope (structuring, “fairisation”, repositories, data openness, legal, etc.).
- The European Open Science Cloud initiative of the European Commission européenne aims to develop a device that provides its users with cloud computing services for open science practices.
- The RDA was launched in 2013 by the European Commission, the National Science Foundation and the National Institute of Standards and Technology of the United States government, as well as the Australian government’s Department of Innovation, with the goal of building the social and technical infrastructure for sharing and reusing research data. The RDA France national “node” is being developed by the CNRS as part of the European RDA Europe 4.0 project with the support of the Ministry of Higher Education, Research and Innovation (MESRI).
What do funders and publishers want?
In the context of open science, incentive initiatives are multiplying at the level of funding agencies, research organizations or publishers. These incentives are gradually becoming obligations.
Funders
In Europe, the H2020 funding program stipulates in its recommendations that data management and openness is mandatory for projects funded from January 2017:
- data and metadata needed for publication validation: mandatory ;
- other data and metadata that the beneficiary has chosen to make available
- in open access: specified in the Data Management Plan (DMP).
If certain data cannot be made accessible, this must be justified in the DMP (risk of compromising the project, ethical reasons, regulations concerning personal data, intellectual property, security, etc.).
The Horizon Europe program has the same requirements.
In France, the ANR requires the drafting of a Data Management Plan for all projects funded from 2019 onwards, according to the methods specified here (in french). It must be provided within 6 months of the start of the project.
The second line of commitment of the National Plan for Open Science launched in July 2018 by the Ministry of Higher Education, Research and Innovation concerns the structuring and opening of research data. The recommended measures: making open dissemination mandatory when it is a project financed half by public funds, creating a data administrator function, creating the conditions and promoting the adoption of a data policy associated with articles published by researchers.
The second National Open Science Plan of 2021 goes further by proposing an axis “Structuring, sharing and opening research data:
- Implement the obligation to disseminate publicly funded research data;
- Create Research Data Gouv, the federated national platform for research data;
- Promote the adoption of a data policy across the entire research data cycle, to make it easy to find, accessible, interoperable and reusable (FAIR).
Publishers
Some publishers such as Elsevier, Springer Nature or PLOS Journals have already added to their journal editorial policy a section describing the dissemination of datasets. They refer to recommended repositories. Thus, the data used to validate or support the publication are directly linked to the article and are immediately accessible.
Some examples of policies stated by publishers:
- Plos One : Data Availability Policy
- Elsevier : Policies
- Springer Nature : 4 Policy Types
What is a data management plan?
The DMP for Data Management Plan is an evolving document that specifies the way in which data will be produced, processed, described, shared or protected and preserved during and after the project. It allows us to anticipate management issues that arise during the course of a research project and the conditions for future dissemination and preservation of the data (embargo? Size of the servers to be planned?). The DMP is a deliverable for more and more funders, particularly the ANR. Depending on the project, 2 or 3 versions can be produced.
Don’t hesitate to register for our Data Management Plan workshops via donnees-recherche@univ-lorraine.fr!
How to give access to your data during peer-reviewing?
Many journals request access to research data during the peer-review process. DOREL allows to give access to deposited but not yet published data, via the private URL function. This way, reviewers can see the data without it being accessible to everyone, and files can be modified as much as necessary before publication (and thus final DOI attribution). Need help? Write to donnees-recherche@univ-lorraine.fr
How to link your data to your publications?
DOREL allows to make a link from the data to the associated publications. HAL also allows to make the link from the publications to the data.
I have written scripts to generate or read my data. Where should I save them?
Research data are often accompanied by scripts (written in Stata, R, MATLAB, Python…). In order to ensure the good readability of the data, it is recommended to deposit the scripts with the data in a repository. The advice of the Dataverse guide can be followed for DOREL.
Note that full-fledged software can be deposited in HAL and in Software Heritage to guarantee their durability. See here the instructions (in french).