• English
    • français
  • Help
  •  | 
  • Contact
  •  | 
  • About
  •  | 
  • Login
  • HAL portal
  •  | 
  • Pages Pro
  • EN
  •  / 
  • FR
View Item 
  •   LillOA Home
  • Liste des unités
  • Groupement d'Études et de Recherche Interdisciplinaire en Information et Communication (GERiiCO) - ULR 4073
  • View Item
  •   LillOA Home
  • Liste des unités
  • Groupement d'Études et de Recherche Interdisciplinaire en Information et Communication (GERiiCO) - ULR 4073
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

Combining Data Lake and Data Wrangling for ...
  • BibTeX
  • CSV
  • Excel
  • RIS

Document type :
Autre communication scientifique (congrès sans actes - poster - séminaire...): Communication dans un congrès avec actes
Permalink :
http://hdl.handle.net/20.500.12210/75339
Title :
Combining Data Lake and Data Wrangling for Ensuring Data Quality in CRIS
Author(s) :
Azeroual, Otmane [Auteur]
Schopfel, Joachim [Auteur] refId
Groupe d'Études et de Recherche Interdisciplinaire en Information et COmmunication - ULR 4073 [GERIICO ]
Ivanovic, Dragan [Auteur]
University of Novi Sad
Nikiforova, Anastasija [Auteur]
Conference title :
CRIS2022: 15th International Conference on Current Research Information Systems
City :
Dubrovnik
Country :
Croatie
Start date of the conference :
2022-05-12
English keyword(s) :
CRIS
research information
research information system
heterogeneous data sources
data quality
data wrangling
data lifecycle
data consolidation
data lake
data cleaning
data warehouse
data lakehouse
HAL domain(s) :
Sciences de l'Homme et Société/Sciences de l'information et de la communication
Informatique [cs]/Base de données [cs.DB]
English abstract : [en]
Consolidation of the research information improves the quality of data integration, reducing duplicates between systems and enabling the required flexibility and scalability when processing various data sources. We assume ...
Show more >
Consolidation of the research information improves the quality of data integration, reducing duplicates between systems and enabling the required flexibility and scalability when processing various data sources. We assume that the combination of a data lake as a data repository and a data wrangling process should allow low-quality or "bad" data to be identified and eliminated, leaving only high-quality data, referred to as "research information" in the Research Information System (RIS) domain, allowing for the most accurate insights gained on their basis. This, however, would lead to increased value of both the data themselves and data-driven actions contributing to more accurate and aware decision-making. This cleansed research information is then entered into the appropriate target Current Research Information System (CRIS) so that it can be used for further data processing steps. In order to minimize the effort for the analysis, the proliferation and enrichment of large amounts of data and metadata, as well as to achieve far-reaching added value in information retrieval for CRIS employees, developers and end users, this paper outlines the concept of a curated data lake with the data wrangling process, showing how it can be used in CRIS to clean up data from heterogeneous data sources during their collection and integration.Show less >
Language :
Anglais
Peer reviewed article :
Oui
Audience :
Internationale
Popular science :
Non
Collections :
  • Groupement d'Études et de Recherche Interdisciplinaire en Information et Communication (GERiiCO) - ULR 4073
Source :
Harvested from HAL
Submission date :
2022-06-18T05:16:59Z
Files
Thumbnail
  • https://hal.archives-ouvertes.fr/hal-03694519v1/document
  • Open access
  • Access the document
Université de Lille

Mentions légales
Université de Lille © 2017