• English
    • français
  • Aide
  •  | 
  • Contact
  •  | 
  • À Propos
  •  | 
  • Ouvrir une session
  • Portail HAL
  •  | 
  • Pages Pro Chercheurs
  • EN
  •  / 
  • FR
Voir le document 
  •   Accueil de LillOA
  • Liste des unités
  • Centre de Recherche en Informatique, Signal et Automatique de Lille (CRIStAL) - UMR 9189
  • Voir le document
  •   Accueil de LillOA
  • Liste des unités
  • Centre de Recherche en Informatique, Signal et Automatique de Lille (CRIStAL) - UMR 9189
  • Voir le document
JavaScript is disabled for your browser. Some features of this site may not work without it.

Constant-Delay Enumeration for Nondeterministic ...
  • BibTeX
  • CSV
  • Excel
  • RIS

Type de document :
Article dans une revue scientifique: Article original
DOI :
10.1145/3436487
Titre :
Constant-Delay Enumeration for Nondeterministic Document Spanners
Auteur(s) :
Mengel, Stefan [Auteur]
Centre de Recherche en Informatique de Lens [CRIL]
Amarilli, Antoine [Auteur]
Data, Intelligence and Graphs [DIG]
Département Informatique et Réseaux [INFRES]
Bourhis, Pierre [Auteur] refId
Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189 [CRIStAL]
Self-adaptation for distributed services and large software systems [SPIRALS]
Niewerth, Matthias [Auteur]
Universität Bayreuth [Deutschland] = University of Bayreuth [Germany] = Université de Bayreuth [Allemagne]
Titre de la revue :
ACM Transactions on Database Systems
Pagination :
1-30
Éditeur :
Association for Computing Machinery
Date de publication :
2021-04-14
ISSN :
0362-5915
Mot(s)-clé(s) en anglais :
• Information systems
Information extraction
Theory of computation
Regular languages
Design and analysis of algorithms
Discipline(s) HAL :
Informatique [cs]/Base de données [cs.DB]
Informatique [cs]/Algorithme et structure de données [cs.DS]
Informatique [cs]/Traitement du texte et du document
Résumé en anglais : [en]
We consider the information extraction framework known as document spanners and study the problem of efficiently computing the results of the extraction from an input document, where the extraction task is described as a ...
Lire la suite >
We consider the information extraction framework known as document spanners and study the problem of efficiently computing the results of the extraction from an input document, where the extraction task is described as a sequential variable-set automaton (VA). We pose this problem in the setting of enumeration algorithms, where we can first run a preprocessing phase and must then produce the results with a small delay between any two consecutive results. Our goal is to have an algorithm that is tractable in combined complexity, i.e., in the sizes of the input document and the VA, while ensuring the best possible data complexity bounds in the input document size, i.e., constant delay in the document size. Several recent works at PODS’18 proposed such algorithms but with linear delay in the document size or with an exponential dependency in size of the (generally nondeterministic) input VA. In particular, Florenzano et al. suggest that our desired runtime guarantees cannot be met for general sequential VAs. We refute this and show that, given a nondeterministic sequential VA and an input document, we can enumerate the mappings of the VA on the document with the following bounds: the preprocessing is linear in the document size and polynomial in the size of the VA, and the delay is independent of the document and polynomial in the size of the VA. The resulting algorithm thus achieves tractability in combined complexity and the best possible data complexity bounds. Moreover, it is rather easy to describe, particularly for the restricted case of so-called extended VAs. Finally, we evaluate our algorithm empirically using a prototype implementation.Lire moins >
Langue :
Anglais
Comité de lecture :
Oui
Audience :
Internationale
Vulgarisation :
Non
Projet ANR :
Réponse efficace aux requêtes sous mises à jour
Collections :
  • Centre de Recherche en Informatique, Signal et Automatique de Lille (CRIStAL) - UMR 9189
Source :
Harvested from HAL
Fichiers
Thumbnail
  • http://arxiv.org/pdf/2003.02576
  • Accès libre
  • Accéder au document
Thumbnail
  • 2003.02576
  • Accès libre
  • Accéder au document
Université de Lille

Mentions légales
Accessibilité : non conforme
Université de Lille © 2017