Reconstructing Genotypes in Private Genomic ...
Type de document :
Communication dans un congrès avec actes
Titre :
Reconstructing Genotypes in Private Genomic Databases from Genetic Risk Scores
Auteur(s) :
Paige, Brooks [Auteur]
The Alan Turing Institute
Department Computer Science [London] [UCL-CS]
Bell, James [Auteur]
The Alan Turing Institute
Bellet, Aurelien [Auteur]
Machine Learning in Information Networks [MAGNET]
Gascón, Adrià [Auteur]
The Alan Turing Institute
University of Warwick [Coventry]
Ezer, Daphne [Auteur]
The Alan Turing Institute
Department of Biology [York]
The Alan Turing Institute
Department Computer Science [London] [UCL-CS]
Bell, James [Auteur]
The Alan Turing Institute
Bellet, Aurelien [Auteur]
Machine Learning in Information Networks [MAGNET]
Gascón, Adrià [Auteur]
The Alan Turing Institute
University of Warwick [Coventry]
Ezer, Daphne [Auteur]
The Alan Turing Institute
Department of Biology [York]
Titre de la manifestation scientifique :
24th International Conference On Research In Computational Molecular Biology (RECOMB 2020)
Ville :
Virtual
Pays :
Italie
Date de début de la manifestation scientifique :
2020
Mot(s)-clé(s) en anglais :
Genomic privacy
Genetic risk scores
GWAS
Genetic risk scores
GWAS
Discipline(s) HAL :
Informatique [cs]/Apprentissage [cs.LG]
Statistiques [stat]/Machine Learning [stat.ML]
Statistiques [stat]/Machine Learning [stat.ML]
Résumé en anglais : [en]
Some organisations like 23andMe and the UK Biobank have large genomic databases that they re-use for multiple different genomewide association studies (GWAS). Even research studies that compile smaller genomic databases ...
Lire la suite >Some organisations like 23andMe and the UK Biobank have large genomic databases that they re-use for multiple different genomewide association studies (GWAS). Even research studies that compile smaller genomic databases often utilise these databases to investigate many related traits. It is common for the study to report a genetic risk score (GRS) model for each trait within the publication. Here we show that under some circumstances, these GRS models can be used to recover the genetic variants of individuals in these genomic databases—a reconstruction attack. In particular, if two GRS models are trained using a largely overlapping set of participants, then it is often possible to determine the genotype for each of the individuals who were used to train one GRS model, but not the other. We demonstrate this theoretically and experimentally by analysing the Cornell Dog Genome database. The accuracy of our reconstruction attack depends on how accurately we can estimate the rate of co-occurrence of pairs of SNPs within the private database, so if this aggregate information is ever released, it would drastically reduce the security of a private genomic database. Caution should be applied when using the same database for multiple analysis, especially when a small number of individuals are included or excluded from one part of the study.Lire moins >
Lire la suite >Some organisations like 23andMe and the UK Biobank have large genomic databases that they re-use for multiple different genomewide association studies (GWAS). Even research studies that compile smaller genomic databases often utilise these databases to investigate many related traits. It is common for the study to report a genetic risk score (GRS) model for each trait within the publication. Here we show that under some circumstances, these GRS models can be used to recover the genetic variants of individuals in these genomic databases—a reconstruction attack. In particular, if two GRS models are trained using a largely overlapping set of participants, then it is often possible to determine the genotype for each of the individuals who were used to train one GRS model, but not the other. We demonstrate this theoretically and experimentally by analysing the Cornell Dog Genome database. The accuracy of our reconstruction attack depends on how accurately we can estimate the rate of co-occurrence of pairs of SNPs within the private database, so if this aggregate information is ever released, it would drastically reduce the security of a private genomic database. Caution should be applied when using the same database for multiple analysis, especially when a small number of individuals are included or excluded from one part of the study.Lire moins >
Langue :
Anglais
Comité de lecture :
Oui
Audience :
Internationale
Vulgarisation :
Non
Collections :
Source :
Fichiers
- https://hal.inria.fr/hal-03100032/document
- Accès libre
- Accéder au document
- https://eprints.whiterose.ac.uk/161818/1/2020.01.15.907808v1.full.pdf
- Accès libre
- Accéder au document
- https://hal.inria.fr/hal-03100032/document
- Accès libre
- Accéder au document
- https://eprints.whiterose.ac.uk/161818/1/2020.01.15.907808v1.full.pdf
- Accès libre
- Accéder au document
- https://hal.inria.fr/hal-03100032/document
- Accès libre
- Accéder au document
- document
- Accès libre
- Accéder au document
- 2020.01.15.907808v1.full.pdf
- Accès libre
- Accéder au document
- 2020.01.15.907808v1.full.pdf
- Accès libre
- Accéder au document