beta

Welcome to the beta website! Learn more

Return to classic site

                                  dbGap Authorized Access Portal home page

Individual-level Data: General Questions

The dbGaP archives and distributes the results of studies that have investigated the interaction of genotype and phenotype. Such studies include genome-wide association studies, medical sequencing, molecular diagnostic assays, as well as association between genotype and non-clinical traits. The individual level data hosted at the dbGaP is distributed through a controlled access system. The types of data distributed through the dbGaP include phenotype data, association (GWAS) data, summary level analysis data, SRA (Short Read Archive) data, reference alignment (BAM) data, VCF (Variant Call Format) data, expression data, imputed genotype data, image data, etc.

The individual-level data submitted to the dbGaP is required to be de-identified. No names or identifiable information is attached to the data. The genetic fingerprint however is embedded in individual's genotype data, which is not de-identifiable. That is why, to protect individual's privacy, all individual level data is only distributed through the Authorized Access System.

The phenotype tables are rectangular, and in general are constructed where a single row represents each study participant, and each column is a measured trait. Genotypes are available in several different formats:

  • The Matrix format. This is like the phenotype format listed above, except that the rows represent SNPs and the columns represent samples.
  • The PLINK format.
  • The VCF format.
  • An individual format where there is one file for each sample and all the genotypes are listed.

Starting from June, 2016, all TCGA data, including the phenotype and sequencing data, are hosted at the Genomics Data Commons website. The dbGaP continues to manage the controlled access approval process through the Authorized Access System. The TCGA data access request should be made through the dbGaP system in the same way as other dbGaP studies (look for study phs000178). After the request is approved, the approval information will be passed to the Genomic Data Commons system within 24 hours. The Genomics Data Common website is operated completely independent of the dbGaP. All issues related to that system, such as system login and data download, should be addressed directly to their help-desk.


Analytical Tool Availability

The dbGaP does not have any online tools for remote data analysis against individual level data.

There is a tool named Phenotype-Genotype Integrator (PheGenI) may be useful to you. Here is a tutorial video about it. PheGenI merges NHGRI genome-wide association study (GWAS) catalog with several databases housed at the NCBI. With the tool, user can search SNP genes association results based on chromosomal location, gene, SNP or phenotype. The PheGenI search however limits to the GWAS summary level analysis data hosted in the dbGaP. It is not a data analytical tool directly against individual level data distributed through the dbGaP Authorized Access System. The tool can be accessed from the dbGaP home page through a link named “Phenotype-Genotype Integrator” Please note that the PheGenI displays the best p-values from each dbGaP hosted analysis. Many p-values are excluded (basically, those that dbGaP has deemed as not statistically significant.) If you'd like access to all the p-values (both good and bad), then there are a couple of alternatives:

  1. If all you care about are just p-values (and not, say, direction or minor allele frequency), then you can either
    1. examine an analysis using the genome browser: for example from here, click on "View association results in Genome Browser"
    2. download the p-values in tabular form from the dbGaP public ftp site: for example from . Click on "Connect to public download site", then "Analyses", then the zip file.
  2. If you require more information than the above alternatives supply, you will have to make data access request through the Authorized Access System.
Another tool named Association Results Browser allows the search against the GWAS catalog and summary level analysis data available at the dbGaP.