BioMedical Data Manifest

Aggregator


GDC Project ID: TCGA-LAML

GDC Summary page not available
General Information
LINKS
DATA MANIFEST AUTHORS
  • Data Manifest aggregated from GDC
VERSION INFORMATION
KEYWORDS
  • Current Version: Data Release 42.0
  • DOI: N/A
  • Release Date: January 30, 2025
  • Last Updated: N/A
  • Hematopoietic and reticuloendothelial systems
  • Myeloid Leukemias
EXTENSION MECHANISMS
  • Contact Dataset Owners/Publishers for ways to contribute
OWNER/PUBLISHER
CONTACT DETAILS
  • N/A
    Dataset Contacts:

Uses of Data
DATASET ORIGINAL USE
CONCERNS AND LIMITATIONS
  • N/A
  • General Research Use: Use of the data is limited only by the terms of the model Data Use Certification.
CONFOUNDING FACTORS
DATASET KNOWN PUBLICATIONS AND BENCHMARKS
  • N/A
  • N/A
CITATION GUIDELINES
When using this dataset please cite:
  • Use of data from dbGaP: Please cite/reference the use of dbGaP data by including the dbGaP accession phs000178
  • Acknowledgement Statement: The results published here are in whole or part based upon data generated by The Cancer Genome Atlas managed by the NCI and NHGRI. Information about TCGA can be found at http://cancergenome.nih.gov.

Dataset Composition
DATA SUBJECT(S)
DATASET SNAPSHOT
DETAILS
  • Each experimental unit refers to a patient sample
  • Each patient (Case) can have multiple experimental units corresponding to for example different timepoints or tissue sources etc.

Above: Overall statistics of the open dataset.
Open Access
Size of Dataset 11.159 GB
Number of Cases 200
Number of Files 4,099
File Types BCR Biotab,TSV,TXT,CEL,IDAT,BCR XML,MAF

Above: Overall statistics of the controlled dataset.
Controlled Access
Size of Dataset 43.119 TB
Number of Cases 200
Number of Files 4,740
File Types VCF,TSV,MAF,BEDPE,BAM,CEL
  • Modalities:
    • Simple Nucleotide Variation
    • Sequencing Reads
    • Biospecimen
    • Clinical
    • Copy Number Variation
    • Transcriptome Profiling
    • DNA Methylation
    • Structural Variation
  • Missingness:
    • Clinical data due to lack of access to records or curation, missing typically represented by blank, NA, Unknown entries etc
    • Not all genomics data is captured for each patient sample
    • Additionally, all genomic assays may have regions below LOD/background
  • Sampling:
    • N/A
  • Data anomalies/errors:
    • N/A
SUMMARY OF INSTANCES AND DATATYPES

Above: Number of instances per sample type and modality for open access data
Open Access Modalities
Instance Type Simple Nucleotide Variation Copy Number Variation Transcriptome Profiling DNA Methylation
Buccal Cell Normal 1 0 0 0
Primary Blood Derived Cancer - Peripheral Blood 149 200 203 194
Solid Tissue Normal 143 200 0 0
Above: Number of instances per sample type and modality for controlled access data
Controlled Access Modalities
Instance Type Simple Nucleotide Variation Copy Number Variation Structural Variation Transcriptome Profiling
Buccal Cell Normal 1 0 0 0
Primary Blood Derived Cancer - Peripheral Blood 270 200 151 151
Solid Tissue Normal 261 200 0 0
  • Additional Notes: Not all these instances are used in a given analyses. See respective publications or other details in this document for information


Above: Number of instances per sample type and diagnosis
Instance Type Myeloid Leukemias
Buccal Cells 1
Peripheral Blood NOS 356
Solid Tissue 340

Above: Number of cases per clinical variable and diagnosis
Clinical Variable Myeloid Leukemias
race 200
gender 200
ethnicity 200
vital_status 200
age_at_index 200
days_to_birth 200
age_is_obfuscated 200
days_to_death 120
state 200
synchronous_malignancy 200
days_to_diagnosis 200
tissue_or_organ_of_origin 200
age_at_diagnosis 200
primary_diagnosis 200
prior_malignancy 200
year_of_diagnosis 200
state 200
prior_treatment 200
diagnosis_is_primary_disease 200
morphology 200
classification_of_tumor 200
fab_morphology_code 200
icd_10_code 200
site_of_resection_or_biopsy 200
calgb_risk_group 197
chemical_exposure_type 4
exposure_type 4
state 4

Above: Number of instances per specific diagnosis and diagnosis
Primary Diagnosis Myeloid Leukemias
Acute megakaryoblastic leukaemia 14
Acute monocytic leukemia 77
Acute myeloid leukemia with maturation 154
Acute myeloid leukemia without maturation 138
Acute myeloid leukemia, M6 type 20
Acute myeloid leukemia, NOS 4
Acute myeloid leukemia, minimal differentiation 65
Acute myelomonocytic leukemia 148
Acute promyelocytic leukaemia, t(15;17)(q22;q11-12) 77

Ethical, Legal and Social Issues (ELSI)
Inclusion: TCGA utilizes a strict set of criteria for inclusion into the study due to the rigorous and comprehensive nature of the work being performed. Tumor samples and matched source of germline DNA are curated and processed by the Biospecimen Core Resource, a centralized site that reviews sample data and processes all samples to ensure consistent pathology assessment and generation of molecular analytes (DNA and RNA). TCGA is focusing on primary untreated tumors that were snap frozen upon collection. All tumors must have a matched normal sample from the same patient. In many cases, the matched normal is a sample of the patient's blood. Once at the BCR, all samples are subjected to a quality control protocol before they are accepted for full analysis into the TCGA pipeline. Each sample is reviewed by a pathologist to confirm the diagnosis and that the sample meets inclusion criteria. Specifically, TCGA requires that samples contain at least 60% tumor nuclei and have less than 20% necrotic tissue. Once the sample passes the pathology review, nucleic acids are isolated and genotyping is performed so that each tumor sample is properly associated with the correct normal tissue. An important goal in establishing this central resource is to ensure that molecular analytes (i.e. DNA and RNA) extracted from tissue samples are of consistent and high quality. Next, these analytes, undergo a molecular quality control process and then are distributed to TCGA Cancer Genome Characterization Centers and Genome Sequencing Centers for genomic analysis. All samples in TCGA have been collected and utilized following strict policies and guidelines for the protection of human subjects, informed consent and IRB review of protocols.
Confidentiality: Usage of Open Access data has been deemed safe for general usage with minimal risk to patients. Restricted Access data could be used to potentially re-identify a given patient and so needs special requirements to access.
Considerations for external resources utilized: N/A
Human Subjects
REGULATORY
OTHER CONSIDERATIONS
  • REB/IRB approval for data collection?: Generally, for these studies all patients have provided informed consent but see dbGaP for more details
  • Sample were collected from patients in the following countries:
    • N/A
  • Known ELS issues introduced by preprocessing: N/A
  • Known problematic proxies: N/A
  • Was a data protection impact analysis performed?: N/A
Collected Sensitive Data Fields
Field Name Definition Distribution
gender Text designations that identify gender. Gender is described as the assemblage of properties that distinguish people on the basis of their societal roles. [Explanatory Comment 1: Identification of gender is based upon self-report and may come from a form, questionnaire, interview, etc.]
race An arbitrary classification of a taxonomic group that is a division of a species. It usually arises as a consequence of geographical isolation within a species and is characterized by shared heredity, physical attributes and behavior, and in the case of humans, by common history, nationality, or geographic distribution. The provided values are based on the categories defined by the U.S. Office of Management and Business and used by the U.S. Census Bureau.
ethnicity An individual's self-described social and cultural grouping, specifically whether an individual describes themselves as Hispanic or Latino. The provided values are based on the categories defined by the U.S. Office of Management and Business and used by the U.S. Census Bureau.
age_at_diagnosis Age at the time of diagnosis expressed in number of days since birth.
submitter_id All submitted IDs can potentially contain identifiable information

Intentionality of sensitive human attribute collection: N/A
LICENSING
  • IP / Terms of Use:
  • Third party intellectual property considerations:
    • N/A
  • Export controls or other regulations impacting dataset access/download or storage:
    • N/A

Provenance and Lineage
Description of the data collection methodology: N/A
Specific data collection devices:
  • N/A
Dataset Curation: N/A
Dataset Validation:
  • N/A
Data collection timeframe: N/A
Preprocessing steps/workflow:
  • Pre-processing: Generation and processing of raw data is described in detail in existing publications
  • Post-processing and derived features: Generation of processed data by the Genomic Data Commons is described here