BioMedical Data Manifest

Aggregator


TARGET Acute Myeloid Leukemia (AML) (TARGET-AML)

The TARGET Acute Myeloid Leukemia projects employed comprehensive molecular characterization to determine the genetic changes that drive the initiation and progression of high-risk or hard-to-treat childhood cancers. Acute myeloid leukemia (AML) is a cancer that originates in the bone marrow from immature white blood cells known as myeloblasts. About 25% of all children with leukemia have AML. Although survival rates have increased since the 1970s, approximately half of all childhood AML cases relapse despite intensive treatment. Additional therapies following relapse are often unsuccessful and can be especially difficult and damaging for children. These patients would clearly benefit from targeted therapeutic approaches.

Through comprehensive genome-wide characterization, TARGET researchers are identifying the genetic and epigenetic alterations of relapsed disease. The ultimate goal is to translate their discoveries into novel treatments that will improve outcomes for children with AML. To learn more about pediatric AML and current treatment strategies, visit the NCI pediatric AML website.

TARGET AML molecular characterization analyses include gene expression array, copy number array, DNA methylation, Whole Genome Sequencing, Whole Exome Sequencing, RNA-seq, miRNA-seq and Targeted Capture Sequencing.


General Information
LINKS
DATA MANIFEST AUTHORS
  • Data Manifest aggregated from GDC
VERSION INFORMATION
KEYWORDS
  • Current Version: Data Release 42.0
  • DOI: N/A
  • Release Date: January 30, 2025
  • Last Updated: N/A
  • Hematopoietic and reticuloendothelial systems
  • Myeloid Leukemias
EXTENSION MECHANISMS
  • Contact Dataset Owners/Publishers for ways to contribute
OWNER/PUBLISHER
CONTACT DETAILS
  • N/A
    Dataset Contacts:
    • Soheil Meshinchi, MD, PhD. Fred Hutchinson Cancer Research Center, Seattle, WA, USA.
    • Robert Arceci, MD, PhD. Children's Hospital, Phoenix, AZ, USA.

Uses of Data
DATASET ORIGINAL USE
CONCERNS AND LIMITATIONS
  • N/A
  • Disease-Specific (Pediatric Cancer Research): Use of the data must be related to Pediatric Cancer Research.
  • Use of protected TARGET datasets should be for research projects that can only be conducted using pediatric data (i.e., the research objectives cannot be accomplished using data from adults) and that have likely relevance to developing more effective treatments, diagnostic tests, or prognostic markers for childhood cancers. Applications proposing methods, software, or other tool development would not be considered acceptable uses of the data.
CONFOUNDING FACTORS
DATASET KNOWN PUBLICATIONS AND BENCHMARKS
  • N/A
  • N/A
CITATION GUIDELINES
When using this dataset please cite:
  • Use of data from dbGaP: Please cite/reference the use of dbGaP data by including the dbGaP accession phs000465
  • Acknowledgement Statement: The results published here are in whole or part based upon data generated by the Therapeutically Applicable Research to Generate Effective Treatments (TARGET) initiative, phs000218, managed by the NCI. The data used for this analysis are available through the GDC [https://gdc.cancer.gov/about-data/publications#/?groups=&years=&programs=TARGET&order=desc]. Information about TARGET can be found at https://www.cancer.gov/ccg/research/genome-sequencing/target/about.

Dataset Composition
DATA SUBJECT(S)
DATASET SNAPSHOT
DETAILS
  • Each experimental unit refers to a patient sample
  • Each patient (Case) can have multiple experimental units corresponding to for example different timepoints or tissue sources etc.

Above: Overall statistics of the open dataset.
Open Access
Size of Dataset 25.027 GB
Number of Cases 2,492
Number of Files 10,479
File Types TSV,MAF,TXT,IDAT,XLSX

Above: Overall statistics of the controlled dataset.
Controlled Access
Size of Dataset 105.216 TB
Number of Cases 2,465
Number of Files 41,424
File Types BEDPE,BAM,TSV,VCF,MAF,CEL,TAR
  • Modalities:
    • Simple Nucleotide Variation
    • Sequencing Reads
    • Combined Nucleotide Variation
    • Biospecimen
    • Clinical
    • Copy Number Variation
    • Transcriptome Profiling
    • DNA Methylation
    • Somatic Structural Variation
    • Structural Variation
  • Missingness:
    • Clinical data due to lack of access to records or curation, missing typically represented by blank, NA, Unknown entries etc
    • Not all genomics data is captured for each patient sample
    • Additionally, all genomic assays may have regions below LOD/background
  • Sampling:
    • N/A
  • Data anomalies/errors:
    • N/A
SUMMARY OF INSTANCES AND DATATYPES

Above: Number of instances per sample type and modality for open access data
Open Access Modalities
Instance Type Simple Nucleotide Variation Transcriptome Profiling DNA Methylation
Blood Derived Cancer - Bone Marrow, Post-treatment 0 25 0
Blood Derived Cancer - Peripheral Blood, Post-treatment 0 1 0
Blood Derived Normal 28 31 15
Bone Marrow Normal 634 394 303
Cell Lines 0 18 0
Fibroblasts from Bone Marrow Normal 0 0 27
Next Generation Cancer Model 0 50 0
Primary Blood Derived Cancer - Bone Marrow 518 1810 279
Primary Blood Derived Cancer - Peripheral Blood 145 347 42
Recurrent Blood Derived Cancer - Bone Marrow 125 389 101
Recurrent Blood Derived Cancer - Peripheral Blood 7 29 6
Above: Number of instances per sample type and modality for controlled access data
Controlled Access Modalities
Instance Type Sequencing Reads Simple Nucleotide Variation Somatic Structural Variation Structural Variation Transcriptome Profiling Combined Nucleotide Variation
Blood Derived Cancer - Bone Marrow, Post-treatment 27 5 2 25 25 0
Blood Derived Cancer - Peripheral Blood, Post-treatment 1 0 0 1 1 0
Blood Derived Normal 59 28 0 31 31 16
Bone Marrow Normal 1035 636 0 394 394 181
Cell Lines 18 0 0 18 18 0
Fibroblasts from Bone Marrow Normal 29 8 2 1 0 5
Next Generation Cancer Model 50 0 0 50 50 0
Primary Blood Derived Cancer - Bone Marrow 1957 635 2 1787 1787 171
Primary Blood Derived Cancer - Peripheral Blood 401 179 0 342 342 26
Recurrent Blood Derived Cancer - Bone Marrow 428 138 0 387 387 90
Recurrent Blood Derived Cancer - Peripheral Blood 34 9 0 29 29 5
  • Additional Notes: Not all these instances are used in a given analyses. See respective publications or other details in this document for information


Above: Number of instances per sample type and diagnosis
Instance Type Myeloid Leukemias Not Applicable
Bone Marrow NOS 3551 70
Derived Cell Line 18 0
Fibroblasts from Bone Marrow 36 0
Peripheral Blood NOS 530 0
Unknown 50 0

Above: Number of cases per clinical variable and diagnosis
Clinical Variable Myeloid Leukemias Not Applicable
race 2189 0
gender 2189 0
ethnicity 2189 0
vital_status 2189 0
age_is_obfuscated 2181 0
age_at_index 2151 0
state 2189 0
days_to_birth 2151 0
days_to_death 742 0
tissue_or_organ_of_origin 2189 0
age_at_diagnosis 2158 0
state 2189 0
morphology 2189 0
classification_of_tumor 2181 0
icd_10_code 2181 0
days_to_diagnosis 2181 0
last_known_disease_status 8 0
days_to_last_follow_up 7 0
primary_diagnosis 2189 0
year_of_diagnosis 2155 0
diagnosis_is_primary_disease 2181 0
site_of_resection_or_biopsy 2189 0
tumor_grade 8 0
progression_or_recurrence 8 0
sites_of_involvement 2189 0

Above: Number of instances per specific diagnosis and diagnosis
Primary Diagnosis Myeloid Leukemias
Acute myeloid leukemia, NOS 3932

Ethical, Legal and Social Issues (ELSI)
Inclusion: AML patient samples were obtained from the Children's Oncology Group (most from study AAML0531) and chosen for inclusion in the TARGET project based on the following criteria: bone marrow and peripheral blood blast counts of > 50% and availability of DNA and RNA for comprehensive molecular characterization of the genome, transcriptome and epigenome. The majority of cases in the project had 3 or fewer cytogenetic abnormalities and the patients achieved a remission following the standard two rounds of induction therapy, which allowed the post-induction "normal" tissue to be used as tumor comparator. However, there does exist subsets of cases within the AML dataset that express a greater number of cytogenetic abnormalities and/or failed induction.
Confidentiality: Usage of Open Access data has been deemed safe for general usage with minimal risk to patients. Restricted Access data could be used to potentially re-identify a given patient and so needs special requirements to access.
Considerations for external resources utilized: N/A
Human Subjects
REGULATORY
OTHER CONSIDERATIONS
  • REB/IRB approval for data collection?: Generally, for these studies all patients have provided informed consent but see dbGaP for more details
  • Sample were collected from patients in the following countries:
    • N/A
  • Known ELS issues introduced by preprocessing: N/A
  • Known problematic proxies: N/A
  • Was a data protection impact analysis performed?: N/A
Collected Sensitive Data Fields
Field Name Definition Distribution
gender Text designations that identify gender. Gender is described as the assemblage of properties that distinguish people on the basis of their societal roles. [Explanatory Comment 1: Identification of gender is based upon self-report and may come from a form, questionnaire, interview, etc.]
race An arbitrary classification of a taxonomic group that is a division of a species. It usually arises as a consequence of geographical isolation within a species and is characterized by shared heredity, physical attributes and behavior, and in the case of humans, by common history, nationality, or geographic distribution. The provided values are based on the categories defined by the U.S. Office of Management and Business and used by the U.S. Census Bureau.
ethnicity An individual's self-described social and cultural grouping, specifically whether an individual describes themselves as Hispanic or Latino. The provided values are based on the categories defined by the U.S. Office of Management and Business and used by the U.S. Census Bureau.
age_at_diagnosis Age at the time of diagnosis expressed in number of days since birth.
submitter_id All submitted IDs can potentially contain identifiable information

Intentionality of sensitive human attribute collection: N/A
LICENSING
  • IP / Terms of Use:
  • Third party intellectual property considerations:
    • N/A
  • Export controls or other regulations impacting dataset access/download or storage:
    • N/A

Provenance and Lineage
Description of the data collection methodology: N/A
Specific data collection devices:
  • N/A
Dataset Curation: N/A
Dataset Validation:
  • N/A
Data collection timeframe: N/A
Preprocessing steps/workflow:
  • Pre-processing: Generation and processing of raw data is described in detail in existing publications
  • Post-processing and derived features: Generation of processed data by the Genomic Data Commons is described here