BioMedical Data Manifest

Aggregator


GDC Project ID: BEATAML1.0-COHORT

GDC Summary page not available
General Information
LINKS
DATA MANIFEST AUTHORS
  • Data Manifest aggregated from GDC
VERSION INFORMATION
KEYWORDS
  • Current Version: Data Release 42.0
  • DOI: N/A
  • Release Date: January 30, 2025
  • Last Updated: N/A
  • Hematopoietic and reticuloendothelial systems
  • Plasma Cell Tumors
  • Myelodysplastic Syndromes
  • Chronic Myeloproliferative Disorders
  • Myeloid Leukemias
  • Leukemias, NOS
EXTENSION MECHANISMS
  • Contact Dataset Owners/Publishers for ways to contribute
OWNER/PUBLISHER
CONTACT DETAILS
  • N/A
    Dataset Contacts:
    • Jeffrey W. Tyner, PhD. Oregon Health and Science University, Knight Cancer Institute, Portland, OR, USA.

Uses of Data
DATASET ORIGINAL USE
CONCERNS AND LIMITATIONS
  • N/A
  • Disease-Specific (Leukemia): Use of the data must be related to Leukemia.
CONFOUNDING FACTORS
DATASET KNOWN PUBLICATIONS AND BENCHMARKS
  • N/A
  • N/A
CITATION GUIDELINES
When using this dataset please cite:
  • Use of data from dbGaP: Please cite/reference the use of dbGaP data by including the dbGaP accession phs001657
  • Acknowledgement Statement: We thank all of our patients at all sites for donating precious time and tissue. DNA and RNA quality assessments, library creation, and short read sequencing assays were performed by the OHSU Massively Parallel Sequencing Shared Resource. Sheenu Sheela, Catherine Lai, Katherine Lindblad and Kary Oetjen assisted in study coordination at NIH. Barry Sawicki and Christina Cline assisted in study coordination at the University of Florida. S. Ravencroft assisted with patient sample shipping and data entry and K. Schorno provided project management and support of activities at the University of Kansas Cancer Center. Jack Taw assisted with patient sample shipping and Shyam Patel assisted with data entry at Stanford University.

Dataset Composition
DATA SUBJECT(S)
DATASET SNAPSHOT
DETAILS
  • Each experimental unit refers to a patient sample
  • Each patient (Case) can have multiple experimental units corresponding to for example different timepoints or tissue sources etc.

Above: Overall statistics of the open dataset.
Open Access
Size of Dataset 6.104 GB
Number of Cases 756
Number of Files 1,276
File Types TSV,MAF,MEX,HDF5

Above: Overall statistics of the controlled dataset.
Controlled Access
Size of Dataset 42.349 TB
Number of Cases 826
Number of Files 15,518
File Types BAM,TSV,BEDPE,VCF,MAF
  • Modalities:
    • Simple Nucleotide Variation
    • Sequencing Reads
    • Transcriptome Profiling
    • Structural Variation
  • Missingness:
    • Clinical data due to lack of access to records or curation, missing typically represented by blank, NA, Unknown entries etc
    • Not all genomics data is captured for each patient sample
    • Additionally, all genomic assays may have regions below LOD/background
  • Sampling:
    • N/A
  • Data anomalies/errors:
    • N/A
SUMMARY OF INSTANCES AND DATATYPES

Above: Number of instances per sample type and modality for open access data
Open Access Modalities
Instance Type Simple Nucleotide Variation Transcriptome Profiling
Blood Derived Cancer - Bone Marrow 2 3
Blood Derived Cancer - Peripheral Blood 1 12
Blood Derived Normal 0 21
Control Analyte 0 16
Primary Blood Derived Cancer - Bone Marrow 188 217
Primary Blood Derived Cancer - Peripheral Blood 72 131
Recurrent Blood Derived Cancer - Bone Marrow 157 149
Recurrent Blood Derived Cancer - Peripheral Blood 81 186
Solid Tissue Normal 444 0
Above: Number of instances per sample type and modality for controlled access data
Controlled Access Modalities
Instance Type Simple Nucleotide Variation Structural Variation Transcriptome Profiling
Blood Derived Cancer - Bone Marrow 4 3 3
Blood Derived Cancer - Peripheral Blood 13 12 12
Blood Derived Normal 0 21 21
Control Analyte 0 16 16
Primary Blood Derived Cancer - Bone Marrow 251 217 217
Primary Blood Derived Cancer - Peripheral Blood 153 131 131
Recurrent Blood Derived Cancer - Bone Marrow 242 149 149
Recurrent Blood Derived Cancer - Peripheral Blood 204 186 186
Solid Tissue Normal 444 0 0
  • Additional Notes: Not all these instances are used in a given analyses. See respective publications or other details in this document for information


Above: Number of instances per sample type and diagnosis
Instance Type Chronic Myeloproliferative Disorders Leukemias, NOS Myelodysplastic Syndromes Myeloid Leukemias Plasma Cell Tumors Unknown
Bone Marrow NOS 14 5 19 841 0 0
Peripheral Blood NOS 1 9 5 705 2 21
Solid Tissue 6 3 12 430 0 0
Unknown 0 0 0 0 0 16

Above: Number of cases per clinical variable and diagnosis
Clinical Variable Chronic Myeloproliferative Disorders Leukemias, NOS Myelodysplastic Syndromes Myeloid Leukemias Plasma Cell Tumors Unknown
race 9 8 15 772 1 21
gender 9 8 15 772 1 21
ethnicity 9 8 15 772 1 21
vital_status 9 8 15 772 1 21
state 9 8 15 772 1 21
tissue_or_organ_of_origin 9 8 15 772 1 21
age_at_diagnosis 8 8 14 748 1 0
state 9 8 15 772 1 21
morphology 9 8 15 772 1 21
eln_risk_classification 8 3 12 538 1 21
last_known_disease_status 9 8 15 772 1 21
primary_diagnosis 9 8 15 772 1 21
site_of_resection_or_biopsy 9 8 15 772 1 21
tumor_grade 9 8 15 772 1 21
progression_or_recurrence 9 8 15 772 1 21

Above: Number of instances per specific diagnosis and diagnosis
Primary Diagnosis Myeloid Leukemias Chronic Myeloproliferative Disorders Leukemias, NOS Myelodysplastic Syndromes Plasma Cell Tumors Unknown
Acute erythroid leukaemia 11 0 0 0 0 0
Acute megakaryoblastic leukaemia 4 0 0 0 0 0
Acute monoblastic and monocytic leukemia 45 0 0 0 0 0
Acute myeloid leukemia with inv(3)(q21q26.2) or t(3;3)(q21;q26.2); RPN1-EVI1 37 0 0 0 0 0
Acute myeloid leukemia with maturation 19 0 0 0 0 0
Acute myeloid leukemia with mutated CEBPA 114 0 0 0 0 0
Acute myeloid leukemia with mutated NPM1 511 0 0 0 0 0
Acute myeloid leukemia with myelodysplasia-related changes 436 0 0 0 0 0
Acute myeloid leukemia with t(6;9)(p23;q34); DEK-NUP214 8 0 0 0 0 0
Acute myeloid leukemia with t(8;21)(q22;q22); RUNX1-RUNX1T1 38 0 0 0 0 0
Acute myeloid leukemia with t(9;11)(p22;q23); MLLT3-MLL 32 0 0 0 0 0
Acute myeloid leukemia without maturation 18 0 0 0 0 0
Acute myeloid leukemia, CBF-beta/MYH11 103 0 0 0 0 0
Acute myeloid leukemia, NOS 269 0 0 0 0 0
Acute myeloid leukemia, minimal differentiation 34 0 0 0 0 0
Acute myelomonocytic leukemia 62 0 0 0 0 0
Acute promyelocytic leukaemia, PML-RAR-alpha 52 0 0 0 0 0
Atypical chronic myeloid leukemia, BCR/ABL negative 0 5 0 0 0 0
Blastic plasmacytoid dendritic cell neoplasm 1 0 0 0 0 0
Chronic myelomonocytic leukemia, NOS 0 8 0 0 0 0
Essential thrombocythemia 0 2 0 0 0 0
Mixed phenotype acute leukemia, B/myeloid, NOS 0 0 2 0 0 0
Mixed phenotype acute leukemia, T/myeloid, NOS 0 0 11 0 0 0
Myelodysplastic syndrome with isolated del (5q) 0 0 0 2 0 0
Myelodysplastic syndrome, unclassifiable 0 0 0 16 0 0
Myelodysplastic/myeloproliferative neoplasm, unclassifiable 0 1 0 0 0 0
Myeloid leukemia associated with Down Syndrome 3 0 0 0 0 0
Myeloid sarcoma 15 0 0 0 0 0
Plasma cell myeloma 0 0 0 0 2 0
Primary myelofibrosis 0 3 0 0 0 0
Refractory anemia with excess blasts 0 0 0 11 0 0
Refractory cytopenia with multilineage dysplasia 0 0 0 5 0 0
Therapy related myeloid neoplasm 140 0 0 0 0 0
Undifferentiated leukaemia 0 0 4 0 0 0
Unknown 24 2 0 2 0 37

Ethical, Legal and Social Issues (ELSI)
Inclusion: All patients with a presumed diagnosis of AML were eligible for specimen collection on this study (listed at clinicaltrails.gov - NCT01728402).
Confidentiality: Usage of Open Access data has been deemed safe for general usage with minimal risk to patients. Restricted Access data could be used to potentially re-identify a given patient and so needs special requirements to access.
Considerations for external resources utilized: N/A
Human Subjects
REGULATORY
OTHER CONSIDERATIONS
  • REB/IRB approval for data collection?: Generally, for these studies all patients have provided informed consent but see dbGaP for more details
  • Sample were collected from patients in the following countries:
    • N/A
  • Known ELS issues introduced by preprocessing: N/A
  • Known problematic proxies: N/A
  • Was a data protection impact analysis performed?: N/A
Collected Sensitive Data Fields
Field Name Definition Distribution
gender Text designations that identify gender. Gender is described as the assemblage of properties that distinguish people on the basis of their societal roles. [Explanatory Comment 1: Identification of gender is based upon self-report and may come from a form, questionnaire, interview, etc.]
race An arbitrary classification of a taxonomic group that is a division of a species. It usually arises as a consequence of geographical isolation within a species and is characterized by shared heredity, physical attributes and behavior, and in the case of humans, by common history, nationality, or geographic distribution. The provided values are based on the categories defined by the U.S. Office of Management and Business and used by the U.S. Census Bureau.
ethnicity An individual's self-described social and cultural grouping, specifically whether an individual describes themselves as Hispanic or Latino. The provided values are based on the categories defined by the U.S. Office of Management and Business and used by the U.S. Census Bureau.
age_at_diagnosis Age at the time of diagnosis expressed in number of days since birth.
submitter_id All submitted IDs can potentially contain identifiable information

Intentionality of sensitive human attribute collection: N/A
LICENSING
  • IP / Terms of Use:
  • Third party intellectual property considerations:
    • N/A
  • Export controls or other regulations impacting dataset access/download or storage:
    • N/A

Provenance and Lineage
Description of the data collection methodology: N/A
Specific data collection devices:
  • N/A
Dataset Curation: N/A
Dataset Validation:
  • N/A
Data collection timeframe: N/A
Preprocessing steps/workflow:
  • Pre-processing: Generation and processing of raw data is described in detail in existing publications
  • Post-processing and derived features: Generation of processed data by the Genomic Data Commons is described here