Information on Data Formats
Large sets of data can be examined for recurrent break points, recurrent gains and losses, statistical dependences between rearrangement, dependence of rearrangements from total karyotype complexity and other types of data mining. In order to do such an analysis, the data must be provided in a data file with a format which can be read by the Online Analysis programs. Three data formats were made available: the Mitelman DB format ("Mitelman"), a simple format for banding analysis ("custom ISCN"), and a simple format for CGH analysis ("custom CGH").
The Mitelman Database Format
Data downloaded from the Mitelman database show the following format:
The file has a header line describing the contents below. Data fields are separated by a tabulator.
The fields are:
- 1st field: Reference Number: identifier of a publication
- 2nd field: Case Number
- 3rd field: Investigation Number
- 4th field: Author, Year (publication data)
- 5th field: Journal Name (publication data)
- 6th field: Volume, Page (publication data)
- 7th field: Morphology
- 8th field: Topography
- 9th field: Short Karyotype (ISCN formula of all clones of the karyotypes).
Example:
Reference Number Case Number Investigation Number Author, Year Journal Name Volume, Page Morphology Topography Short Karyotype 3409 1 1 Abdi et al 1990 J Pakistan Med Ass 40:9-11 Acute lymphoblastic leukemia, FAB type L1 48,XY,+2,+8,t(13;22)(q?;q?) 3409 3 1 Abdi et al 1990 J Pakistan Med Ass 40:9-11 Acute lymphoblastic leukemia, FAB type L1 48,XY,-11,+3mar 3409 6 1 Abdi et al 1990 J Pakistan Med Ass 40:9-11 Acute lymphoblastic leukemia, FAB type L2 47,XX,+18 1139 1 1 Abe & Sandberg 1984 Cancer Genet Cytogenet 13:121-127 Acute lymphoblastic leukemia, NOS 46,XY,t(4;11)(q21;q23) 606 1 1 Abe et al 1979 Am J Hematol 6:259-266 Acute lymphoblastic leukemia, NOS 46,XY,del(5)(q12q23),del(9)(p21) 410 1 1 Abe et al 1982 Cancer Genet Cytogenet 7:185-195 Acute lymphoblastic leukemia, FAB type L3 46,XY,t(8;22)(q24;q12)/46,idem,+del(1)(p22),-22/46,idem,add(1)(q?),+del(1),-5 838 1 1 Abe et al 1983 Cancer Genet Cytogenet 9:139-144 Acute lymphoblastic leukemia, NOS 46,XX,del(11)(q13q23),ins(19;11)(p13;q13q23) 1162 1 1 Abe et al 1985 Cancer Genet Cytogenet 14:45-59 Acute lymphoblastic leukemia, FAB type L2 48-52,XX,+7,+11,+12,+13,+14,i(17)(q10),+20,+22 1162 1 2 Abe et al 1985 Cancer Genet Cytogenet 14:45-59 Acute lymphoblastic leukemia, FAB type L2 103,XXXX,+2,-4,+7,+7,+11,+12,+12,+13,+14,+16,i(17)(q10)x2,+20,+20,+22/53,XX,+X,+7,+11,+12,+13,i(17)(q10),+20,+22 1303 1 1 Abe et al 1985 Cancer Genet Cytogenet 18:49-54 Acute lymphoblastic leukemia, FAB type L2 46,XX,t(9;22)(q34;q11) 2398 1 1 Abe et al 1988 Cancer Genet Cytogenet 31:279-283 Acute lymphoblastic leukemia, NOS 46,XY,del(9)(p13p22),add(10)(p11),del(11)(q21q23) 5513 1 1 Abeliovich et al 1994 Cancer Genet Cytogenet 76:70-71 Acute lymphoblastic leukemia, FAB type L2 45,X,-Y/46,XY,t(9;22)(q34;q11) 1059 1 1 Abromowitch et al 1984 Br J Haematol 56:409-416 Acute lymphoblastic leukemia, FAB type L1 46,XY,t(1;19)(q23;p13),add(13)(q?) 1059 1 2 Abromowitch et al 1984 Br J Haematol 56:409-416 Acute lymphoblastic leukemia, FAB type L1 85,XXYY,-1,t(1;19),-2,-3,-4,del(4)(q23),-5,del(5)(p13),del(6)(q15),-7,+8,+8,+del(8)(p21),-9,-10,-12,dup(14)(q13q32)x2,-16,-17,-18,der(19)t(1;19),+20,+21,+21,-22,-22,+mar 4455 1 1 Abshire et al 1992 Leukemia 6:357-362 Acute lymphoblastic leukemia, NOS 44,X,-X,-20,t(20;22)(p?;q?),-22 4455 1 2 Abshire et al 1992 Leukemia 6:357-362 Acute lymphoblastic leukemia, NOS 44,X,-X,del(2)(q?),t(6;9)(q?;q?),-20,+t(20;22),-22 879 8 1 Aide et al 1981 Acta Acad Med Wuhan 1:7-15 Acute lymphoblastic leukemia, NOS 46,XY,t(9;22)(q34;q11)
The Custom ISCN Format
The custom ISCN format is meant for data from karyotype banding analysis. It is a very simple format, just an identifier for the case and the karyotype written in ISCN are required. For polyclonal karyotypes, clones are separated by a forward slash ("/"); "empty" clones are allowed.
You can choose if you provide a header line or not. Furthermore, data fields can be separated by either tabulator, pipe ("|"), or single blank (" "). Optionally, data can be surrounded by quotation marks (").
Examples:
When a header line is present, and the separator is set to tabulator:
Identifier Karyotype Case_1 "46,XX, t(9;22)(q34;q11)" Case_2 "45,XX,der(1)t(1;8)(p1?1;p?), der(8)t(8;15)(q1?1;q1?1),-15,der(17)(17pter->17q25::1p36->1p1?3::8q?21->8pter)" Case_3 46,X,-Y, +15 Case_4 "46,X,-Y,+15/47,XY,+6/ 46,XY,del(9)(q13-21)"
When no header line is present, and the separator is set to pipe:
Case_1|"46,XX, t(9;22)(q34;q11)" Case_2|"45,XX,der(1)t(1;8)(p1?1; p?),der(8)t(8;15)(q1?1;q1?1),-15,der(17)(17pter->17q25::1p36->1p1?3::8q?21->8pter)" Case_3|46,X,-Y,+15 Case_4|"46,X,-Y,+15 / 47,XY,+6/46,XY,del(9)(q13-21)"
When a header line is present, and the separator is set to blank
(Note: there must be absolutely no blanks in the identifier or the karyotype!):
Identifier Karyotype Case_1 "46,XX,t(9;22)(q34;q11)" Case_2 "45,XX,der(1)t(1;8)(p1?1;p?),der(8)t(8;15)(q1?1;q1?1),-15,der(17)(17pter->17q25::1p36->1p1?3::8q?21->8pter)" Case_3 46,X,-Y,+15 Case_4 "46,X,-Y,+15/47,XY,+6/46,XY,del(9)(q13-21)"
The Custom CGH Format
The custom CGH format is meant for data from comparative genome hybridisation (CGH) analysis. It is a very simple format, just an identifier for the case and the karyotype written CGH style are required.
You can choose if you provide a header line or not. Furthermore, data fields can be separated by either tabulator, pipe ("|"). Optionally, data can be surrounded by quotation marks (").
A single blank (" ") character as a separator cannot be accepted in this context because the description starts with "rev ish ..." which contains blanks.
Examples:
When a header line is present, and the separator is set to tabulator:
Identifier CGH_Karyotype Case_1 Rev ish enh(6p21p25),dim(2q35q37) Case_2 "Rev ish enh(1q22q31; 1q41qter; 6p; 15q25qter;19p13.2pter; 19q; 20q13.1qter)" Case_3 "Rev ish enh(2p12pter; 6p)" Case_4 Rev ish enh(1q, 4q27q35, 6p, 14q22q32.3, 17q22q25,19),dim(16q12.2qter)
When no header line is present, and the separator is set to pipe:
Case_1|Rev ish enh(6p21p25),dim(2q35q37) Case_2|"Rev ish enh(1q22q31; 1q41qter; 6p; 15q25qter; 19p13.2pter; 19q; 20q13.1qter)" Case_3|"Rev ish enh(2p12pter; 6p)" Case_4|Rev ish enh(1q, 4q27q35, 6p, 14q22q32.3, 17q22q25,19), dim(16q12.2qter)
Such data can be used for the analysis of:
- recurrent breakpoints, and recurrent gains and losses
- dependence between rearrangements
- dependence of rearrangements from karyotype complexity
Due to technical limitations (availablity of a database, time of calculation), these online programs can just provide some preview to the full analysis options of the CyDAS desk top application which you can download from the Download section.