Information on Data Formats

Deutsche Version

Large sets of data can be examined for recurrent break points, recurrent gains and losses, statistical dependences between rearrangement, dependence of rearrangements from total karyotype complexity and other types of data mining. In order to do such an analysis, the data must be provided in a data file with a format which can be read by the Online Analysis programs. Three data formats were made available: the Mitelman DB format ("Mitelman"), a simple format for banding analysis ("custom ISCN"), and a simple format for CGH analysis ("custom CGH").

The Mitelman Database Format

Data downloaded from the Mitelman database show the following format:

The file has a header line describing the contents below. Data fields are separated by a tabulator.

The fields are:

1st field: Reference Number: identifier of a publication
2nd field: Case Number
3rd field: Investigation Number
4th field: Author, Year (publication data)
5th field: Journal Name (publication data)
6th field: Volume, Page (publication data)
7th field: Morphology
8th field: Topography
9th field: Short Karyotype (ISCN formula of all clones of the karyotypes).

If no data are present for a field, that field must still be present but left empty, or 0 for number fields, though it recommended to have a case identifier when you want to make use of the information on karyotype errors displayed in the error file and adjust your data to correct data. Data must not be surrounded by quotes.

Example:

    Reference Number    Case Number     Investigation Number    Author, Year    Journal Name    Volume, Page    Morphology      Topography      Short Karyotype
3409    1       1       Abdi et al 1990 J Pakistan Med Ass      40:9-11 Acute lymphoblastic leukemia, FAB type L1               48,XY,+2,+8,t(13;22)(q?;q?)
3409    3       1       Abdi et al 1990 J Pakistan Med Ass      40:9-11 Acute lymphoblastic leukemia, FAB type L1               48,XY,-11,+3mar
3409    6       1       Abdi et al 1990 J Pakistan Med Ass      40:9-11 Acute lymphoblastic leukemia, FAB type L2               47,XX,+18
1139    1       1       Abe & Sandberg 1984     Cancer Genet Cytogenet  13:121-127      Acute lymphoblastic leukemia, NOS               46,XY,t(4;11)(q21;q23)
606     1       1       Abe et al 1979  Am J Hematol    6:259-266       Acute lymphoblastic leukemia, NOS               46,XY,del(5)(q12q23),del(9)(p21)
410     1       1       Abe et al 1982  Cancer Genet Cytogenet  7:185-195       Acute lymphoblastic leukemia, FAB type L3               46,XY,t(8;22)(q24;q12)/46,idem,+del(1)(p22),-22/46,idem,add(1)(q?),+del(1),-5
838     1       1       Abe et al 1983  Cancer Genet Cytogenet  9:139-144       Acute lymphoblastic leukemia, NOS               46,XX,del(11)(q13q23),ins(19;11)(p13;q13q23)
1162    1       1       Abe et al 1985  Cancer Genet Cytogenet  14:45-59        Acute lymphoblastic leukemia, FAB type L2               48-52,XX,+7,+11,+12,+13,+14,i(17)(q10),+20,+22
1162    1       2       Abe et al 1985  Cancer Genet Cytogenet  14:45-59        Acute lymphoblastic leukemia, FAB type L2               103,XXXX,+2,-4,+7,+7,+11,+12,+12,+13,+14,+16,i(17)(q10)x2,+20,+20,+22/53,XX,+X,+7,+11,+12,+13,i(17)(q10),+20,+22
1303    1       1       Abe et al 1985  Cancer Genet Cytogenet  18:49-54        Acute lymphoblastic leukemia, FAB type L2               46,XX,t(9;22)(q34;q11)
2398    1       1       Abe et al 1988  Cancer Genet Cytogenet  31:279-283      Acute lymphoblastic leukemia, NOS               46,XY,del(9)(p13p22),add(10)(p11),del(11)(q21q23)
5513    1       1       Abeliovich et al 1994   Cancer Genet Cytogenet  76:70-71        Acute lymphoblastic leukemia, FAB type L2               45,X,-Y/46,XY,t(9;22)(q34;q11)
1059    1       1       Abromowitch et al 1984  Br J Haematol   56:409-416      Acute lymphoblastic leukemia, FAB type L1               46,XY,t(1;19)(q23;p13),add(13)(q?)
1059    1       2       Abromowitch et al 1984  Br J Haematol   56:409-416      Acute lymphoblastic leukemia, FAB type L1               85,XXYY,-1,t(1;19),-2,-3,-4,del(4)(q23),-5,del(5)(p13),del(6)(q15),-7,+8,+8,+del(8)(p21),-9,-10,-12,dup(14)(q13q32)x2,-16,-17,-18,der(19)t(1;19),+20,+21,+21,-22,-22,+mar
4455    1       1       Abshire et al 1992      Leukemia        6:357-362       Acute lymphoblastic leukemia, NOS               44,X,-X,-20,t(20;22)(p?;q?),-22
4455    1       2       Abshire et al 1992      Leukemia        6:357-362       Acute lymphoblastic leukemia, NOS               44,X,-X,del(2)(q?),t(6;9)(q?;q?),-20,+t(20;22),-22
879     8       1       Aide et al 1981 Acta Acad Med Wuhan     1:7-15  Acute lymphoblastic leukemia, NOS               46,XY,t(9;22)(q34;q11)

The Custom ISCN Format

The custom ISCN format is meant for data from karyotype banding analysis. It is a very simple format, just an identifier for the case and the karyotype written in ISCN are required. For polyclonal karyotypes, clones are separated by a forward slash ("/"); "empty" clones are allowed.

You can choose if you provide a header line or not. Furthermore, data fields can be separated by either tabulator, pipe ("|"), or single blank (" "). Optionally, data can be surrounded by quotation marks (").

Examples:
When a header line is present, and the separator is set to tabulator:

Identifier    Karyotype
Case_1    "46,XX, t(9;22)(q34;q11)"
Case_2    "45,XX,der(1)t(1;8)(p1?1;p?), der(8)t(8;15)(q1?1;q1?1),-15,der(17)(17pter->17q25::1p36->1p1?3::8q?21->8pter)"
Case_3    46,X,-Y, +15
Case_4    "46,X,-Y,+15/47,XY,+6/ 46,XY,del(9)(q13-21)"

When no header line is present, and the separator is set to pipe:

Case_1|"46,XX, t(9;22)(q34;q11)"
Case_2|"45,XX,der(1)t(1;8)(p1?1; p?),der(8)t(8;15)(q1?1;q1?1),-15,der(17)(17pter->17q25::1p36->1p1?3::8q?21->8pter)"
Case_3|46,X,-Y,+15
Case_4|"46,X,-Y,+15 / 47,XY,+6/46,XY,del(9)(q13-21)"

When a header line is present, and the separator is set to blank
(Note: there must be absolutely no blanks in the identifier or the karyotype!):

Identifier    Karyotype
Case_1 "46,XX,t(9;22)(q34;q11)"
Case_2 "45,XX,der(1)t(1;8)(p1?1;p?),der(8)t(8;15)(q1?1;q1?1),-15,der(17)(17pter->17q25::1p36->1p1?3::8q?21->8pter)"
Case_3 46,X,-Y,+15
Case_4 "46,X,-Y,+15/47,XY,+6/46,XY,del(9)(q13-21)"

The Custom CGH Format

The custom CGH format is meant for data from comparative genome hybridisation (CGH) analysis. It is a very simple format, just an identifier for the case and the karyotype written CGH style are required.

You can choose if you provide a header line or not. Furthermore, data fields can be separated by either tabulator, pipe ("|"). Optionally, data can be surrounded by quotation marks (").

A single blank (" ") character as a separator cannot be accepted in this context because the description starts with "rev ish ..." which contains blanks.

Examples:
When a header line is present, and the separator is set to tabulator:

Identifier    CGH_Karyotype
Case_1    Rev ish enh(6p21p25),dim(2q35q37)
Case_2    "Rev ish enh(1q22q31; 1q41qter; 6p; 15q25qter;19p13.2pter; 19q; 20q13.1qter)"
Case_3    "Rev ish enh(2p12pter; 6p)"
Case_4    Rev ish enh(1q, 4q27q35, 6p, 14q22q32.3, 17q22q25,19),dim(16q12.2qter)

When no header line is present, and the separator is set to pipe:

Case_1|Rev ish enh(6p21p25),dim(2q35q37)
Case_2|"Rev ish enh(1q22q31; 1q41qter; 6p; 15q25qter; 19p13.2pter; 19q; 20q13.1qter)"
Case_3|"Rev ish enh(2p12pter; 6p)"
Case_4|Rev ish enh(1q, 4q27q35, 6p, 14q22q32.3, 17q22q25,19), dim(16q12.2qter)

Such data can be used for the analysis of:

Due to technical limitations (availablity of a database, time of calculation), these online programs can just provide some preview to the full analysis options of the CyDAS desk top application which you can download from the Download section.