Data Mining

Data Mining tries to find patterns in the data which may be "hidden" or not really obvious.

Events (rearrangements) may occur more often in special combinations than expected, or less often (dependence between events). Some rearrangements may occur preferentially alone (early events), while others are encountered with complex karyotypes only ("Aberration Count Distribution"). Also, complex karyotypes arise along pathways of karyotype evolution. Finally, similar karyotypes can be clustered into groups (not yet implemented).

With data mining, every karyotype is regarded a case of its own. I.e. a patient with two investigations and five karyotypes is regarded as five cases. This point of view is especially useful when calculating the evolution pathway.

Mining Parameters

The Mining Parameters can be edited from the Main Window (menu Edit - Mining Parameters) for all subsequent mining tasks. If the parameters are to be edited for a single mining window, they can be edited there from the local Edit menu; "Edit - General Parameters" will open the edit form in the same way as from the main window, the "Edit - XY Parameters" opens the form at a specific tab, but the other tabs can be accessed too.

General Parameters

Input

Mining can use either ISCN (banding) data as an input or CGH data. If cases contain both types of data, only the selected type will be used. Cases which do not have the selected type of input data will be ignored.

Method

Standard elements mean textual descriptions of rearrangements as denoted in the ISCN or CGH notation. The data can be transformed into the SCCN or Cytoband notation prior to actual mining.

Resolution

If data are transformed, a banding resolution has to be selected. Since a resolution of 400 bands per haploid set (bphs) is quite rare in tumour cytogenetics, a lower resolution is advisable; the resolution of chromosomal arms is recommended.
Standard Elements cannot be standardized to a resolution.

Other Parameters

The meaning of other parameters is explained in the "edit" sections of the respective analysis, e.g. evolution tree, dependence network, event distribution.