Analysis of ISCN

An ISCN formula contains several items of cytogenetic information, and these items of information are of distinct types. Some of them are quite directly accessible, others need some calculation:

Start

An ISCN formula is passed as a string into the constructor of the karyotype class. Because analysis is time consuming, and not everytime a karyotype object is used it must be analysed, analysis is started when values are queried which need prior analysis.

Analysis is performed by a series of steps and is aborted when an error was encountered. In case of such an error, a description of the error will be placed into ErrorDescription property (variable: mstrError) of the object.

This parser is not a strict parser. It will accept some deviations from the standard ISCN.

Since the singular items in the formula are separated by a comma, the first step of analysis is splitting the ISCN formula into an array of its elements. Leading and trailing spaces are removed from each element.

Elements are then analysed in a defined series: first, some information is taken from the last elements, then chromosome count and sex chromosomes from the first elements, remaining elements are aberrations and analysed by the Aberration class. Afterwards, the karyotype object sums up the information from its aberration objects.

Last Elements

Some final information ("[cp]" or "inc") is placed at the end of the formula without the need of a separation from the previous item by a comma. Hence, analysis starts with the last element of the array.

The last element possible is "[cp]" denoting that the karyotype is composite; the number of metaphases may be included (results are stored in the variables mbooCompositeKaryotype and mnCloneSize, resp.).

An "inc" statement (meaning incomplete information; also "[inc]" is accepted) may come before the [cp] statement (the result is stored in the variable mbooIncompleteKaryotype).

Analysis is performed by comparing the last non-empty element of the array with an appropriate pattern (a Perl regular expression). In case of a match, the matched region is removed from the element.

Double Minutes

After the above analysis, the last non-empty element may describe Double Minutes. If it ends with "dmin", a double minutes object is instantiated with this element and analysed by the DoubleMinute class (see also: Analysis of Double Minutes).

It is stored in moDMin. The number of dmin is added to the "unclassified" section of the CKASvariable moCKAS.

Marker Chromosomes

Marker chromosomes may span a few elements before double minutes. If an element contains the pattern "mar" somewhere in it, it is assumed to describe a marker chromosome and a Marker object is instantiated with the element.

The marker object performs analysis of this marker chromosome (see also: Analysis of Marker Chromosomes). It is then added to the collection of marker chromosome mcMarkers.

If the last element did no more match the "mar" pattern, the total number of marker chromosomes is added to the "unclassified" section of the CKAS variable moCKAS.

Ring Shaped Marker Chromosomes

Ring shaped marker chromosomes may span a few elements before normal marker chromosomes.

An element is thought to describe such a ring chromosome, if it contains an "r" which may follow a combination of the characters "+-~x?" and numbers and spaces or which may be followed by a combination of the characters "c-~x?" and numbers and spaces. The element must not contain other characters.

A Ring object is instantiated with the element. It performs analysis of this ring shaped marker chromosome (see also: Analysis of Ring Shaped Marker Chromosomes). It is then added to the collection of ring shaped marker chromosomes mcRings.

If the last element did no more match the above pattern, the total number of ring shaped marker chromosomes is added to the "unclassified" section of the CKAS variable moCKAS.

First Elements

Only after the analysis of the "last elements", the first elements should be analysed: a karyotype like "42-48,XY inc [cp15]" is possible and the statements of incompleteness and composition would be found in the element with the sex chromosomes.

Chromosome Count

The chromosome count element is the first element in the array. It must not contains characters other than "-~<>?" and numbers and spaces. After removal of spaces, the element is matched to the pattern "^\?*(?<num>\d+)?([\-\~]\??(?<max>\d+))?(\<\??(?<ploidy>\d+)n\>)?$".

The minimum and maximum numbers of chromosomes are stored in mnChromosomeCount and mnChromosomeCountMax, the ploidy level in mnPloidyLevel. If no ploidy level was given in the chromosome count field, ploidy is calculated by dividing the arithmetic medium of chromosome count by 23.

Sex chromosomes

Non-aberrant sex chromosomes are to be listed in the second field. But since all sex chromosomes can be involved in structural aberrations, no sex chromosome field need be present.

The sex chromosome field may consist of the characters "XYc?" only. If the element contains any other character, it is treated as an aberration (see below).

Aberrations

After the analysis above, all remaining elements ought to be aberrations.

An element may contain two aberrations linked with "or". If the sequence "or" is found, the element is split into its two aberrations which are both marked questionable (i.e. with a leading question mark).

For an aberration, only digits, "normal" characters, and "+-x~();" are acceptable.

Each aberration is then passed into the constructor of the Aberration class. The aberration object then analyses this aberration (see Analysis of Aberrations). In case of derivative chromosomes (aberrations starting with "der" or "ider"), the fragment composition similiar to the ISCN detailed notation must be calculated (see Calculation of Derivative Chromosomes).

If the aberration was found valid, its aberration type is queried. If it is an "idem" aberration, it is stored in moIdemAberration, else it is added to the collection of aberrations mcAberrations.

In contrast to the ISCN manual, any arbitrary order of aberrations in the aberration section of the formula is accepted.

Missing break points

In some cases, an aberration must receive its breakpoints from an other aberration. E.g. in the karyotype "47,XY,t(9;22)(q34;q11),+der(22)t(9;22)" the aberration "+der(22)t(9;22)" is lacking the break points but can receive them from "t(9;22)(q34;q11)". In "48,XY,der(9)add(9)(p22)t(9;22)(q34;q11),der(22)t(9;22),+der(22)x2", the aberration "+der(22)x2" does not conform with the ISCN standard, but can receive the missing aberration from "der(22)t(9;22)".

The collection of aberrations is searched for non-expanded aberrations ("isExpanded" property). Such aberrations receive their break points from other aberrations of the karyotype by the help of string comparison. An expanded aberration replaces then the original aberration.

Summing Up

Break points, structural aberration, quantitative aberration etc. are determined at the level of each (ISCN) aberration in the aberration class. For calculating them at the level of the whole karyotype, the aberrations are placed in three further collections of aberrations: mcDerAberrations containing all aberrations of "der" or "ider" type, mcNonDerAberrations containing all other aberrations, and cCKASAberrations for the calculation of a complex karyotype aberration score.

Since with CKAS, descriptions of derivative chromosomes have to reveal their constituting aberrations, but duplicates of the aberrations must be recognised, the aberrations are added to the collection with the CombineMultiplicators parameter set to true.

The break points collection (type Bands) gets its content from mcAberrations. At this point, all break points can be checked for their existence on (human) chromosomes: non-existent break points - likely to be due to typos - are a common problem.

Fusions (also called "junctions" in the website of NCBI) and structural aberrations are also calculated from mcAberrations.

While quantitative aberrations can be easily summed up for non-der-aberrations, der-aberrations need balancing. E.g. with "46,XY,der(4)t(4;9)(p14;p21),der(9)t(4;9)(p14;p21)t(9;22)(q34;q11),der(22)t(9;22)(q34;q11)", all three translocations are balanced, but for each single derivative chromosome gains and losses were found.

If no error was encountered, the ISCN formula is rebuild from the elements and the karyotype is marked valid.