Notes for the Developer

The present version of the CyDAS software package was developed during November 2003 to August 2005.Due to a lack of further funding, there will be no future development by the CyDAS team.

Please feel free to download all materials available and continue the work for the benefit of cytogenetics.

Here, you find some hints on general problems and on ideas for future development.

General Problems

Running CyDAS from Visual Studio may sometimes cause a few minor problems:

Registry

During compilation of the package, an error stating that registering the ISCNAnalyser.dll (or the CyDASGeneralControls.dll) failed is very common. It is just a tedious error message which generally can be ignored.
You can try to remove it by closing Visual Studio, executing RegClean.exe, and thereafter restarting your computer; this procedure is not guaranteed to work.

Resources

Another problem is based on the fact that CyDAS.exe and ISCNAnalyser.dll need access to the non-executable files CyDAS.ini (and the import filter definition files for the import filters denoted there) and Chromosomes.xml, resp. They are meant to be stored in the same directory as the executables.

But when running CyDAS from Visual Studio, CyDAS.exe and ISCNAnalyser.dll - but not the above mentioned non-executable files - are copied into the bin directory of the CyDAS project directory, which resides in the CyDAS project folder directory, which resides in the Visual Studio Projects directory, ... On a web server with aspx pages, the binaries - but not the non-executable files - are copied into a fresh temporary directory for each session.

Hence, querying the location of the executing assembly (System.Reflection.Assembly.GetExecutingAssembly.Location) does not work in such cases.

The registry is not useful: whenever the dll is compiled, the old information is removed from the registry and the newly build dll is registered - with the paths of the build directory. Also on a web server, registering a dll is generally not allowed to web site owners.

Embedded resources cannot be applied: the .ini file must be editable by the end user. And if the end user wants to analyse chromosomes of a different species, he must be able to replace the Chromosome.xml file without recompiling the application.

To circumvent those problems, two variables were introduced into the ChromosomeData class of the ISCNAnalyser. gbUseDefaultLocation and gbWebServer (around line 50). In the private function LoadResources() these variables are queried and thus the CyDAS installation directory determined (hard-coded, just switching dependent on the cases); if you decide to use a different location for those files, adjust the paths there.

Internationalization

The CyDAS package was originally intended for scientific use only, and we expected scientists to have some command of English language. Hence, all the program output was in English only.

Later, we decided to add internationalisation to the project. Upto now, localised code was integrated only for the most important messages of the CyDAS package, i.e. the error message from the ISCNAnalyser, and some other features used with the internet application of CyDAS. The CyDAS desktop application is still in English only.

The other language package we added is German. Please feel free to add Spanish, French, Chinese, and many other versions, and to expand the localisation of many more parts of CyDAS.

Data Mining

The interface preparing the data for data mining is quite simple (see interface IMinable). The essentially two-dimensional table with boolean values (plus arrays with identifiers of cases and events) can be used for further analysis.

Further Aspects

A few aspects of data mining were integrated into the CyDAS package - but the actually most important features are missing: what do which aberrations / combinations of aberration mean for the outcome (life expectancy, survival, treatment aspects, costs of treatment,...)? This lack is due to a lack of data: we do not have such data. But when you do, add that functionality!

Coping with Preselection

Often, patients may be preselected for certain cytogenetic findings, e.g. a Philadelphia positive ALL. Consequently, that mutation(s) present in any patient (but not necessarily in all his/her clones) may distort the findings in data mining. It could be heplful to add some facility which allows the user to "ignore" some user specified aberration during data mining.

Some functionality for removing a selected event from the Miner is already present in the private function Miner.calculateDependentNodes(). The respective lines of code could be integrated into a function of its own, including an interface with better use from outside the class (e.g. event name instead of internal event number; array of events to be removed).

Sex Chromosomes

Rearrangements in sex chromosomes are dealt with in the same way as mutations in autosomes. But there is only one X chromosome in males while there are two X chromosomes in females, and furthermore the Y chromosome is present in males only. I.e. the number of each sex chromosome which can be affected by some rearrangement is lower than the number of each autosome which are present at 2 chromosomes each in a healthy subject. No measures were taken to adjust for these differring numbers.

As with the Preselection issue (see above), ignoring rearrangements in sex chromosomes could be a simply usable way to avoid distortions.

Clinical Classification

Especially in leucaemias, cytogenetic findings are very important for treatment and estimation of outcome. A small module for classifying patients according to their findings was integrated into CyDAS.

The user interface of that facility is not very nice. The way to describe a classification involves the generating of an SQL query which can be handled by a visual editor for easier queries only, in complicated cases the SQL query must eb generated manually. Here, some development would be desired.

Furthermore, the above classification is just the input of a classification system which was generated somewhere else. With the availability of patient data (instead of cytogenetic data only), CyDAS should be extended to generate such classification systems.

Web Service

CyDAS was programmed with Visual Basic .NET and thus requires a Windows platform. Most computers used in the cytogenetic institutes run with windows, so this does not seem to be a big problem. But major databases and programs run on other platformes, especially UNIX types which cannot be used for CyDAS.

A solution for this incompatibility is the creation of a Windows Web Service. Programs running on other platforms could then send a request to the web service and work with the answer received. SOAP with HTTP and XML is quite a standard way for such webservices and communication with webservices. We wanted to create such a webservice, but funding was cut off before we could start.

Such a web service would be very helpful in greater projects like e.g. the cancer genome project or the cancer genome anatomy project, or others. Hence, this should be a high priority task.

Other Aspects

Remaining Types of Aberrations

Almost all types of aberrations described in the ISCN manual were integrated into the ISCN analyser dll.

Chromosomal fision was not integrated at all, and quadruplication is missing in derivative chromosomes. These are minor items to be added to the Aberration and Chromosome class.

Also, the "normal variable chromosome features" chapter of the ISCN manual (chapter 7, pp. 44-45) was totally omitted. These features contain descriptions like "16qh+" or "14pstkstk" or similar. As can be seen from the examples, these features use a slightly different grammar than do rearrangements, hence integratng them into the aberration class would mean vast additions to the upper section of the analyse function. Alternatively, the karyotype class could try to distinguish between such features and proper aberrations, and then use a specialised class for the features. In the Mitelman database, they were not encountered at all, hence this is deemed a very low priority issue.

Recombination

Rearranged chromosomes in the germ line give raise to so called "recombinant chromosomes". Recombinant chromosomes are described in chapter 4.5 (pp. 36-38) of the ISCN manual, but extremely badly. With so little information available on their specialised nomenclature, no approach was undertaken to integrate the "rec" symbol into the Aberration class.

But the relevance of recombinant chromosomes should not be underestimated. Some user interface allowing a user to start with a (typically balanced) rearrangment, display its derivative chromosomes, and then to look for the possible different outcomes of meiotic crossing-over, could be a very useful tool in non-tumour cytogenetics. When describing the outcomes in the ISCN detailed nomenclature as derivative chromosomes, the "rec" symbol would not be required.

Banding Resolution

Originally, data regarding the chromosomal bands were manually extracted from the ISCN manual (pp. 14-21) - using a ruler and lots of patience. The length of bands was translated with 1mm in the original ideogram corresponding to 10 units (pixels) in the data file.

Because that was a lengthy tedious work close to the limits of the author's psychical incompetence, all data (name, length, shading) were entered for 400 and 550 bphs levels only, and name only for 800 bphs level. Thus all data could be analysed for break points, gains, losses, etc., but ideograms could be drawn for low resolution only.

Much later, a file was found found on the NCBI web server which contains ideogram data for the 800 bphs level. It is very well hidden at ftp://ftp.ncbi.nih.gov/genomes/H_sapiens/maps/mapview/BUILD.35.1/ideogram.gz. 10 length units of that data were translated into 1 pixel which is quite consistent with the previous data (chromosome 1 is about 1600 pixels long in both systems).

But the new data had to be integrated into the old structure: there is only one field for the length of a band, and one field for its shading. Generally, this does not cause major problems. But while a band at the 400 bphs level which splits into three bands at 550 bphs level is as long (in pixels) as the sum of the latter three bands in the ideograms of the ISCN manual, this feature does not apply for the 800 bphs level. Hence, if a band at the 400 or 550 bphs level splits into three bands at 800 bphs, the length sum of the latter three bands need not equal the previous size. But more important, there are bands which do not split into more bands at the 800 bphs level, e.g. 1p22.2 at 550 bphs remains 1p22.2 at 800 bphs, and 1p33 at 400 bphs remains 1p33 at 800 bphs. Here, the value for the 800 bphs resolution was entered, and thus some distortion was introduced into the proportions of the chromosomal bands. Also, the shading parameter of the 800 bphs resolution was entered, giving raise to a few bands in the ideograms of 400 or 550 bphs which are shown with their 800 bphs shading.

The styles for drawing satellites and heterochromatic regions were taken from the 400 and 550 bphs ideograms - also when drawing an 800 bphs ideogram.

To resolve these features, a different data structure would be required, and some different access to the drawing values in the CyDASGraphics class, especially in CyDASGraphics.drawChromosomeBandsAsMap(). Since a Band object does not provide information on its actual resolution (e.g. 1p33 could be at 800 bphs, 550 bphs, 400 bphs, 2 digits), some further changes could be required to obtain the relevant values.

Data on base pair positions of the bands were taken from the data file specified above. There are no functions yet to retrieve them from the data file and make them usable for further purposes (e.g. more modern analysis technics yielding high resolution data, as mentioned in the ISCN discussion page).

Ideograms for Ring Chromosomes

In ideograms, ring chromosomes are shown linearised. Actually, it is not a big issue to draw them as ring chromosomes:
the total length of a derivative chromosome can easily be calculated (an extra function in the Chromosome class, e.g. Chromosome.getPaintHeight as Integer, internally similar to (but simpler than) Chromosome.getCentromerePosition()); next, the band positions would have to be transformed into circular coordinates, arcs are to be used instead of straight lines.
A little bit more complicated is the transformation of the image map data, since there are no arc elements for an image map. Hence, the regions could be translated into polygons with edges every few degrees (maybe write a class for such functions).

Calculating Derivative Chromsomes

Duplicated Centromeres

Dealing with the duplication (triplication,...) of centromeric regions is a terrible issue because of bugs in the ISCN (see also several chapters in "ISCN Discussion", especially "Differentiating Homologous Chromosomes").

The Chromosome class uses internally the variable "mbHasDuplicatedCentromere" to indicate that by a duplication with the duplicate() function or by a triplication with the triplicate() function a centromere was duplicated. When the ISCN detailed formula is rebuild, each chromosome number will be accepted once only (contradicting the ISCN manual).

But when a new aberration is initialised with the ISCN detailed description of such a chromosome, that information is lost. Here, it is advisable that CyDAS already fully worked with the style of representing homologous chromosomes in the ISCN Discussion chapter mentioned above. This would require changes throughout the ISCNAnalyser dll.

Band Identification in Derivative Chromosomes

In a derivative chromosome, there can be several bands with the same band designation, e.g. after a duplication. If the band name only is available, there is no way indicating if the first, second, third,... occurence of the band was meant.

In the karyograms, the user can select where to introduce an aberration and by him / her clicking on a specific band it could be identified more closely. But upto now, the interface does not differentiate between the occurences, it just yields a band object.

A solution would be to add something like a "BelongsToFragmentNumber" property to the band object which simply denotes in which fragment the band was found (there cannot be two bands with the same name in one fragment). The rearrangement functions of the Chromosome class could then be adjusted to look for that property first when determining the position of the band.