Class MiningInputData

The MiningInputData class defines a flat data structure which is used for later data mining analyses. Basically, it
consists of three items: a list of the case names, a list of the event names, and a boolean matrix indicating which case comes with which events. There is almost no business logic packed into this class.

Overview

MiningInputData
New()
New(NumberOfCases: Integer, NumberOfEvents: Integer, Description: String)
CaseCount() As Integer {read-only}
CasesEvents() As Boolean(,)
CaseNames() As String()
Description() As String
EventCount() As Integer {read-only}
EventNames() As String()
addCase(EventList: String, CaseName: String): void
addCase(EventsInCase: String(), CaseName: String): void
checkConsistence(): void

Programming Language

  • Microsoft Visual Basic .Net.

Availability

Constructors

Public Sub New()

Instantiates a new MiningInputData object.

Its internal arrays are empty. Cases are then added by the addCasefunctions. Since adding a case requires here an enlargement of the arrays for each case (which is very time consuming), the use of the other constructor should be preferred.

Alternatively, the lists with case and event names and the case-event-matrix can be set using the respective properties.

Public Sub New(ByVal NumberOfCases As Integer, ByVal NumberOfEvents As Integer, ByVal Description As String)

Instantiates a new MiningInputData object for the given number of cases and events with the given description of the mining data.

Parameters

  • NumberOfCases: the expected number of cases.
  • NumberOfEvents: the expected number of events.
  • Description: a free text field for the description of the mining data.

Remarks

  • The internal arrays are set up to take up the given amount of events and cases.
  • Cases can be added by the addCase functions or by directly setting the lists with case and event names and the case-event-matrix using the respective properties.
  • Cases without events and events without a case are removed later by the miner.

Examples

  • To prepare the MiningInputData for your WebShops 1234 shopping baskets and 200 articles, do:

  • Dim nShoppingBaskets As Integer = 1234
    Dim nArticles As Integer = 200
    Dim strDescription As String = "WebShop"
    Dim oMiningData as MiningInputData = New MiningInputData(nShoppingBaskets, nArticles, strDescription)

Interfaces

This class does not implement an interface.

Enumerations

The class does not provide enumerations.

Properties

Public ReadOnly Property CaseCount() As Integer

Returns the number of cases.

Property Value

  • the number of cases.

Remarks

  • This property looks at the case names array and returns its capacity. It does not check if the cases really do exist (they might still be empty place holders for cases to be added later).
  • The value cannot be set directly.

Public Property CasesEvents() As Boolean(,)

Gets or sets the case-event-matrix.

Property Value

  • the case-event-matrix (a two-dimensional boolean array).

Remarks

  • The case-event-matrix stores information on which cases contains which events. The first dimension defines the case, the second dimension the event, i.e. "CasesEvents(CaseNumber, EventNumber)" stores if case #CaseNumber contains event #EventNumber. The index is 1-based.
  • Only the presence or absence of an event is queried, not its quantity. If the quantity is of interest, it is the events that have to be redefined: one times event A is an event distinct from two times event A, and so on, such leading to a greater number of events which in turn become useful for the boolean matrix.
  • No safeguards are enforced that the case names and event names fit to this matrix. Differring amounts of data may lead to unexpected results. However, consistence should be checked with the checkConsistence function.

Public Property CaseNames() As String()

Gets or sets the case names.

Property Value

  • the case names (a one-dimensional string array).

Remarks

  • The CaseNames array contains the names of the cases. Case names should be unique, but no safeguards are taken to enforce unique names. Non-unique names may lead to unexpected results.
  • The index is 1-based.
  • No safeguards are enforced that the case names fit to the case-event-matrix. Differring amounts of data may lead to unexpected results. However, consistence should be checked with the checkConsistence function.

Public Property Description() As String

Gets or sets a short description of the data.

Property Value

  • a description of the data.

Remarks

  • This property is not further used by the mining functions. It can be used for providing information on reports for the end-user etc.

Public ReadOnly Property EventCount() As Integer

Returns the number of events.

Property Value

  • the number of events.

Remarks

  • This property looks at the event names array and returns its capacity. It does not check if the events really do exist (they might still be empty place holders for events to be added later).
  • The value cannot be set directly.

Public Property EventNames() As String()

Gets or sets the event names.

Property Value

  • the event names (a one-dimensional string array).

Remarks

  • The EventNames array contains the names of the events. Event names should be unique, but no safeguards are taken to enforce unique names. Non-unique names may lead to unexpected results.
  • The index is 1-based.
  • No safeguards are enforced that the event names fit to the case-event-matrix. Differring amounts of data may lead to unexpected results. However, consistence should be checked with the checkConsistence function.

Methods

Public Sub addCase(ByVal EventList As String, ByVal CaseName As String)

Adds a new case to the mining data.

Parameters

  • EventList: a comma separated list of events which were encountered in this case.
  • CaseName: the name (identifier) of the case.

Remarks

  • The function transforms the EventList parameter into an array and passes through to the other addCase function.

Public Sub addCase(ByRef EventsInCase As String(), ByVal CaseName As String)

Adds a new case to the mining data.

Parameters

  • EventsInCase: a reference to an array with the names of the events which were encountered in this case.
  • CaseName: the name (identifier) of the case.

Remarks

  • For each event name in EventsInCase, the respective position in the EventNames array is determined and then value at the respective position in the case-event-matrix set to true.
  • The capacity of the arrays may be re-adjusted if more cases than expected are added or when new events are encountered. This is a very time consuming process and thus should be avoided.

Public Sub checkConsistence()

Checks if the number of data in the EventNames array, the CaseNames array, and the case-event-matrix are consistent with each other.

Exception

  • Exception, if an inconsistence in the amount of data was encountered.

Remarks

  • There is no return value. Instead, in case of problems, the function throws an exception, since such data should not be used for further analysis.
  • No check for duplicate case names or duplicate event names exists yet, but may be introduced later into this function.

Interaction with other classes

Classes using MiningInputData

The MiningInputData class is the primary entrance point into the Mining dll. It is passed as a parameter into the constructor of the Miner object.

A MiningInputData object should be the output of some appropriate functions of the application into which the Mining class is embedded.

Classes used by MiningInputData

The MiningInputData class uses quite basic data types only to ensure a wide usability.