Frequently Asked Questions
|
1.) What kind of datamining 'engines' are part of your approach? Do you allow the installation of alternative approaches. We currently have two data mining tools available from NCR: Knowledge Discovery Workbench (KDW) and Management Discovery Tool (MDT). KDW supports the complete process of developing analytic models and has algorithms available for data pre-processing, data transformation data visualization, model creation, model testing and output generation. The "core" algorithms for model creation in KDW 2.0 are as follows: C5.0 for decision tree-induction, Neural Networks with backpropogation for classification, Neural Networks with kohonen for clustering, two algorithms for rule induction, Apriori for affinity analysis and Linear regression. The architecture of KDW makes it very easy to add additional algorithms. MDT supports a business user in helping them understanding the "state of their business". MDT has intelligent agents which automate the discovery process for the business user. The discovery tasks that are automated in MDT are Change Analysis, Trend Analysis, Comparison Analysis and Summarization Analysis. In addition to MDT and KDW NCR has a strong partnership with SAS and we can bring to bear all of the algorithms available in SAS in a data warehouse environment. 2.) Can the datamining approaches be directly compared with more traditional statistical approaches? Yes. The comparison can happen at two levels: at the theoretical level where the mathematics in the "new" algorithms can be compared to the mathematics in the "traditional statistical" algorithms and at the practical level by comparing the performance of the models generated by the two algorithms. It should be noted that there are a large number of tasks for which the traditional statistical approaches are better than the "new" algorithms and most "new" algorithms at their core rely on one or more statistical approach. 3.) Can the number of dependent variables be chosen? Can I choose categorical as well as (or mixed with) numerical valiables? Any limitations on number of variables? Yes to the first three questions and theoretically there is no limit to the number of variables you choose. However large number of variables can limit the model from being easily understood and disseminated. Also all choices should be determined by a disciplined analysis of the business problem. The choices you make will determine which algorithm one can use in the model creation step. 4.) Does your system include the handling of periodicity? KDW includes functionality for transforming time-series features such that they can be used for model creation. KDW also includes functionality for dealing with missing values in a time-series. The Trend Analysis in MDT specifically deals with time-series features. 5.) Do you deal with, or report 'confidence values' on results? KDW provides confidence values. These confidence values should be interpreted keeping in mind the algorithm that was used during the model creation step. 6.) What kinds of outlier management and data cleansing capabilities do you provide? KDW has extensive capabilities here. KDW identifies missing values, provide reports on these missing values, and lets you set up business rules to deal with missing values. Some of the "new" algorithms (example, decision tree induction) are also much better in dealing with the presence of outliers and missing values without any data pre-processing. It should also be noted that a large portion of the data cleansing tasks should be done during the building of the data warehouse. In our experience that data cleansing task is never eliminated but it is reduced significantly when developing models in a data warehouse environment. 7.) How do you deal with auto-colinearity? Colinearity is an issue when one is using regression and to some extent when one is using neural networks. Usually this is not an issue when using tree or rule induction at least from a standard error perspective. In KDW you can use visualizations to detect variables that are linearly dependent. You can then derive a new variable or choose one of the linearly dependent variables as input into the model creation step. In MDT you can represent linear dependencies and correlations as Measure Relationships. These are then used by all of the analyses. 8.) How do I deal with 'prior knowledge' using your approach? MDT provides you a framework for explicitly representing your "prior knowledge" - this framework consists of Dimensions, Attributes, Segments, Measures (including formulae to compute measures) and Measure Relationships (directional relationships between measures). This prior knowledge is used by MDT to produce meaningful analyses for the business user. You can iteratively refine the knowledge represented in MDT. In KDW you can represent "prior knowledge" as derived features. 9.) Do you provide ways to visualize very complex results? Yes. KDW provides a suite of "active" visualizations including the traditional histograms, scatter plots, etc. These visualizations are "active" because you can interact with them to select data directly from the visualization. MDT provides a visualization metaphor called an InfoFrame. An InfoFrame combines natural language text, graphs and tables to provide meaningful analysis for a business user. MDT has intelligence to select the most appropriate graph for an analysis. |