Exploratory Data Analysis

Graphical excellence is that which gives to the viewer the greatest number of ideas in the shortest time with the least ink in the smallest space.

-- Edward R. Tufte

Developing and using Know-How. 

Developing “Know-How” is a process of accumulating experience and insight around a given subject matter.  It involves trial and error, experimentation, exploration and some form of evaluation or review.  Progressively, the accumulated knowledge becomes available to the expert in a way that can be shared, redacted and conveyed to someone else. 

When people convey ideas, it is often through the use of diagrams, flowcharts, and other visualizations.  It is a natural way of describing relationships quickly and efficiently.

 

Anomalies, tangents, and random thought association

New insights often begin as puzzle pieces that don’t fit.  Capturing and pursuing random thoughts, associations, and anomalies can assist in redefining problems and reorganizing ideas that may form paradigm shifts. 

 

Exploring patterns

Often summary statistics are confused with real understanding of the data. Let's look at one extreme example.

Pathological pattern discovery problems can be easily invented by creating datasets as a mixture of randomly points from a small number of distributions.  Many of these synthetic sets have striking visualizations when we either limit the creation or project onto two or three dimensions.  These datasets serve to benchmark the robustness of pattern recognition algorithms to outliers and different assumptions regarding the number of clusters and cluster shape.  For real world data where the number of dimensions is large, it is important to visualize in two or three dimensions different projections of the data to explore if any pathologies may exist.

These four data sets have identical mean μ and covariance Σ. Clearly, the ability to create informative clusters depends on the choice of algorithm and parameters. From: Richard O. Duda, Peter E. Hart, and David G. Stork, Pattern Classification. Copyright © 2001 by John Wiley & Sons, Inc.