Listen
to what veterans, experts, gurus have to say...
"...
because the interface was designed to address problems where you know
the target and you want the database to quickly retrieve the result. If
you don't have an exact description of the target, you're lost with a
database today. This is why data mining is seeing a lot of demand."
(on problem of data mining tools)
"There are many tools available from companies such as SAS or IBM, but in order to use them properly, you had better be an expert, preferably a Ph.D. in the area of data mining or statistics. If you're not, you just bought a bunch of shelfware. ...."For most users, data mining tools offer the wrong interface. You need data mining solutions. If you have a large staff of experts who know data mining very well, data mining tools will do the job,However, this department of experts is now acting as the interface between the tools and the ultimate user."
This position paper takes the point view of two groups of data
mining users:business persons(BP) and scientific persons(SP). The
author hypothesizes that two groups of users have different focus and
style in data mining process thus should be analyzed differently.
BP focus on model evaluation and
deployment while SP on exploring,visualizing and modeling. While BP are
keen in using the model to predict to guide business decision,SP seeks
to "look inside the box" to discover new knowledge. As the result, BP
often use predicative model while SP prefers generative model.
General purpose data mining software environment. Next generation of
interative user centered data exploration tool.
We think of data mining as the process of identifying valid, novel,
potentially useful, and ultimately comprehensible understandable
patterns or models in data to make crucial business decisions. “Valid”
means that the patterns hold in general, “novel” that we did not know
the pattern beforehand, and “understandable” means that we can
interpret and comprehend the patterns. Hence, like statistics, data
mining is not only modelling and prediction, nor a product that can be
bought, but a whole problem solving cycle/process that must be mastered
through team effort.
Heikki Mannila comments on Theoretical Frameworks for Data Mining
(full text: Theoretical
Frameworks for data mining)
In this article, the author tried to lay out possible theoretical
frameworks for data mining. Simple approaches to adapt or simply borrow
from statistics or machine learning are rejected, mostly because their
theoretical framework doesn't apply to data mining practice directly.
In particular, the author points out several specail aspect of data
mining: database integration, simplicity of use and understandability of
the result.
Four more approaches proposed:
(1)probalistic approach, however it lacks the ways for taking the iterative and interactive nature of the data mining process into account
(2)data compression approach: view data mining as knowledge discovery, i.e. find fewer bit representation of information
(3)microeconomic view: find decicison x which maximize utility function f(x)
(4)inducative database: "THERE IS NO SUCH THINGS AS KNOWLEDGE DISCOVERY,IT'S ALL IN THE POWER OF QUERY LANGUAGE"
The author's favorite approach is to combine (3) and (4)