UBC Theses and Dissertations
Supporting user interaction for the exploratory mining of constrained frequent sets Mah, Teresa
Data mining is known to some as "knowledge discovery from large databases". It is the technology of taking large quantities of data, and searching through the data looking for previously unknown patterns of information. One can see how this can be useful to entrepreneurs and researchers of all kinds. Retailers can apply data mining to find customer shopping patterns. On a grander scale, meteorologists can use the technology to identify telltale signs of extreme weather conditions, such as tornadoes or hurricanes. Unfortunately, albeit so useful, data mining has not yet broken out of its shell. There are two main reasons for this. The first reason is that the mining process is still slow, even with all the research done to optimize the algorithms. The second reason is that there has not been much work done on improving the user interaction aspect of the technology. Most of the systems created so far have resembled a black box. Input is entered in at one end of the black box, and output is received at the other end. There is no concept of human-centred exploration or control of the process, and no mechanism to specify focus in the database. The work described here provides a glimpse of a new exploratory mining framework that encourages exploration and control. In addition, this new framework is incorporated into the first fully functional prototype capable of constrained frequent set mining. A user of the prototype can specify focus by providing constraints on data to be mined, and can view frequent sets satisfying these constraints before relationships are found. The prototype also allows users to sort or format frequent set output, and to choose only interesting sets to find relationships on. Furthermore, frequent sets in our system can be mined between sets with different or similar domains, and users can choose other notions of relationship besides confidence. Combining this new exploratory mining paradigm with the faster, more efficient C A P algorithm, we have what we believe is the first in a new generation of fast and human-centred data mining systems.
Item Citations and Data