@(Pattern Discovery in Data Mining)
The intuition to set hierarchical min_sup: Level-reduced min-support (Items at the lower level are expected to have lower support)
Efficient mining: Shared multi-level mining (Use the lowest min-support to pass down the set of candidates)
Redundancy Filtering at Mining Multi-Level Associations:
* Multi-level association mining may generate many redundant rules
* Redundancy filtering: Some rules may be redundant due to “ancestor” relationships between items
* (Suppose the 2% milk sold is about 1?4 of milk sold in gallons)
1. milk
2. 2% milk
Customized Min-Supports for Different Kinds of Items
* We have used the same min-support threshold for all the items or item sets to be mined in each association mining
* In reality, some items (e.g., diamond, watch, …) are valuable but less frequent
* It is necessary to have customized min-support settings for different kinds of items
* One Method: Use group-based “individualized” min-support
* E.g., {diamond, watch}: 0.05%; {bread, milk}: 5%; …
* How to mine such rules efficiently?
* Existing scalable mining algorithms can be easily extended to cover such cases
Single-dimensional rules (e.g., items are all in “product” dimension)
Multi-dimensional rules (i.e., items in
Rare Pattern vs. Negative Pattern
Defining Negative Correlated Patterns
Support-based definition
Kulczynski measure-based difinision
Exercise
Given a table of patterns and their supports:
Why mining compressed patterns? Since there are too many scattered patterns but not so meaningful.
We can find that P1 and P2 are similar both in item-sets and support, and so do P1 and P5 with similar item-sets. But how to compressed those similar patterns?
We can also analyze about it that:
* Closed patterns
* P1, P2, P3, P4, P5(all have no identical supports)
* Emphasizes too much on support
* There is no compression
* Max-patterns
* P3: information loss
* Desired output (a good balance):
* P2, P3, P4
So we can define some compressing method
pattern distance measure
Redundancy-Aware Top-k Patterns
原文:http://blog.csdn.net/rk2900/article/details/43878111