CSCE 566/CMPS 499 Syllabus by Chapter

Spring 2018

Chapter 1: Top 10 Algorithms in Data Mining

Chapter 2: Decision Tree Construction

Chapter 3: Association Analysis

Chapter 4: Clustering

Chapter 5: Rule Induction, kNN and GA (3 Lectures)

Chapter 6: Bayesian Methods

March 15: Mid-Term Exam

Chapter 7: Dealing with Noise and Real-Valued Attributes (3 Lectures)

Chapter 8: Data Mining from Very Large Databases

Chapter 9: Clustering with EM

CSCE 566 Students: Paper Presentations and Essay Writing

Chapter 10: Web Search with PageRank

Chapter 11: Support Vector Machines

  1. Linear classifiers: the optimal separating hyperplane
  2. Nonlinear decision boundary: Map data vectors, xi, into a higher-dimension (even infinite) feature space
  3. Kernels: a feature mapping from original attributes
  4. SVMs can only handle binary classification.
  5. Run SVM with Weka: www.stat.nctu.edu.tw/misg/WekaInC.ppt
  6. SVMs: Pros and Cons:

Chapter 12: Mining Frequent Patterns with FP-Tree and P-Tree

  1. Frequent Pattern Tree with an example
    1. 1st scan:
      • identify the set of frequent items by minisup
      • order in frequency descending order
    2. 2nd scan: store the set of frequent items of each transaction in a tree
      1. Scan the DB for second time
      2. Add the paths which are the ordered frequent items
      3. Share the path until a different item comes up
      4. Branch and create a sub-path
      Then,
      1. (To facilitate tree traversal) build item header table
      2. Nodes with the same item-name are linked
  2. F-P Tree Properties
  3. FP-Growth: Mining Frequent Patterns Using FP-Tree
  4. Association Analysis with One Scan of DB
    1. Problems with FP-tree
      • STILL requires scanning the database twice
      • If the support threshold is reduced, rerun the whole algorithm
      • Cannot handle new data
    2. Step 1: Construct a P-tree
      1. Generate a P-tree by inserting sorted transactions on any preference
      2. Record the actual frequency of every item into the item frequency list L
      3. Sort L according to item frequency
    3. Step 2: Restructure the P-tree
      1. Generate a path for each leaf
      2. Sort the path according to the updated item frequency list L and insert it into a new P-tree
  5. FP-tree Generation from P-tree
    1. Get minisup
    2. Compute frequent item list
    3. Check the P-tree to get rid of infrequent ones and their sub trees
  6. Update P-tree With New Data
    1. Insert new transactions into the P-tree according to the item frequency list and meanwhile update the list
    2. A new P-tree can be restructured according to the updated item frequency list
  • Course Requirements for Final Exam
    Please e-mail queries and comments to xwu@louisiana.edu.