cse5dmi data mining
DATA MINING
CSE5DMI
2018
Credit points: 15
Subject outline
Data Mining refers to various techniques which can be used to uncover hidden information from a database. The data to be mined may be complex data including big data, multimedia, spatial and temporal data, biological and health data. Data Mining has evolved from several areas including: databases, artificial intelligence, algorithms, information retrieval and statistics. This subject is designed to provide graduate students with a solid understanding of data mining concepts and tools. The subject covers data preprocessing, data classification, association rule mining, and clustering algorithms and techniques. Domain applications of data mining techniques will be addressed in this subject.
SchoolSchool Engineering&Mathematical Sciences
Credit points15
Subject Co-ordinatorPhoebe Chen
Available to Study Abroad StudentsYes
Subject year levelYear Level 5 - Masters
Exchange StudentsYes
Subject particulars
Subject rules
Prerequisites CSE1OOF or CSE4OOF or CSE5CES or equivalent (discuss with subject coordinator)
Co-requisitesN/A
Incompatible subjects CSE4DMI
Equivalent subjectsN/A
Special conditionsN/A
Learning resources
Readings
Resource Type | Title | Resource Requirement | Author and Year | Publisher |
---|---|---|---|---|
Readings | Introduction to Data Mining | Recommended | Tan, PN, Steinback, M & Kumar, V; 2006 | MORGAN KAUFMANN |
Readings | Data Mining: Concepts and Techniques | Recommended | Jiawei Han, Micheline Kamber and Jian Pei; 2011 | Morgan Kaufmann |
Graduate capabilities & intended learning outcomes
01. Explain the technologies and applications of data mining techniques (DM).
- Activities:
- Lecture 1 is on the introduction of data mining techniques and its applications.
02. Identify various data types and perform critical and effective data-preprocessing tasks
- Activities:
- Students will learn different types of data and their related issues such as sampling, similarity metrics, feature selection, dimensionality issue. They also learn and practise effective data-preprocessing techniques in Lecture 2, Tutorial 3, and assignment.
03. Explain and evaluate major classification methods
- Activities:
- Lectures 3, 4, 5, and 6 provide details for a wide range of classification approaches such as decision tree, rule-based classification, nearest neighbour classification, Bayes classification, artificial neural network (ANN), and support vector machine (SVM). Related issues covering underfitting and overfitting will also be discussed. Students will also apply various classification approaches to different datasets in Tutorials 4, 5, 6, and assignment.
04. Explain and evaluate association rules mining approaches
- Activities:
- Lectures 7 and 8 introduce association analysis for transaction data, including frequent itemsets, association rule mining, rule generation and evaluation, and Apriori algorithm. Students will practise association rules mining in Tutorials 7, 8, and assignment.
05. Explain and evaluate data clustering techniques
- Activities:
- Lectures 9, 10, and 11 introduce major data clustering techniques, such as K-means clustering, hierarchical clustering, and DBSCAN, for pattern extraction and knowledge discovery from unlabeled data. Students will learn how to applies these approaches to real datasets in Tutorials 9, 10, and assignment.
06. Discuss advanced issues of data mining algorithms, e.g. classification systems, important issues in clustering algorithms, and the links between data mining and knowledge engineering.
- Activities:
- Lectures 12 is a revision lecture where all relevant data mining algorithms will be evaluated for deeper understandings and applications. Also, students will analyse advanced issues such as classifications systems, and the links between data mining and knowledge engineering, for their further studies.
07. Implement advanced data mining techniques for pattern discovery from selected datasets.
- Activities:
- Students will learn the basics of MATLAB programming and WEKA in Tutorials including data pre-processing, decision tree, ANN, SVM,and association rules mining. Students will apply k-means and hierarchical approach for MATLAB data clustering. Assignment consultations will be given in certain tutorials, where students will get help and technical support to overcome their difficulties encountered in doing their assessments.
Subject options
Select to view your study options…
Melbourne, 2018, Semester 2, Blended
Overview
Online enrolmentYes
Maximum enrolment sizeN/A
Enrolment information
Subject Instance Co-ordinatorPhoebe Chen
Class requirements
LectureWeek: 31 - 43
One 2.0 hours lecture per week on weekdays during the day from week 31 to week 43 and delivered via face-to-face.
Computer LaboratoryWeek: 32 - 43
One 2.0 hours computer laboratory per week on weekdays during the day from week 32 to week 43 and delivered via face-to-face.
Assessments
Assessment element | Comments | % | ILO* |
---|---|---|---|
Assignment 1 equivalent to 1,200 words | 20 | 02, 03, 07 | |
Assignment 2 equivalent to 1,200 words | 20 | 02, 04, 05, 07 | |
One 3-hour examination | Hurdle requirement: To pass the subject, a pass in the examination is mandatory. | 50 | 01, 02, 03, 04, 05, 06 |
Tutorial participation and contribution to tutorial tasks equivalent to 100 words per tutorial | 10 | 07 |