Home - Students - My Studies - Courses - D - Content

Data Mining Practice

Course Name (Chinese):数据挖掘实践

(English): Data Mining Practice

Course Name: Data Mining Practice

Course Code: S2298053

Semester:4

Credit: 2

Program:Computer Science

Course Module:Specialized Optional

Responsible: Yu Mei

E-mail: yumei@tju.edu.cn

Department:College of Intelligence and Computing, Tianjin University

Time Allocation (1 credit hour = 45 minutes)

Exercise

Lecture

Lab-study

Project

Internship (days)

Personal Work

0

4

28

0

0

12

Course Description

With the continuous accumulation of data in the information age, the importance of data analysis and mining technology has become increasingly prominent, and related disciplines such as computer science and technology, artificial intelligence, etc. have developed rapidly, and cross-integrated with other disciplines, so data analysis and data mining technology have become a necessary skill for talents in many disciplines. This course takes the commonly used methods and models in data analysis and mining as the carrier, and covers the whole process of data representation, storage, preprocessing, and analysis mining. Through a large number of models and application examples, the course enables students to quickly master the basic processes and basic algorithms of data analysis and data mining and lays a solid foundation for their subsequent learning and scientific research.

Prerequisite

Ÿ Mathematical statistics and analysis: concepts and methods

Ÿ Data structure: data storage and query method

Course Objectives

This coursediscusses basic concepts ofdata mining to help studentsfind potential knowledge. After this course, students should be able to:

Ÿ Understand what data mining is and how to cope with the actual problem with data mining method.

Ÿ Master the related algorithms about on-line analytical processing (OLAP), classification, clustering, prediction and so on.

Ÿ Identify several data mining strategies and the application environment of each strategy.

Ÿ Comprehensively understand how to establish a model through data mining technology to solve an actual problem.

Course Syllabus

Ÿ Data mining overview: definition, task, mining object.

Ÿ Data: attributes, basic statistical description, similarity and dissimilarity.

Ÿ Data preprocessing: data, data quality issues, data preprocessing.

Ÿ Data warehousing and OLAP: design, implement, OLPA, metadata model.

Ÿ Regression analysis: basic concepts, univariate linear regression, multiple linear regression, Polynomial regression.

Ÿ Association analysis: definition, task, Apriori algorithm, FP- tree algorithm.

Ÿ Clustering: definition, main method.

Ÿ Classify: definition, decision tree, Naive Bayesian.

Ÿ Classification methods, such as decision tree, naive bayesian, and neural network.

Ÿ Exception Mining: definition, application, exception data generation causes, solutions.

Textbooks & References

Ÿ Yu Mei, Yu Jian. Data Analysis and Data Mining (2nd Edition). Tsinghua University Press, 2020.

Ÿ Jiawei Han and Micheline Kamber, Data Mining: Concepts and Techniques, China Machine Press,2006.

Ÿ D. Hand, H.Mannila and P. Smith, Principle of Data Mining, Springer, 2004.

Ÿ Pang-Ning Tan,Michael Steinbach andVipin Kumar, Introduction to Data Mining, Addison Wesley, 2005.

Ÿ Te-Ming Huang, Vojislav Kecman and Ivica Kopriva, Kernel Based Algorithms for Mining Huge Data Sets: Supervised, Semi-supervised and Unsupervised Learning, Springer, 2006.

Capability Tasks

CT1: Tounderstand the concept of data analysis and data mining, master the data types of analysis and mining, be able to apply the methods of data analysis and data mining

CT2: Be able to explain the properties of the data, master the basic statistical description of the data, and explain the similarities and differences of the data.

CT3: Understand the problems existing in data, master the concepts and methods of data cleaning, and be able to implement data integration and data reduction.

CS1:To master the basic concepts of data warehouse, implement data warehouse design, master the implementation method of data warehouse, understand online analytical processing, and be able to describe metadata models.

CS2:Be able to describe the concept of frequent patterns, master Apriori algorithm and FP-growth algorithm, understand compression frequent itemsets and association pattern evaluation.

Achievements

Ÿ To understand the application scenarios of data analysis and mining-Level: N

Ÿ To Master the properties of the data and how to describe them, the design and implementation of a data warehouse and three types of regression analysis methods. -Level: M

Ÿ To apply data mining technology to solve practical problems. -Level: M

Ÿ To master the methods of various types of data classification, data clustering and exceptionmining. -Level: M

Students: Computer Science,Year 1, Year2, Year3; Smart Medicine Year 2

Assessment:

Exam

Assignment

Report

Term Paper

Presentation

Others




Language of assessment:Chinese

Attendance 0 % Homework: 30 %

Mid-term report/test0 % Final report/test 70 %