Introduction to Big Data Analytics, Spring 2019

This course offers an introduction to the principles and concepts in big data analytics (BDA), which is gaining more popularity in recent years.
The major topics covered in this course will include: data mining, distributed computing, and parallel programming.
The course is offered at undergraduate level.

Course Information

Latest News

(Tentative) Schedule

* All slides can be downloaded at the iSchool plaform in NTUT.
WeekDateContentReadingNote
1Feb. 20 & 21, 2019 Course Overview
Introduction to Big Data Analytics
DM3, Ch.1
2Feb. 27 & 28, 2019 Ch.2, Getting to Know Your Data
(2/28: 228 Peace Memorial Day)
DM3, Ch.2
3Mar. 6 & 7, 2019 Ch.3, Data Preprocessing DM3, Ch.3 (selected) 3/7: HW#1
4Mar. 13 & 14, 2019 Ch.6, Frequent Pattern Mining DM3, Ch.6
5Mar. 20 & 21, 2019 Ch. 6: Frequent Pattern Mining
(3/21: Leave for MOE Project) -- (TA) Term Project Proposal & Team Registration
DM3, Ch.6 Term Project Proposal
3/21 Due: HW#1
6Mar. 27 & 28, 2019 Ch.8, Classification: Basic Concepts DM3, Ch.8 HW#2
7Apr. 3 & 4, 2019 (4/3: Compensation Leave for Sports Day
4/4: Children's Day)
8Apr. 10 & 11, 2019 Ch.8, Classification: Basic Concepts HW#3
Due: HW#2
9Apr. 17 & 18, 2019 Ch.9, Classification: Advanced Methods
Ch.10, Cluster Analysis: Basic Concepts and Methods
DM3, Ch.9 (selected sections)
DM3, Ch.10
10Apr. 24 & 25, 2019 Ch.10, Cluster Analysis Due: HW#3
11May 1 & 2, 2019 Distribtued Platforms: Hadoop, Spark
Ref: Notes on installation, configuration, and management of Hadoop & Spark clusters
(5/2:Midterm Exam)
Due: Proposal
12May 8 & 9, 2019 Parallel Programming Paradigms & Concepts HW#4: Hadoop & Spark
13May 15 & 16, 2019 MapReduce Programming
14May 22 & 23, 2019 Spark Programming
Advanced Topics
Due: HW#4
15May 29 & 30, 2019 Term Project Presentation (Week 1).
16Jun. 5 & 6, 2019 Term Project Presentation (Week 2).
17Jun. 12 & 13, 2019 Term Project Presentation (Week 3).
18Jun. 19 & 20, 2019 (Leave for UC)

Programming Assignments and Projects

Please hand in your assignment before deadline according to the following instructions.

Submission Instructions

NOTE: Programs or projects in electronic files must be submitted directly to the TA online.

If you cannot successfully submit your work, please contact with the TA or the instructor.

Homeworks

There will be several written assignments and programming exercises that target at different data analysis tasks.
  1. HW#1 : Ch.2-3 Data Preprocessing
    Due: Mar. 21, 2019
  2. HW#2 : Ch.6 Frequent Pattern Mining
    Due: Apr. 10, 2019
  3. HW#3 : Ch.8-9 Classification
    Due: Apr. 24, 2019
  4. HW#4 : MepReduce Programming
    Due: May 22, 2019

Projects

  1. Term Project: paper presentation or system demonstration
    ItemDescriptionTime
    Proposal You are required to submit a proposal for term project one week after midterm exam. May 2, 2019 (Thu.)
    Topics For paper presentations, the paper quality will *greatly* affect your score in term project. Please *carefully* select good papers to read.
    Schedule Please check the current schedule for term project presentation. (updated: Jun. 4, 2019)
    Due to our time limits, we have to start the term project presentation as early as May 29, 2019 (Wed.).

    * [NOTE] All presentations *must* be finished within the scheduled time slots, which will be the last 3 weeks in this semester. (excluding the last week) No other time slots will be avbailable.
    May 29, 30, Jun. 5, 6, 12, 13, 2019
    ReportEach team is *required* to upload the final report after finishing your presentation.
    The final report should contain at least the following:
    1. presentation slides (for all teams), and
    2. source code, installation/execution instructions, team members and task responsibility (for system projects)
    Jun. 21, 2019 (Fri.)

Exams

  1. Midterm Exam: Apr. 15-19, 2019
  2. Final Exam: Jun. 17-21, 2019

Scores

Please check the homework submission site for more details.
E-mail: jhwang AT csie . ntut . edu . tw
Created: Feb. 19, 2019.
Last Updated: Jun. 23, 2019.