Introduction to Big Data Analytics, Fall 2020

This course offers an introduction to the principles and concepts in big data analytics (BDA), which is gaining more popularity in recent years.
The major topics covered in this course will include: data mining, distributed computing, and parallel programming.
The course is offered at undergraduate level.

Course Information

Latest News

(Tentative) Schedule

* All slides can be downloaded at the iSchool plaform in NTUT.
WeekDateContentReadingNote
1Sep. 14 & 16, 2020 Course Overview
Introduction to Big Data Analytics
DM3, Ch.1
2Sep. 21 & 23, 2020 Ch.2, Getting to Know Your Data DM3, Ch.2
3Sep. 28 & 30, 2020 Ch.3, Data Preprocessing DM3, Ch.3 (selected) 9/30: HW#1
4Oct. 5 & 7, 2020 Ch.6, Frequent Pattern Mining DM3, Ch.6
5Oct. 12 & 14, 2020 Ch.8, Classification: Basic Concepts DM3, Ch.8 Term Project Proposal
10/14 Due: HW#1
6Oct. 19 & 21, 2020 Ch.8, Classification: Basic Concepts DM3, Ch.8 10/21: HW#2
Team Registration
7Oct. 26 & 28, 2020 Ch.9, Classification: Advanced Methods DM3, Ch.9 (selected sections) Due: Team Registration
8Nov. 2 & 4, 2020 Ch.10, Cluster Analysis: Basic Concepts and Methods DM3, Ch.10 HW#3
11/4 Due: HW#2
9Nov. 9 & 11, 2020 Ch.10, Cluster Analysis: Basic Concepts and Methods DM3, Ch.10
10Nov. 16 & 18, 2020 (11/16: Midterm Exam) Due: HW#3
11Nov 23 & 25, 2020 Distribtued Platforms: Hadoop, Spark
Ref: Notes on installation, configuration, and management of Hadoop & Spark clusters
(11/25: Leave for MLN 2020)
Due: Proposal
12Nov. 30 & Dec. 2, 2020 Parallel Programming Paradigms & Concepts HW#4
13Dec. 7 & 9, 2020 MapReduce Programming
(Lab: Spark cluster demo)
14Dec. 14 & 16, 2020 Spark Programming
(Lab: classification using Spark)
Due: HW#4
15Dec. 21 & 23, 2020 Term Project Presentation (Week 1) - 6 teams completed.
16Dec. 28 & 30, 2020 Term Project Presentation (Week 2) - 7 teams completed.
17Jan. 4 & 6, 2021 Term Project Presentation (Week 3) - 6 teams completed.
18Jan. 11 & 13, 2021 Term Project Presentation (Week 4) - 4 teams completed.

Programming Assignments and Projects

Please hand in your assignment before deadline according to the following instructions.

Submission Instructions

NOTE: Programs or projects in electronic files must be submitted directly to i-School+.

If you cannot successfully submit your work, please contact with the TA or the instructor.

Homeworks

There will be several written assignments and programming exercises that target at different data analysis tasks.
  1. HW#1 : Ch.2-3 Data Preprocessing
    Due: Oct. 14, 2020
  2. HW#2 : Ch.6 Frequent Pattern Mining
    Due: Nov. 4, 2020

    [NOTE] For the programming projects in HW#2, the DBLP dataset can be downloaded in XML format at: https://dblp.org/xml/release/
    However, since DBLP dataset is very large, it might not be easy to analyze. You can try to download the partial datasets collected by different sources. The details can be checked in the Notes for HW#2, for example:

  3. HW#3 : Ch.8-9 Classification
    Due: Nov. 16, 2020
  4. HW#4 : MepReduce Programming
    Due: Dec. 14, 2020

Projects

  1. Term Project: paper presentation or system demonstration
    ItemDescriptionTime
    Proposal You are required to submit a proposal for term project one week after midterm exam. Nov. 23, 2020 (Mon.)
    Topics For paper presentations, the paper quality will *greatly* affect your score in term project. Please *carefully* select good papers to read.
    Schedule Due to our time limits, we have to start the term project presentation as early as Dec. 21, 2020 (Mon.).

    Please check the current schedule of presentation. (as of Jan. 11, 2021)
    * [NOTE] All presentations *must* be finished within the scheduled time slots, which will be the last 4 weeks in this semester. No other time slots will be avbailable.
    Dec. 21, 23, 28, 30, 2020 & Jan. 4, 6, 11, 13, 2021
    ReportEach team is *required* to upload the final report after finishing your presentation.
    The final report should contain at least the following:
    1. presentation slides (for all teams), and
    2. source code, installation/execution instructions, team members and task responsibility (for system projects)
    Jan. 15, 2021 (Fri.)

Exams

  1. Midterm Exam: Nov. 9-13, 2020
  2. Final Exam: Jan. 11-15, 2021

Scores

Please check the homework submission site for more details.
E-mail: jhwang AT csie . ntut . edu . tw
Created: Sep. 11, 2020.
Last Updated: Jan. 21, 2021.