Big Data Analytics (IMFI), Fall 2020

This course offers an introduction to the principles and concepts in big data analytics (BDA), which is gaining more popularity in recent years.
The course is offered for the International Master Program in Financial Technology and Innovation Enterpreneur (IMFI). It's taught by two instructors: Chun-Hao Chen, and Jenq-Haur Wang.
This page is for the second part. The major topics covered in the second part include: data mining and distributed processing.

Course Information (for the second part)

Latest News

(Tentative) Schedule (for the second part)

WeekDateContentReadingNote
1Sep. 17, 2020 (The first part)
2Sep. 24, 2020 (The first part)
(Leave for ROCLING 2020)
3Oct. 1, 2020 (The first part)
(Leave for Mid-Autumn Festival)
4Oct. 8, 2030 (The first part)
5Oct. 15, 2020 (The first part)
(National Day)
6Oct. 22, 2020 (The first part)
7Oct. 29, 2020 Course Overview
Introduction to Big Data Analytics
Ch.2, Getting to Know Your Data
DM3, Ch.1
DM3, Ch.2
8Nov. 5, 2020 Ch.3, Data Preprocessing DM3, Ch.3 HW1
9Nov. 12, 2020 Ch.6, Frequent Pattern Mining DM3, Ch.6
10Nov. 19, 2020 Ch.8, Classification: Basic Concepts DM3, Ch.8 Due: HW#1
HW#2
11Nov. 26, 2020 Ch.8, Classification Proposal
12Dec. 3, 2020 (Midterm Exam)
(Leave for TAAI 2020)
Due: HW#2
13Dec. 10, 2020 Ch.10, Cluster Analysis: Basic Concepts and Methods
Introduction to Big Data Analytics Platforms: Hadoop, Spark, TensorFlow
(Ref: Notes on installation, configuration, and management of Hadoop & Spark clusters)
(Lab: Python Programming in Google Colab)
DM3, Ch.10 HW3
Due: Term Project Proposal
14Dec. 17, 2020 (Leave for ICS 2020)
TA: Introduction to Python programming, Google CoLab, distributed platform, Membership registration for Term Project, Introduction to Open Data, dataset aquisition
TA: Examples in Spark programming
Programming in Spark & Colaboratory
15Dec. 24, 2020 The MapReduce Programming Paradigm
Spark Programming
(Lab: Classification Example using Spark)
Due: HW#3
16Dec. 31, 2020 (Term Project Presentation)
17Jan. 7, 2021 (Term Project Presentation)
18Jan. 14, 2021 (Term Project Presentation)

Homework Assignments, Labs, and Term Project (for the second part)

During the progress of the course, there will be several homework assignments for written exercises, and also some hand-on labs in class.

Homeworks

There will be about 3 written assignments for topics such as pattern mining, classification, and clustering.
  1. HW1 : Ch.2-3
    Due: Nov. 19, 2020
  2. HW#2 : Ch.6
    Due: Dec. 3, 2020
  3. HW3 : Ch.8
    Due: Dec. 24, 2020
Please hand in your assignment before deadline according to the following instructions.

Submission Instructions

NOTE: Programs or projects in electronic files must be submitted directly to the website.

If you cannot successfully submit your work, please contact with the instructor.

Labs

Due to the background of students this semester, it will be difficult to give hands-on labs or programming exercises in different platforms such as Spark and Jupyter Notebook.
Reference:

Term Project

Instead, you are required to complete a term project in which open datasets can be anayzed using open source tools.
  1. Proposal: one week after our midterm (Dec. 10, 2020)
  2. Presentations: *required* in the last three weeks (Dec. 31, 2020, Jan. 7, 14, 2021)
  3. Final report: *required* before the end of the semester (Jan. 15, 2021)

Exams

  1. Midterm Exam: Nov. 9-13, 2020
  2. Final Exam: Jan. 11-15, 2021

Scores

Please check the homework submission site for more details.
Created: Oct. 16, 2020.
Last Updated: Jan. 11, 2021.