Big Data Analytics (IMFI), Fall 2019

This course offers an introduction to the principles and concepts in big data analytics (BDA), which is gaining more popularity in recent years.
The course is offered for the International Master Program in Financial Technology and Innovation Enterpreneur (IMFI). It's taught by two instructors: Chen-Shu Wang, and Jenq-Haur Wang.
This page is for the second part. The major topics covered in the second part include: data mining and distributed processing.

Course Information (for the second part)

Latest News

(Tentative) Schedule (for the second part)

WeekDateContentReadingNote
1Sep. 12, 2019 Course Overview
2Sep. 19, 2019 (The first part)
3Sep. 26, 2019 (The first part)
(Leave for TANET 2019)
4Oct. 3, 2019 (The first part)
(Leave for ROCLING 2019)
5Oct. 10, 2019 (The first part)
(National Day)
6Oct. 17, 2019 (The first part)
7Oct. 24, 2019 Introduction to Big Data Analytics
Ch.2, Getting to Know Your Data
DM3, Ch.1
DM3, Ch.2
8Oct. 31, 2019 Ch.3, Data Preprocessing DM3, Ch.3 HW#1
9Nov. 7, 2019 Ch.6, Frequent Pattern Mining DM3, Ch.6
10Nov. 14, 2019 (Leave for NCS 2019)
TA: Introduction to distributed platform
Introduction to Big Data Analytics Platforms: Hadoop, Spark, TensorFlow
(Ref: Notes on installation, configuration, and management of Hadoop & Spark clusters)
Due: HW#1
11Nov. 21, 2019 (Leave for TAAI 2019)
TA: Membership registration for Term Project, Introduction to Open Data, dataset aquisition
TA: Examples in Spark programming
Programming in Spark & Colaboratory
(Lab: Python Programming in Google Colab)
12Nov. 28, 2019 Ch.8, Classification: Basic Concepts DM3, Ch.8 HW#2
Proposal
13Dec. 5, 2019 (Midterm Exam) Due: Term Project Proposal
14Dec. 12, 2019 Ch.10, Cluster Analysis: Basic Concepts and Methods
(Lab: Classification Example using Spark)
DM3, Ch.10 Due: HW#2
HW#3
15Dec. 19, 2019 (Lab: Clustering Example using Spark)
The MapReduce Programming Paradigm
Spark Programming
16Dec. 26, 2019 (Term Project Presentation) Due: HW#3
17Jan. 2, 2020 (Term Project Presentation)
18Jan. 9, 2020 (Term Project Presentation)

Homework Assignments, Labs, and Term Project (for the second part)

During the progress of the course, there will be several homework assignments for written exercises, and also some hand-on labs in class.

Homeworks

There will be about 3 written assignments for topics such as pattern mining, classification, and clustering.
  1. HW#1 :Ch.2-3
    Due: Nov. 14, 2019
  2. HW#2 :Ch.6
    Due: Dec. 12, 2019
  3. HW#3 :Ch.8
    Due: Dec. 26, 2019
Please hand in your assignment before deadline according to the following instructions.

Submission Instructions

NOTE: Programs or projects in electronic files must be submitted directly to the website.

If you cannot successfully submit your work, please contact with the instructor.

Labs

Due to the background of students this semester, it will be difficult to give hands-on labs or programming exercises in different platforms such as Spark and Jupyter Notebook.
Reference:

Term Project

Instead, you are required to complete a term project in which open datasets can be anayzed using open source tools.
[NOTE] If you haven't submitted your proposal, please do so as soon as possible since we are arranging the presentation schedule.
  1. The current scuedule for term project presentation. (as of Dec. 26, 2019)
    Since we have less teams this semester, each team can have more time for presentation. The presentations that cannot finish will be delayed until next week.

Exams

  1. Midterm Exam: Nov. 5-9, 2018
  2. Final Exam:

Scores

Please check the homework submission site for more details.
Created: Sep. 11, 2019.
Last Updated: Jan. 15, 2020.