Introduction to Big Data Analytics, Spring 2017

This course offers an introduction to the principles and concepts in big data analytics (BDA), which is gaining more popularity in recent years.
The major topics covered in this course will include: data mining, distributed computing, and parallel programming.
The course is offered at junior-level.

Course Information

Latest News

(Tentative) Schedule

WeekDateContentReadingNote
1Feb. 21 & 24, 2017 Course Overview
2Feb. 28 & Mar. 3, 2017 (2/28:Peace Memorial Day)
Introduction to Big Data Analytics
DM3, Ch.1
3Mar. 7 & 10, 2017 Ch.2, Getting to Know Your Data
Ch.3, Data Preprocessing
DM3, Ch.2 & Ch.3 (selected)
4Mar. 14 & 17, 2017 Ch.6, Frequent Pattern Mining DM3, Ch.6 3/17: HW#1
5Mar. 21 & 24, 2017 Ch.8, Classification: Basic Concepts DM3, Ch.8
6Mar. 28 & 31, 2017 Ch.8, Classification
Ch.9, Classification: Advanced Methods
DM3, Ch.9 (selected sections) HW#2
Term Project Proposal
Due: HW#1
7Apr. 4 & 7, 2017 (4/4: Tomb Sweeping Day & Children's day)
Ch.10, Cluster Analysis: Basic Concepts and Methods
DM3, Ch.10 Proposal
8Apr. 11 & 14, 2017 Data Clustering DM3, Ch.10 Due: HW#2
9Apr. 18 & 21, 2017 (4/18: Midterm Exam)
10Apr. 25 & 28, 2017 Distribtued Platforms: Hadoop, Spark
Documents about installation, configuration, and management of Hadoop & Spark clusters can be referenced.
Due: Proposal
11May 2 & 5, 2017 Parallel Programming Paradigms & Concepts
(5/5: No class)
12May 9 & 12, 2017 MapReduce Programming
13May 16 & 19, 2017 Spark Programming HW#3
14May 23 & 26, 2017 MapReduce Programming
Advanced Topics
15May 30 & Jun. 2, 2017 (Dragon Boat Festival)
Term Project Presentation (Week 1): 6 teams completed.
16Jun. 6 & 9, 2017 Term Project Presentation (Week 2): 12 teams completed (6/6), 5 teams completed (6/9) Due: HW#3
17Jun. 13 & 16, 2017 Term Project Presentation (Week 3): 10 teams completed (6/13), 4 teams completed (6/16).
18Jun. 20 & 23, 2017 (Leave for JCDL 2017) // TErm Project Presentation (Week 4): 8 teams remained (6/20).

Useful Links

Here're some useful links to information retrieval related resources or further readings.
  • More Books

    Programming Assignments and Projects

    Please hand in your assignment before deadline according to the following instructions.

    Submission Instructions

    NOTE: Programs or projects in electronic files must be submitted directly to the TA online.

    If you cannot successfully submit your work, please contact with the TA or the instructor.

    Homeworks

    There will be several written assignments and programming exercises that target at different data analysis tasks.
    1. HW#1: Ch.2-3 (DM3)
      Due: Mar. 31, 2017
    2. HW#2: Ch.4 (DM3)
      Due: Apr. 14, 2017
    3. HW#3: MapReduce
      Due: Jun. 6, 2017
      [NOTE] The datasets for HW#3 can be downloaded at the following: Foursquare data.

    Projects

    1. Term Project: paper presentation or system demonstration
      ItemDescriptionTime
      Proposal You are required to submit a proposal for term project one week after midterm exam. May 5, 2017 (Fri.)
      Topics For paper presentations, the paper quality will *greatly* affect your score in term project. Please *carefully* select good papers to read.
      Schedule
      Due to our time limits, we might have to start the term project presentation as early as Jun. 2, 2017 (Fri.).

      * [NOTE] All presentations *must* be finished within the scheduled time slots, which will be the last *four* weeks in this semester. No other time slots will be avbailable.
      The current schedule for term project presentation has been posted. (as of Jun. 16, 2017)
      Jun. 2, 6, 9, 13, 16, 20, 2017
      ReportEach team is *required* to upload the final report after finishing your presentation.
      The final report should contain at least the following:
      1. presentation slides (for all teams), and
      2. source code, installation/execution instructions, team members and task responsibility (for system projects)
      Jun. 26, 2017 (Mon.)

    Exams

    1. Midterm Exam: Apr. 17-21, 2017
      • Date: Apr. 18, 2017 (Tue.)
      • Time: 14:10-16:00pm
      • Location: R227, 6th Teaching Building
      • Range: DM3, Ch.1-2, 3 (part), 6, 8, 9 (part), 10
    2. Final Exam: Jun. 19-23, 2017
      • Note: There will be no final exam in this course. Instead, you are required to finish a term project for system development or paper presentation.

    Scores

    Please check the homework submission site for more details.
    E-mail: jhwang AT csie . ntut . edu . tw
    Created: Feb. 18, 2017.
    Last Updated: Jun. 16, 2017.