Introduction to Big Data Analytics, Fall 2022

This course offers an introduction to the principles and concepts in big data analytics (BDA), which is gaining more popularity in recent years.
The major topics covered in this course will include: data mining, distributed computing, and parallel programming.
The course is offered at undergraduate level.

Notes on Online Courses

All students enrolled in this course have been added to the team in Microsoft Teams for the corresponding course number.
You can attend the online course in Microsoft Teams at the following channel: Team created for Big Data Analytics Course [Course Number: 308849]

Course Information

Latest News

(Tentative) Schedule

* All slides can be downloaded at the iSchool+ plaform in NTUT.
WeekDateContentReadingNote
1Sep. 12 & 14, 2022 Course Overview
2Sep. 19 & 21, 2022 Introduction to Big Data Analytics
Ch.2, Getting to Know Your Data
DM3, Ch.1
DM3, Ch.2
3Sep. 26 & 28, 2022 Ch.3, Data Preprocessing DM3, Ch.3 (selected)
4Oct. 3 & 5, 2022 Ch.3 HW#1
5Oct. 10 & 12, 2022 (Compensation Leave for National Day)
Ch.6, Frequent Pattern Mining
DM3, Ch.6 Term Project Proposal
6Oct. 17 & 19, 2022 Ch.8, Classification: Basic Concepts DM3, Ch.8 Due: HW#1
Team Registration
7Oct. 24 & 26, 2022 Ch.8 HW#2
Due: Team Registration
8Oct. 31 & Nov. 2, 2022 Ch.9, Classification: Advanced Methods DM3, Ch.9 (selected sections)
9Nov. 7 & 9, 2022 Ch.10, Cluster Analysis: Basic Concepts and Methods DM3, Ch.10 HW#3
Due: HW#2
10Nov. 14 & 16, 2022 (Midterm Exam)
11Nov. 21 & 23, 2022 Distribtued Platforms: Hadoop, Spark
Ref: Notes on installation, configuration, and management of Hadoop & Spark clusters
Due: HW#3
Due: Proposal
12Nov. 28 & 30, 2022 Parallel Programming Paradigms & Concepts
13Dec. 5 & 7, 2022 MapReduce Programming
(Lab: Spark cluster demo)
HW#4
14Dec. 12 & 14, 2022 Spark Programming
(Lab: classification using Spark)
15Dec. 19 & 21, 2022 (Leave for IEEE BigData 2022)
Term Project Presentation (Week 1)
Due: HW#4
16Dec. 26 & 28, 2022 (12/26: Leave for IEEE BigData 2022)
Term Project Presentation (Week 2)
17Jan. 2 & 4, 2023 Term Project Presentation (Week 3)
18Jan. 9 & 11, 2023 Term Project Presentation (Week 4)

Programming Assignments and Projects

Please hand in your assignment before deadline according to the following instructions.

Submission Instructions

NOTE: Programs or projects in electronic files must be submitted directly to i-School+.

If you cannot successfully submit your work, please contact with the TA or the instructor.

Homeworks

There will be several written assignments and programming exercises that target at different data analysis tasks.
  1. HW#1 : Ch.2-3 Data Preprocessing
  2. HW#2 : Ch.6 Frequent Pattern Mining
  3. HW#3 : Ch.8-9 Classification
  4. HW#4 : MepReduce Programming

Projects

  1. Term Project
    ItemDescriptionTime
    Proposal You are required to submit a proposal for term project one week after midterm exam. Nov. 21, 2022 (Mon.)
    Topics Two options:
    1. Project for data analysis or related system development
    2. Joining competitions as your term project. You can check the details on recent competitions as potential topics for term project.
    Schedule Due to our time limits, we have to start the term project presentation as early as Dec. 19, 2022 (Mon.).

    * [NOTE] All presentations *must* be finished within the scheduled time slots, which will be the last 4 weeks in this semester. No other time slots will be avbailable.
    Dec. 19, 21, 26, 28, 2022 & Jan. 2, 4, 9, 11, 2023
    ReportEach team is *required* to upload the final report after finishing your presentation.
    The final report should contain at least the following:
    1. presentation slides, and
    2. source code, and documents containing installation/execution instructions, team members and task responsibility
    Jan. 13, 2023 (Fri.)

Exams

  1. Midterm Exam: Nov. 7-11, 2022
  2. Final Exam: Jan. 9-13, 2023

Scores

Please check the homework submission site for more details.
E-mail: jhwang AT ntut . edu . tw
Created: Sep. 11, 2022.
Last Updated: Nov. 23, 2022.