[*** NOTE ***]: This course is NOT an introductory course for undergraduates.
You are required to take the first course of "Introduction to Big Data Analytcis" or "Data Mining" in the undergraduate-level as a prerequisite.
It contains very heavy loads of parallel programming, which is a totally different concept from sequential programming.
Please think twice before taking this course!
Please read the notes to the course very carefully about the requirements for the programming homework and quiz.
Week | Date | Content | Reading | Note |
---|---|---|---|---|
1 | Sep. 21, 2021 | (Leave for Mid-Autumn Festival) | ||
2 | Sep. 28, 2021 |
Course Overview
Ch.1, Introduction to distributed platforms & MapReduce: Hadoop, Spark |
[MMDS3] Ch.1 & 2
[MR] Ch.1-3 | |
3 | Oct. 5, 2021 | MapReduce programming: the basics |
[MMDS3] Ch. 2
[MR] Ch. 3 | |
4 | Oct. 12, 2021 | MapReduce Algorithm Design: design patterns (pairs & stripes), language models |
[MMDS3] Ch. 2
[MR] Ch. 3 | HW#1 |
5 | Oct. 19, 2021 |
(Leave for taking COVID-19 vaccines)
TA: Multi-node Hadoop/Spark Installation & configuration, platform usage demo TA: Homework Q&A, term project proposal, and team member registration | ||
6 | Oct. 26, 2021 | Ch.3, Finding similar items | [MMDS3] Ch.3 | |
7 | Nov. 2, 2021 | Ch.7, Clustering | [MMDS3] Ch.7 |
HW#2
Due: HW#1 |
8 | Nov. 9, 2021 | Ch.11, Dimension reduction | skim through [MMDS3] Ch.11 | Term Project Proposal |
9 | Nov. 16, 2021 | Ch.9, Recommender systems | [MMDS3] Ch.9 | HW#3 |
10 | Nov. 23, 2021 | (Midterm Exam) | Due: HW#2 | |
11 | Nov. 30, 2021 | (Leave for taking COVID-19 vaccines) | [MMDS3] Ch.5 | Due: Proposal |
12 | Dec. 7, 2021 |
Ch.5, Link analysis - PageRank & HITS
Part II of link analysis | [MMDS3] Ch.5 |
HW#4
Due: HW#3 |
13 | Dec. 14, 2021 | Ch.10, Mining social network graphs: Community Detection | [MMDS3] Ch.10 | |
14 | Dec. 21, 2021 | Part II: Overalapping Communities | [MMDS3] Ch.10 | HW#5 |
15 | Dec. 28, 2021 |
Ch.12, Large-scale machine learning: kNN, Perceptron
Ch.12, Large-scale machine learning: SVM | [MMDS3] Ch.12 | Due: HW#4 |
16 | Jan. 4, 2022 | Term Project Presentation (Week 1) | ||
17 | Jan. 11, 2022 | Term Project Presentation (Week 2) | Due: HW#5 | |
18 | Jan. 18, 2022 | Term Project Presentation (Week 3) |
If you cannot successfully submit your work, please contact with the TA or the instructor.
In the case of very large datasets, you still need to analyze all data objects in the whole dataset. Please design your program accordingly to partition the data into several batches, and merge the final result.
Item | Description | Time |
---|---|---|
Proposal | You are required to submit a proposal for term project one week after midterm exam. | Nov. 23, 2021 (Tue.) |
Schedule |
Due to our time limits, we might have to start the term project presentation as early as Dec. 28, 2021 (Tue.). * [NOTE] All presentations *must* be finished within the scheduled time slots, which will be the last *three* weeks in this semester. No other time slots will be avbailable. |
(Dec. 28, 2021,) Jan. 4, 11, 18, 2022 |
Report | Each team is *required* to upload the final report after finishing your presentation.
The final report should contain at least the following:
|
Jan. 22, 2022 (Fri.) |