[NOTE] For those who really cannot attend the class in person due to some unforeseeable reasons, you could join the corresponding online sessions in Teams. During course sessions, remember to sign up your name and student ID in Teams, when the TA will remind you.
Please read the notes to the course very carefully about the requirements for the programming homework and quiz.
[*** NOTE ***]: This course is NOT an introductory course for undergraduates.
You are strongly suggested to take the first course of "Introduction to Big Data Analytcis" or "Data Mining" in the undergraduate-level as a prerequisite.
It contains very heavy loads of parallel programming, which is a totally different concept from sequential programming.
Please think twice before taking this course!
Week | Date | Content | Reading | Note |
---|---|---|---|---|
1 | Sep. 11, 2024 | Course Overview | ||
2 | Sep. 18, 2024 | Ch.1, Introduction to distributed platforms & MapReduce: Hadoop, Spark |
[MMDS3] Ch.1 & 2
[MR] Ch.1-3 | |
3 | Sep. 25, 2024 |
MapReduce programming: the basics
MapReduce Algorithm Design: design patterns (pairs & stripes), language models |
[MMDS3] Ch. 2
[MR] Ch. 3 | |
4 | Oct. 2, 2024 | (Leave for Typhoon Day) |
[MMDS3] Ch. 2
[MR] Ch. 3 | |
5 | Oct. 9, 2024 |
Ch.3, Finding similar items
TA: Spark Cluster: Installation & configuration, platform usage demo | [MMDS3] Ch.3 | HW#0 |
6 | Oct. 16, 2024 |
Ch.3, Finding similar items
TA: Homework Q&A, term project proposal, and team member registration | [MMDS3] Ch.3 |
Due: HW#0
HW#1 |
7 | Oct. 23, 2024 | Ch.7, Clustering | [MMDS3] Ch.7 | |
8 | Oct. 30, 2024 | Ch.11, Dimension reduction | skim through [MMDS3] Ch.11 |
Term Project Proposal
Due: HW#1 HW#2 |
9 | Nov. 6, 2024 |
Ch.9, Recommender systems - Part I
Ch.9, Recommender systems - Part II | [MMDS3] Ch.9 | |
10 | Nov. 13, 2024 | (11/13: Midterm Exam) | ||
11 | Nov. 20, 2024 | Ch.5, Link analysis - PageRank & HITS (Part I) | [MMDS3] Ch.5 |
Due: HW#2
Due: Proposal HW#3 |
12 | Nov. 27, 2024 | Link analysis: TrustRank, WebSpam (Part II) | [MMDS3] Ch.5 | |
13 | Dec. 4, 2024 |
Ch.10, Mining social network graphs: Community Detection
(Part II: Overalapping Communities) (Ch.12, Large-scale machine learning: kNN, Perceptron Ch.12, Large-scale machine learning: SVM) |
[MMDS3] Ch.10
([MMDS3] Ch.12) |
Due: HW#3
HW#4 HW#5 |
14 | Dec. 11, 2024 | Term Project Presentation (Week 1) | ||
15 | Dec. 18, 2024 |
(Leave for IEEE BigData 2024)
(TA: Questions about homeworks and term projects) | ||
16 | Dec. 25, 2024 |
Term Project Presentation (Week 2)
(Leave for IEEE BigData 2024) (TA: Helps recording the presentations) | ||
17 | Jan. 1, 2025 | (1/1: Leave for New Year Day) |
Due: HW#4
Due: HW#5 | |
18 | Jan. 8, 2025 | Term Project Presentation (Week 3) |
If you cannot successfully submit your work, please contact with the TA or the instructor.
In the case of very large datasets, you still need to analyze all data objects in the whole dataset. Please design your program accordingly to first partition the data into several batches, and then merge the final result.
Item | Description | Time |
---|---|---|
Proposal | You are required to submit a proposal for term project one week after midterm exam. | Nov. 20, 2024 (Tue.) |
Schedule |
Due to our time limits, we might have to start the term project presentation as early as Dec. 11, 2024 (Wed.). * [NOTE] All presentations *must* be finished within the scheduled time slots, which will be the last *four* weeks in this semester. No other time slots will be avbailable. |
Dec. 11, 25, 2024, Jan. 8, 2025 |
Report | Each team is *required* to upload the final report after finishing your presentation.
The final report should contain at least the following:
|
Jan. 10, 2025 (Fri.) |