[*** NOTE ***]: This course is NOT an introductory course for undergraduates.
You are required to take the first course of "Introduction to Big Data Analytcis" or "Data Mining" in the undergraduate-level as as prerequisite.
It contains very heavy loads of parallel programming, which is a totally different concept from sequential programming.
Please think twice before taking this course!
Please read the notes to the course very carefully about the requirements for the programming homework and quiz.
Week | Date | Content | Reading | Note |
---|---|---|---|---|
1 | Sep. 13, 2019 | (Leave for Mid-Autumn Festival) | ||
2 | Sep. 20, 2019 |
Course Overview
Ch.1, Introduction to distributed platforms & MapReduce: Hadoop, Spark |
[MMDS2] Ch.1 & 2
[MR] Ch.1-3 | |
3 | Sep. 27, 2019 |
(Leave for TANET 2019)
TA: Package Installation, platform usage demo | HW#1 | |
4 | Oct. 4, 2019 |
(Leave for ROCLING 2019)
TA: Homework Q&A, term project proposal, and team member registration | ||
5 | Oct. 11, 2019 | (Compensaton Leave for National Day) | ||
6 | Oct. 18, 2019 | MapReduce programming: the basics |
[MMDS2] Ch. 2
[MR] Ch. 3 |
Term Project Proposal |
7 | Oct. 25, 2019 | MapReduce Algorithm Design: design patterns (pairs & stripes), language models |
Due: HW#1
HW#2 | |
8 | Nov. 1, 2019 | Ch.3, Finding similar items | [MMDS2] Ch.3 | |
9 | Nov. 8, 2019 |
Ch.7, Clustering
(Ch.11, Dimension reduction) |
[MMDS2] Ch.7
(skim through [MMDS2] Ch.11) | HW#3 |
10 | Nov. 15, 2019 | Ch.9, Recommender systems | [MMDS2] Ch.9 | Due: HW#2 |
11 | Nov. 22, 2019 |
(Leave for TAAI 2019)
Invited talk about FinTech (in Chinese) | ||
12 | Nov. 29, 2019 | (Midterm Exam) |
Due: Proposal
HW#4 | |
13 | Dec. 6, 2019 |
Ch.5, Link analysis - PageRank & HITS
More about PageRank | [MMDS2] Ch.5 | Due: HW#3 |
14 | Dec. 13, 2019 |
Ch.10, Mining social network graphs
Community Detection in Graphs | [MMDS2] Ch.10 | HW#5 |
15 | Dec. 20, 2019 | Ch.12, Large-scale machine learning: SVM | [MMDS2] Ch.12 | |
16 | Dec. 27, 2019 | Term Project Presentation (Week 1) | Due: HW#4 | |
17 | Jan. 3, 2020 | Term Project Presentation (Week 2) | ||
18 | Jan. 10, 2020 | Term Project Presentation (Week 3) | Due: HW#5 |
If you cannot successfully submit your work, please contact with the TA or the instructor.
In the case of very large datasets, you still need to analyze all data objects in the whole dataset. Please design your program accordingly to partition the data into several batches, and merge the final result.
Item | Description | Time |
---|---|---|
Proposal | You are required to submit a proposal for term project one week after midterm exam. | Nov. 22, 2019 (Fri.) |
Schedule |
Due to our time limits, we might have to start the term project presentation as early as Dec. 27, 2019 (Fri.). The current schedule for term project presentation (as of Dec.27, 2019) *** [NOTE] Since we have much more teams than expected, we can only allow for 15 minutes for each presentation. Please plan accordingly. * [NOTE] All presentations *must* be finished within the scheduled time slots, which will be the last *three* weeks in this semester. No other time slots will be avbailable. |
Dec. 27, 2019, Jan. 3, 10, 2020 |
Report | Each team is *required* to upload the final report after finishing your presentation.
The final report should contain at least the following:
|
Jan. 10, 2020 (Fri.) |