Informaiton Retrieval and Applications, Spring 2013

This course offers an introduction to the principles and concepts in information retrieval (IR), which is fundamental to modern Web search engines.
In addition to Web search, other applications of information retrieval systems will also be described.
This year, the course is offered at graduate-level as well as the International Graduate Program in College of Electrical Engineering and Computer Science (EECS). It's taught in English.

Course Information

Latest News

(Tentative) Schedule

The slides were slightly modified from the Stanford CS276 class.
Note: IIR - Introduction to Information Retrieval, MIR - Modern Information Retrieval, Salton - Automatic Text Processing
WeekDateContentReadingNote
1Feb. 20, 2013Course Overview
2Feb. 27, 2013 Chap. 1, Boolean retrieval IIR Ch.1, MIR Ch.1, MIR 8.1-8.2, Salton 8.1-8.3
3Mar. 6, 2013 Chap. 2, The term vocabulary and postings lists
Chap. 3, Dictionaries and tolerant retrieval
IIR Ch.2, MIR 8.2, 7.1.-7.2, Salton 8.6
IIR Ch.3, MIR 4.2, Salton Ch.9
4Mar. 13, 2013 Chap. 4, Index construction IIR Ch.4, MIR Ch.8 HW#1
5Mar. 20, 2013 Sec. 5.1 Statistical properties of terms in information retrieval IIR 5.1, MIR 6.1-6.3 Team Registration
6Mar. 27, 2013 Chap. 6, Scoring, term weighting, and the vector space model
Chap. 7, Computing scores in a complete search system
IIR Ch.6, MIR 2.5
IIR Ch,7, MIR 2.5
7Apr. 3, 2013(Compensation Leave for Sports Day) Due: HW#1
8Apr. 10, 2013 Chap. 8, Evaluation in information retreival IIR Ch.8, MIR Ch.3 HW#2
term project proposal
9Apr. 17, 2013 Chap. 9, Relevance feedback and query expansion IIR Ch.9, MIR Ch.5
10Apr. 24, 2013 (Midterm Exam) Due: HW#2
Due: Proposal
11May 1, 2013 Chap. 13, Text classification and Naive Bayes IIR Ch.13
Note: Only selected topics in Ch. 13 will be covered.
12May 8, 2013 Chap. 14, Vector space classification
Sec. 15.1 Support vector machines
Chap. 16, Flat clustering & Chap. 17, Hierarchical clustering
IIR 14.1-14.3
IIR Sec.15.1
IIR Ch.16-17, MIR 5.3
HW#3
Note: Only selected topics in Ch.14, 15-1, 16, & 17 will be covered.
13May 15, 2013 Chap. 19, Web search basics
Chap. 20, Web crawling and indexes
IIR Ch.19, MIR Ch.13
IIR Ch.20, MIR Ch.13
14May 22, 2013 Chap. 21, Link analysis
Advanced topics and applications of IR: CLIR, Multimedia IR, and Semantic Search
IIR Ch.21, MIR 2.7 Note: Only selected parts of Ch.21 will be introduced
15May 29, 2013 Final Presentation: Week 1 (10 teams scheduled, 8 teams completed.) Due: HW#3
16Jun. 5, 2013 Final Presentation: Week 2 (12 teams scheduled, 11 teams completed.)
17Jun. 12, 2013 (Leave for Dragon Boat Festival)
18Jun. 19, 2013 Final Presentation: Week 3 (12 teams scheduled, 11 completed.)

Useful Links

Here're some useful links to information retrieval related resources or further readings.

Programming Assignments and Projects

Please hand in your assignment before deadline according to the following instructions.

Submission Instructions

NOTE: Programs or projects in electronic files must be submitted directly to the TA online. The URL for homework submission is: http://140.124.183.39/IR/.

If you cannot successfully submit your work, please contact with the TA or the instructor.

Homeworks

There will be about 2-3 programming homeworks that target at different IR tasks.
  1. HW#1: Indexing
    Due: extended to Apr. 8, 2013 (Mon.).

    [NOTE] Each team will have to go to the TA for obtaining data files. You can also use smaller sample files for testing purpose.

  2. HW#2: Query processing and Searching
    Due: extended to Apr. 28, 2013 (Sun.).
  3. HW#3: Text Classification
    Due: Extended to May 29, 2013 (Wed.).
    [NOTE] Please use the ModApte Split in Reuters-21578 dataset, which contains 9,603 training docs and 3,299 test documents.
    You *only* need to consider classification of documents into 135 categories in Topics. (Ignore the documents without Topics tags.)

Projects

  1. Term Project: paper presentation or system demonstration
    ItemDescriptionTime
    Proposal You are required to submit a proposal for term project around midterm exam. Apr. 28, 2013 (Sunday)
    Topics The current list of topics in term project proposals is announced. (updated: May 22, 2013) Please check if there's any queston.
    For paper presentations, the paper quality will *greatly* affect your score in term project. Please *carefully* select good papers to read.
    Schedule The current schedule of term project presentation has be arranged. (updated Jun. 5, 2013)
    As of May 22, 2013, there are 54 students in 30 teams in total. Please check if there's any question.
    Due to our time limits, we will start the term project presentation on May 29, 2013.
    Each team is allocated up to 15-20 minutes for your presentation (and system demo).
    * [NOTE] All presentations *must* be finished within the scheduled time slots, which will be the last *three* weeks of this semester. No other time slots will be avbailable.
    If you have preferred time slots, please book at your earliest convenience in your proposal.
    May 29, Jun. 5, 19, 2013
    ReportPlease upload your final report after finishing your presentation.
    The final report should contain at least the following:
    1. presentation slides (for all teams), and
    2. source code, installation/execution instructions, team members and task responsibility (for system projects)
    Jun. 21, 2013 (Friday)

Exams

  1. Midterm Exam: Apr. 15-19, 2013
  2. Final Exam: Jun. 17-21, 2013

Scores

Please check the homework submission site for more details.
E-mail: jhwang AT csie . ntut . edu . tw
Created: Feb. 18, 2013.
Last Updated: Jun. 26, 2013.