F033583 Introduction to Web Search and Mining
The World Wide Web (WWW) is the largest source of open-domain information today. The popularization of the web has revolutionized the way people search and retrieve information. This course presents the fundamental theory and practice behind web search engines and introduce some basic techniques to extract information and mine knowledge from the web, with an emphasis on text documents. After learning from this course, you should be able to understand the basic internals of a web search engine, and perhaps build a small search engine of yourself. On the other hand, you should get enough hands-on experience to write a crawler to extract data from the web and do various data analytics on the acquired data.
- Feb 21, 2017: Web page opens.
- Feb 22, 2017: Assignment 1 released.
- Mar 01, 2017: Assignment 2 released.
- Mar 02, 2017: Assignment 2 updated. Please re-download.
- Mar 7, 2017: Project description released. Please see below! Peer review form also
- Mar 7, 2017: Assignment 3 released.
- Mar 15, 2017: Assignment 4 released.
- Mar 21, 2017: There's no assignment this week. Yay!! Regarding the
questions about adjacency list compression and HITS convergence,
I added some comments and
references to the slides.
- Mar 29, 2017: Assignment 5 released. There will be no class next week. Enjoy your holiday!
- May 2, 2017: Tutorial 2 released below, which answers some of the questions
you have in the assignments and quizzes. If you have any questions or doubts
about these answers, please contact Yuchen by midnight, May 9, 2017. After this
deadline, no changes to your scores will be allowed.
Lectures: Tue 12:55-16:10,
Chen Rui Qiu Building, Room 219
Instructor: Kenny Zhu
- SEIEE 03-541 Phone: 3420-4592
Office hours: by appointment via email or ask after class
Yuchen Sha - SEIEE 03-341,
Office hours: Thursday 4 PM
- Introduction to Information Retrieval, Jul 7, 2008, by Christopher D. Manning and Prabhakar Raghavan
- Modern Information Retrieval: The Concepts and Technology behind Search (2nd Edition), by Ricardo Baeza-Yates, Berthier Ribeiro-Neto
- Mining the Web: Discovering Knowledge from Hypertext Data Hardcover - October 23, 2002, by Soumen Chakrabarti
- Web Information Retrieval (Data-Centric Systems and Applications), Aug 30, 2013, by Stefano Ceri and Alessandro Bozzon
- Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data (Data-Centric Systems and Applications), Aug 6, 2013, by Bing Liu
- Quizzes: 30%
- Assignments: 30%
- Group Project: 40% Project description
Peer Review Form
||Introduction, Boolean Retrieval, Vocabulary||[pdf]
||IIR Ch. 1-2||Assignment 1 [tex] [pdf]
||Tolerant Retrival, Index Construction||[pdf]
||IIR Ch. 3-4||Assignment 2 [tex] [pdf]
||Scoring and Complete Search System||[pdf]
||IIR Ch. 6-7||Assignment 3 [tex]
||Web Basics and Crawling||[pdf]
||IIR Ch. 19, 20||Assignment 4 [tex] [pdf]
||Link Analysis and Tutorial 1
||IIR Ch. 21||
||Text Classification, Probabilistic Retrieval Model, Language Model||[pdf]
||IIR Ch. 11, 12, 21||Assignment 5 [tex]
||Evaluation, Summary, Query Expansion||
||IIR Ch. 8, 9||
Copyright (c) Kenny Q. Zhu, 2016-2017.