F033583 Introduction to Web Search and Mining

Course Summary

The World Wide Web (WWW) is the largest source of open-domain information today. The popularization of the web has revolutionized the way people search and retrieve information. This course presents the fundamental theory and practice behind web search engines and introduce some basic techniques to extract information and mine knowledge from the web, with an emphasis on text documents. After learning from this course, you should be able to understand the basic internals of a web search engine, and perhaps build a small search engine of yourself. On the other hand, you should get enough hands-on experience to write a crawler to extract data from the web and do various data analytics on the acquired data.

Latest News

Administrative Information

Lectures: Fri 14:00-16:30, DZY 1-107

Instructor: Kenny Zhu - SEIEE 03-407 Phone: 3420-4592 Email: kzhu@cs.sjtu.edu.cn
Office hours: by appointment via email or ask after class

Teaching Assistant: Flora Huang and Kelsey Huang - SEIEE 03-341, Email: florahuangss@163.com and shangleihuang@sjtu.edu.cn
Office hours: Thursday 4-5 PM

Reference Textbooks:

  1. Introduction to Information Retrieval, Jul 7, 2008, by Christopher D. Manning and Prabhakar Raghavan
  2. Modern Information Retrieval: The Concepts and Technology behind Search (2nd Edition), by Ricardo Baeza-Yates, Berthier Ribeiro-Neto
  3. Mining the Web: Discovering Knowledge from Hypertext Data Hardcover - October 23, 2002, by Soumen Chakrabarti
  4. Web Information Retrieval (Data-Centric Systems and Applications), Aug 30, 2013, by Stefano Ceri and Alessandro Bozzon
  5. Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data (Data-Centric Systems and Applications), Aug 6, 2013, by Bing Liu


  1. Quizzes: 30%
  2. Assignments: 30%
  3. Group Project: 40% Project Description Peer Review Form


LectureDateTopic SlidesResourcesHomework
103/01/2019 Introduction, Boolean Retrieval, Vocabulary[pdf] IIR Ch. 1-2 Assignment 1[tex] [pdf]
203/10/2019 Tolerant Retrival[pdf] IIR Ch. 3Assignment 2 [tex] [pdf]
303/15/2019 Index Construction and Index Compression (I)[pdf] IIR Ch. 4-5Assignment 3 [tex] [pdf]
403/23/2019 Index Compression (II), Scoring, Term Weighting and Vector Space Model[pdf] IIR Ch. 5-6Assignment 4 [tex] [pdf]
503/30/2019 Scoring and Complete Search System, Web Basics (I)[pdf] IIR Ch. 19, 20Assignment 5 [tex] [pdf]
604/12/2019 Web Basics (II)[pdf] IIR Ch. 20Assignment 6 [tex] [pdf]
704/18/2019 Link Analysis [pdf] IIR Ch. 21Assignment 7 [tex] [fig] [pdf]
804/20/2019 Evaluation [pdf] IIR Ch. 8 Assignment 8 [tex] [pdf]
1005/05/2019 Text Classification, Probabilistic Retrieval Model, Language Model[pdf] IIR Ch. 10, 11, 12 Assignment 9 [tex] [pdf]
Copyright (c) Kenny Q. Zhu, 2016-2019.