CS7330 Introduction to Web Search and Mining
The World Wide Web (WWW) is the largest source of open-domain information today. The popularization of the web has revolutionized the way people search and retrieve information. This course presents the fundamental theory and practice behind web search engines and introduce some basic techniques to extract information and mine knowledge from the web, with an emphasis on text documents. After learning from this course, you should be able to understand the basic internals of a web search engine, and perhaps build a small search engine of yourself. On the other hand, you should get enough hands-on experience to write a crawler to extract data from the web and do various data analytics on the acquired data.
- 2023/01/08: Course website reopens.
- 2023/02/18: Assignment 1 released.
- 2023/02/26: Assignment 2 released.
- 2023/03/05: Assignment 3 released.
- 2023/03/11: Assignment 4 released.
- 2023/03/16: Starting from this week, I no longer accept quiz submissions on
canvass. Instead, all submissions must be made to me personally, immediately
after the class.
- 2023/03/17: Assignment 5 released.
- 2023/03/25: Assignment 6 released.
- 2023/04/03: Assignment 7 released.
- 2023/04/09: Project description released below. Please start the bidding of your group project topic asap.
At the end of the semester, by the deadline of the project report, each member of the group need to submit a peer-review form individually to the TAs as well. Please download the peer-review form below.
- 2023/04/09: Assignment 8 released.
- 2023/04/10: Project B of the project description was updated. Please download again.
- 2023/04/16: Assignment 9 released.
- 2023/04/29: Assignment 10 released.
Lectures: Fri 12:55-15:20, DZY 2-106
Instructor: Kenny Zhu
- SEIEE 03-407 Phone: 3420-4592
Office hours: by appointment via email or ask after class
Apple Chen and
Xukai Wang - SEIEE 03-329
Office hours: Thursday 4-5 PM
- Introduction to Information Retrieval, Jul 7, 2008, by Christopher D. Manning and Prabhakar Raghavan
- Modern Information Retrieval: The Concepts and Technology behind Search (2nd Edition), by Ricardo Baeza-Yates, Berthier Ribeiro-Neto
- Mining the Web: Discovering Knowledge from Hypertext Data Hardcover - October 23, 2002, by Soumen Chakrabarti
- Web Information Retrieval (Data-Centric Systems and Applications), Aug 30, 2013, by Stefano Ceri and Alessandro Bozzon
- Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data (Data-Centric Systems and Applications), Aug 6, 2013, by Bing Liu
- Other Research Papers
- Quizzes: 30%
- Assignments: 30%
- Group Project: 40%
Peer Review Form
||Introduction, Boolean Retrieval, Vocabulary||[pdf]
||IIR Ch. 1-2||
||IIR Ch. 3||Assignment 2 [tex] [pdf]
||Index Construction and Index Compression (I)||[pdf]
||IIR Ch. 4-5||Assignment 3 [tex]
||Index Compression (II), Scoring, Term Weighting and Vector Space Model||[pdf]
||IIR Ch. 5-6||Assignment 4 [tex] [pdf]
||Scoring and Complete Search System, Web Basics (I)||
||IIR Ch. 19, 20||Assignment 5[tex]
||Web Basics (II)||[pdf]
||IIR Ch. 20||Assignment 6 [tex] [pdf]
||IIR Ch. 21||Assignment 7 [tex]
||Crawling and Evaluation||
||IIR Ch. 8
||Assignment 8 [tex]
||Text Classification, Probabilistic Retrieval Model,
||IIR Ch. 10, 11, 12
Assignment 10 [tex] [pdf]
||Next Generation Information Retrieval||[pdf]
||Refer to slides
Assignment 10 [tex] [pdf]
Copyright (c) Kenny Q. Zhu, 2016-2023.