Internet-based Information Extraction Technologies

 

Teacher:  Fang Li

Office:   SEIEE Building, No.3 Room: 533 Tel: 34205423

Office Time: Thursday afternoon (14.00~16.00)

 

Teacher Assistant: Zhe Ye

 

Lecture Time & Venue: 

Every Thursday (10.00~11.40AM) from Sept, 2017 to Dec, 2017.

  Place:

Dong Xia Yuan 213 (东下院213)  1th and from 8~15th week

  Dong ShangYuan 115 (东上院115)  during week 2th to 7th

 

Textbook: 

Information Extraction: Algorithms and Prospects in a Retrieval Context by Marie-Francine Moens  Published by Spinger, (P.O. Box 17, 3300 AA Dordrecht, The Netherlands) . ISBN-13 978-1-4020-4993-4 (e-book)

 

References:

1)    Sunita Sarawagi, Information Extraction from Foundations and Trends in Database vol.1,No.3(2007) 261-377

2)    Jerry R.Hobbs, Ellen Riloff, Information Extraction chapter 21 of Handbook of Natural Language Processing (2010).

3)    Ralph Grishman. Information Extraction: Capabilities and Challenges (2012)

 

Introduction:

Internet-based Information extraction (IE) is the method of deriving structured information from unstructured text and semi-structured web pages. More succinctly, information extraction is finding names of the entities, relations and events from the Internet.

The lecture introduces an overview of the history and technologies of information extraction. It presents the state-of-the art research methods and focuses on real world applications.

Readings will be based on the text book and references. Grades will be based on class participation and a project. There is no final examination for this course. Students are encouraged to form a group in order to finish a project and write a report. There are three tasks. Each group can choose one of the tasks and present their project in the class workshop held at the end of the semester.

 

Course Topics and Readings

Weeks

Topics

Slides

Readings

1th

Motivation &

Course Introduction

Lecture 1

Video

NELL system

2th

Basic Knowledge for IE

Lecture 2

 

POSforChinese

WordVector1

WordVector2

3th

IE Concepts

Lecture 3

Chapter 1 of textbook

Chapter 2 of textbook

Chapter 8 of textbook

4th

Holiday

 

 

5th

Named Entity Extraction (rule-based)

Lecture 4

Chapter 4 of textbook

Reference

6 th

Named Entity Extraction (machine learning)

Lecture 5

Chapter 5 of textbook

CRFmodelforORG

7th

Relation Extraction (pattern-based, supervised)

Lecture 6

SVMguide

8th

Relation Extraction (semi-supervised & distant-supervised)

Lecture 7

DistantSupervisionMethod

9th

Event extraction

Lecture 8

Chapter9 of textbook

10th

Opinion Mining

Lecture 9

Sentimental Analysis

SA2016competitionRef

11th

Opinion words mining

Lecture 10

Turney Algorithm

Inducing Domain-specific Sentiment Lexicons from Unlabeled Corpora

12th

Webpage IE

Lecture 11

surveyofwebIE

13th

IE system (1)

lecture 12

LixTo, Roadrunner

14th

IE System (2)

Lecture 13 

Know-it-all

15th

IE System (3)

lecture 14

Text runner

16 th

Student Workshop

Schedule

Each group presents their work

Noted:

The content of each lecture may change. The above slides only give you the general information about each lecture. The classroom exercises and discussions are not included in these slides.

References from Industry:  

1)     Knowledge-based Information extraction taught by an expert from Alibaba in the year of 2013.

2)  Information Extraction in E-commerce taught by an expert from Alibaba in the year of 2013.

Prerequisites

Data Structure, Programming Language, Natural Language Processing

Grading:

1.       Attendence & Classroom Discussions (40%)  (from 1th to 15th week)

2.       Workshop Presentation (20%)  (in the 16th week)

3.       Evaluations of Algorithm or System (40%)  (in the 17th week)   

Project tasks:

1)    Specific Relation Extraction. Please see the training data(for employment, chief of, location extraction) , another training data (for four kinds of employment relationship extraction) and student work1 and student work2 presented in the last few years for your references. Or

2)    Positive and negative Sentimental Analysis. Please see the training data for example. Or

3)    Exploring a new extraction for some particular applications such as news extraction.

About the evaluation: (the time and place will be announced later)     

1)    Task1 (employment relation extraction): input file format and output file format.

2)    Task2 (positive and negative sentimental analysis) input file format and output file format

3)    Specification for evaluation