Internet-based Information Extraction Technologies

 

Teacher:  Fang Li

Office:   SEIEE Building, No.3 Room: 533

Office Time: Tuesday (10~11 AM)

 

References: 

1Sonit Singh, Natural Language Processing for Information Extraction (2018)

2 Information Extraction: Algorithms and Prospects in a Retrieval Context by Marie-Francine Moens  Published by Spinger, (P.O. Box 17, 3300 AA Dordrecht, The Netherlands) . ISBN-13 978-1-4020-4993-4 (e-book)

3Jerry R.Hobbs, Ellen Riloff, Information Extraction chapter 21 of Handbook of Natural Language Processing (2010).

4Ralph Grishman. Information Extraction: Capabilities and Challenges (2012)

 

Introduction:

Internet-based Information extraction (IE) is the method of deriving structured information from unstructured text and semi-structured web pages. More succinctly, information extraction is finding names of the entities, relations and events from the Internet and free text.

The lecture introduces an overview of the history and technologies of information extraction. It presents the state-of-the art research methods and focuses on real world applications.

Readings will be based on the conference articles. Grades will be based on class participation and projects. There is no final examination for this course. Students are encouraged to form a group in order to finish projects and write reports. There are three tasks. Each group can choose two of them and present their project in the class workshop at the end of the semester.

 

Course Topics and Readings

Weeks

Topics

Slides

Readings & References

1th

Motivation &

Course Introduction

Lecture 1

NELL system

2th

Basic Knowledge for IE

Lecture 2

WordVectorTutorial

 

POSforChinese

WordVector1

WordVector2

3th

IE Concepts

Lecture 3

PPT(TA)

Chapter 1 of textbook

Chapter 2 of textbook

Chapter 8 of textbook

4th

Holiday

 

 

5th

Named Entity Extraction

Lecture 4-5

Chapter 4 of textbook

CRFmodelforORG

6 th

Named Entity Extraction

(Discussion & group presentation)

Article Reading and NotesSample 

Chapter 5 of textbook

Reference1

Reference2

Reference3

7th

Relation Extraction (pattern-based, supervised, semi-supervised)

Lecture 6

Reference, SVMguide

8th

Relation Extraction (distant-supervised, deepLearning) & group Discussion

Lecture 7

DistantSupervisionMethod

TransE(deepLearning)

9th

Event extraction

Lecture 8

Chapter9 of textbook

Template-based Event extraction without template

ALanguage-IndependentNeuralNetworkforEventDetection

10th

Opinion Mining

Lecture 9

Sentimental Analysis

PolarityEmbeddingFusionforRobustSentimentAnalysis

SAwithEnsembleofConvolutionalNeuralNetworkswithDistantSupervision

11th

Opinion Mining

Lecture 10

Turney Algorithm

Inducing Domain-specific Sentiment Lexicons from Unlabeled Corpora

12th

Webpage IE

Lecture 11

surveyofwebIE

13th

IE system

lecture 12

lecture 13 14

LixTo, Roadrunner

Know-it-all, Text runner, OPENIE

14th

Knowledge Graph (new)

15th ~16th

Student Workshop

Each group presents their work which includes: the task, its problems and analysis (2 minutes) Describe your general approach (3 minutes) Your results (3 minutes) Open questions and challenges (2 minutes) , Q&A (5 minutes).

Noted:

The content of each lecture may change every year. The above slides only give you the general information about each lecture. The classroom exercises and discussions are not included in these slides. The new teaching materials are in the Canvas.

Prerequisites

Data Structure, Programming Language, Natural Language Processing

Grading:

1.      Attendence & Classroom Discussions (40%)

2.      Reading & writing (20%) 

3.      Algorithm Design (40%)

Project tasks:

1)    Specific Relation Extraction. Please see the training data(for employment, chief of, location extraction) , another training data (for four kinds of employment relationship extraction) and student work1 and student work2 presented in the last few years for your references.

2)    Positive and negative Sentimental Analysis. Please see the training data for example.

3)    Web Page Extraction

About the evaluation:    

1)    Task1 (employment relation extraction): input file format and output file format.

2)    Task2 (positive and negative sentimental analysis) input file format and output file format

3)    Specification for evaluation and evaluation tools.