Speaker: Zhe Wang
Time: 10:00-11:30, 23rd April
Venue: SEIEE 3-404
Abstract: Keyword search engines have been the state-of-the-art
information retrieval tool over large text corpora for two decades. To date,
most search engines have little understanding that keywords and documents refer
to entities and relations in real-life. Better search results and experience
can be achieved by understanding entities and relations in documents as well as
in queries. A knowledge base (KB) containing relevant entities and relations
should be the backbone of any application that is fueled by text. Given a large
amount of text data, a system is needed that can automatically construct a
knowledge base using statistical machine learning (SML) methods, manage the
uncertainty inherent in the extracted knowledge, and maintain them over time.
In this
talk, I first summarize the major results from BayesStore, a probabilistic
database system that natively supports SML models and various inference
algorithms to perform query-driven knowledge extraction from text and
probabilistic query processing over uncertain extractions. Results show that
BayesStore can significantly improve performance and answer quality for queries
over unstructured text.
With
BayesStore as a foundation, I propose to build a probabilistic knowledge base
(ProbKB) system with a deep integration of the SML methods with scalable data
processing frameworks. A ProbKB system should be designed to support various
aspects in the life of a knowledge base (KB) including KB extraction,
expansion, evolution, and integration. I will discuss in detail the challenges
and our current progress in the following three research directions: (1)
scalable statistical information extraction; (2) probabilistic deductive
inference and incremental maintenance over large uncertain KBs; and (3)
probabilistic knowledge integration from both SML and crowd-sourcing.
Bio: Daisy Zhe Wang is an Assistant Professor in the CISE
department at the University of Florida. She obtained her Ph.D. degree from the
EECS Department at the University of California, Berkeley in 2011 and her
Bachelor’s degree from the ECE Department at the University of Toronto in 2005.
At Berkeley, she was a member of the Database Group and the AMP/RAD Lab. She is
particularly interested in bridging scalable data management and processing
systems with probabilistic models and statistical methods. She currently
pursues research topics such as probabilistic databases, probabilistic knowledge
bases, large-scale inference engines, query-driven interactive machine
learning, and crowd assisted machine learning. Her research is currently funded
by DARPA, Google, Pivotal, Greenplum/EMC, Survey Monkey and Law School at UF.