For the past years, my research has been funded with multiple Army and DARPPA projects. Here list some main topics.
Liberal Information Extraction
Traditional IE approaches tend to follow a top-down manner - learning effective features for each predefined type according to human annotations, and then discovering facts specific to the predefined types. It has at least two limitations which make it difficult to be directly applied to new domains or languages: (1) First, this paradigm is not fully automatic because it involves human in the loop during the first two steps. Both of the predefined type schema and human annotated data are very expensive. (2) a predefined schema can only cover a limited number of types and relations.
We take a fresh look at the IE problem and design a cold-start liberal information extraction paradigm. We hypothesize that the facts, such as entities or events, usually shared similar types when they occur in similar contexts and scenarios. With this hypothesis, we develop a liberal entity/event extraction frameworks, as well as a zero shot event extraction approach. These work has been published in ACL'2016, BigData'2017, ACL'2018.
Low Resource Languages for Emergent Incidents
In many emergent situations such as disease out-breaks and natural disasters, there is great demand to rapidly develop a Natural Language Processing(NLP) system, such as name tagger, for a surprise Incident Language (IL) with very few resources. Traditional supervised learning methods that rely on large-scale manual annotations would be too costly. Considering these, a promising direction is to transfer available resources and annotations from high-resource languages to low-resource languages. To achieve this goal, we first develop a common semantic space framework as a bridge for thousands of languages, and then efficiently transfer knowledge across languages. Related work has been published in EMNLP'2018, NAACL'2019
Knowledge Network Construction and Population