Web-Scale Information Extraction
Dr. Alexander Yates
Department of Information Science and Technology Center
DATE: Wednesday, March 26, 2008
COFFEE: 2:00 – 2:30 PM
TIME: 2:30 PM - 3:30 PM
LOCATION: GITC Building
The web is a gold mine for those seeking information. In many ways however, the information can be difficult to find in that vast mine, even with commercial search engine technology. Information extraction is a research area that endeavors to find all of the nuggets of information in a body of text. Recent advances in domain-independent information extraction have led to new systems capable of extracting enormous quantities of detailed information from the Web and other large corpora. This talk will discuss ongoing research into some of these new techniques, highlighting the challenges and novel techniques involved in scaling up to the size of the Web. It will highlight some of the extensions to information extraction, especially synonym resolution, that make it a promising area of research. It will then discuss the larger goal of language-independent information extraction. It will conclude with a discussion of the potential applications of such an advance.
Alexander Yates is currently an Assistant Professor in the Information Science and Technology Center, part of Temple University's Computer and Information Sciences Department. He received his Ph.D. in Computer Science and Engineering from the University of Washington in 2007, and his B.A. in Applied Mathematics from Harvard University in 2001. His research interests are in the area of text mining and natural language understanding, as well as machine learning and probabilistic methods.