Using Machine Learning Approaches for Displaying Query Results in Search Engines
Boğaziçi University, Computer Engineering Department, Istanbul, Turkey
Visiting Professor at TALP Research Center, UPC
A search engine is a type of web information retrieval system that is
frequently used by end users. By looking at the information displayed
by the search engine in response to a query, the users try to locate
the relevant pages and load them to find answers to their information
needs. Current search engines usually extract a few lines from the
contents of a web page that include the query terms and display these
as a representation of the document to the user. Such extracts pose
two difficulties for the user in deciding the relevancy of the page.
They are too short and include limited information, and also they
focus on the query words only. In this talk, we present a novel
approach that displays the query results in the form of summaries of
the web pages. We propose a system that performs document layout
analysis and learns a summarization model by using a number of machine
learning techniques. The summarization framework makes use of new
heuristics that take the output of the layout analysis into account.
Experiments on two standard datasets showed that the proposed
methodology significantly outperforms traditional search engines.
Tunga Güngör is an associate professor at the department of Computer
Engineering at Boğaziçi University, Istanbul, Turkey. He received his
PhD degree from the same department. His research interests include
natural language processing, machine translation, machine learning,
pattern recognition, and automated theorem proving. He published about
60 scientific articles, supervised three PhD students, and
participated in several research projects and conference
organizations. He is currently visiting professor at the TALP Center
(Center for Language and Speech Technologies and Applications) at
Universitat Politecnica de Catalunya.