An Investigation of the Latent Semantic Analysis Technique for Document Retrieval

Abstract

Latent semantic analysis (LSA) application in information retrieval promises to offer

better performance by overcoming some limitations that plagues traditional termmatching

techniques. These term-matching techniques have always relied on

matching query terms with document terms to retrieve the documents having terms

matching the query terms. However, by use of these traditional retrieval techniques,

users’ needs have not been adequately served. While users want to search through

information based on conceptual content, natural languages have limited the

expression of these concepts. They present synonymy problem (a situation where

several words may have the same meaning) and polysemy problem (a situation

where a word may have several meanings). Due to these natural language

problems, individual words contained in users’ queries, may not explicitly specify the

intended user’s concept, which may result in the retrieval of some irrelevant

documents. LSA seems to be a promising technique in overcoming these natural

language problems especially synonymy problem. It deals with exploiting the global

relationships between terms and documents and then mapping these documents

and terms in a proximity space, where terms and documents that are closely related

are mapped close to each other in this space. Queries are then mapped to this

space with documents being retrieved based on similarity measures. In this report,

LSA performance in documents retrieval is investigated and compared with

traditional term-matching techniques.