The question answering system will perform two tasks: document retrieval and passage retrieval. The system will have access to a corpus of text documents. When presented with a query, document retrieval will first identify which document(s) are most relevant to the query. Once the top documents are found, the top document(s) will be subdivided into passages (in this case, sentences) so that the most relevant passage to the question can be determined.
To find the most relevant documents, we use both the term frequency for words in the query as well as inverse document frequency for words in the query. After we found the most relevant documents, we use a combination of inverse document frequency and a query term density measure for scoring passages.