|LSA Website Executive Summary|
This website is organized into three content areas: Information, Applications, and Demonstrations. Detailed instructions on how to best use the website are available on the 1st Time User/Help File page. We recommend you view that page after reading this summary.
Latent Semantic Analysis (LSA) captures the essential relationships between text documents and word meaning, or semantics, the knowledge base which must be accessed to evaluate the quality of content. Several educational applications that employ LSA have been developed: (1) selecting the most appropriate text for learners with variable levels of background knowledge, (2) automatically scoring the content of an essay, and (3) helping students effectively summarize material.
An LSA Primer
Latent Semantic Analysis (LSA) is a mathematical/statistical technique for extracting and representing the similarity of meaning of words and passages by analysis of large bodies of text. It uses singular value decomposition, a general form of factor analysis, to condense a very large matrix of word-by-context data into a much smaller, but still large-typically 100-500 dimensional-representation (Deerwester, Dumais, Furnas, Landauer & Harshman, 1990). The right number of dimensions appears to be crucial; the best values yield up to four times as accurate simulation of human judgments as ordinary co-occurence measures.
The similarity between resulting vectors for words and contexts, as measured by the cosine of their contained angle, has been shown to closely mimic human judgments of meaning similarity and human performance based on such similarity in a variety of ways. For example, after training on about 2,000 pages of English text it scored as well as average test-takers on the synonym portion of TOEFL-the ETS Test of English as a Foreign Language (Landauer & Dumais, 1997). After training on an introductory psychology textbook it achieved a passing score on a multiple-choice exam (Landauer, Foltz & Laham, in prep). LSA significantly improves automatic information retrieval by allowing user requests to find relevant text on a desired topic even when the text contains none of the words used in the query (Dumais, 1991, 1994).
About the Demonstrations of Educational Applications
The Educational Text Selection demonstration is a result of an empirical examination of a theoretical relationship proposed by Walter Kintsch in which the ability of a reader to learn from text is proposed to be dependent on the match between the background knowledge of the reader and the difficulty of the text information. LSA is used as a means of automatically predicting how much readers will learn from texts based on the estimated conceptual match between their knowledge of the topic and the information in the text they read.
A demonstration of using LSA in Essay Scoring is also available. To assess the quality of essays, LSA is first trained on domain-representative text. Then student essays are characterized by LSA vectors of their contained words and compared with essays of known quality on degree of conceptual relevance and amount of relevant content. Over many diverse topics, LSA scores have agreed with human experts as well as expert scores agreed with each other.
At an abstract level one can distinguish three properties of a student essay that are desirable to assess; the correctness and completeness of its contained conceptual knowledge, the soundness of arguments that it presents in discussion of issues, and the fluency, elegance, and comprehensibility of its writing. One might also want to score for grammatical and stylistic variables or for mechanical features such as spelling and punctuation. Evaluation of superficial mechanical and syntactical features is fairly easy to separate from the other factors, but the rest-content, argument, comprehensibility, and aesthetic style-are likely to be difficult to pull apart because each influences the other, if only because each depends on the choice of words.
Previous attempts to develop computational techniques for scoring essays have focused primarily on measures of style. Indices of content have remained secondary, indirect and superficial. In contrast to earlier approaches, LSA methods concentrate on the conceptual content, the knowledge conveyed in an essay, rather than its style. A number of experiments have been done using LSA derived measures of text in a variety of ways and calibrating them against several different types of standards to arrive at quality scores.
In the Summary Scoring & Revision demonstration, you will see how LSA can provide formative evaluations of the quality of a student summary for a given text. These evaluations provide feedback during the course of the student's writing to help guide the student (in subsequent revisions) toward the content that experts consider most important from the text.
Summarizing is a strategy that can be used to determine whether students understand what they have read and whether or not they have learned from it. Summarization not only reveals the existence of comprehension breakdowns during reading, but also helps to pinpoint the location and cause of the breakdown.
Writing a summary also involves active meaning construction, despite
its focus on the textbase. To a much greater degree than notetaking or
outlining it requires the construction of a mental representation that joins
elements of text information with each other and with elements of prior
knowledge. To the degree that students do this, they will acquire new knowledge
that is useful and long lasting; a well elaborated knowledge representation
will support knowledge application as well as recall. Finally, summaries
are a communication tool, a means of sharing one's knowledge with others.
For more information please click on the Information link in the upper left Main Menu frame, which will provide a list of options in the lower left Sub Menu frame. We suggest clicking on the 1st Time User/Help File link to find out how to best use this site and on the What is LSA? link to get more information about the LSA method.
All references are available for downloading on the Group