AI system solves SAT geometry questions as well as average American 11th-grade student

September 23, 2015

Examples of questions (left column) and interpretations (right column) derived by GEOS (credit: Minjoon Seo et al./Proceedings of EMNLP)

An AI system that can solve SAT geometry questions as well as the average American 11th-grade student has been developed by researchers at the Allen Institute for Artificial Intelligence (AI2) and University of Washington.

This system, called GeoS, uses a combination of computer vision to interpret diagrams, natural language processing to read and understand text, and a geometric solver, achieving 49 percent accuracy on official SAT test questions.

If these results were extrapolated to the entire Math SAT test, the computer roughly achieved an SAT score of 500 (out of 800), the average test score for 2015.

These results, presented at the 2015 Conference on Empirical Methods in Natural Language Processing (EMNLP) in Lisbon, Portugal, were achieved by GeoS solving unaltered SAT questions that it had never seen before and that required an understanding of implicit relationships, ambiguous references, and the relationships between diagrams and natural-language text.

The best-known current test of an AI’s intelligence is the Turing test, which involves fooling a human in a blind conversation. “Unlike the Turing Test, standardized tests such as the SAT provide us today with a way to measure a machine’s ability to reason and to compare its abilities with that of a human,” said Oren Etzioni, CEO of AI2. “Much of what we understand from text and graphics is not explicitly stated, and requires far more knowledge than we appreciate.”

How GeoS Works

GeoS is the first end-to-end system that solves SAT plane geometry problems. It does this by first interpreting a geometry question by using the diagram and text in concert to generate the best possible logical expressions of the problem, which it sends to a geometric solver to solve. Then it compares that answer to the multiple-choice answers for that question.

This process is complicated by the fact that SAT questions contain many unstated assumptions. For example, in top example in the SAT problem above, there are several unstated assumptions, such as the fact that lines BD and AC intersect at E.

GeoS had a 96 percent accuracy rate on questions it was confident enough to answer. AI2 researchers said they are moving to solve the full set of SAT math questions in the next three years.

An open-access paper outlining the research, “Solving Geometry Problems: Combining Text and Diagram Interpretation,” and a demonstration of the system’s problem-solving are available. All data sets and software are also available for other researchers to use.

The researchers say they are also building systems that can tackle science tests, which require a knowledge base that includes elements of the unstated, common-sense knowledge that humans generate over their lives. This Aristo project is described here.

Abstract of Solving geometry problems: Combining text and diagram interpretation

This paper introduces GeoS, the first automated system to solve unaltered SAT geometry questions by combining text understanding and diagram interpretation. We model the problem of understanding geometry questions as submodular optimization, and identify a formal problem description likely to be compatible with both the question text and diagram. GeoS then feeds the description to a geometric solver that attempts to determine the correct answer. In our experiments, GeoS achieves a 49% score on official SAT questions, and a score of 61% on practice questions. Finally, we show that by integrating textual and visual information, GeoS boosts the accuracy of dependency and semantic parsing of the question text.