This site may earn affiliate commissions from the links on this folio. Terms of use.

Yeah, yeah — of class a calculator won at a math competition. That'due south not the indicate. This story, which concerns a rather amazing program called GeoS from the Allen Constitute for Bogus Intelligence (AI2), is about the ability of AI to usefully appoint with the earth. To a computer, with a encephalon literally structured for these sorts of operations, the math SAT is non a test on calculation, but reading comprehension. That'due south why this story is so interesting: GeoS isn't as adept as the average American at geometry, it's equally good equally the boilerplate American at the Saturday itself.

Specifically, this AI program was able to score 49% accuracy on official SAT geometry questions, and 61% in practise questions. The 49% figure is basically identical to the average for existent human exam-takers. The plan was not given digitized or particularly labeled versions of the exam, simply looked at the exact same question layout every bit real students. It read the writing. It interpreted the diagrams. It figured out what the question was asking, and and so it solved the problem. It only got the reply nigh half the time — which makes it roughly every bit fallible equally a human existence.

SAT AI 2Of course, GeoS makes errors for different reasons than high-schoolers. A human existence might correctly interpret the question, so apply the wrong formula, or muck upward the adding. GeoS, being a computer, will virtually always get the right answer and then long as information technology truly understands the question. Information technology might not be able to read a word correctly, or the grammar of a question might be as well conflicting for the calculator to parse. Regardless, what we're really measuring here is the computer's ability to understand human communication in a grade that'due south deliberately (pardon the pun) obtuse.

To practise this, the researchers had to smash together a whole array of different software technologies. GeoS uses optical graphic symbol recognition (OCR) algorithms to read the text, and custom language processing to effort to understand what it reads. Geometry questions are structured to be difficult to parse, hiding important information as inferences and implications.

sat ai 3The other side of the coin is that though geometry questions are dense and hard to tease apart, they're also extremely uniform in construction and bailiwick matter. The AI's programmers can plan for the strict blueprint principles that go into writing the questions. Information technology couldn't take this same programming and direct use it to calculus problems for instance, because they use somewhat different language and mathematical symbols to describe the trouble. But a good GeometryBot would also exist relatively like shooting fish in a barrel to adapt to those few distinguishing rules. Each successive new area of competence would make the next one easier to acquire.

One intriguing implication of this research is that someday, we might have algorithms quality-checking Sabbatum questions. We could have unlike AI programs intended to achieve different levels of success on average questions, perhaps even for different reasons. Run proposed new questions through them, and their relative operation could not only weed out bad questions for bespeak to the source of the problem. BadAtReadingAI and BadAtLogicAI did equally expected on the question, just BadAtDiagramsAI did terribly — mayhap the drawing simply needs to be a little clearer.

This isn't a sign of the coming AI-pocalypse, or at least not a particularly immediate sign; as dense as geometry questions might be, they're homogeneous and nowhere near as complex as something like conversational speech. Just this study shows how the private tools available to AI researchers can be assembled to create rather full-featured artificial intelligences. When things will really take off is when those same researchers start snapping together those amalgamations into something far more versatile and full-featured — something not entirely unlike a real biological mind.