A computational algorithm for fact-checking

Yet another “computers can’t…” myth busted
June 19, 2015

These charts map the “truth scores” of statements related to geography, history and entertainment. Correct statements appear along the diagonal, with color intensity indicating the strength of the score (credit: Giovanni Ciampaglia)

Computers can now do fact-checking for any body of knowledge, according to Indiana University network scientists, writing in an open-access paper published June 17 in PLoS ONE.

Using factual information from summary infoboxes from Wikipedia* as a source, they built a “knowledge graph” with 3 million concepts and 23 million links between them. A link between two concepts in the graph can be read as a simple factual statement, such as “Socrates is a person” or “Paris is the capital of France.”

In the first use of this method, IU scientists created a simple computational fact-checker that assigns “truth scores” to statements concerning history, geography and entertainment, as well as random statements drawn from the text of Wikipedia. In multiple experiments, the automated system consistently matched the assessment of human fact-checkers in terms of the humans’ certitude about the accuracy of these statements.

Dealing with misinformation and disinformation

In what the IU scientists describe as an “automatic game of trivia,” the team applied their algorithm to answer simple questions related to geography, history, and entertainment, including statements that matched states or nations with their capitals, presidents with their spouses, and Oscar-winning film directors with the movie for which they won the Best Picture awards. The majority of tests returned highly accurate truth scores.

Lastly, the scientists used the algorithm to fact-check excerpts from the main text of Wikipedia, which were previously labeled by human fact-checkers as true or false, and found a positive correlation between the truth scores produced by the algorithm and the answers provided by the fact-checkers.

Significantly, the IU team found their computational method could even assess the truthfulness of statements about information not directly contained in the infoboxes. For example, the fact that Steve Tesich — the Serbian-American screenwriter of the classic Hoosier film “Breaking Away” — graduated from IU, despite the information not being specifically addressed in the infobox about him.

Using multiple sources to improve accuracy and richness of data

“The measurement of the truthfulness of statements appears to rely strongly on indirect connections, or ‘paths,’ between concepts,” said Giovanni Luca Ciampaglia, a postdoctoral fellow at the Center for Complex Networks and Systems Research in the IU Bloomington School of Informatics and Computing, who led the study.

“If we prevented our fact-checker from traversing multiple nodes on the graph, it performed poorly since it could not discover relevant indirect connections,” said Ciampaglia. “But because it’s free to explore beyond the information provided in one infobox, our method leverages the power of the full knowledge graph.

“These results are encouraging and exciting. We live in an age of information overload, including abundant misinformation, unsubstantiated rumors and conspiracy theories whose volume threatens to overwhelm journalists and the public. Our experiments point to methods to abstract the vital and complex human task of fact-checking into a network analysis problem, which is easy to solve computationally.”

Expanding the knowledge base

Although the experiments were conducted using Wikipedia, the IU team’s method does not assume any particular source of knowledge. The scientists aim to conduct additional experiments using knowledge graphs built from other sources of human knowledge, such as Freebase, the open-knowledge base built by Google, and note that multiple information sources could be used together to account for different belief systems.

The team added a significant amount of natural language processing research, but they note that additional work remains before these methods could be made available to the public as a software tool.

The work was supported in part by the Swiss National Science Foundation, the Lilly Endowment, the James S. McDonnell Foundation, the National Science Foundation, and the Department of Defense.

* The team selected Wikipedia as the information source for their experiment due to its breadth and open nature. Although Wikipedia is not 100 percent accurate, previous studies estimate the online encyclopedia is nearly as reliable as traditional encyclopedias, but also covers many more subjects, the researchers note.


Abstract of Computational Fact Checking from Knowledge Networks

Traditional fact checking by expert journalists cannot keep up with the enormous volume of information that is now generated online. Computational fact checking may significantly enhance our ability to evaluate the veracity of dubious information. Here we show that the complexities of human fact checking can be approximated quite well by finding the shortest path between concept nodes under properly defined semantic proximity metrics on knowledge graphs. Framed as a network problem this approach is feasible with efficient computational techniques. We evaluate this approach by examining tens of thousands of claims related to history, entertainment, geography, and biographical information using a public knowledge graph extracted from Wikipedia. Statements independently known to be true consistently receive higher support via our method than do false ones. These findings represent a significant step toward scalable computational fact-checking methods that may one day mitigate the spread of harmful misinformation.