Accuracy is critical for the world’s largest online encyclopedia
AI researchers from Meta have developed a model that can verify citations in Wikipedia, the world’s largest online encyclopedia, for accuracy.
Many people get their information from Wikipedia. But with 17,000 new articles added to Wikipedia every month, human editors simply cannot keep up with ensuring the facts in Wiki pages are backed up by the footnotes.
Enter AI. While automated tools can spot gibberish or missing citations, Meta’s AI model goes a step further: It can automatically scan hundreds of thousands of citations at once to see whether they back up the Wiki page’s facts. The model was built on a dataset comprised of 134 million public web pages.
“It calls attention to questionable citations, allowing human editors to evaluate the cases most likely to be flawed without having to sift through thousands of properly cited statements,” the company said in a blog post. “If a citation seems irrelevant, our model will suggest a more applicable source, even pointing to the specific passage that supports the claim.”
Given the model’s ultimate purpose, Meta has opted to go down the open source route, making it available via GitHub under an MIT license. Read the research paper here.
Models to ‘teach tech’
According to Meta, automated tools can be used to identify statements that lack citations – but it’s more difficult for AI to determine whether a source actually backs up a claim. But if successfully managed, it would aid the research community.
The citation model was trained using pages from statements, accompanied by websites that may or may not support the claims made in the entry.
By using full web pages over a handful of sentences, Meta claims its model has achieved “a leap in performance in terms of detecting the accuracy of citations.”
For example, the model was able to detect an inaccuracy in a reference to the Los Angeles Philharmonic CEO appointment in 2018. The AI was able to deduce that the cited press release was not relevant to the claim, and instead suggested another possible source, a blog post on the L.A. Philharmonic’s website.
“Once they are ready to be deployed, our models will bolster the quality of knowledge on Wikipedia, helping to preserve the accuracy of a resource that virtually everyone uses,” the researchers said.
Meta’s AI researchers plan to further refine the model – and make it able to suggest auto-complete text and offer proofreading corrections.
“Ideally, the models would understand multiple languages and be able to process several types of media, including video, images, and data tables,” the blog post concluded. “These capabilities are among Meta AI’s new targets as we help teach technology to understand our world.”
Other AI models from Meta
The citation model is the latest in a slew of AI models the company’s researchers have unveiled.
A week prior to its citation model reveal, Meta showcased NLLB-200 – which can translate over 200 different languages. NLLB is designed to improve machine translation capabilities.
In early June, Meta exhibited LegonNN, a model designed to allow developers to reuse modules when building machine learning architectures.
June also saw its AI team release OPT-66B – an open source, 66 billion parameter version of its OPT language model. The differing-sized models let researchers study the effect of language model scaling, the company said.
And that same month, Meta, along with researchers from the University of Texas, published three open source AI models for the audio-visual understanding of human speech and sounds in videos. The models are designed to improve acoustics for augmented reality experiences.