M+E Connections

Wikidata Founder, Google Ontologist: Data Doesn’t Equal Truth

MARINA DEL REY, Calif. — Wikidata founder and current Google ontologist Denny Vrandečić knows the power of information better than most. He also knows that you need to tread carefully with the value placed on information.

“Every entry in Wikidata can make a statement,” Vrandečić said Feb. 19, speaking during digital agency Diamond’s monthly CTO Mixer event. “[But] Wikidata is not about truth. We look at what sources say and share it. It’s up to the user to decide what to believe. Wikidata is not about the truth, it’s about what sources say.”

Wikidata — the collaboratively edited knowledge base that serves as a common source of open data used by Wikipedia and others — has a stunning 60 million items described, with support for more than 400 languages, and 20,700 monthly contributors. As of the end of 2018, Wikidata information is used in nearly 60% of all English Wikipedia articles.

The goal with Wikidata is to capture as much collective knowledge online as possible, Vrandečić said, and to make that information readable in a way where computers can make connections between entries. Because online content is written by humans — so-called “natural languages” — the information may not show up easily when it’s sought after, he said.

“The problem is, if you have content in natural language text, it can be hidden there,” Vrandečić said. “As good as [natural language] is to representing knowledge, it’s bad at pulling out knowledge in a strong way.” By putting all this information into Wikidata, “we have less inconsistencies, the data quality is better, and nobody has to maintain it,” he said.

Vrandečić used a light-hearted example of Winterthur, Germany, and Ontario, Calif., two sister cities that established a relationship decades ago, but whose sister city partnership had been lost in in time for Winterthur … at least until Wikidata made the connection last year.

“The information was there, but nobody recognized the discrepancy until we put natural language to it,” Vrandečić said. “It shouldn’t be that hard. It’s the same story happening hundreds, thousands, millions of times, all over the web.

“We have information out there, but it’s not available the moment you actually need it. We want to turn Wikidata into the Rosetta Stone of the web. And it’s catching on.”