Using ORES and Machine Learning to Flag Risky Wikipedia Edits

Wikipedia has over 60 million articles and millions of edits every day. Not all of them are helpful. Some are vandalism-nonsense, spam, or outright lies. For years, human editors spent hours rolling back bad changes. But as the site grew, that became impossible. That’s where ORES is a machine learning system built by the Wikimedia Foundation to automatically score the quality and intent of Wikipedia edits. ORES doesn’t replace humans. It helps them. It flags the risky edits so editors can focus on what matters.

What is ORES and how does it work?

ORES stands for Objective Revision Evaluation Service. It’s not a single tool. It’s a collection of machine learning models trained on decades of Wikipedia edit history. Each model looks at a different kind of edit. One model predicts if an edit is vandalism. Another guesses if it’s a good-faith improvement. Another checks if it’s likely to be reverted. These models don’t guess randomly. They learn from patterns.

Here’s how it works in practice. When someone edits a Wikipedia page, ORES scans the edit in real time. It looks at things like:

  • How many words were added or deleted
  • Whether the edit includes known spam phrases
  • Whether the editor has a history of reverts
  • Whether the edit matches patterns from past vandalism
  • Whether the edit was made from a blocked IP address

Each factor gets a score. Then ORES combines them into one overall probability. For example, an edit might have a 92% chance of being vandalism. That’s high. Editors using the WikiDashboard or the recent changes feed see these scores as color-coded indicators-red for risky, green for safe. It’s like a traffic light for edits.

Why machine learning? Why not just rules?

Before ORES, Wikipedia used rule-based filters. If an edit contained the word "sex" or "fuck," it got flagged. But that caught a lot of false positives. A medical article about sexual health? Blocked. A history paper quoting a swear word? Blocked. Rules are blunt. They don’t understand context.

Machine learning is smarter. It doesn’t look for words. It looks for behavior. A new user who edits one page and adds "Obama is a lizard"? High risk. A veteran editor who fixes a typo in a 10,000-word article? Low risk. ORES learns from thousands of past cases where humans manually reviewed edits. It sees that vandalism often comes from new accounts, uses extreme language, and deletes large chunks of text. It doesn’t need to know the meaning of "lizard"-it just knows that edit pattern is dangerous.

Studies from the Wikimedia Research team show ORES catches 80% of obvious vandalism within seconds. Human reviewers alone would have taken hours to find even half of those. And ORES gets better over time. Every time an editor confirms or corrects a prediction, the system learns. It’s self-improving.

Real-world impact: How editors use ORES

Volunteer editors don’t sit there staring at every single edit. They use tools built around ORES scores. The Recent Changes Patrol interface now shows ORES predictions right next to each edit. Some editors filter out edits with low risk scores, so they only see the ones that need attention. Others use automated bots that revert edits with over 90% vandalism probability before any human even sees them.

One study from 2024 analyzed 2.3 million edits across 12 language Wikipedias. It found that when ORES was active, vandalism was reverted 40% faster than when it wasn’t. In the English Wikipedia, the average time to revert vandalism dropped from 22 minutes to 13 minutes. That’s huge. It means harmful content stays up for less time. Readers see fewer lies.

But ORES isn’t perfect. Sometimes it misses clever vandalism. A user might change "The capital of France is Paris" to "The capital of France is Berlin"-a simple swap that doesn’t trigger the usual red flags. ORES might give it a 30% risk score. That’s not high enough to auto-revert. But it’s still wrong. That’s why humans still matter. ORES points them to the suspicious edits. Humans decide if it’s real.

An experienced editor reviewing a suspicious Wikipedia edit with ORES risk scores displayed beside a new user's change.

What about bias? Can AI be unfair?

Yes. Machine learning models can inherit bias. ORES was trained mostly on English Wikipedia edits. That means it’s better at spotting vandalism in English than in, say, Swahili or Hindi. Edits from new users in developing countries might get higher risk scores just because their editing style is different-not because they’re vandalizing.

Wikimedia’s team knows this. They’ve built separate models for different languages. They also audit the system regularly. In 2025, they released a public dashboard showing ORES performance across languages. If a model is flagging too many edits from a certain region, they retrain it with more data from that area. It’s not perfect, but it’s being actively fixed.

Another concern: ORES might discourage new contributors. If a well-meaning editor gets flagged as "high risk" just because they’re new, they might quit. To fix this, ORES includes a "good faith" score. If an edit has high vandalism probability but also high good-faith probability, it’s treated differently. New editors get a warning, not an automatic revert. The system tries to help, not scare.

The future: What’s next for AI moderation?

ORES is just the beginning. The Wikimedia Foundation is now testing models that predict not just if an edit is bad-but whether it’s likely to cause long-term damage. A single vandalism edit might get reverted. But what if someone slowly adds misleading citations across 50 articles? That’s harder to catch. New models are being trained to spot coordinated manipulation, not just random spam.

They’re also testing real-time feedback. Imagine a new editor types in a false claim. Before they hit "save," a pop-up says: "This edit contradicts 12 reliable sources. Are you sure?" That’s not ORES yet-but it’s the direction things are going. AI won’t replace editors. It’ll become a co-pilot.

Some worry that too much automation will make Wikipedia feel cold. But the opposite is true. By handling the dull, repetitive work, ORES frees up human editors to do what they do best: write, debate, and build knowledge together. The real magic of Wikipedia isn’t in the software. It’s in the people. ORES just helps them do it faster.

How accurate is ORES really?

Accuracy varies by model. The vandalism detection model hits about 88% precision and 85% recall across major Wikipedias. That means:

  • Of all edits flagged as vandalism, 88% were actually vandalism
  • Of all actual vandalism edits, 85% were caught

That’s better than most human reviewers. Studies show even experienced editors miss about 30% of vandalism on their first pass. ORES doesn’t get tired. It doesn’t miss edits because it’s on break. It works 24/7.

The most reliable models are trained on high-quality data: edits reviewed by trusted editors, confirmed reverts, and long-term edit histories. The less data a language has, the less accurate the model. That’s why smaller Wikipedias still rely more on humans. But even there, ORES is making a difference.

An abstract neural network visualizing Wikipedia edit patterns, with data streams flowing and correcting misclassified edits.

What tools use ORES?

ORES isn’t a standalone app. It’s a backend service. Many tools plug into it:

  • WikiDashboard - Shows ORES scores for every edit in real time
  • ClueBot NG - An automated bot that reverts vandalism with over 90% confidence
  • RCFilter - Lets editors filter recent changes by ORES risk level
  • Revision Scoring - Used by admins to prioritize which edits to review first

These tools are open source. Anyone can see how they work. That transparency is key. Unlike corporate platforms that hide their algorithms, Wikipedia’s system is public. You can read the code, check the training data, and even suggest improvements.

ORES Model Performance Across Major Wikipedias (2025)
Wikipedia Language Vandalism Detection Precision Vandalism Detection Recall Good Faith Accuracy
English 91% 87% 89%
German 89% 85% 87%
French 88% 83% 86%
Japanese 85% 81% 84%
Swahili 76% 72% 78%

Frequently Asked Questions

Is ORES used on all Wikipedia languages?

Yes, but not equally. ORES is active on all major Wikipedias, but models are trained separately for each language. Smaller Wikipedias have less data, so their models are less accurate. The Wikimedia Foundation is working to improve coverage, especially for underrepresented languages.

Can ORES automatically delete content?

No. ORES only assigns risk scores. It doesn’t delete, block, or revert anything on its own. Automated bots like ClueBot NG use ORES scores to make decisions, but even those bots require human oversight. Only human editors can permanently delete content or block users.

Do editors trust ORES?

Most do. A 2025 survey of 1,200 active Wikipedia editors found that 74% said ORES improved their workflow. Only 8% said they ignored it. The rest used it selectively-sometimes trusting it, sometimes overriding it. Like any tool, it’s only as good as how it’s used.

How often is ORES updated?

ORES models are retrained every 2-4 weeks using new edit data. The system is designed to adapt quickly. If a new vandalism tactic emerges-say, a bot that adds fake citations-editors can flag examples, and the model learns within days. Updates are public and documented.

Can I see how ORES scored my edit?

Yes. Every edit has a public ORES score page. Just add "?ores=1" to the end of any edit URL. For example: https://en.wikipedia.org/w/index.php?diff=12345678&ores=1. You’ll see the probability scores for vandalism, good faith, and other factors. This transparency helps editors learn how to edit better.

What’s next for Wikipedia’s moderation?

Wikipedia’s future isn’t about more AI. It’s about better collaboration. The goal isn’t to automate moderation. It’s to empower editors. ORES is one tool in a growing toolkit. Others include AI-generated summaries of edit histories, tools that suggest sources for disputed claims, and systems that match new editors with mentors.

As long as humans are in the loop, Wikipedia stays human. ORES doesn’t make decisions. It makes them easier. And that’s the real win.