ORES Scores and Quality Prediction on Wikipedia: What They Mean

23 May 2026

You’ve probably seen it before. You’re reading a Wikipedia is a free, web-based, collaborative encyclopedia that allows its users to add, modify, or delete content. It is the largest and most popular general reference work on the Internet. page about a niche topic, maybe a small town or a obscure video game character. The information looks solid, but you have that nagging feeling in the back of your head: *Is this actually true? Who wrote this?*

Unlike traditional encyclopedias written by hired experts, Wikipedia is built by volunteers. This open model creates a massive resource, but it also introduces chaos. How do we know which articles are reliable and which ones are full of errors or vandalism? Enter ORES is Objective Revision Evaluation Service, a tool used by the Wikimedia Foundation to predict the quality of Wikipedia edits using machine learning models.

ORES doesn’t just guess; it calculates. It assigns scores to edits and articles based on patterns learned from millions of past changes. If you want to understand what makes an article trustworthy-or why a certain edit was reverted-you need to understand these scores. They are the hidden heartbeat of Wikipedia’s quality control system.

What Exactly Is ORES?

To get the gist of ORES, you have to look at the problem it solves. Every day, thousands of people make changes to Wikipedia. Some are fixing typos. Others are adding well-sourced facts. But some are adding nonsense, promoting personal agendas, or vandalizing pages. Human editors can’t review every single change instantly. There are simply too many.

ORES acts as a first line of defense. It is a statistical engine that looks at an edit and asks: *"Based on historical data, does this change look like it improves the article, or does it look like trouble?"* It uses features like the length of the text added, whether new references were included, if the user has a history of good edits, and even the time of day the edit was made.

Think of it like spam filters for email, but for knowledge. When you send an email, the filter checks for keywords, sender reputation, and formatting. ORES does something similar for Wikipedia edits. It assigns probabilities-numbers between 0 and 1-that tell us how likely an edit is to be "good faith" or "damaging."

The service was developed by the Wikimedia Foundation is the non-profit organization that hosts and develops Wikipedia and other wiki projects. team, specifically the Trust and Safety group. It relies on machine learning models trained on data labeled by human editors. When humans tag an edit as "vandalism" or "constructive," ORES learns from those examples to predict future cases.

Decoding the Score: Probability vs. Quality

This is where things get tricky for most readers. ORES produces two main types of scores, and they mean very different things. Confusing them leads to misunderstandings about Wikipedia’s reliability.

Bad Faith / Good Faith Scores: These predict the *intent* behind an edit. A high "bad faith" score means the edit looks like vandalism, harassment, or spam. A high "good faith" score suggests the editor tried to help, even if they made a mistake. This helps bots flag obvious vandalism quickly.
Article Quality Scores: These predict the *state* of the article itself. This is part of the Article Assessment is a process on Wikipedia where articles are tagged with quality grades such as Stub, Start, C-class, B-class, GA (Good Article), and FA (Featured Article). system. It predicts whether an article meets the criteria for specific classes like "Stub" (very short) or "Featured Article" (high quality).

When you see a score like `0.85` for "damaging," it doesn’t mean the article is 85% wrong. It means there is an 85% probability that this specific *change* was damaging, based on the model’s training data. It’s a measure of confidence, not a grade on a test paper.

For example, if someone adds a paragraph of text without any sources, ORES might assign a lower quality score because unsourced material is often removed later. But if that same person adds a sourced fact, the score goes up. The model looks at structural elements: Are there citations? Is the tone neutral? Does the section structure follow standard guidelines?

Common ORES Metrics Explained
Metric Name	What It Measures	High Score Means...	Low Score Means...
Bad Faith	Intent of the editor	Likely vandalism or spam	Likely constructive editing
Damaging	Negative impact on content	Removes valid info or adds nonsense	Adds or preserves valid info
WikiScore	Overall article quality	High-quality, well-structured article	Stub or poorly organized draft
New Page Score	Quality of newly created pages	Meets creation guidelines	Likely speedy deletion candidate

How the Prediction Model Works Under the Hood

You don’t need to be a data scientist to grasp the basics, but understanding the mechanics helps build trust in the system. ORES primarily uses logistic regression and random forest algorithms. These are standard tools in machine learning for classification tasks.

The model looks at hundreds of "features" for every edit. Here are a few concrete examples of what it analyzes:

Text Length Changes: Did the edit add 5 words or 5,000? Massive additions are often flagged for review.
Reference Count: Did the edit add URLs or citation templates? Articles with more citations generally score higher on quality metrics.
User History: Has this account been active for years? New accounts making major changes are scrutinized more heavily.
Revert Rate: Was this type of change reverted by other editors in the past? If similar edits were undone frequently, the score drops.
Time of Day: Vandalism spikes at certain times (like late night). The model adjusts expectations based on when the edit happened.

These features are weighted differently depending on the context. For instance, adding a reference is a strong positive signal for quality. Removing a large chunk of text without explanation is a strong negative signal for damage. The algorithm combines these signals into a final probability score.

It’s important to note that ORES is constantly updated. As Wikipedia’s community norms change, the models are retrained. What was considered acceptable editing ten years ago might look different today. The system adapts to keep pace with the evolving culture of the encyclopedia.

Abstract scale balancing cited documents against vandalism, with floating probability scores

Why ORES Scores Matter to Regular Readers

If you never edit Wikipedia, why should you care about ORES? Because it directly impacts the information you consume. ORES powers several tools that keep the site clean and accurate.

First, it drives FlaggedRevs is a MediaWiki extension that allows editors to mark revisions as stable or pending, ensuring that unregistered users see only reviewed content. On some Wikipedias, when you view an article, you might be seeing a "stable" version that has been reviewed by experienced editors. ORES helps prioritize which revisions need human review. High-risk edits are pushed to the top of the queue for human moderators.

Second, it helps combat misinformation. In an era where fake news spreads rapidly, Wikipedia’s ability to quickly identify and revert damaging edits is crucial. ORES allows the community to scale their efforts. Instead of one editor watching one page, automated systems watch millions of pages simultaneously, flagging anomalies for human attention.

Third, it provides transparency. If you click on the "View History" tab of any Wikipedia article, you can often see ORES scores next to recent edits. This gives you a quick visual cue. If an edit has a high "bad faith" score, you might want to read that change with extra skepticism. It empowers you to judge the reliability of the content yourself.

Limitations and Misconceptions

No system is perfect, and ORES has its blind spots. Understanding these limitations is key to using Wikipedia wisely.

1. It Can Be Biased: Machine learning models inherit biases from their training data. If past editors were harsher on newcomers or certain topics, ORES might reflect that. For example, edits by new users are sometimes scored lower than identical edits by veteran users, simply because new users have less history. The Wikimedia Foundation actively works to detect and reduce these disparities.

2. Context Matters: ORES looks at patterns, not meaning. It might flag a legitimate correction as "damaging" if it removes a lot of text, even if the removed text was incorrect. Conversely, it might miss subtle misinformation that follows proper formatting and includes fake-looking but plausible references.

3. It’s Not a Final Verdict: A low quality score doesn’t mean an article is useless. Many "Start-class" articles contain perfectly accurate information; they just lack depth or polish. A high score doesn’t guarantee truth; it just means the article follows Wikipedia’s style guidelines and has good structure.

4. Topic Specificity: ORES models are often trained globally, but some topics require specialized knowledge. An edit to a medical article needs different scrutiny than an edit to a sports article. While there are topic-specific models, they aren’t always applied uniformly across all subjects.

User viewing a Wikipedia article with a holographic tooltip showing edit stability scores

How to Use ORES Data for Better Research

So, how can you leverage this knowledge? You don’t need to install special software. Just use your browser and a bit of critical thinking.

When researching a controversial topic, check the history. Look for clusters of edits with high "bad faith" or "damaging" scores. These indicate areas where editors disagree or where vandalism attempts occurred. If a section has been edited back and forth dozens of times, treat the current version with caution. It might still be in flux.

Look for the "Talk" page. ORES scores are discussed there. Experienced editors often leave notes explaining why an edit was reverted or why an article’s quality rating changed. This conversation provides context that the raw numbers can’t.

Finally, cross-reference. Even if an article has a high WikiScore, verify key claims with primary sources. Wikipedia is a secondary source. Use it as a starting point, not the final destination. ORES helps you navigate the landscape, but you still need to walk the path yourself.

The Future of Automated Quality Control

We are moving toward a more integrated future. The Wikimedia Foundation is exploring ways to make ORES scores more visible and actionable for everyday users. Imagine hovering over a sentence and seeing a tiny tooltip that says, "This claim was added in 2023 and has a high stability score." That kind of interface would transform how we trust digital information.

Advances in natural language processing (NLP) will also improve accuracy. Current models rely heavily on structural cues. Future models might better understand semantic meaning, detecting logical fallacies or biased language rather than just missing citations. This could significantly reduce false positives and make the system fairer to new contributors.

As AI becomes more prevalent in content creation, tools like ORES will become even more vital. They provide a baseline of objective measurement in a world increasingly flooded with generated text. By understanding what these scores mean, you become a more informed consumer of information, capable of distinguishing between polished noise and genuine knowledge.

Can I see ORES scores on my own Wikipedia edits?

Yes. After you save an edit, go to the "View History" tab. Next to your revision, you will often see a link or icon indicating the ORES score. You can also use the "ORES" tool in the toolbar if you enable it in your preferences. This allows you to see how the system perceives your contributions.

Does a high ORES score guarantee an article is accurate?

No. ORES measures adherence to Wikipedia’s style guidelines, structure, and editorial norms, not factual truth. An article can be perfectly formatted and cited but still contain outdated or subtly incorrect information. Always verify critical facts with primary sources.

Why do new users often get lower ORES scores?

New users lack a track record. The model uses user history as a feature. Without a history of constructive edits, the system defaults to a more cautious assessment. This isn't necessarily unfair; it's a risk mitigation strategy. However, the Wikimedia Foundation is working to reduce bias against newcomers.

How often are ORES models updated?

Models are retrained regularly, often monthly or quarterly, depending on the amount of new labeled data available. As Wikipedia’s community standards evolve, the models are adjusted to reflect current best practices and emerging types of vandalism or misinformation.

Is ORES used on other wikis besides English Wikipedia?

Yes. ORES is deployed on many language editions of Wikipedia and other Wikimedia projects like Wikivoyage and Wikisource. Each edition may have its own tuned models based on local editing patterns and community norms, though the core technology remains the same.

CATEGORY: Online Encyclopedias

Leona Whitcombe