Citing Preprints and arXiv on Wikipedia: A Guide to Reliable Sources
Imagine spending hours polishing a Wikipedia entry, only to have an editor slap a 'citation needed' tag on your best point because you used a paper from arXiv. It's frustrating, right? You know the data is solid, and the author is a heavyweight in their field, but the rules around preprints can feel like a minefield. The core problem is that Wikipedia isn't a playground for the latest theories; it's a record of established knowledge. This means the gap between a draft on a server and a peer-reviewed journal is a huge deal for the community.

Quick Takeaways

  • Preprints are not peer-reviewed and are generally discouraged as primary sources.
  • arXiv is the gold standard for physics and math but still requires caution.
  • Use preprints only when no other reliable source exists and the claim is non-controversial.
  • Always check if a peer-reviewed version of the preprint has been published.

What Exactly Are Preprints?

Before we get into the rules, let's get the terminology straight. Preprints is a version of a scholarly or scientific paper that precedes formal peer review and publication in a peer-reviewed scholarly journal. Essentially, it's a "rough draft" that researchers upload to a public server to get their findings out quickly. Instead of waiting six months or a year for a journal to approve the work, they put it online today.

Now, arXiv (pronounced "archive") is a free distribution service and an open-access archive for scholarly articles in the fields of physics, mathematics, computer science, quantitative biology, quantitative finance, statistics, and economics. It is managed by Cornell University and is the most famous hub for these kinds of documents. While arXiv has a basic screening process to stop total nonsense from getting posted, it does not perform the deep, rigorous peer review that a journal like Nature or Science does.

Why does this matter for Wikipedia? Because Wikipedia's core policy on Reliable Sources requires that information be verifiable through high-quality, independent sources. A preprint, by definition, hasn't been vetted by independent experts yet. Using them is like quoting a student's first draft of a thesis instead of the final, defended version.

The Gold Standard: Peer Review vs. Preprints

To understand why editors get picky, you have to look at how the academic world works. When a scientist submits a paper to a journal, it goes through a process where other experts in that specific niche tear the paper apart, find errors in the math, and demand more evidence. This is the "filter" that keeps Wikipedia from becoming a collection of unproven guesses.

Comparison between Peer-Reviewed Journals and Preprints (like arXiv)
Feature Peer-Reviewed Journal Preprint / arXiv
Vetting Process Rigorous blind review by experts Basic moderation / screening
Speed to Public Slow (Months to Years) Instant (Days)
Wikipedia Status Highly Reliable Generally Unreliable/Discouraged
Stability Version of Record (Fixed) Can be updated/changed (Fluid)

If you find a groundbreaking claim on arXiv, your first move shouldn't be to add it to Wikipedia. Your first move should be to search for that author's name in Google Scholar or PubMed. Most papers on arXiv are eventually published in a journal. If a peer-reviewed version exists, use that. Period. Using the arXiv version when a journal version is available is a fast way to get your edits reverted.

Documents passing through a crystalline filter to become verified peer-reviewed papers

When is it Actually Okay to Use arXiv?

Despite the strict rules, there are a few edge cases where arXiv is acceptable. You can't just ignore them, but you have to use them sparingly. First, consider the field. In theoretical physics and mathematics, the culture is different. Because the peer-review process in these fields can take years, the community treats arXiv as a primary record. If you're writing about a highly technical math proof, an arXiv link might be tolerated, provided the author is a recognized expert.

Second, look at the "fringe" factor. If the claim is a minor detail-like the specific version of a software library used in a study-it's less risky than using a preprint to claim that we've found a cure for a disease. The more controversial or "big" the claim, the more you need a peer-reviewed source. If a preprint claims a discovery that contradicts 20 years of established science, it stays off Wikipedia until a journal signs off on it.

Third, consider the author's reputation. While not a perfect rule, a preprint by a Nobel laureate is generally more credible than a preprint by an anonymous account. However, even legends make mistakes. The rule of thumb is: use preprints for context or minor details, never for the main "truth" of an article.

Step-by-Step: How to Cite a Preprint Correctly

If you've determined that a preprint is truly the only source and it meets the criteria above, don't just drop a raw URL. You need to be transparent about what the source is so other editors (and readers) know the risk.

  1. Verify the latest version: Check the arXiv page to see if there is a "Journal reference" listed. If there is, use the journal, not the preprint.
  2. Use the correct template: Use the {{cite web}} or {{cite journal}} template. If you use {{cite journal}}, explicitly state in the notes that it is a preprint.
  3. Be honest in the text: Don't write "Research shows that..." Instead, write "A preprint paper uploaded to arXiv suggests that..." This signals to the reader that the information hasn't been peer-reviewed.
  4. Provide the arXiv ID: Always include the specific identifier (e.g., arXiv:2304.12345). This makes it much easier for other editors to track the paper's progress toward publication.

For example, if you're documenting a new AI architecture, don't just say "The model achieves 90% accuracy." Say, "According to a preprint on arXiv, the model achieves 90% accuracy, though the results have not yet undergone formal peer review." This honesty protects your reputation as an editor and maintains the site's integrity.

A hand updating a digital encyclopedia page by replacing a preprint tag with a gold seal

Common Pitfalls to Avoid

One of the biggest mistakes is confusing a "repository" with a "publisher." bioRxiv and medRxiv are similar to arXiv but for biology and health. In the medical world, the stakes are higher. Using a medRxiv preprint to give medical advice on Wikipedia is a violation of safety guidelines and will likely result in an immediate ban of the content. Health claims require the highest level of verification.

Another trap is the "consensus" fallacy. Just because a paper has 500 downloads and a lot of Twitter hype doesn't mean it's a reliable source. Popularity is not a substitute for peer review. You'll often see "viral" preprints that are later debunked or completely retracted once the experts actually look at the data. If you rely on the hype, you're just importing that instability into the encyclopedia.

Lastly, avoid "citation stacking." This is when an editor cites three different preprints to make a point seem well-supported. Three unvetted papers are not the same as one peer-reviewed paper. If the only evidence for a claim is a cluster of preprints, the claim is not yet an established fact and probably doesn't belong on Wikipedia at all.

The Long-Term Strategy for Editors

If you're passionate about a topic that's currently only available in preprint form, the best thing you can do is keep a "watchlist." Monitor the authors and the arXiv IDs. The moment that paper hits a journal, go back to Wikipedia and swap the preprint link for the formal publication. This is how the encyclopedia evolves from a collection of "cutting edge" (and risky) ideas into a stable body of knowledge.

Remember, Wikipedia is a marathon, not a sprint. It's better to wait six months and add a bulletproof source than to add a "hot" take today and spend the next three years arguing with other editors about why your source isn't reliable. Respect the process of peer review, and your contributions will stick.

Can I use arXiv for a biography of a scientist?

Yes, but for a different reason. If you are citing a paper to show what a scientist has worked on or to list their publications, arXiv is a perfectly fine record of their activity. However, if you are using that paper to prove a scientific fact about the world, the same peer-review rules apply.

What if the paper is on arXiv and the author says it was "accepted" but not yet published?

This is a grey area. While "accepted for publication" is a strong signal, the final version might still change during the proofing stage. It's safer to wait for the DOI (Digital Object Identifier) to be issued. If you must cite it, clearly label it as "accepted for publication" in your citation notes.

Are all preprints considered unreliable?

Not "unreliable," but "unverified." Many preprints are eventually published without any major changes. The problem isn't that they are wrong; it's that there is no formal guarantee they are right. Wikipedia prefers the guarantee over the gamble.

How do I find if a preprint has been published in a journal?

The easiest way is to copy the title of the preprint into Google Scholar. If a peer-reviewed version exists, it will usually appear as the top result. You can also check the "Comments" or "Journal reference" field on the arXiv page itself, as authors often update this once the paper is published.

Is it okay to use preprints for computer science topics?

In Computer Science and AI, preprints are incredibly common because the field moves faster than journals can keep up. While Wikipedia still prefers peer-reviewed sources, there is slightly more leniency for highly technical AI papers from reputable labs (like DeepMind or OpenAI). Still, use them only if a journal version isn't available and the claim is narrow.