Toolforge Kubernetes: Deploying Scalable Wikipedia Tools

27 Nov 2025

Running a bot on Wikipedia isn’t like running a script on your laptop. If your tool gets popular - say, it starts fixing grammar errors across 50,000 articles - your old Python script will crash under the load. That’s where Toolforge comes in. It’s the official infrastructure behind hundreds of automated tools used by Wikipedia editors, and it runs on Kubernetes. No more guessing if your bot will survive a spike in traffic. No more manual restarts after a server crash. Just deploy, scale, and forget.

What is Toolforge?

Toolforge is part of the Wikimedia Cloud Services, hosted by the Wikimedia Foundation. It’s not a shared web host. It’s not a cloud VM you rent. It’s a purpose-built platform for running tools that interact with Wikipedia and its sister projects. Think of it as a sandbox where bots, data visualizations, and automated editors can run safely - without breaking the wiki.

Before Toolforge, volunteers ran bots on personal servers or rented VPSes. Those setups were unreliable. They crashed during edit storms. They got banned for violating rate limits. They didn’t have access to Wikimedia’s APIs in a clean, supported way. Toolforge fixes all that. It gives you authenticated API access, built-in rate limiting, automatic logging, and a shared environment that respects Wikipedia’s rules.

Toolforge runs on Kubernetes - the same open-source system Google uses to manage millions of containers. That means your tool gets automatic scaling, failover, and resource isolation. If your tool gets 100x more requests, Kubernetes spins up more instances. If one container dies, another starts right away. You don’t need to be a sysadmin to make that happen.

Why Kubernetes for Wikipedia tools?

Kubernetes isn’t just a buzzword here. It’s the reason Toolforge works at scale. Wikipedia gets over 500 million visits a month. Tools that edit, monitor, or analyze content can’t afford downtime. A bot that fixes broken links on 10,000 pages an hour needs to run 24/7. If it crashes, those links stay broken.

Traditional hosting fails here. A single server can’t handle traffic spikes. Cloud VMs are expensive to run continuously. Docker containers are great, but managing 20 of them manually? No thanks.

Kubernetes solves this by treating your tool like a living thing. You tell it: “Run three copies of my bot, each using 512MB RAM and 0.5 CPU.” Kubernetes watches over them. If one dies, it restarts it. If traffic grows, it adds more. If a server goes down, it moves your containers to a healthy one. All without you lifting a finger.

And it’s not just reliability. Kubernetes gives you precise control over resources. You can set memory limits so your tool doesn’t eat up all the RAM on the shared node. You can define startup delays so your bot doesn’t flood the API the moment it boots. These are small things - but they make the difference between a tool that’s useful and one that gets blocked.

Getting started with Toolforge

First, you need a Wikimedia account. If you’re editing Wikipedia, you already have one. Then, go to toolforge.org and click “Join Toolforge.” You’ll get an account on the Toolforge grid, which is the old system. But you don’t want that. You want Kubernetes.

Once approved, log in via SSH:

ssh [email protected]

Then, switch to the Kubernetes cluster:

k8s-shell

This drops you into a shell where you can manage your containers. You’ll see a prompt like yourusername@k8s:~$. Now you’re ready.

You need a Docker image. If you’re using Python, build a simple image with your code and requirements. Here’s a basic Dockerfile:

FROM python:3.12-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY . .
CMD ["python", "bot.py"]

Build it locally:

docker build -t yourusername/my-wiki-bot .

Push it to the Toolforge registry:

docker tag yourusername/my-wiki-bot toolforge-registry.wikimedia.org/yourusername/my-wiki-bot
 docker push toolforge-registry.wikimedia.org/yourusername/my-wiki-bot

Now you need a deployment file - a YAML file that tells Kubernetes how to run your tool. Save this as deployment.yaml:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-wiki-bot
spec:
  replicas: 2
  selector:
    matchLabels:
      app: my-wiki-bot
  template:
    metadata:
      labels:
        app: my-wiki-bot
    spec:
      containers:
      - name: bot
        image: toolforge-registry.wikimedia.org/yourusername/my-wiki-bot:latest
        resources:
          limits:
            memory: "512Mi"
            cpu: "500m"
        env:
        - name: WIKI_USERNAME
          valueFrom:
            secretKeyRef:
              name: wiki-creds
              key: username
        - name: WIKI_PASSWORD
          valueFrom:
            secretKeyRef:
              name: wiki-creds
              key: password

You also need a secret for your bot’s credentials:

kubectl create secret generic wiki-creds --from-literal=username=YourBotName --from-literal=password=YourBotPassword

Then apply the deployment:

kubectl apply -f deployment.yaml

Check if it’s running:

kubectl get pods

You should see two pods listed. That’s it. Your bot is now running on Kubernetes. It’s scalable, reliable, and monitored.

Developer watching a dashboard showing bot scaling from 2 to 10 replicas in a home office.

Scaling and monitoring

Toolforge gives you a dashboard at toolforge.org/dashboard. You can see your tool’s CPU and memory usage over time. If your bot is using 80% CPU during peak hours, you can increase the limit in your YAML file and redeploy.

To scale up manually:

kubectl scale deployment my-wiki-bot --replicas=5

To let Kubernetes auto-scale based on CPU usage:

kubectl autoscale deployment my-wiki-bot --cpu-percent=70 --min=2 --max=10

Now, if your bot’s CPU usage hits 70%, Kubernetes adds more replicas - up to 10. When traffic drops, it scales back down. You pay nothing extra. You just get reliability.

Logs are available via:

kubectl logs -l app=my-wiki-bot --tail=100

You can also set up alerts if your bot crashes too often. Toolforge integrates with Prometheus and Grafana for advanced monitoring - but you don’t need them to get started.

Common pitfalls and how to avoid them

Most people fail at Toolforge not because Kubernetes is hard - but because they ignore Wikipedia’s rules.

Don’t hit the API too fast. Toolforge has a default limit of 10 requests per second per bot. If you exceed it, your tool gets blocked. Use time.sleep(0.1) between API calls in Python.
Don’t use your personal account. Always create a dedicated bot account. Use OAuth for authentication, not passwords. Toolforge supports OAuth tokens via the pywikibot library.
Don’t store secrets in code. Never hardcode passwords. Use Kubernetes secrets like in the example above.
Don’t run heavy tasks on the main container. If your tool does image processing or large data dumps, offload it to a separate job. Use Kubernetes CronJobs for scheduled tasks.
Don’t forget to clean up. Unused tools consume resources. If you stop using a tool, delete it. Run kubectl delete deployment my-wiki-bot and kubectl delete secret wiki-creds.

One real example: a volunteer built a tool that flagged articles with outdated citations. It started with one pod. Within two weeks, editors were using it daily. Traffic jumped 300%. The bot didn’t crash. It scaled automatically. No one had to do anything.

Abstract scene of bots automatically correcting a Wikipedia article with floating icons and golden connections.

What you can build with Toolforge

Toolforge isn’t just for bots. People use it for:

Automated vandalism detection
Article quality scoring systems
Language translation helpers
Statistical dashboards for editors
IRC bots that notify of new edits
Image metadata fixers
Wikidata query tools

One popular tool, Twinkle, helps new editors fix common mistakes. It runs on Toolforge and handles over 2 million actions a year. Another, Stiki, automatically reverts vandalism using machine learning. Both are open source. Both run on Kubernetes.

You don’t need to be a developer to start. Many tools are built with Python and the pywikibot library. Others use Node.js, Go, or even Rust. As long as you can package it in a Docker container, Toolforge can run it.

What’s next?

Once your tool is running, join the Toolserver mailing list. Ask questions. Share your work. The community is small but helpful. They’ve built most of the tools you use every day.

Don’t wait for perfection. Start small. A bot that fixes one type of broken link. A script that checks for dead references. A dashboard that shows which articles need images. Run it on Toolforge. See how it scales. Then improve it.

Wikipedia’s tools are built by volunteers. And they’re only as strong as the infrastructure behind them. Kubernetes gives you the power to build something that lasts - not just today, but for the next decade.

Do I need to know Kubernetes to use Toolforge?

No. You only need to understand Docker and basic YAML. Toolforge handles the cluster setup, networking, and security. You just define your container and how many copies to run. Most users learn the basics in a day.

Can I run a web app on Toolforge?

Yes. Many tools are web apps - like dashboards or APIs. Just expose port 8080 in your Dockerfile and use a web server like Flask or Express. Toolforge automatically routes traffic to your app via a public URL.

Is Toolforge free?

Yes. Toolforge is funded by the Wikimedia Foundation and available to all registered editors. There are resource limits, but they’re generous enough for most tools. You won’t be charged.

What happens if my tool violates Wikipedia policies?

Your tool will be suspended. Toolforge doesn’t police content - but it enforces technical rules. If your bot edits too fast, makes bad changes, or uses unauthorized credentials, admins will disable it. You’ll get a notice and a chance to fix it.

Can I use Toolforge for non-Wikipedia projects?

Only if they interact with Wikimedia projects. You can’t use Toolforge to host your personal blog or a commercial app. It’s strictly for tools that help edit, analyze, or improve Wikipedia, Wikidata, Commons, and other Wikimedia sites.

CATEGORY: Online Encyclopedias