Running a bot on Wikipedia isn’t like running a script on your laptop. If your tool gets popular - say, it starts fixing grammar errors across 50,000 articles - your old Python script will crash under the load. That’s where Toolforge comes in. It’s the official infrastructure behind hundreds of automated tools used by Wikipedia editors, and it runs on Kubernetes. No more guessing if your bot will survive a spike in traffic. No more manual restarts after a server crash. Just deploy, scale, and forget.
What is Toolforge?
Toolforge is part of the Wikimedia Cloud Services, hosted by the Wikimedia Foundation. It’s not a shared web host. It’s not a cloud VM you rent. It’s a purpose-built platform for running tools that interact with Wikipedia and its sister projects. Think of it as a sandbox where bots, data visualizations, and automated editors can run safely - without breaking the wiki.
Before Toolforge, volunteers ran bots on personal servers or rented VPSes. Those setups were unreliable. They crashed during edit storms. They got banned for violating rate limits. They didn’t have access to Wikimedia’s APIs in a clean, supported way. Toolforge fixes all that. It gives you authenticated API access, built-in rate limiting, automatic logging, and a shared environment that respects Wikipedia’s rules.
Toolforge runs on Kubernetes - the same open-source system Google uses to manage millions of containers. That means your tool gets automatic scaling, failover, and resource isolation. If your tool gets 100x more requests, Kubernetes spins up more instances. If one container dies, another starts right away. You don’t need to be a sysadmin to make that happen.
Why Kubernetes for Wikipedia tools?
Kubernetes isn’t just a buzzword here. It’s the reason Toolforge works at scale. Wikipedia gets over 500 million visits a month. Tools that edit, monitor, or analyze content can’t afford downtime. A bot that fixes broken links on 10,000 pages an hour needs to run 24/7. If it crashes, those links stay broken.
Traditional hosting fails here. A single server can’t handle traffic spikes. Cloud VMs are expensive to run continuously. Docker containers are great, but managing 20 of them manually? No thanks.
Kubernetes solves this by treating your tool like a living thing. You tell it: “Run three copies of my bot, each using 512MB RAM and 0.5 CPU.” Kubernetes watches over them. If one dies, it restarts it. If traffic grows, it adds more. If a server goes down, it moves your containers to a healthy one. All without you lifting a finger.
And it’s not just reliability. Kubernetes gives you precise control over resources. You can set memory limits so your tool doesn’t eat up all the RAM on the shared node. You can define startup delays so your bot doesn’t flood the API the moment it boots. These are small things - but they make the difference between a tool that’s useful and one that gets blocked.
Getting started with Toolforge
First, you need a Wikimedia account. If you’re editing Wikipedia, you already have one. Then, go to toolforge.org and click “Join Toolforge.” You’ll get an account on the Toolforge grid, which is the old system. But you don’t want that. You want Kubernetes.
Once approved, log in via SSH:
ssh [email protected]
Then, switch to the Kubernetes cluster:
k8s-shell
This drops you into a shell where you can manage your containers. You’ll see a prompt like yourusername@k8s:~$. Now you’re ready.
You need a Docker image. If you’re using Python, build a simple image with your code and requirements. Here’s a basic Dockerfile:
FROM python:3.12-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY . .
CMD ["python", "bot.py"]
Build it locally:
docker build -t yourusername/my-wiki-bot .
Push it to the Toolforge registry:
docker tag yourusername/my-wiki-bot toolforge-registry.wikimedia.org/yourusername/my-wiki-bot
docker push toolforge-registry.wikimedia.org/yourusername/my-wiki-bot
Now you need a deployment file - a YAML file that tells Kubernetes how to run your tool. Save this as deployment.yaml:
apiVersion: apps/v1
kind: Deployment
metadata:
name: my-wiki-bot
spec:
replicas: 2
selector:
matchLabels:
app: my-wiki-bot
template:
metadata:
labels:
app: my-wiki-bot
spec:
containers:
- name: bot
image: toolforge-registry.wikimedia.org/yourusername/my-wiki-bot:latest
resources:
limits:
memory: "512Mi"
cpu: "500m"
env:
- name: WIKI_USERNAME
valueFrom:
secretKeyRef:
name: wiki-creds
key: username
- name: WIKI_PASSWORD
valueFrom:
secretKeyRef:
name: wiki-creds
key: password
You also need a secret for your bot’s credentials:
kubectl create secret generic wiki-creds --from-literal=username=YourBotName --from-literal=password=YourBotPassword
Then apply the deployment:
kubectl apply -f deployment.yaml
Check if it’s running:
kubectl get pods
You should see two pods listed. That’s it. Your bot is now running on Kubernetes. It’s scalable, reliable, and monitored.
Scaling and monitoring
Toolforge gives you a dashboard at toolforge.org/dashboard. You can see your tool’s CPU and memory usage over time. If your bot is using 80% CPU during peak hours, you can increase the limit in your YAML file and redeploy.
To scale up manually:
kubectl scale deployment my-wiki-bot --replicas=5
To let Kubernetes auto-scale based on CPU usage:
kubectl autoscale deployment my-wiki-bot --cpu-percent=70 --min=2 --max=10
Now, if your bot’s CPU usage hits 70%, Kubernetes adds more replicas - up to 10. When traffic drops, it scales back down. You pay nothing extra. You just get reliability.
Logs are available via:
kubectl logs -l app=my-wiki-bot --tail=100
You can also set up alerts if your bot crashes too often. Toolforge integrates with Prometheus and Grafana for advanced monitoring - but you don’t need them to get started.
Common pitfalls and how to avoid them
Most people fail at Toolforge not because Kubernetes is hard - but because they ignore Wikipedia’s rules.
- Don’t hit the API too fast. Toolforge has a default limit of 10 requests per second per bot. If you exceed it, your tool gets blocked. Use
time.sleep(0.1)between API calls in Python. - Don’t use your personal account. Always create a dedicated bot account. Use OAuth for authentication, not passwords. Toolforge supports OAuth tokens via the
pywikibotlibrary. - Don’t store secrets in code. Never hardcode passwords. Use Kubernetes secrets like in the example above.
- Don’t run heavy tasks on the main container. If your tool does image processing or large data dumps, offload it to a separate job. Use Kubernetes CronJobs for scheduled tasks.
- Don’t forget to clean up. Unused tools consume resources. If you stop using a tool, delete it. Run
kubectl delete deployment my-wiki-botandkubectl delete secret wiki-creds.
One real example: a volunteer built a tool that flagged articles with outdated citations. It started with one pod. Within two weeks, editors were using it daily. Traffic jumped 300%. The bot didn’t crash. It scaled automatically. No one had to do anything.
What you can build with Toolforge
Toolforge isn’t just for bots. People use it for:
- Automated vandalism detection
- Article quality scoring systems
- Language translation helpers
- Statistical dashboards for editors
- IRC bots that notify of new edits
- Image metadata fixers
- Wikidata query tools
One popular tool, Twinkle, helps new editors fix common mistakes. It runs on Toolforge and handles over 2 million actions a year. Another, Stiki, automatically reverts vandalism using machine learning. Both are open source. Both run on Kubernetes.
You don’t need to be a developer to start. Many tools are built with Python and the pywikibot library. Others use Node.js, Go, or even Rust. As long as you can package it in a Docker container, Toolforge can run it.
What’s next?
Once your tool is running, join the Toolserver mailing list. Ask questions. Share your work. The community is small but helpful. They’ve built most of the tools you use every day.
Don’t wait for perfection. Start small. A bot that fixes one type of broken link. A script that checks for dead references. A dashboard that shows which articles need images. Run it on Toolforge. See how it scales. Then improve it.
Wikipedia’s tools are built by volunteers. And they’re only as strong as the infrastructure behind them. Kubernetes gives you the power to build something that lasts - not just today, but for the next decade.
Do I need to know Kubernetes to use Toolforge?
No. You only need to understand Docker and basic YAML. Toolforge handles the cluster setup, networking, and security. You just define your container and how many copies to run. Most users learn the basics in a day.
Can I run a web app on Toolforge?
Yes. Many tools are web apps - like dashboards or APIs. Just expose port 8080 in your Dockerfile and use a web server like Flask or Express. Toolforge automatically routes traffic to your app via a public URL.
Is Toolforge free?
Yes. Toolforge is funded by the Wikimedia Foundation and available to all registered editors. There are resource limits, but they’re generous enough for most tools. You won’t be charged.
What happens if my tool violates Wikipedia policies?
Your tool will be suspended. Toolforge doesn’t police content - but it enforces technical rules. If your bot edits too fast, makes bad changes, or uses unauthorized credentials, admins will disable it. You’ll get a notice and a chance to fix it.
Can I use Toolforge for non-Wikipedia projects?
Only if they interact with Wikimedia projects. You can’t use Toolforge to host your personal blog or a commercial app. It’s strictly for tools that help edit, analyze, or improve Wikipedia, Wikidata, Commons, and other Wikimedia sites.