Table of contents |
---|
What is sharding |
Sharding in Python libraries |
What is clustering |
How to cluster your bot |
Note: This content relates directly to the experiences of the author. You should tailor your solution to your bots needs as at this scale, everyone has different requirements.
What is sharding Link to heading
Sharding is the process by which Discord helps to alleviate load by forcing your bot to create multiple connections to Discord to split the load. When sharding, each shard only handles a certain number of all guilds that your bot is in.
Each shard can handle a maximum of 2500
guilds, and bots in more than 2500
servers must use sharding. Luckily for us, discord.py and it’s forks automatically handle sharding using a process generally referred to as internal sharding.
Internal sharding refers to when all shards are handled by a single Python process and this is provided by the libraries by default. This method of sharding allows bot developers access to the same information as an un-sharded bot as all the information is still maintained by a single process. As your bot grows however, internal sharding will eventually give way to a more traditional approach to sharding that we refer to as clustering which is discussed later in this post.
Sharding in Python libraries Link to heading
Note: This only applies to discord.py and forks of discord.py
bot = commands.Bot()
This code can be changed to the following:
bot = commands.AutoShardedBot()
By making this code modification, the library you are using will automatically assign the correct number of shards and process them internally. While using discord.py or a fork of discord.py, the library automatically handles the processing of sharding without changing the user facing code. You can continue to rely on things like get_
methods, etc.
What is clustering Link to heading
As your bot scales, simply using commands.AutoShardedBot
to handle your sharding becomes unfeasible as the requirements for your bot outgrow what a single process can support (You can think of one process as calling python3 main.py
once), you may begin to notice your bots responses are slowing down among other actions. This is often a sign that you need to start clustering your bot, however, the simplest way to figure out if you need clustering or not is to simply look at your shard count.
Clustering is a simple way to distribute the load of your bot across multiple processes. Essentially, this means a single process will only handle commands for X amount of shards. For the purposes of this guide, the terms “process” and “cluster” are interchangeable.
A good starting point would be around 5
shards per cluster. After beginning to cluster keep an eye on your bot and how each cluster is fairing in order to determine the true amount of shards per cluster your bot requires.
Some factors which are generally good indicators include:
- Noticing in your logs that the websocket is falling behind regularly
- CPU cores maxing out under regular load
- Noticeable slowness on Discord
Note: This tutorial sets you up with a bot where each process only maintains a cache of the data for the shards it is running. Attempting to use get_
methods on items from other shards may not always work.
How to cluster your bot Link to heading
Files Required Link to heading
For the purposes of this guide, we will be deploying our bot using Docker and docker-compose. Further to this, you will also need some form of docker container library. For the purposes of this guide we will be using GitHub Actions to deploy our images to the GitHub Docker Registry (This is outdated, ghcr.io is now the recommended Docker registry)
Within this guide we are going to work with a fictional bot called Dave
. Dave
has a repository under the GitHub user Skelmis
and will be designed to have 2
clusters serving 16
shards.
Dockerfile
This Dockerfile will run the main.py file with Python 3.10 using the requirements defined in requirements.txt
, the file should just be called Dockerfile
.
FROM python:3.10
RUN mkdir -p /bot
WORKDIR bot
# We do this as a seperate step
# to reduce rebuild times
COPY ./requirements.txt /bot/requirements.txt
RUN pip3 install -r requirements.txt
COPY . /bot
CMD python3 main.py
Note:
This is a simplistic example. For more fine-grained control you may consider setting environment variables such as ENV PIP_NO_CACHE_DIR=false
or using a smaller Python docker image.
docker-compose
This file should be called docker-compose.yml
.
version: '2'
services:
dave_cluster_1:
container_name: 'dave_cluster_1'
image: docker.pkg.github.com/skelmis/dave/dave
environment:
CLUSTER: 1
IS_PRODUCTION: 1
TOKEN: ...
dave_cluster_2:
container_name: 'dave_cluster_2'
environment:
CLUSTER: 2
extends:
service: dave_cluster_1
Explanation:
CLUSTER
is a number representing what number cluster this Docker image is. This starts from 1 in our code.IS_PRODUCTION
should be present in any production instances to tell the bot to launch the required clusters.TOKEN
should be your bot token.
When using the extends directive, our new image (dave_cluster_2
in this case) inherits from dave_cluster_1
. This reduces the amount of duplicate environment variables as the only difference between clusters is the id which is provided by the CLUSTER
environment variable.
For further security with your TOKEN
and other environment variables, look into using a docker-compose.override.yml file to avoid accidentally pushing these variables to your source control. Alternatively, the usage of proper secrets managers should be considered.
GitHub Action
This file should be placed in the directory .github/workflows
and can be called anything. For this tutorial we will call it publish.yml
.
name: Publish Docker Image
on:
push:
branches: [master]
jobs:
push_to_registry:
name: Push Docker image to GitHub Packages
runs-on: ubuntu-latest
steps:
- name: Check out the repo
uses: actions/checkout@v2
- name: Push to GitHub Packages
uses: docker/build-push-action@v1
with:
username: ${{ github.actor }}
password: ${{ github.token }}
registry: docker.pkg.github.com
repository: <GitHub Username>/<Repo>/<Name>
tags: latest
build_args: BRANCH=${{ github.ref }},COMMIT=${{ github.sha }}
You should replace the following with their lowercase version:
<GitHub Username>
with your username.<Repo>
with your bot’s repository.<Name>
with a name of your choice, I like to just re-use<Repo>
For example, if my bot was in a repository called Dave under the GitHub account Skelmis
then it would look like this. repository: skelmis/dave/dave
You may also like to tag your image, for the purposes of this guide we will use latest.
Note:
Make sure the branch in branches: [ master ]
is also correct for your repository.
Bot Files
This bot file is extremely simple, however, the concepts and classes used are transferable to your own bots. It should be called main.py
.
import os
import asyncio
import logging
import disnake
from disnake.ext import commands
log = logging.getLogger(__name__)
logging.basicConfig(level=logging.INFO)
async def main():
# Values set here clearly indicate we are not in production
cluster_id = 0
total_shards = 1
cluster_kwargs = {}
if os.environ.get("IS_PRODUCTION"):
total_shards = 16
cluster_id = int(os.environ["CLUSTER"])
offset = cluster_id - 1 # As we start at 1
number_of_shards_per_cluster = 10
# Calculate the shard id's this cluster should handle
# For example on cluster 1 this would be equal to
# [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
shard_ids = [
i
for i in range(
offset * number_of_shards_per_cluster,
(offset * number_of_shards_per_cluster) + number_of_shards_per_cluster,
)
if i < total_shards
]
cluster_kwargs = {
"shard_ids": shard_ids,
"shard_count": total_shards,
}
bot = commands.AutoShardedInteractionBot(intents=disnake.Intents.default(), **cluster_kwargs)
bot.cluster_id = cluster_id
bot.total_shards = total_shards
@bot.event
async def on_ready():
log.info("Cluster %s is now ready.", bot.cluster_id)
@bot.slash_command()
async def ping(interaction: disnake.CommandInteraction):
"""Pong!"""
await interaction.send("Pong!")
await bot.start(os.environ["TOKEN"])
if __name__ == "__main__":
asyncio.run(main())
Note:
It is not included here, however, a file called requirements.txt
should exist with the requirements relevant to running your bot.
Publishing your Docker image Link to heading
Assuming you have followed the guide correctly up until this point, you should now have everything you need to start deploying your Docker image and by extension your bot.
Every time you push code to your GitHub repo, the GitHub action we defined earlier should run and publish a new version of your bots Docker image for the world to see. It’s as simple as that.
Deploying your bot Link to heading
To complete this step, at-least one successful deployment of your GitHub action must have occurred.
On your server simply copy the docker-compose.yml
file from earlier. If you are unsure on how to do this, I recommend a tool such as FileZilla.
Note:
All these commands should be run in the same directory as your docker-compose.yml
file.
Pulling the latest Docker image
In order to pull the latest Docker image from your Docker container registry, simply run the following.
docker-compose pull
Starting your bot
For when you wish to run the bot and view the logs in the current terminal instance.
docker-compose up
Note: Your bot will stop when you close this terminal.
For more general purpose usage I recommend:
docker-compose up -d
This command will start all of your clusters in “detached mode”, this essentially means in the background. That means you can safely exit your current terminal and the bot will continue running.
Inspecting your bots logs
If your running in detached mode, you will not see your bots output. While not always useful, there are still times you will need to see your bots output.
docker-compose logs
Will allow you to view your bots logs up until the time you ran the command.
If you wish to have a ’live view’ of your logs, try the following command.
docker-compose logs -f
Shutting down your bot
More advanced bots may feature commands to shut down individual clusters, etc, however that is outside the scope of this guide.
docker-compose down
Will shut down all clusters. If you wish to gracefully handle this shutdown you will likely need to handle that in your code as this simply shuts the Docker container itself down.