Discord bot sharding & clustering

Table of contents
What is sharding
Sharding in Python libraries
What is clustering
How to cluster your bot

Note: This content relates directly to the experiences of the author. You should tailor your solution to your bots needs as at this scale, everyone has different requirements.

What is sharding Link to heading

Sharding is the process by which Discord helps to alleviate load by forcing your bot to create multiple connections to Discord to split the load. When sharding, each shard only handles a certain number of all guilds that your bot is in.

Each shard can handle a maximum of 2500 guilds, and bots in more than 2500 servers must use sharding. Luckily for us, discord.py and it’s forks automatically handle sharding using a process generally referred to as internal sharding.

Internal sharding refers to when all shards are handled by a single Python process and this is provided by the libraries by default. This method of sharding allows bot developers access to the same information as an un-sharded bot as all the information is still maintained by a single process. As your bot grows however, internal sharding will eventually give way to a more traditional approach to sharding that we refer to as clustering which is discussed later in this post.

Sharding in Python libraries Link to heading

Note: This only applies to discord.py and forks of discord.py

bot = commands.Bot()

This code can be changed to the following:

bot = commands.AutoShardedBot()

By making this code modification, the library you are using will automatically assign the correct number of shards and process them internally. While using discord.py or a fork of discord.py, the library automatically handles the processing of sharding without changing the user facing code. You can continue to rely on things like get_ methods, etc.

What is clustering Link to heading

As your bot scales, simply using commands.AutoShardedBot to handle your sharding becomes unfeasible as the requirements for your bot outgrow what a single process can support (You can think of one process as calling python3 main.py once), you may begin to notice your bots responses are slowing down among other actions. This is often a sign that you need to start clustering your bot, however, the simplest way to figure out if you need clustering or not is to simply look at your shard count.

Clustering is a simple way to distribute the load of your bot across multiple processes. Essentially, this means a single process will only handle commands for X amount of shards. For the purposes of this guide, the terms “process” and “cluster” are interchangeable.

A good starting point would be around 5 shards per cluster. After beginning to cluster keep an eye on your bot and how each cluster is fairing in order to determine the true amount of shards per cluster your bot requires.

Some factors which are generally good indicators include:

Noticing in your logs that the websocket is falling behind regularly
CPU cores maxing out under regular load
Noticeable slowness on Discord

Note: This tutorial sets you up with a bot where each process only maintains a cache of the data for the shards it is running. Attempting to use get_ methods on items from other shards may not always work.

How to cluster your bot Link to heading

Files Required Link to heading

For the purposes of this guide, we will be deploying our bot using Docker and docker-compose. Further to this, you will also need some form of docker container library. For the purposes of this guide we will be using GitHub Actions to deploy our images to the GitHub Docker Registry (This is outdated, ghcr.io is now the recommended Docker registry)

Within this guide we are going to work with a fictional bot called Dave. Dave has a repository under the GitHub user Skelmis and will be designed to have 2 clusters serving 16 shards.

Dockerfile

This Dockerfile will run the main.py file with Python 3.10 using the requirements defined in requirements.txt, the file should just be called Dockerfile.

FROM python:3.10

RUN mkdir -p /bot
WORKDIR bot

# We do this as a seperate step
# to reduce rebuild times
COPY ./requirements.txt /bot/requirements.txt
RUN pip3 install -r requirements.txt

COPY . /bot

CMD python3 main.py

Note: This is a simplistic example. For more fine-grained control you may consider setting environment variables such as ENV PIP_NO_CACHE_DIR=false or using a smaller Python docker image.

docker-compose

This file should be called docker-compose.yml.

version: '2'
services:
    dave_cluster_1:
        container_name: 'dave_cluster_1'
        image: docker.pkg.github.com/skelmis/dave/dave
        environment:
            CLUSTER: 1
            IS_PRODUCTION: 1
            TOKEN: ...
    dave_cluster_2:
        container_name: 'dave_cluster_2'
        environment:
            CLUSTER: 2
        extends:
            service: dave_cluster_1

Explanation:

CLUSTER is a number representing what number cluster this Docker image is. This starts from 1 in our code.
IS_PRODUCTION should be present in any production instances to tell the bot to launch the required clusters.
TOKEN should be your bot token.

When using the extends directive, our new image (dave_cluster_2 in this case) inherits from dave_cluster_1. This reduces the amount of duplicate environment variables as the only difference between clusters is the id which is provided by the CLUSTER environment variable.

For further security with your TOKEN and other environment variables, look into using a docker-compose.override.yml file to avoid accidentally pushing these variables to your source control. Alternatively, the usage of proper secrets managers should be considered.

GitHub Action

This file should be placed in the directory .github/workflows and can be called anything. For this tutorial we will call it publish.yml.

name: Publish Docker Image
on:
    push:
        branches: [master]
jobs:
    push_to_registry:
        name: Push Docker image to GitHub Packages
        runs-on: ubuntu-latest
        steps:
            - name: Check out the repo
              uses: actions/checkout@v2
              
            - name: Push to GitHub Packages
              uses: docker/build-push-action@v1
              with:
                  username: ${{ github.actor }}
                  password: ${{ github.token }}
                  registry: docker.pkg.github.com
                  repository: <GitHub Username>/<Repo>/<Name>
                  tags: latest
                  build_args: BRANCH=${{ github.ref }},COMMIT=${{ github.sha }}

You should replace the following with their lowercase version:

<GitHub Username> with your username.
<Repo> with your bot’s repository.
<Name> with a name of your choice, I like to just re-use <Repo>

For example, if my bot was in a repository called Dave under the GitHub account Skelmis then it would look like this. repository: skelmis/dave/dave

You may also like to tag your image, for the purposes of this guide we will use latest.

Note: Make sure the branch in branches: [ master ] is also correct for your repository.

Bot Files

This bot file is extremely simple, however, the concepts and classes used are transferable to your own bots. It should be called main.py.

import os
import asyncio
import logging

import disnake
from disnake.ext import commands

log = logging.getLogger(__name__)
logging.basicConfig(level=logging.INFO)

async def main():
    # Values set here clearly indicate we are not in production
    cluster_id = 0
    total_shards = 1
    cluster_kwargs = {}
    if os.environ.get("IS_PRODUCTION"):
        total_shards = 16
        cluster_id = int(os.environ["CLUSTER"])
        offset = cluster_id - 1  # As we start at 1
        number_of_shards_per_cluster = 10
        # Calculate the shard id's this cluster should handle
        # For example on cluster 1 this would be equal to
        # [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
        shard_ids = [
            i
            for i in range(
                offset * number_of_shards_per_cluster,
                (offset * number_of_shards_per_cluster) + number_of_shards_per_cluster,
            )
            if i < total_shards
        ]
        cluster_kwargs = {
            "shard_ids": shard_ids,
            "shard_count": total_shards,
        }
        
    bot = commands.AutoShardedInteractionBot(intents=disnake.Intents.default(), **cluster_kwargs)
    bot.cluster_id = cluster_id
    bot.total_shards = total_shards
    
    @bot.event
    async def on_ready():
        log.info("Cluster %s is now ready.", bot.cluster_id)
        
    @bot.slash_command()
    async def ping(interaction: disnake.CommandInteraction):
        """Pong!"""
        await interaction.send("Pong!")
        
    await bot.start(os.environ["TOKEN"])

if __name__ == "__main__":
    asyncio.run(main())

Note: It is not included here, however, a file called requirements.txt should exist with the requirements relevant to running your bot.

Publishing your Docker image Link to heading

Assuming you have followed the guide correctly up until this point, you should now have everything you need to start deploying your Docker image and by extension your bot.

Every time you push code to your GitHub repo, the GitHub action we defined earlier should run and publish a new version of your bots Docker image for the world to see. It’s as simple as that.

Deploying your bot Link to heading

To complete this step, at-least one successful deployment of your GitHub action must have occurred.

On your server simply copy the docker-compose.yml file from earlier. If you are unsure on how to do this, I recommend a tool such as FileZilla.

Note: All these commands should be run in the same directory as your docker-compose.yml file.

Pulling the latest Docker image

In order to pull the latest Docker image from your Docker container registry, simply run the following.

docker-compose pull

Starting your bot

For when you wish to run the bot and view the logs in the current terminal instance.

docker-compose up

Note: Your bot will stop when you close this terminal.

For more general purpose usage I recommend:

docker-compose up -d

This command will start all of your clusters in “detached mode”, this essentially means in the background. That means you can safely exit your current terminal and the bot will continue running.

Inspecting your bots logs

If your running in detached mode, you will not see your bots output. While not always useful, there are still times you will need to see your bots output.

docker-compose logs

Will allow you to view your bots logs up until the time you ran the command.

If you wish to have a ’live view’ of your logs, try the following command.

docker-compose logs -f

Shutting down your bot

More advanced bots may feature commands to shut down individual clusters, etc, however that is outside the scope of this guide.

docker-compose down

Will shut down all clusters. If you wish to gracefully handle this shutdown you will likely need to handle that in your code as this simply shuts the Docker container itself down.