Deploy to Kubernetes

This guide describes how to deploy a websockets server to Kubernetes. It assumes familiarity with Docker and Kubernetes.

We’re going to deploy a simple app to a local Kubernetes cluster and to ensure that it scales as expected.

In a more realistic context, you would follow your organization’s practices for deploying to Kubernetes, but you would apply the same principles as far as websockets is concerned.

Containerize application

Here’s the app we’re going to deploy. Save it in a file called app.py:

#!/usr/bin/env python

import asyncio
import http
import signal
import sys
import time

from websockets.asyncio.server import serve


async def slow_echo(websocket):
    async for message in websocket:
        # Block the event loop! This allows saturating a single asyncio
        # process without opening an impractical number of connections.
        time.sleep(0.1)  # 100ms
        await websocket.send(message)


def health_check(connection, request):
    if request.path == "/healthz":
        return connection.respond(http.HTTPStatus.OK, "OK\n")
    if request.path == "/inemuri":
        loop = asyncio.get_running_loop()
        loop.call_later(1, time.sleep, 10)
        return connection.respond(http.HTTPStatus.OK, "Sleeping for 10s\n")
    if request.path == "/seppuku":
        loop = asyncio.get_running_loop()
        loop.call_later(1, sys.exit, 69)
        return connection.respond(http.HTTPStatus.OK, "Terminating\n")


async def main():
    # Set the stop condition when receiving SIGTERM.
    loop = asyncio.get_running_loop()
    stop = loop.create_future()
    loop.add_signal_handler(signal.SIGTERM, stop.set_result, None)

    async with serve(
        slow_echo,
        host="",
        port=80,
        process_request=health_check,
    ):
        await stop


if __name__ == "__main__":
    asyncio.run(main())

This is an echo server with one twist: every message blocks the server for 100ms, which creates artificial starvation of CPU time. This makes it easier to saturate the server for load testing.

The app exposes a health check on /healthz. It also provides two other endpoints for testing purposes: /inemuri will make the app unresponsive for 10 seconds and /seppuku will terminate it.

The quest for the perfect Python container image is out of scope of this guide, so we’ll go for the simplest possible configuration instead:

FROM python:3.9-alpine

RUN pip install websockets

COPY app.py .

CMD ["python", "app.py"]

After saving this Dockerfile, build the image:

$ docker build -t websockets-test:1.0 .

Test your image by running:

$ docker run --name run-websockets-test --publish 32080:80 --rm \
    websockets-test:1.0

Then, in another shell, in a virtualenv where websockets is installed, connect to the app and check that it echoes anything you send:

$ python -m websockets ws://localhost:32080/
Connected to ws://localhost:32080/.
> Hey there!
< Hey there!
>

Now, in yet another shell, stop the app with:

$ docker kill -s TERM run-websockets-test

Going to the shell where you connected to the app, you can confirm that it shut down gracefully:

$ python -m websockets ws://localhost:32080/
Connected to ws://localhost:32080/.
> Hey there!
< Hey there!
Connection closed: 1001 (going away).

If it didn’t, you’d get code 1006 (abnormal closure).

Deploy application

Configuring Kubernetes is even further beyond the scope of this guide, so we’ll use a basic configuration for testing, with just one Service and one Deployment:

apiVersion: v1
kind: Service
metadata:
  name: websockets-test
spec:
  type: NodePort
  ports:
    - port: 80
      nodePort: 32080
  selector:
    app: websockets-test
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: websockets-test
spec:
  selector:
    matchLabels:
      app: websockets-test
  template:
    metadata:
      labels:
        app: websockets-test
    spec:
      containers:
      - name: websockets-test
        image: websockets-test:1.0
        livenessProbe:
          httpGet:
            path: /healthz
            port: 80
          periodSeconds: 1
        ports:
        - containerPort: 80

For local testing, a service of type NodePort is good enough. For deploying to production, you would configure an Ingress.

After saving this to a file called deployment.yaml, you can deploy:

$ kubectl apply -f deployment.yaml
service/websockets-test created
deployment.apps/websockets-test created

Now you have a deployment with one pod running:

$ kubectl get deployment websockets-test
NAME              READY   UP-TO-DATE   AVAILABLE   AGE
websockets-test   1/1     1            1           10s
$ kubectl get pods -l app=websockets-test
NAME                               READY   STATUS    RESTARTS   AGE
websockets-test-86b48f4bb7-nltfh   1/1     Running   0          10s

You can connect to the service — press Ctrl-D to exit:

$ python -m websockets ws://localhost:32080/
Connected to ws://localhost:32080/.
Connection closed: 1000 (OK).

Validate deployment

First, let’s ensure the liveness probe works by making the app unresponsive:

$ curl http://localhost:32080/inemuri
Sleeping for 10s

Since we have only one pod, we know that this pod will go to sleep.

The liveness probe is configured to run every second. By default, liveness probes time out after one second and have a threshold of three failures. Therefore Kubernetes should restart the pod after at most 5 seconds.

Indeed, after a few seconds, the pod reports a restart:

$ kubectl get pods -l app=websockets-test
NAME                               READY   STATUS    RESTARTS   AGE
websockets-test-86b48f4bb7-nltfh   1/1     Running   1          42s

Next, let’s take it one step further and crash the app:

$ curl http://localhost:32080/seppuku
Terminating

The pod reports a second restart:

$ kubectl get pods -l app=websockets-test
NAME                               READY   STATUS    RESTARTS   AGE
websockets-test-86b48f4bb7-nltfh   1/1     Running   2          72s

All good — Kubernetes delivers on its promise to keep our app alive!

Scale deployment

Of course, Kubernetes is for scaling. Let’s scale — modestly — to 10 pods:

$ kubectl scale deployment.apps/websockets-test --replicas=10
deployment.apps/websockets-test scaled

After a few seconds, we have 10 pods running:

$ kubectl get deployment websockets-test
NAME              READY   UP-TO-DATE   AVAILABLE   AGE
websockets-test   10/10   10           10          10m

Now let’s generate load. We’ll use this script:

#!/usr/bin/env python

import asyncio
import sys

from websockets.asyncio.client import connect


URI = "ws://localhost:32080"


async def run(client_id, messages):
    async with connect(URI) as websocket:
        for message_id in range(messages):
            await websocket.send(f"{client_id}:{message_id}")
            await websocket.recv()


async def benchmark(clients, messages):
    await asyncio.wait([
        asyncio.create_task(run(client_id, messages))
        for client_id in range(clients)
    ])


if __name__ == "__main__":
    clients, messages = int(sys.argv[1]), int(sys.argv[2])
    asyncio.run(benchmark(clients, messages))

We’ll connect 500 clients in parallel, meaning 50 clients per pod, and have each client send 6 messages. Since the app blocks for 100ms before responding, if connections are perfectly distributed, we expect a total run time slightly over 50 * 6 * 0.1 = 30 seconds.

Let’s try it:

$ ulimit -n 512
$ time python benchmark.py 500 6
python benchmark.py 500 6  2.40s user 0.51s system 7% cpu 36.471 total

A total runtime of 36 seconds is in the right ballpark. Repeating this experiment with other parameters shows roughly consistent results, with the high variability you’d expect from a quick benchmark without any effort to stabilize the test setup.

Finally, we can scale back to one pod.

$ kubectl scale deployment.apps/websockets-test --replicas=1
deployment.apps/websockets-test scaled
$ kubectl get deployment websockets-test
NAME              READY   UP-TO-DATE   AVAILABLE   AGE
websockets-test   1/1     1            1           15m