Application server

The author of websockets isn’t aware of best practices for deploying network services based on asyncio, let alone application servers.

You can run a script similar to the server example, inside a supervisor if you deem that useful.

You can also add a wrapper to daemonize the process. Third-party libraries provide solutions for that.

If you can share knowledge on this topic, please file an issue. Thanks!

Graceful shutdown

You may want to close connections gracefully when shutting down the server, perhaps after executing some cleanup logic. There are two ways to achieve this with the object returned by serve():

  • using it as a asynchronous context manager, or

  • calling its close() method, then waiting for its wait_closed() method to complete.

On Unix systems, shutdown is usually triggered by sending a signal.

Here’s a full example for handling SIGTERM on Unix:

#!/usr/bin/env python

import asyncio
import signal
import websockets

async def echo(websocket, path):
    async for message in websocket:
        await websocket.send(message)

async def echo_server(stop):
    async with websockets.serve(echo, "localhost", 8765):
        await stop

loop = asyncio.get_event_loop()

# The stop condition is set when receiving SIGTERM.
stop = loop.create_future()
loop.add_signal_handler(signal.SIGTERM, stop.set_result, None)

# Run the server until the stop condition is met.

This example is easily adapted to handle other signals. If you override the default handler for SIGINT, which raises KeyboardInterrupt, be aware that you won’t be able to interrupt a program with Ctrl-C anymore when it’s stuck in a loop.

It’s more difficult to achieve the same effect on Windows. Some third-party projects try to help with this problem.

If your server doesn’t run in the main thread, look at call_soon_threadsafe().

Memory usage

In most cases, memory usage of a WebSocket server is proportional to the number of open connections. When a server handles thousands of connections, memory usage can become a bottleneck.

Memory usage of a single connection is the sum of:

  1. the baseline amount of memory websockets requires for each connection,

  2. the amount of data held in buffers before the application processes it,

  3. any additional memory allocated by the application itself.


Compression settings are the main factor affecting the baseline amount of memory used by each connection.

By default websockets maximizes compression rate at the expense of memory usage. If memory usage is an issue, lowering compression settings can help:

  • Context Takeover is necessary to get good performance for almost all applications. It should remain enabled.

  • Window Bits is a trade-off between memory usage and compression rate. It defaults to 15 and can be lowered. The default value isn’t optimal for small, repetitive messages which are typical of WebSocket servers.

  • Memory Level is a trade-off between memory usage and compression speed. It defaults to 8 and can be lowered. A lower memory level can actually increase speed thanks to memory locality, even if the CPU does more work!

See this example for how to configure compression settings.

Here’s how various compression settings affect memory usage of a single connection on a 64-bit system, as well a benchmark of compressed size and compression time for a corpus of small JSON documents.


Window Bits

Memory Level

Memory usage

Size vs. default

Time vs. default




325 KiB





181 KiB





110 KiB





73 KiB





55 KiB






22 KiB



Don’t assume this example is representative! Compressed size and compression time depend heavily on the kind of messages exchanged by the application!

You can run the same benchmark for your application by creating a list of typical messages and passing it to the _benchmark function.

This blog post by Ilya Grigorik provides more details about how compression settings affect memory usage and how to optimize them.

This experiment by Peter Thorson suggests Window Bits = 11, Memory Level = 4 as a sweet spot for optimizing memory usage.


Under normal circumstances, buffers are almost always empty.

Under high load, if a server receives more messages than it can process, bufferbloat can result in excessive memory use.

By default websockets has generous limits. It is strongly recommended to adapt them to your application. When you call serve():

  • Set max_size (default: 1 MiB, UTF-8 encoded) to the maximum size of messages your application generates.

  • Set max_queue (default: 32) to the maximum number of messages your application expects to receive faster than it can process them. The queue provides burst tolerance without slowing down the TCP connection.

Furthermore, you can lower read_limit and write_limit (default: 64 KiB) to reduce the size of buffers for incoming and outgoing data.

The design document provides more details about buffers.

Port sharing

The WebSocket protocol is an extension of HTTP/1.1. It can be tempting to serve both HTTP and WebSocket on the same port.

The author of websockets doesn’t think that’s a good idea, due to the widely different operational characteristics of HTTP and WebSocket.

websockets provide minimal support for responding to HTTP requests with the process_request() hook. Typical use cases include health checks. Here’s an example:

#!/usr/bin/env python

# WS echo server with HTTP endpoint at /health/

import asyncio
import http
import websockets

async def health_check(path, request_headers):
    if path == "/health/":
        return http.HTTPStatus.OK, [], b"OK\n"

async def echo(websocket, path):
    async for message in websocket:
        await websocket.send(message)

start_server = websockets.serve(
    echo, "localhost", 8765, process_request=health_check