Applies to: Mattermost Server v10.11 and later
Symptoms: Database connections are exhausted under load, causing service degradation or outages.
🛑 Problem
In deployments with multiple data sources configured (DataSource, DataSourceReplicas, DataSourceSearchReplicas) and especially in HA/Cluster deployments, the total number of database connections Mattermost can open grows much faster than MaxOpenConns alone suggests - and can silently exceed what the database is configured to accept.
Mattermost opens one independent connection pool per configured data source on each app node, applying the full MaxOpenConns value to each pool. In a single-node deployment the total is multiplied by the number of data sources; in an HA cluster it is multiplied by both data sources and nodes:
total connections = MaxOpenConns Ă— data sources per node Ă— app nodes
Even on a single-node deployment, multiple data sources multiply the total. With the v10 default of MaxOpenConns=300 and three data sources configured (DataSource, DataSourceReplicas, DataSourceSearchReplicas):
300 Ă— 3 Ă— 1 = 900 possible connections (single node) 300 Ă— 3 Ă— 3 = 2700 possible connections (3-node HA cluster)
If the database max_connections (or pooler backend cap) is lower than this total, connections get rejected under load. The more nodes and data sources, the larger the gap between what Mattermost can open and what the database can accept - and the harder that gap is to survive during reconnect or other load-heavy events.
This applies equally to PostgreSQL (the supported and recommended database) and MySQL (deprecated as of Mattermost v11). The connection pool mechanics are backend-agnostic.
Symptoms
Administrators experiencing this issue will see:
# PostgreSQL FATAL: sorry, too many clients already # Via a connection pooler (e.g. ProxySQL) 1040: Too many connections Max connect timeout reached while reaching hostgroup ... after 15000ms
Additional symptoms:
- HTTP 500 errors on client-bootstrap endpoints (
/api/v4/users/me,/api/v4/channels) across all app nodes - Database CPU spikes to saturation during the reconnect window
- Connection pooler logs showing backends shunned and frontend connections queuing or rejected
- Service degradation that outlasts the triggering event, sustained by connection retry loops
âś… Solution
The v11 defaults (MaxOpenConns=100, MaxIdleConns=50) are a reasonable starting point for most deployments, but the right values depend on your infrastructure and workload. Some deployments need to raise MaxOpenConns to handle higher concurrency/heavier workloads - and must raise the database max_connections (and pooler caps) accordingly. Others may want to lower it below 100, for example to allow downsizing the database. Either direction is valid. The requirement in both cases is that every layer - Mattermost, database, pooler - is sized consistently against the same total connection count.
Step 1 - Calculate your total open connections
Use the formula to understand what your current configuration allows and what you need:
total connections = MaxOpenConns Ă— data sources per node Ă— app nodes
Example with v11 defaults on a 3-node HA cluster with a read replica:
100 Ă— 3 Ă— 2 = 600 possible connections
Start from what your database can handle given its memory and max_connections, and set MaxOpenConns to fit - or determine what your workload needs from Mattermost and size the database to match.
Step 2 - Update SqlSettings in config.json on all app nodes
Set MaxOpenConns to your calculated value and keep MaxIdleConns at half that to maintain the recommended 2:1 ratio:
"SqlSettings": {
"MaxOpenConns": 100,
"MaxIdleConns": 50
}⚠️ Important: Connection pools are built at server startup. The new values take effect only after a server restart. In HA deployments, perform a rolling restart - bring down and restart one node at a time and confirm it rejoins the cluster before proceeding to the next.
Step 3 - Verify the database max_connections is aligned
Confirm max_connections on the database is set above Mattermost's total open connections, with headroom for non-application connections (replication, monitoring exporters, admin sessions):
database max_connections > (MaxOpenConns Ă— data sources per node Ă— app nodes) + non-app connections
If your workload requires a higher MaxOpenConns, raise max_connections on the database to match - provided the database host has the memory to support it. On PostgreSQL, each connection allocates work_mem and related per-session buffers; on a memory-constrained node, a high connection cap causes OOM kills. If the database cannot be raised further, lower MaxOpenConns until Mattermost's total fits within what the database can accept.
Side note - connection poolers (PgBouncer, ProxySQL)
A connection pooler placed between Mattermost and the database multiplexes frontend connections (app → pooler) onto a smaller pool of backend connections (pooler → database). This allows the pooler to accept the full connection demand while holding far fewer real connections open against the database.
If you use a pooler, size its backend connection cap - the maximum connections the pooler is allowed to open per database node - below the database max_connections, and its frontend cap at or above Mattermost's total open connections:
pooler backend cap < database max_connections (e.g. 280 per backend node on a database with max_connections=300) pooler frontend cap ≥ total open connections (e.g. 1000 for a 3-node HA setup at MaxOpenConns=100, with read and search replica)
Start with Mattermost. If MaxOpenConns is not set first, every downstream number (pooler frontend, pooler backend, database cap) is sized against the wrong total.
Additional Resources
For more information, see:
Comments
Article is closed for comments.