Mattermost's new Collapsed Reply Threads (Beta) feature will increase the resource utilization of your application servers and database. Please consider the following guide when upgrading to Mattermost release v5.37 and later, or if you’re planning to enable Collapsed Reply Threads while it’s in beta.
What are Collapsed Reply Threads?
Collapsed Reply Threads (Beta) offers an enhanced experience for users communicating in threads and replying to messages. Our goal is to improve users’ ability to process channel content, find, follow, and resume conversations more easily, and keep threaded conversations focused.
Should I enable Collapsed Reply Threads while it is in beta?
Collapsed Reply Threads are available in beta in Mattermost Cloud and Mattermost Server v5.37 and later. As we work toward promoting the feature to general availability, it's expected that you may experience bugs and server performance implications as we stabilize the feature.
In particular, you should expect increased utilization of your server and database resources. We do not recommend enabling Collapsed Reply Threads while it’s in beta if:
1. You do not have the infrastructure to monitor server performance, such as Prometheus and Grafana.
2. You cannot readily scale up your database size: Deployments with many users or large posts tables (10M+ rows) should be ready to roughly double their database size if needed in response to increased load.
3. You are running the Mattermost application server and database server on the same machine. We highly recommend using two separate machines whether or not you plan to enable Collapsed Reply Threads.
4. Your organization is sensitive to periods of downtime for server maintenance or upgrades that may be required during troubleshooting (see below).
If any of the above statements apply to your organization, we recommend waiting to enable Collapsed Reply Threads until it is promoted to general availability in Q1 2022.
What are the symptoms of increased server load on the system?
Affected deployments may report increased database CPU utilization and row updates. For end-users, this may result in:
- Timeouts: Manifests as failure to load channels or successfully post messages, leaving users with infinite loading indicators. Similarly, mobile devices may have trouble connecting to the server, showing “no connection banners” or frequently disconnecting.
- Slow API responses: Manifests as slowness in critical workflows such as scrolling to load more posts, navigating between channels, and posting messages.
What can I do to troubleshoot if my deployment is experiencing heavily increased resource utilization?
There are a number of troubleshooting steps to take that can help remedy abnormal increases in resource utilization, in order:
- Upgrade to v5.37.6 (ESR), v5.39.3, v6.0.4, v6.1.1 or later. These patch releases address server performance issues (unrelated to Collapsed Reply Threads) observed in recent releases. Specifically, MM-40050 and MM-39433.
- As a general practice, we recommend running
vacuum analyze(Postgres) on the tables in your database after every Mattermost upgrade. This operation must be done for the full database, where the
Threadmembershipstables are particularly important for Collapsed Reply Threads functionality. Please be aware that this is a resource intensive operation, especially on large tables such as the
Poststable with full-text indexes. This operation should be done with some consideration during off hours as the operation increases utilization while in progress and may impact live system performance. Please see related documentation for Postgres and MySQL.
- Ensure you are running the Mattermost application server and database server on separate machines. See our documentation for more information on architecture and scaling.
- Scale up your database server. Some large instances have been successful after doubling their database resources.
- You'll also need to tune the database in the database configuration after scaling up the database server, otherwise increasing database resources won't have the desired effect.
Given the complexity of Collapsed Reply Threads, increased resource utilization after enabling the feature is normal. However, if your database resource utilization is still unacceptably high after attempting the above steps, consider disabling Collapsed Reply Threads:
- First, attempt to disable `CollapsedThreads` in the System Console or config file. Disabling this setting turns off the feature for all end-users.
- If necessary, you may also attempt to disable `ThreadAutoFollow` in your config file (this setting is not available in the System Console). This setting is responsible for updating the
Threadmembershipstables to store followed threads for each user and the read or unread state of each followed thread. Disabling this setting means that users may lose track of their conversations if Collapsed Reply Threads are ever re-enabled, since this setting doesn't retroactively track threads users participated in while this setting is disabled. We only recommend disabling this setting in extreme circumstances where database performance is unmanageable, or if you never plan on using Collapsed Reply Threads in your instance.
What is Mattermost doing to address the known performance issues?
We have a number of work items in-progress that will help us move toward a general availability release of Collapsed Reply Threads in Q1 2022, including:
- Address known optimizations around database queries and code performance. To keep track of these improvements please follow the JIRA epic here.
- Update our load testing framework to better simulate heavy load due to threading. This will allow server administrators to simulate real-world usage of their deployment at scale with Collapsed Reply Threads enabled. Load testing prior to enabling the feature into production is highly recommended to identify if the current server hardware is sufficient to support the expected demand.
- To see a full list of known issues relating to Collapsed Reply Thread, please see our Kanban board here.
What does general availability of Collapsed Reply Threads mean?
General availability (GA) means we are confident in the performance and stability of the feature, and can recommend that all our customers enable it. GA will also bring more options for how Admins configure Collapsed Reply Threads on their instance, specifically Admins will be able to turn the feature on by default for all users. Please see this blog post for details: Looking ahead to general availability of Collapsed Reply Threads.
Given the complexity of Collapsed Reply Threads, increased resource utilization after enabling the feature is normal. As we approach general availability, we will appropriately update our hardware recommendations to provide guidelines for Administrators regarding appropriate resource sizing to support the feature at scale.