Incident Alerted

Incident Report for Olark Live Chat

Postmortem

Beginning Friday, May 9th, we detected elevated disconnects affecting our Classic (legacy) agent console. Agents reported unexpected disconnections from the Classic (chat.olark.com) agent console while logged in. Agents could reconnect by refreshing the page. These disconnections only affected our Classic (legacy) app; our modern app was not affected by this issue.

After initial investigations, our engineering team decided to improve disconnection retry logic in the Classic agent console; this unfortunately exacerbated the problem for those agents and we rolled back those changes.

In an effort to avoid taking Olark Classic completely offline for a full maintenance window, our engineering team began rebooting and clearing state in individual systems incrementally. This helped, but disconnect rates remained elevated.

Ultimately, we ended up re-provisioning and upgrading the cluster where our Classic console runs to forcibly clear all possible network state, and to eliminate the potential of underlying Google Cloud network issues that could cause these interruptions. After the cluster upgrade completed, we saw network interruptions slowly return to normal levels.

Final mitigations were in place by the afternoon of Friday, May 23rd, and we monitored the rate of disconnects closely over the holiday weekend. By Tuesday morning, May 27, we were confident that systems were fully stable and marked the incident resolved. 

Our engineering team has placed new monitoring in place to detect similar issues in the future, and we have a runbook to re-provision the relevant systems if this occurs again.

Additionally, our customer service team is actively working with legacy customers (who are still using our Classic agent console) to smoothly migrate to our new app for an improved agent experience.

Posted May 29, 2025 - 14:41 PDT

Resolved

This incident has been resolved.
Posted May 27, 2025 - 07:38 PDT

Monitoring

We've completed our work and are continuing to monitor closely.
Posted May 20, 2025 - 17:44 PDT

Update

We're continuing work on permanent resolution for this issue. Our scheduled maintenance window has closed but we expect heightened disconnects to continue for another 30-40 minutes. We'll update here when work is complete.
Posted May 20, 2025 - 17:23 PDT

Identified

Unfortunately, we're seeing an elevated rate of unexpected agent disconnects in Olark Classic. We're continuing to work on a resolution for is as our highest priority.

In the meantime, any teams that would like to utilize our newer agent experience (which is built on a separate platform and not affected by the disconnects issue we're seeing in Olark Classic) are encouraged to have an admin reach out to our support team (support@olark.com) and we'll facilitate.
Posted May 15, 2025 - 10:58 PDT

Monitoring

Further updates have been deployed and users should see Olark Classic begin to return to normal. We're continuing to monitor closely.
Posted May 14, 2025 - 14:20 PDT

Update

Changes we implemented yesterday appear to have slowed the instance of disconnection but we are still seeing unexpected disconnections from Olark Classic. Resolution of this issue continues to be our top priority.

In the meantime, any teams that would like to utilize our newer agent experience (which is built on a separate platform and not affected by the disconnects issue we're seeing in Olark Classic) are encouraged to have an admin reach out to our support team (support@olark.com) and we'll facilitate.
Posted May 14, 2025 - 08:50 PDT

Update

We've deployed a fix for the periodic agent disconnects in Olark Classic and are monitoring the result closely.
Posted May 13, 2025 - 15:30 PDT

Update

We are continuing to work on a resolution for the unexpected agent disconnects in Olark Classic.
Posted May 13, 2025 - 07:48 PDT

Update

We are continuing to work on a fix for this issue.
Posted May 12, 2025 - 16:00 PDT

Update

We're continuing to work on a fix for the Classic Olark (chat.olark.com) agent disconnects issue. We'll keep updating here.
Posted May 12, 2025 - 14:57 PDT

Update

We're continuing to work on a fix for the cause of the Olark Classic agent disconnects.
Posted May 12, 2025 - 12:05 PDT

Identified

We believe we've identified the cause of the Olark Classic agent disconnects and are implementing a fix.
Posted May 12, 2025 - 11:09 PDT

Update

We are continuing to investigate this issue.
Posted May 12, 2025 - 10:38 PDT

Update

We're continuing to investigate reports of agents being disconnected from Classic chat (chat.olark.com) unexpectedly.
Posted May 12, 2025 - 09:13 PDT

Update

We're investigating reports of agents being disconnected from chat unexpectedly. We'll update here as we have additional information.
Posted May 12, 2025 - 08:15 PDT

Investigating

We've detected an issue and are working to resolve this quickly. We'll have an update within the hour.
Posted May 12, 2025 - 08:03 PDT
This incident affected: chat.olark.com, Slack Integration, and Slack Messaging.