Chat.olark.com login issues
Incident Report for Olark Live Chat
Postmortem

This morning, a memory leak on our operator chat console put our cluster in a state where a restart was necessary.

A restart of these servers errantly triggered a separate code path to run, which subsequently disabled logins and authentication to our main chat server – effectively preventing operators from chatting. Customers started to report issues at 10:03am EST.

After investigation, we were able to revert the errant code on our chat servers allowing authentication to start working normally again. We gave the all-clear at 1:23pm EST after a period of monitoring.

Since this incident, we have taken measures to ensure the memory leak doesn't re-occur, such as improved logging and error handling, to prevent this from happening again. We are also putting a team together to add safeguards in ensuring similar issues do not occur in the future.

Posted Jan 25, 2016 - 17:19 EST

Resolved
We are confident the issue has been resolved and are reflecting on the outage so we can take steps to prevent similar problems occurring in the future.
Posted Jan 25, 2016 - 14:05 EST
Monitoring
We have identified and fixed the issue affecting login to the Olark chat service. We're continuing to monitor all services. We will follow up with more information once we're satisfied the issue is fully resolved.
Posted Jan 25, 2016 - 13:36 EST
Identified
We have identified the root cause of this issue and are continuing to work towards restoring service. Thanks for your patience, more info soon.
Posted Jan 25, 2016 - 12:47 EST
Update
We are still investigating the root cause of this issue. We also have had reports of issues logging into non-chat.olark.com clients as well, and it seems to be related to the inital report. More info soon.
Posted Jan 25, 2016 - 11:27 EST
Investigating
We are investigating an issue this morning with some customers reporting problems connecting to our web-based chat client.
Posted Jan 25, 2016 - 10:41 EST