Customer Portal Performance Degradation
Incident Report for Yext
Postmortem

Summary

On March 21st, at 2:02 PM EST, engineering received reports of elevated page load times in the Customer Portal. Investigation began immediately, and mitigations were implemented by 2:21 PM EST, at which time page load times returned to normal.

Root Cause

A bug in an asynchronous process occurred shortly after midnight on March 21st, causing the process to slowly increase its resource consumption. The unfettered resource consumption reached a critical point in the early afternoon, resulting in increased latency in Customer Portal requests. Once the source was identified, the asynchronous process was instantly terminated, releasing the resources.

Due to the gradual nature of the resource consumption, our alerting failed to detect the issue until it began to affect other systems. Going forward, we plan to implement mechanisms to prevent such runaway processes from consuming resources. We also plan to add alerting to actively detect latency increases over longer periods, to prevent such scenarios from impacting customer requests.

Posted 3 months ago. Apr 01, 2019 - 14:52 EDT

Resolved
This incident has been resolved.
Posted 3 months ago. Mar 22, 2019 - 11:14 EDT
Monitoring
We have implemented mitigations and page load times have returned to normal. We will actively monitor the Customer Portal for any issues.
Posted 3 months ago. Mar 21, 2019 - 17:21 EDT
Investigating
We are currently investigating reports of elevated page load times in the Customer Portal. We will update as soon as we have more details.
Posted 3 months ago. Mar 21, 2019 - 14:01 EDT
This incident affected: Customer Portal.