Answers indexing delay
Incident Report for Yext
Postmortem

Summary

On April 21 at 10am ET, a misconfiguration caused an automated provisioning process to create a very large number of new Answers Experiences. At 11:20am, the rapid increase overwhelmed the available resources and resulted in data updates and other new Answers Experiences to halt. We identified the misconfiguration and implemented mitigation steps to clean up the in-progress provisioning. We completed that work and verified all delayed data updates had gone through at 1:47pm ET. No data updates were lost.

Root Cause

When provisioning Answers for a new set of customers managed by a partner, we accidentally provisioned Answers for all of the partner's accounts, which number in the tens of thousands. We did not have enough capacity to absorb such a large increase in Answers search indexes, and after some time the system exhausted its resources and stopped processing all subsequent requests. 

Remediation

In response, we are planning to create a separate queue for work required to provision new Answers Experiences, such that updates to existing Answers Experiences would be unaffected if a similar scenario occurred in the future. Additionally, we are adding an alert for when the Answers index limit is being approached. Finally, we are evaluating different designs for Answers search indexes that would entirely avoid this resource limit in the future.

Posted Apr 24, 2021 - 14:04 EDT

Resolved
This incident has been resolved.
Posted Apr 22, 2021 - 16:05 EDT
Monitoring
We have completed mitigations and confirmed that all Answers experiences are up to date. We will continue to monitor the indexing pipeline.
Posted Apr 22, 2021 - 13:50 EDT
Identified
We have identified extended delays in indexing data from Knowledge Graph into the Answers Serving system and in the provisioning of new Answers Experiences. We have identified the cause and are working to mitigate it.
Posted Apr 22, 2021 - 13:01 EDT
This incident affected: Answers Serving.