On Thursday, March 2nd, 2023 at 2:51PM ET, less than 1% of calls to the Search API began timing out. At 3:19PM ET, Yext engineering was alerted to the issue and the team began investigating. At 3:44PM ET, the issue was resolved.
The deployment of a configuration change resulted in a temporary routing issue between services, causing our consumer serving regions to failover to a single consumer serving region. This led to a single region being responsible for all consumer serving, and it quickly became overwhelmed due to the spike in inbound traffic.
The team ran a restart of the affected services and once they confirmed that they were working, they enabled the consumer serving regions that had been failed over. The team also has deployed a change to allow better load distribution between systems, which should solve the issue of a single machine receiving all inbound requests during a failover.