On Thursday, April 4, 2024, starting around 6:00 AM ET, Yext resellers saw many features missing from their customer portal and its navigation menu when they or their customers logged in. Some resellers started seeing their features return after 10:00 AM ET, and all resellers were returned to a normal view in their customer portal by 12:30 PM ET.
Additionally, a small number of resellers saw their Listings removed from their accounts even after access to the rest of their features were restored. Restoration of these missing Listings was completed at around 8:00 PM ET on Friday, April 5.
Also starting around 6:00 AM ET on Thursday, April 4, all Yext customers saw delays in the addition or removal of services in response to subscription and unsubscription requests. Processing of service changes by non-reseller customers returned to normal at around 8:00 PM ET the same day, and all delayed changes for these customers were also applied by that time. Processing of all new service changes by reseller customers returned to normal at around 12:30 AM ET on Friday, April 5, but some delayed service changes from during the day on April 4 were not fully applied until around 8:00 PM ET on Friday, April 5.
Yext has a process that rechecks the subscription status of most of our reseller customers every day. A bug in that process caused the subscription-tracking system to treat a subset of those customers as no longer subscribed to Yext services. As a result, the customer portal removed access to various features for these accounts. This issue was fixed by 12:30 PM ET on April 4 when this bug was rectified.
Unlike the customer portal, which directly queries the subscription-tracking system for current subscription status, our Listings management system handles subscription changes asynchronously, reacting to messages emitted by our subscription-tracking system after every service change. Because of this bug, the subscription-tracking system emitted a very large volume of service removal notifications, but the system that relays these messages was overwhelmed by the volume and stopped processing. As a result, the service removal notifications for only a few reseller customers made it to the Listings management system, limiting the scope of this incident.
During remediation work for this incident, we prevented the system that relays service change notifications from our subscriptions system to the rest of our systems from processing any notifications until we could be confident that those notifications were no longer affected by any bugs. This caused the delays in the addition or removal of services for all customers through the course of April 4.
Part of stabilizing the system for all new service changes by Thursday evening involved disregarding most service change notifications for reseller accounts. Remediation work on April 5 involved identifying affected accounts and manually bringing the Listings system back in sync with the subscriptions system for those accounts.
Because of the widespread effects that a bug in the process that rechecks the subscription status of most of our reseller customers can have, we plan to add additional testing stages to the deployment process of our subscription tracking system.
We are also planning to add additional operational tooling to our Listings system to reduce incident response times.