Customer Portal and Pages Generation Outage
Incident Report for Yext
Postmortem

Summary

On Friday, February 17th 2023, beginning at 4:30PM ET, Yext engineers identified high error rates across the Customer Portal, Admin Console, and CLI. Delays were also identified in updates to entity data across our Publisher Network, Pages, Live API, Search, and Analytics system. Updates made via platform APIs or custom ETLs may have also failed. 

Pages serving, Knowledge Assistant, Sandbox environments, and our Hitchhiker site were unaffected. Services started to be restored on Saturday, February 18th, at 1:00AM ET, although some flows may initially have returned stale data. By 3:30AM ET backlogged updates were processed and all systems had caught up.

Root Cause

Our primary asynchronous processing system had a major hardware failure which had downstream effects on many aspects of our platform. Once the issue was identified and fixed, services were able to process backlogged updates.

Remediation

We will be reviewing our asynchronous processing system, high availability configuration, and failover procedures to improve stability and speed to recovery.

Posted Mar 06, 2023 - 12:36 EST

Resolved
This incident has been resolved.
Posted Feb 18, 2023 - 07:04 EST
Update
We are continuing to monitor for any further issues.
Posted Feb 18, 2023 - 07:04 EST
Update
We are continuing to monitor for any further issues.
Posted Feb 18, 2023 - 01:28 EST
Monitoring
We have remediated the issue, and are processing the delayed data. We will continue to monitor the situation.
Posted Feb 18, 2023 - 01:02 EST
Update
We are continuing to work on a fix for this issue.
Posted Feb 18, 2023 - 01:01 EST
Update
We are continuing to work on a fix for this issue.
Posted Feb 18, 2023 - 01:00 EST
Update
We are continuing to work on a fix for this issue.
Posted Feb 18, 2023 - 00:59 EST
Update
We are continuing to work on a fix for this issue.
Posted Feb 18, 2023 - 00:37 EST
Update
This incident caused delayed entity data updates across our Publisher network, Pages Platform, Live API, Search, and Analytics System. During the Incident our Customer Portal, Admin Console, and CLI were unavailable or significantly degraded.
Updates made via our Knowledge or Admin APIs may have failed and API users should check their logs for failures and retry. Updates attempted via our Managed Services ETLs may have failed. Pages Serving, Knowledge Assistant, Sandbox, and our Hitchhiker Site were unaffected.

We have identified the underlying cause and have been working to get our systems fully running again. Most of our services are becoming available again.

Entity Updates are now available again via our Knowledge API, Admin Console, and CLI but propagation to down stream systems may continue to be delayed.
Posted Feb 17, 2023 - 23:54 EST
Update
We are continuing to work on a fix for this issue. Consumer Serving services remain available but may contain stale data.
Posted Feb 17, 2023 - 23:02 EST
Update
We are continuing to work on a fix for this issue.
Posted Feb 17, 2023 - 22:19 EST
Update
We are continuing to work on a fix for this issue.
Posted Feb 17, 2023 - 21:57 EST
Update
We are continuing to work on a fix for this issue.
Posted Feb 17, 2023 - 20:06 EST
Identified
We have identified the issue and are working on remediation.
Posted Feb 17, 2023 - 17:12 EST
Investigating
We are investigating reports of errors in the customer portal and pages generation. Consumer serving services are not affected.
Posted Feb 17, 2023 - 17:00 EST
This incident affected: Content (Content API, Management API), Listings (Listings Publishing), Search (Search Serving), Analytics (Analytics Ingestion), Pages (Pages Generation), and Customer Portal Login.