Summary

On March 24 between 10:32am and 8:12pm ET, and then again at 9:16pm to 6:00am ET the following morning, there was elevated load on services that provide access to Knowledge Graph data. The resulting degradation in response times caused delays in propagating updates to Live API and Answers, and it caused some sporadic failures in the Customer Portal as well.

Root Cause

A new usage pattern in accessing data from related entities caused a multiplicative increase in the amount of processing required for updates in a particular large account. Specific, but common, requests began to load >100x the amount of profile data, and this overwhelmed the in-memory caching system and caused delays to other requests.

Remediation

We identified the usage pattern leading to the elevated load and implemented a more efficient mechanism to support it while avoiding the expansion in resources required.

Posted Apr 08, 2021 - 08:36 EDT

Resolved

The mitigation was tested and deployed, and the elevated load on the component has not been seen again.

Posted Mar 31, 2021 - 11:26 EDT

Update

The system continues to operate within normal parameters. We are continuing to monitor the previously-affected component closely and develop/test the identified mitigation.

Posted Mar 29, 2021 - 10:25 EDT

Update

The system continues to operate within normal parameters. We are continuing to monitor the previously-affected component closely and develop the identified mitigation.

Posted Mar 26, 2021 - 09:41 EDT

Update

We are continuing to investigate the root cause of the elevated load on the component identified earlier, although it has been operating within typical ranges since 6AM ET. An additional mitigation (an efficiency improvement to our data access layer in Knowledge Graph) is under development.

Posted Mar 25, 2021 - 17:25 EDT

Update

There have not been any significant indexing delays since the last update, but we are continuing to investigate an elevated amount of load on the affected component. We will continue to monitor for indexing delays as the work day begins in the US (ET), and we will apply further mitigations if necessary.

Posted Mar 25, 2021 - 09:00 EDT

Monitoring

We have identified the root case and implemented a mitigation. All data served by Live API and Answers is fully up to date and we have not seen further delays in the time since.

Posted Mar 24, 2021 - 20:58 EDT

Identified

We are investigating slowdowns to our indexing process, which began around 12 ET. Customers may experience delays in seeing new or updated entity data from Answers or the Live API.

Posted Mar 24, 2021 - 15:24 EDT

This incident affected: Content (Content API) and Search (Search Serving).