Beginning on Sunday, March 3rd, 2019, at 11:18 AM EST, engineering was alerted of elevated error rates in the Platform API. Mitigation efforts began immediately, and a fix was deployed by 11:42 AM, at which point error rates returned to normal.
An routine upgrade which had increased the serving capacity for an API endpoint caused errors under extremely high load. Reverting the change reduced errors immediately. Going forward, we plan to double the server capacity and add additional load based testing to catch these issues.