Amazon AWS Outage
Incident Report for SMRT Systems
Resolved
This incident has been resolved.
Posted Nov 26, 2020 - 02:22 PST
Update
AWS Outage Update:
The latest from Amazon:

"We have restored all traffic to Kinesis Data Streams from Internet-facing endpoints, and we are continuing to incrementally restore all requests to Kinesis Data Streams using VPC Endpoints. We are also beginning to observe the incremental recovery of CloudWatch metrics functionality for new incoming metrics, and working towards full recovery. The backlog of metrics will take additional time to populate.

We will continue to keep you updated on our progress."

SMRT knows how hard today was and we thank you for your paitience! This was the worst AWS issue since at least 2017. Interestingly enough we had one customer demoing an offline inventory toll and they were able to handle pickups throughout the outage. This update will be available to all SMRT customers before the new year.
Posted Nov 25, 2020 - 23:03 PST
Update
AWS Outage Update:

While the system is back online Amazon Web Services is still experiences issues but is on the mend. Expect slower performance than usual for the rest of the day and reports to take longer than normal to update.

Here's the latest from Amazon:
"We continue to work towards recovery of the issue affecting the Kinesis Data Streams API in the US-EAST-1 Region. We also continue to see an improvement in error rates for Kinesis and several affected services, but expect full recovery to still take up to a few hours. For Amazon Cognito, the issues affecting APIs and authentication for user and identity pools has now recovered. For AutoScaling, delays in launching new instances has now recovered, however some scaling operations are still delayed due to delayed CloudWatch metrics. For EventBridge, we have seen partial recovery for the issue affecting delivery of Events.

We are actively working toward full recovery for all affected services, and will continue to provide updates regularly as we have new information to share."
Posted Nov 25, 2020 - 15:44 PST
Monitoring
The system is back up but with degraded performance. We're working with AWS to get us back to normal.
Posted Nov 25, 2020 - 12:31 PST
Update
Latest from Amazon:
"We continue to work towards recovery of the issue affecting the Kinesis Data Streams API in the US-EAST-1 Region. For Kinesis Data Streams, the issue is affecting the subsystem that is responsible for handling incoming requests. The team has identified the root cause and is working on resolving the issue affecting this subsystem.

The issue also affects other services, or parts of these services, that utilize Kinesis Data Streams within their workflows. While features of multiple services are impacted, some services have seen broader impact and service-specific impact details are below."
Posted Nov 25, 2020 - 11:45 PST
Update
AWS is still experiencing a severe outage that's affecting many companies including some big names like Roku, Adobe, and Ring to name a few.
Here's an article explaining the outage. Amazon has not given a clear answer as to what caused the issue.
https://techcrunch.com/2020/11/25/amazon-web-services-outage-takes-a-portion-of-the-internet-down-with-it/
Posted Nov 25, 2020 - 10:08 PST
Update
We're still working on resolving these issues, unfortunately all 3 of our AWS datacenter locations are down in us-east-1 (Virginia).
Posted Nov 25, 2020 - 09:05 PST
Update
We are continuing to work with Amazon Web Services to resolve this issue. It appears to be a global outage with their infrastructure:
https://downdetector.com/status/amazon/
Posted Nov 25, 2020 - 07:55 PST
Identified
We're investigating elevated reports loading SMRT. It appears our datacenter provider, AWS, is having intermittent issues.
Posted Nov 25, 2020 - 07:37 PST
This incident affected: SMRT POS (POS).