October 26, 2022

Overview

On October 26, 2022, The Palace Project suffered a service-wide disruption. This was initially identified and reported to the technical team at 7:30am ET, after which the team began to investigate.

During the disruption, patrons were unable to login to their library in the app. Users who had previously logged in were unable to view their bookshelf, or borrow and read new books. The administrative dashboard remained available throughout the disruption.

The service disruption was resolved 8:45am ET.

Assessment and Actions

We’ve determined that the cause of this disruption was an inability for our servers to route traffic to our partner libraries’ patron authentication services and to book distributor services.

This routing problem was caused by one of our engineers removing a development environment that was running in AWS. This development environment was mistakenly configured with the same network addresses as our production servers. Removing this environment inadvertently removed the configuration that the production servers used to talk to the internet.

In response to this service disruption we are completing work to add additional isolation to the production hosting environment to ensure that development environment configuration does not impact our production hosting environments. Additionally we are adding additional monitoring to our hosting environment that will alert us to any issues with our traffic routing in the future.