System Outage
Incident Report for Amplifi.io
Postmortem

Amplifi Admins

I am sorry for the recent short notice service windows and unexpected issues. We are indeed moving forward not backward, but it might look the other way around on the surface.

I can assure you that things are going to be better. We have delivered very high availability over the last several years, but the issues with this month's transition to new technology has been cause of some concern. By the numbers we are at 94% this last 30 days compared to 99.9% over the recent years.

The online tech environment is changing fast, and in our push to move forward, we made some mistakes. Things that we tested, and were assured by our technicians would be ‘seamless’, went different when migrating our production servers. It boiled down to a couple elements out of about 20 that didn't move well. They involved our instance admin and media conversions. Everything else migrated successfully and is running on the new technology, but these elements required us to revert and write special bridges to finish the updates but leave these two elements in our previous environment. What the team did to solve the issues was pretty amazing, but it has had some issues and aftershocks. It has been a pretty intense run this month with some very long days for our technical teams.

So we will have two elements left to upgrade this year. It will be significant work, but we will take our time to do this right. When it comes time to deploy the changes, we will do so with as little disruption as possible. We will offer clear two-week advance notice and we will strive for zero unplanned outages.

Our roadmap has some fantastic upgrades planned for the instance admin and also for media conversions and we are accelerating related work in order to wrap up this phase by the end of the year. These updates all set the stage for new features coming in 2023 in the areas of; workflows, collaboration, content syndication, analytics, etc. All tall tasks, but we are upscaling and staffing in order to tackle them. We also realize getting our core service right and delivering high availability will be most critical to our customers. Sorry for the gap on that recently.

Some successes over the last month. We have moved to all new faster networks from 1 Gig to 25 Gigs. We have updated our languages to mesh with AWS cloud services, we have migrated our primary storage to 100% solid state (SSD) and new fast NVME technology, and we have tuned up our cross-regional backups engine. We have installed next generation security, operating systems and other cloud tech and will continue to make updates to meet the demands of our customers as well as market forces. We will definitely lead the way on many important aspects of managing and delivering your content.

Thanks for your understanding and patience during our update process - it's your input that we are developing features to meet. I hope that Amplifi can continue as an indispensable system in your tech stack and that despite a few hiccups we can provide a high return on your investment in Amplifi. Thanks for being on the journey with us. We truly feel honored and strive to live up to your trust.

Ken Garff
Founder / President

Posted Jul 21, 2022 - 00:08 PDT

Resolved
This incident has been resolved.
Posted Jul 20, 2022 - 23:35 PDT
Monitoring
Hi everyone,

I am sorry for the unexpected downtime today.

Access is being restored now and our teams are continuing to stabilize and monitor the platform.

We have an older service element that is acting up on the new hardware and updated server software we installed recently. We are absolutely focused on providing near perfect uptime and on doing the ‘basics’ very well before we add additional new features. We have accelerated our plan to update the older elements that involve access management and media conversions and which are behind this recent issue.

Our team has verified that data was not in jeopardy, but we understand it’s very frustrating when you can’t access the system.

Ken Garff
Posted Jul 19, 2022 - 15:51 PDT
Update
We are working closely with our technical team to get this outage resolved. We are committed to getting this solved as quickly as possible and are continuing to work on the issue. Thank you for your patience!
Posted Jul 19, 2022 - 12:22 PDT
Investigating
We are working to restore the service. Some micro service elements that involve user authority and some other older foundational services are giving us trouble on the new platform.
Posted Jul 19, 2022 - 08:35 PDT
This incident affected: Main System Web Interface and Network / Internet Services.