Unplanned downtime and update to Lemmy 0.19.10

a year ago by lwadmin to c/lemmyworld

Hello,

as some of you may have noticed we just had about 25 minutes of downtime due to the update to Lemmy 0.19.10.

Lemmy release notes: https://join-lemmy.org/...

This won't fix YouTube thumbnails for us, as YouTube banned all IPs belonging to our hosting provider.

We were intending to apply this update without downtime, as we're looking to apply the database migration that allows marking PMs as removed due to the recent spam waves.

Although this update contains database migrations, we expected to still be able to apply the migration in the background before updating the running software, as the database schema between the versions was backwards compatible. Unfortunately, once we started the migrations, we started seeing the site go down.

In the first minutes we assumed that the migrations contained in this upgrade were somehow unexpectedly blocking more than intended but still processing, but it turned out that nothing was actually happening on the database side. Our database deadlocked due to what appears to be an orphaned transaction, which didn't die even after we killed all Lemmy containers other than the one running the migrations.

While the orphaned transaction was pending, a pending schema migration was waiting for the previous transaction to complete or be rolled back, so nothing was moving anymore. As the previous transaction also didn't move anymore everything started to die. We're not entirely sure why the original transaction broke down, as it was started about 30 seconds before the schema migration query, which seems like that shouldn't have been broken by that happening at the same time.

Lemmy has a "replaceable" schema, which is applied separately from the regular database schema migrations, which runs every time a DB migration occurs. We unfortunately did not consider this replaceable schema migration in our planning, as we would otherwise have realized that this would likely have larger impact on the overall migration.

After we identified that the database had deadlocked, we resorted to restarting our postgres container, then run the migration again. Once we restarted the database, everything was back online in less than 30 seconds, which includes first running the remaining migrations and then starting up all containers again.

When we tested this process on our test instance prior to deploying this to the Lemmy.World production environment we did not run into this issue. Everything was working fine with the backend services running on Lemmy 0.19.9 and the database being upgraded to Lemmy 0.19.10 schema already, but the major difference here is the lack of user activity during the time of the migration.

Our learning from this is to always plan for downtime for Lemmy updates if any database migrations are included, as it does not appear to be possible to "safely" apply them even if they seem small enough to be theoretically doable without downtime.

voicesarefree 32 points a year ago

Appreciate the transparency

path: 0 16070648, hotness: undefined, score: 32, children: 0
Shadow 16 points a year ago

FYI I saw the same deadlocks on lemmy.ca when I tried to do a similar hot upgrade, which seems odd since that alter is innocuous enough.

path: 0 16071006, hotness: undefined, score: 16, children: 2
MrKaplan 17 points a year ago

yeah, the actual issue was the replaceable schema being applied, which starts with

DROP SCHEMA IF EXISTS r CASCADE;
CREATE SCHEMA r; 

and then continues here and here

path: 0 16071006 16071093, hotness: undefined, score: 17, children: 1
Shadow 11 points a year ago

Interesting, I've got some reading to do. Thanks for the links.

path: 0 16071006 16071093 16071160, hotness: undefined, score: 11, children: 0
voracread 12 points a year ago

Best to have fixed downtime for any maintenance like what banks do.

path: 0 16070686, hotness: undefined, score: 12, children: 3
MrKaplan 12 points a year ago

we all do this in our spare time. if we had set working hours then it would be easy to do so, but even then I don't think a daily maintenance window would be necessary when we don't changes that frequently.

we believed this change to be doable without downtime, otherwise we would've announced it ahead of time.

this change is important for our anti spam measures, especially if we tune it to be more aggressive, which might increase the false positive rate, it is important for us to be able to distinguish removed pms from user deleted pms in case we need to restore them at a later point.

due to that it's a somewhat urgent change that was fit in where we had spare time available to allow us to continue improving our efforts to combat pm spam effectively.

path: 0 16070686 16081915, hotness: undefined, score: 12, children: 1
voracread 2 points a year ago

I understand, I was thinking like it would give you less stress if you had a clean window. For an activity that is done in spare time should be enjoyable and not stress inducing.

:)

path: 0 16070686 16081915 16094822, hotness: undefined, score: 2, children: 0
Brkdncr 10 points a year ago

Nah, I’d rather the admins upgrade when they want. It’s already a free service no need to make it a full time unpaid job.

path: 0 16070686 16071907, hotness: undefined, score: 10, children: 0
Lost_My_Mind 10 points a year ago

Oh good. It wasn't me. I thought I somehow broke something.

path: 0 16070765, hotness: undefined, score: 10, children: 0
Brkdncr 10 points a year ago

Thanks!

path: 0 16070576, hotness: undefined, score: 10, children: 0
Blaze 6 points a year ago

Well done!

path: 0 16073966, hotness: undefined, score: 6, children: 0
woelkchen 6 points a year ago

This won’t fix YouTube thumbnails for us, as YouTube banned all IPs belonging to our hosting provider.

Isn't there a way that YT thumbnails could be generated locally by the person posting them who's then uploading the thumbnail to LW?

path: 0 16077561, hotness: undefined, score: 6, children: 3
MrKaplan 9 points a year ago

(mobile) apps could do this, but I don't think browser based apps would be able to. the generation of YouTube thumbnails works by requesting the html content of the YouTube page and then extracting a metadata component from it, where YouTube provides the actual preview image as a link. browsers set restrictions on how you can interact with other websites for security reasons and I dint think this would be allowed there.

manually this is of course doable, but it's rather cumbersome.

path: 0 16077561 16081718, hotness: undefined, score: 9, children: 2
woelkchen 2 points a year ago

Bummer

path: 0 16077561 16081718 16086771, hotness: undefined, score: 2, children: 0
FooBarrington 2 points a year ago

Would probably be doable with a browser extension, but that's quite a hassle.

path: 0 16077561 16081718 16087077, hotness: undefined, score: 2, children: 0
fury 5 points a year ago

I appreciate you

path: 0 16072348, hotness: undefined, score: 5, children: 0
lemmyworld
lemmyworld

@lemmy.world

login for more options
31279
770
44

This Community is intended for posts about the Lemmy.world server by the admins.

Follow us for server news 🐘

Outages πŸ”₯

https://status.lemmy.world

For support with issues at Lemmy.world, go to the Lemmy.world Support community.

Support e-mail

Any support requests are best sent to info@lemmy.world e-mail.

Report contact

Donations πŸ’—

If you would like to make a donation to support the cost of running this platform, please do so at the following donation URLs.

If you can, please use / switch to Ko-Fi, it has the lowest fees for us

Ko-Fi (Donate)

Bunq (Donate)

Open Collective backers and sponsors

Patreon

Join the team

go to feed...