Small Instance Admin Adventures: How to definitely kill and maybe resurrect a lemmy instance
from oleorun@lemmy.fan to fediverse@lemmy.world on 26 Jun 16:52
https://lemmy.fan/post/188251
from oleorun@lemmy.fan to fediverse@lemmy.world on 26 Jun 16:52
https://lemmy.fan/post/188251
Hi y’all,
I wrote up part one of an eventual two-part documentary/record-of-failure on how I killed and resurrected my dear lemmy instance. It’s in my instance’s meta community that isn’t federated so I thought I would share it here.
If you are like me and enjoy a technical but not overly boring look at the backside (snicker) of running a lemmy instance read on. It’s a bit of a long post so here’s a tl;dr front and center.
If there’s a better place to post this please let me know. Mods/admins, same: If I need to put this elsewhere just let me know and I’ll be happy to comply!
Best,
-oleo
threaded - newest
your link errors out
Thanks for the head’s up. Is it timing out or 404?
Edit: It’s not a federated community so of course it will not appear unless you are a lemmy.fan user. That’s my bad. I put the post below.
OG post in case my instance dies, which is not unexpected:
The last couple of weeks have been truly eventful for lemmy.fan.
tl;dr (and first paragraph):
My plan was to wholeheartedly go all in on PieFed for lemmy.fan. I planned a cut-over date, started moving some things, and I was on track to get everything up and running solely on PieFed[.lemmy].fan^1^. Then lemmy.fan died. It’s back now.
/slash tl;dr
The reasons behind the migration were varied, but essentially boiled down to a lack of development from two full-time, donation-sponsored-but-still-underpaid developers, nutomic and dessalines from lemmy.ml.
The lifecycle of open-source software development is well-established in lore if not in fact: under- or unpaid developers work on a project that started as a labor of love. The love disappears, and the labor quickly turns to animosity and dread, as Git repos devolve into loud, angry people demanding this or that, reporting bugs but not contributing to fixing existing ones, and always the politics, politics, politics.
Then, PostgreSQL made the decision to utterly shit itself. Lemmy.fan suffered a catastrophic database failure; from what, who knows.
Lemmy.fan, as I once knew and loved, now lies in a pile of corrupted Postgres garbage files, gnashed angrily together by some destructive, demonic, database daemon.
I know little to nothing about PostgreSQL, and that is why I absolutely despise it. My life has been spent using, manipulating, troubleshooting, and migrating MariaDB and MySQL, two very sane and easy-to-use database systems that Just WorkTM. I do not want to learn something new. I fear I now have no choice but to learn this garbage database system and adopt the same relationship with it as I have with so many other things in my life: Don’t fuck with me, and I will not fuck with you. Cross me once, though, and you best be prepared for total annihilation.
I think Postgres has that sorted now, as we’re circling the saloon old-west style, revolvers pointed at one another, shaking slightly in unsteady hands, and eyeing one another for the moment one of us so much as blinks.
Also, it’s good to have backups. I did, and still do, but I decided not to restore them, and here’s why:
A long time ago, when lemmy.fan was but a tiny baby Docker container nestled snugly in a NASsinette in a suburban basement, I created the very first federated lemmy.fan using Yunohost to test things out. I was new to ActivityPub and had little idea as to how the federation worked in broad terms, so I set about this and that. Before long, lemmy.fan was puttering along, populated by the lemmycommunity bot that would dutifully scrape and subscribe to popular communities and instances across the fediverse. I had no local communities and was very new to Docker, so lemmy.fan and I expanded our knowledge: I by learning Docker, Portainer, and other tools, and lemmy.fan sucking in content from across the fediverse, growing and becoming better in its own self.
Then, I broke something, or lemmy.fan itself broke, or something happened that resulted in me having to destroy the instance.
I mistakenly then thought that the domain, lemmy.fan, was no longer available to federate with because I had used it already, exchanging messages using the ActivityPub protocol. Now that the domain was established in the greater fediverse, I thought that I had to go to a subdomain.
So I added real.lemmy.fan as the federation source, CNAMED it back to lemmy.fan, and believed that perhaps everything would just work.
And it did, for about a year or so.
As I trudged along keeping lemmy.fan mostly running, I grew and fostered weirdnews, a community that surpassed 1,000 subscribers. A few friends joined, I added a few other communities, and I kept the small instance chugging along splendidly.
Then something changed.
Lemmy.fan became slow, unreliable. Server errors were pretty commonplace and, while restarting the Docker containers fixed the problem, the underlying cause was a mystery to me. Lemmy.fan’s performance began dragging down other containers on the NAS, the PostgreSQL and lemmy frontend containers putting load averages in the 20s. Logs were useless and showed no particular fault.
I decided to migrate the instance from the basement NAS to a VPS, my thought process being that allocating more RAM and throwing a few more processor cores at lemmy.fan would fix things. For us, it was the vacation preceding the divorce; we being the hypothetical couple who tried to save a failing marriage by going to Hawaii and renewing our vows. Instead, lemmy.fan fell asleep on the couch while I gamed and watched reruns of old Star Trek episodes. Year of Hell Parts 1 and 2 back-to-back on Pluto?
Yes, please!
This is a death knell for any relationship, be it human and human or human and silicon/ele
This is almost as bad as the time The Best of Both Worlds left a fan group waiting all summer. Come on with part 2!
That might be true for some open source projects, but I personally am still very happy to work on Lemmy. If there are loud or angry people on Github we quickly ban them so that has never be a real problem. And politics on Lemmy are easy to block if you want to.
The vast majority of Lemmy servers are absolutely stable. Lemmy.ml has been running for 6 years now and there have never been any problems like you describe. Maybe you have corrupt hardware or something, but its definitely not something you can blame on the Lemmy software. You should join the admin chat, people there can probably help you to resolve the problem.
Development is definitely not stalled, there were 87 pull requests merged and 66 issues closed just in the last month. The only unresolved issues are very minor or only affect the development version. And there is a lot of progress on 1.0, it will include many features such as private communities and multi-communities.
PostgreSQL shitting itself is generally a hardware problem. I’ve had it “detect” faulty RAM modules in a few cases in the decades I’ve been using it.
it’s a feature, not a bug
Postgres is pretty resilient, in my experience. If it can’t recover from a hardware failure, I’d bet no other DB would be stable on such hardware. It’s surprising how Linux can soldier on on an almost dead disk, as long as it managed to boot.