azorius.net

After almost half a century, I'm still doing it...
from ExtremeDullard@lemmy.sdf.org to linux@lemmy.ml on 18 Feb 19:12
https://lemmy.sdf.org/post/29725349

So I’m working on a server from home.

I do a cat /sys/class/net/eth0/operstate and it says unknown despite the interface being obviously up, since I’m SSH’ing into the box.

I try to explicitely set the interface up to force the status to say up with ip link set eth0 up. No joy, still unknown.

Hmm… maybe I should bring it down and back up.

So I do ip link set eth0 down and… I drive 15 miles to work to do the corresponding ip link set eth0 up

50 years using Unix and I’m still doing this… 😥

#linux

threaded - newest

despotic_machine@lemmy.dbzer0.com on 18 Feb 19:32 next collapse

Why don’t you use chained commands, or better yet simply create an alias that chains down/up, then use the alias instead?

catloaf@lemm.ee on 18 Feb 19:36 next collapse

Or use some kind of molly guard. Or have an OOB management channel.

You’d think you’d learn from your mistakes after one or two of them, not fifty years’ worth…

ExtremeDullard@lemmy.sdf.org on 18 Feb 19:39 next collapse

In my defense, I just installed the machine. I was configuring it from home after hours.

DasFaultier@sh.itjust.works on 18 Feb 20:29 collapse

after hours

I’ve configured PAM to not let me login remotely after hours, because I just know that someday I’ll want to fix “just this tiny thing” and I’ll break production because I’m too tired. I clearly need protection from myself, and this is one slice in Dr.Reasons’s Swiss cheese model.

Don’t let the people drag you down, this happens to all of us.

Cyber@feddit.uk on 19 Feb 19:47 collapse

Harsh (to yourself), but fair

atzanteol@sh.itjust.works on 18 Feb 20:29 next collapse

Don’t be shitty.

IsoKiero@sopuli.xyz on 18 Feb 22:58 collapse

You’d think you’d learn from your mistakes

Yes, that what you’d think. And then you’ll sit with a blank terminal once again when you did some trivial mistake yet again.

A friend of mine developed a habit (working on a decent sized ISP 20+ years ago) to set up a scheduled reboot for everything in 30 minutes no matter what you’re going to do. The hardware back then (I think it was mostly cisco) had a ‘running conrfig’ and ‘stored config’ which were two separate instances. Log in, set up scheduled reboot, do whatever you’re planning to do and if you mess up and lock yourself out the system will restore to previous config in a while and then you can avoid the previous mistake. Rinse and repeat.

And, personally, I think that’s the one of the best ways to differentiate actual professionals from ‘move fast and break things’ group. Once you’ve locked yourself out of the system literally half way across the globe too many times you’ll eventually learn to think about the next step and failovers. I’m not that much of a network guy, but I have shot myself in the foot enough that whenever there’s dd, mkfs or something similar on the root shell I automatically pause for a second to confirm the command before hitting enter.

And while you gain experience you also know how to avoid the pitfalls, the more important part (at least for myself) is to think ahead. The constant mindset of thinking about processes, connectivity, what you can actually do if you fuck up and so on becomes a part of your workflow. Accidents will happen, no matter how much experience you have. The really good admins just know that something will go wrong at some point in the process and build stuff to guarantee that when you fuck things up you still have availability to fix it instead of calling someone 6 timezones away in the middle of the night to clean up your mess.

Cyber@feddit.uk on 19 Feb 19:46 collapse

Without repeating my other comment. This approach saved my life many times

ExtremeDullard@lemmy.sdf.org on 18 Feb 19:38 collapse

Because I plain forgot I was remote. It’s as simple and as stupid as that.

despotic_machine@lemmy.dbzer0.com on 18 Feb 19:40 next collapse

Fair enough. I’ve done worse in my time as a keyboard jockey.

melroy@kbin.melroy.org on 18 Feb 20:15 next collapse

That is why you have KVMs..

eldavi@lemmy.ml on 18 Feb 20:34 next collapse

time to setup a console server so that you don’t do that again.

Lodespawn@aussie.zone on 18 Feb 21:38 collapse

Until they have to troubleshoot the console server …

eldavi@lemmy.ml on 18 Feb 22:09 collapse

then setup a super console server. lol

fossphi@lemm.ee on 18 Feb 23:26 next collapse

It’s console servers all the way down (up?)

eldavi@lemmy.ml on 19 Feb 00:39 next collapse

and you make each one geographically closer than the previous one until there’s one right next to you. lol

fossphi@lemm.ee on 19 Feb 09:16 collapse

So that’s why we have mobile phones

eldavi@lemmy.ml on 19 Feb 16:03 collapse

so long as you’re mobile, any phone can become a mobile phone. lol

catloaf@lemm.ee on 19 Feb 06:33 collapse

Nah, you only need two, each connected to the other. Use one to work on the other.

darklamer@lemmy.dbzer0.com on 18 Feb 23:57 collapse

I have once actually used a console server console server to troubleshoot a misbehaving console server.

eldavi@lemmy.ml on 19 Feb 00:41 collapse

i once worked at a place that had something like this and; it sounds silly; but i got a live demonstration that it was the smartest thing ever.

IsoKiero@sopuli.xyz on 18 Feb 23:01 next collapse

We’ve all been there. If you do this stuff for a living, you’ve done that way more than once.

friend_of_satan@lemmy.world on 20 Feb 02:03 collapse

That is a totally fair explanation. End of story. No blame. Honest mistake.

thingsiplay@beehaw.org on 18 Feb 19:36 next collapse

Remember what Bruce Lee said:

I fear not the man who has practiced 10,000 kicks once, but I fear the man who has practiced one kick 10,000 times.

twinnie@feddit.uk on 18 Feb 20:08 next collapse

I knew a guy who did this and had to fly to Germany to fix it because he didn’t want to admit what he’d done.

miss_demeanour@lemmy.dbzer0.com on 18 Feb 20:16 collapse

This hits…

melroy@kbin.melroy.org on 18 Feb 20:14 next collapse

@ExtremeDullard You're doing it wrong. Just setup a KVM behind your server. So then you never need to leave home again.

miss_demeanour@lemmy.dbzer0.com on 18 Feb 20:14 next collapse

There, but for the grace of god…

InnerScientist@lemmy.world on 18 Feb 20:32 next collapse

I have a failsafe service for one of my servers, it pings the router and if it hasn’t reached it once for an entire hour then it will reboot the server.

This won’t save me from all mistakes but it will prevent firewall, link state, routing and a few other issues when I’m not present.

Cyber@feddit.uk on 19 Feb 19:42 collapse

Until you block ICMP one day and then wonder why the server keeps rebooting…

(Been there. Done it)

satans_methpipe@lemmy.world on 18 Feb 20:38 next collapse

I hope you don’t admin any mission critical servers. That’s a first year mistake.

Edit:

Hi salty kiddos still making year one mistakes! Down voting this comment won’t improve your skill set. You will get walked out at a serious enough outfit for doing something like that to a prod system.

ExtremeDullard@lemmy.sdf.org on 18 Feb 20:40 collapse

This is a server I was setting up. It’s not doing anything useful at all at the moment, hence the lax work practice. The only reason I drove back to work is because it’s needed tomorrow and I wanted to finish setting it up tonite.

satans_methpipe@lemmy.world on 19 Feb 01:51 collapse

Gotcha. I disagree with your methodology of not being careful about non-prod systems. You stated you forgot it was remote. What happens the next time you do that but forget it’s prod or mix up terminals?

limelight79@lemm.ee on 19 Feb 10:19 collapse

I assume you’ve never made any mistakes, ever. What an arrogant attitude.

satans_methpipe@lemmy.world on 19 Feb 10:35 collapse

I have made mistakes. I will make more mistakes in the future. I will not repeat disruptive freshmen mistakes like the one described here.

plumbercraic@lemmy.sdf.org on 18 Feb 22:19 next collapse

Did this once on a router in a datacenter that was a flight away. Have remembered to set the reboot in future command since. As I typed the fatal command I remember part of my brain screaming not to hit enter as my finger approached the keyboard. 🤦‍♂️

ExtremeDullard@lemmy.sdf.org on 18 Feb 22:23 collapse

Have remembered to set the reboot in future command since

That’s not a bad idea actually. I’ll have to reuse that one. Thanks!

Cyber@feddit.uk on 19 Feb 19:38 collapse

This.

Do it. This saved my life on more than one occasion.

You’ll think “nah, it’ll be fine” and then at 11pm when your brain’s fried on vending machine coffee you’ll be glad that you did it… 3 times over…

Float@startrek.website on 18 Feb 23:07 next collapse

Every network engineer must lock themselves out of a node at some point, it is a rite of passage.

Tippon@lemmy.dbzer0.com on 18 Feb 23:34 next collapse

If it makes you feel any better, I did something just as infuriating a few years ago.

I had set up my home media server, and had finally moved it to my garage with just a power cable and ethernet cable plugged in. Everything was working perfectly, but I needed to check something with the network settings. Being quite new to Linux, I used a remote desktop tool to log in and do everything through a gui.

I accidentally clicked the wrong item in the menu and disconnected the network. I only had a spare ps/2 keyboard and mouse, and as the server was an old computer, it would crash if I plugged a ps/2 device in while it was running*.

The remote desktop stayed open but frozen, mocking me for my obvious mistake and lack of planning, with the remote mouse icon stuck in place on the disconnect menu.

*I can’t remember if that was a ps/2 thing, or something specific to my server, but I didn’t want to risk it

bobs_monkey@lemm.ee on 19 Feb 02:17 collapse

Old hardware used to get really upset before plug and play became common. I remember I was playing some old racing game with a joystick on a win95 box, and accidentally pulled the connector out, lost my entire game because the system flipped out.

terminhell@lemmy.dbzer0.com on 18 Feb 23:41 next collapse

I formated an OS drive by mistake last night, thought it was my flash drive…

vfsh@lemmy.blahaj.zone on 19 Feb 00:22 next collapse

Almost did the same last night on a device that has its internal drive (flash) mounted as mmc and the USB drive was sda

terminhell@lemmy.dbzer0.com on 19 Feb 02:03 collapse

That entire scenario scares me lol

markstos@lemmy.world on 19 Feb 16:21 collapse

I started to DBAN (wipe) my internal drive once instead of an attached drive. That was the last time I ran DBAN on a machine with any drives of value plugged in.

MXX53@programming.dev on 19 Feb 01:26 next collapse

Good to know that in another 30 years, I will still be doing the dumb shit I’ve been doing for the last 20.

sleepmode@lemmy.world on 19 Feb 03:14 next collapse

I was on-call and half awake when I got paged about a cache server’s memcached being down for the third time that night. They’d all start to go down like dominoes if you weren’t fast enough at restarting the service, which could overwhelm the database and messaging tiers (baaaaad news to say the least). Two more had their daemon shit the bed while I was examining it. Often it was best to just kick it on all of them to rebalance things. It was… not a great design.

So I wrote a quick loop to ssh in and restart the service on each box in the tier to refresh them all just in case and hopefully stop the incessant pages. Well. In my bleary eyed state I set reboot in the variable instead of restart. Took out the whole cache tier (50+) and the web site. First and only time I did that but that definitely woke me up. Oddly enough the site ran better after that for months as my reboots uncovered an undiscovered problem.

LeFantome@programming.dev on 19 Feb 04:02 next collapse

If I understand, this is less a complaint about how UNIX works and more a story about the consequences of careless mistakes.

This is why we have KVMs I guess. Though not every server has one of those.

catloaf@lemm.ee on 19 Feb 06:37 next collapse

I don’t buy servers without iDRAC enterprise licenses. It’s too damn useful.

toynbee@lemmy.world on 19 Feb 08:21 collapse

I’ve been trying to find a network capable KVM for home use. They’re all pretty expensive or lacking functionality. I don’t actually need one or I’d pull the trigger, but I sure have been tempted.

MangoCats@feddit.it on 19 Feb 13:26 next collapse

I had a remote relay box: 8 channels of power control, so I could at least power cycle machines from remote when all else failed.

I actually ended up not using it much at all, it was a nice security blanket, but the last time I decided that I wanted to power cycle something was about 6 years ago, and at that time I realized it had been over 3 years since I had previously used it, and that usage was more of a “let’s make sure this thing is working like I think it should” test.

Cyber@feddit.uk on 19 Feb 19:31 collapse

Check out JetKVM

toynbee@lemmy.world on 19 Feb 20:58 collapse

Nice! I currently have a PiKVM but haven’t been able to get it working with my NVR. Maybe this would work better.

Ephera@lemmy.ml on 19 Feb 05:42 next collapse

At $DAYJOB, we’re currently setting up basically a way to bridge an interface over the internet, so it transports everything that enters on an interface across the aether. Well, and you already guessed it, I accidentally configured it for eth0 and couldn’t SSH in anymore.

Where it becomes fun, is that I actually was at work. I was setting it up on two raspis, which were connected to a router, everything placed right next to me. So, I figured, I’d just hook up another Ethernet cable, pick out the IP from the router’s management interface and SSH in that way.
Except I couldn’t reach the management interface anymore. Nothing in that network would respond.

Eventually, I saw that the router’s activity lights were blinking like Christmas decoration. I’m guessing, I had built a loop and therefore something akin to a broadcast storm was overloading the router. Thankfully, the solution was then relatively straightforward, in that I had to unplug one of the raspis, SSH in via the second port, nuke our configuration and then repeat for the other raspi.

toynbee@lemmy.world on 19 Feb 08:47 next collapse

A decade and change ago, in a past life, I was tasked with switching SELinux to permissive mode on the majority of systems on our network (multiple hundreds, or we might have gotten above one thousand at that point, I don’t recall exactly). This was to be done using Puppet. A large number of the systems, including most of our servers, had already been manually switched to permissive but it wasn’t being enforced globally.

Unfortunately, at that point I was pretty familiar with Puppet but had only worked with SELinux a very few times. I did not correctly understand the syntax of the config file or setenforce and set the mode to … Something incorrect. SELinux interpreted whatever that was as enforcing mode. I didn’t realize what I had done wrong until we started getting alerts from throughout the network. Then I just about had a panic attack when I couldn’t login to the systems and suddenly understood the problem.

Fortunately, it’s necessary to reboot a system to switch SELinux from disabled to any other mode, so most customer facing systems were not impacted. Even more fortunately, this was done on a holiday, so very few customers were there to be inconvenienced by the servers becoming inaccessible. Even more fortunately, while I was unable to access the systems that were now in enforcing mode, the Puppet agent was apparently still running … So I reversed my change in the manifest and, within half an hour, things were back to normal (after some service restarts and such).

When I finally did correctly make the change, I made sure to quintuple check the syntax and not rush through the testing process.

edit: While I could have done without the assault on my blood pressure at the time, it was an effective demonstration of our lack of readiness for enforcing mode.

apt_install_coffee@lemmy.ml on 19 Feb 12:52 next collapse

A few months ago I accidentally dd’d ~3GiB to the beginning of one of the drives in a 4 drive array… That was fun to rebuild.

wewbull@feddit.uk on 19 Feb 13:51 next collapse

Your 4 drive raid5 array, right?

Right?!

Cyber@feddit.uk on 19 Feb 19:25 next collapse

not RAID10 I hope…

apt_install_coffee@lemmy.ml on 20 Feb 07:44 collapse

I wish.

It was a bcachefs array with data replicas being a mix of 1,2 & 4 depending on what was most important, but thankfully I had the foresight to set metadata to be mirrored for all 4 drives.

I didn’t get the good fortune of only having to do a resilver, but all I really had to do was fsck to remove references to non-existent nodes until the system would mount read-only, then back it up and rebuild it.

NixOS did save my bacon re: being able to get back to work on the same system by morning.

ArsonButCute@lemmy.dbzer0.com on 19 Feb 16:40 collapse

Like 3 weeks ago on my (testing) server I accidentally DD’d a Linux ISO to the first drive in my storage array (I had some kind of jank manual “LVM” bullshit I set up with odd mountpoints to act as a NAS, do not recommend), no Timeshift, no Btrfs snapshot. It gave me the kick in the pants I needed to stop trying to use a macbook air with 6 external hard drives as a server though. Also gave me the kick in the pants I needed to stop using volatile naming conventions in my fstab.

MangoCats@feddit.it on 19 Feb 13:23 next collapse

It’s not Unix, it’s you.

MangoCats@feddit.it on 19 Feb 13:28 collapse

For clarity, I have done it myself - plenty, but not just on Unix boxes.

markstos@lemmy.world on 19 Feb 16:25 next collapse

I was scared to move the cloud for this reason. I was used to running to the server room and the KVM if things went south. If that was frozen, usually unplugging the server physically from the switch would get it calm down.

Now Amazon supports a direct console interface like KVM and you can virtually unplug virtual servers from their virtual servers too.

lka1988@lemmy.dbzer0.com on 19 Feb 19:58 collapse

It’s VMs within VMs within VMs.

dependencyinjection@discuss.tchncs.de on 19 Feb 18:42 next collapse

Not SysAdmin but about a year into my first software engineer job I was working on the live DB in SQL without using BEGIN TRAN ROLLBACK TRAN.

Suffice to say I broke the whole system my making an UPDATE without a WHERE clause. Luckily we have regular backups but it was a lot of debugging with the boss before I realised it was me that caused the issue the client was reporting.

mlg@lemmy.world on 19 Feb 19:39 next collapse

Lol I’ve locked myself out of so many random cloud and remote instances like this that now I always make a sleep chain or a kill timer with tmux/screen.

Usually like:

./risky_dumb_script.sh ; sleep 30 ; ./undo.sh

./risky_dumb.script.sh

Which starts with a 30 second sleep, and:

(tmux) sleep 300 ; kill PID

moonpiedumplings@programming.dev on 19 Feb 20:07 next collapse

Use cockpit by Red Hat. It gives you a GUI to make networking changes*, and will check if the connection still works before making the change. If the connection doesn’t work (like the ip addresses changed), it will undo the change and then warn you. You can then either force the change through or leave it be.

*via NetworkManager only.

caseyweederman@lemmy.ca on 19 Feb 21:54 collapse

That’s probably because of netplan, right? You should be able to get the same results with just netplan try.

moonpiedumplings@programming.dev on 19 Feb 22:07 collapse

Netplan is an abstraction layer, so it can go over systemd-networkd, NetworkManager, or iproute. I suppose it’s better though, because it can be used with multiple backends.

caseyweederman@lemmy.ca on 19 Feb 22:35 collapse

Right, but the entirety of Cockpit is not necessarily required.

moonpiedumplings@programming.dev on 19 Feb 23:05 collapse

You don’t need to install cockpit on the server being configured, you can use it as a gui to connect from other machines via the flatpak, over ssh.

caseyweederman@lemmy.ca on 19 Feb 23:49 collapse

Right.
My point is that a wrench was needed and a batmobile was recommended.

moonpiedumplings@programming.dev on 20 Feb 01:19 collapse

No. Netplan uses it’s own yaml format, which people would have to learn and use. I don’t want to do that, I would rather just configure my existing networkmanager setup, rather than learning another abstraction layer over what is already an abstraction layer.

I understand that cockpit (and similar type tools) are “the whole kitchen sink” of utilities, and it may seem like they come with more than you may need. But that doesn’t change the fact that they get the job done, and in some usecases, are better than dedicated tools.

iriyan@lemmy.ml on 19 Feb 21:13 next collapse

I still prefer net-tools and use ifconfig eth0 up That ip mess I’d rather do without, and those funky UU device/interface names I wish them out of my system

By the way, what system/init/svc manager are you using? With 50y in your back, cron job to check if it is up and resetting it while you are away. You can always remotely cancel the cronjob … but it will be a new mistake not the old one :)

I started on Irix and ultrix if you remember those, what would I know :)

fmstrat@lemmy.nowsci.com on 20 Feb 00:50 next collapse

This is why IPMI is so important.

GaMEChld@lemmy.world on 20 Feb 08:41 collapse

Can also use Pi KVM to add a similar capability to non server grade hardware that doesn’t have it. I did that for a workstation once.

fmstrat@lemmy.nowsci.com on 21 Feb 03:08 collapse

Yup, I use PiKVM, too. Fun fact, PiKVM’s first content commit is a clone of my DIY IPMI repo 😉

Look, it’s’a me: github.com/…/70eebd5c59da26dc3f6ad56730adbb616055…

GaMEChld@lemmy.world on 21 Feb 08:17 collapse

Awesome job!

fmstrat@lemmy.nowsci.com on 21 Feb 12:07 collapse

Thanks! Yea, it was a really fun project to make back before there were any real options. And I’m glad the PiKVM team could expand upon it.

Somewhere along the way I lost the “based on” credit, likely whenever they fully modernized the stack. I wasn’t really keeping track, but did find it humorous when LTT said the creator complained someone based another project on them. I was like “Hmmmmmmm…” but just laughed because I didn’t make it for it to stagnate like it had been with me.

friend_of_satan@lemmy.world on 20 Feb 02:01 next collapse

I’ve done this kind of thing remotely in screen with ifdown eth0 ; sleep 10 ; ifup eth0 ;

kittenroar@beehaw.org on 21 Feb 01:06 collapse

Lol; I’ve done this too. Thankfully not to anything important.