Stop Parsing (unstructured) Text (pc-hass.de)
from Laser@feddit.org to linux@lemmy.ml on 03 May 2025 19:20
https://feddit.org/post/11845727

#linux

threaded - newest

Trent@lemmy.ml on 03 May 2025 19:58 next collapse

You might like jc

Laser@feddit.org on 03 May 2025 20:03 next collapse

Thanks, I never used it and had forgotten about it until now.

double_quack@lemm.ee on 04 May 2025 07:01 collapse

Nice! I didn’t know this

StrangeAstronomer@lemmy.ml on 03 May 2025 22:26 next collapse

venerable jq

Ha! jq was the bratty kid I yelled at to get off my lawn. Now he’s a drinking buddy, but still the youngest!

Laser@feddit.org on 04 May 2025 06:08 collapse

It’s true that compared to the other utilities, it’s rather new. First release was almost 13 years ago. awk, which I think is the closest comparison, on the other hand turns 50 in 2027… though new awk is only 40.

traches@sh.itjust.works on 04 May 2025 07:49 next collapse

Shout out to nushell for building an entire shell around this idea!

Laser@feddit.org on 04 May 2025 08:02 collapse

It’s a cool shell, I like ita lot more since I found out you can use ? to mark a field optional

MonkderVierte@lemmy.ml on 04 May 2025 11:02 collapse

A tradeoff between convenience and usecase. I personally would only use json/jq for complex data processing needs. But then i would use Python, not shell.

Laser@feddit.org on 04 May 2025 12:17 collapse

The issue is not only complexity, though it does play a role. You can also run into issues with pure text parsing, especially when whitespace is involved. The IP thing is a very classic example in my opinion, and while whitespace might not be an issue there (more common with filenames), the queries you find online in my opinion aren’t less complex.

Normal CLI output is often meant to be consumed by humans, so the data presentation requirements are different. Then you find out that an assumption you made isn’t true (e.g. due to LANG indicating a non-English language) and suddenly your matching rules don’t fit.

There are just a lot of pitfalls that can make things go subtly wrong, which is why parsing general CLI output that’s not intended to be parsed is often advised against. It doesn’t mean that it will go wrong.

Regarding Python, I think it has a place when you do what I’d call data set processing, while what I talk about is shell plumbing. They can both use JSON, but the tools are probably not the same.