How are you parsing JSON on the command line?
from j4k3@lemmy.world to linux@lemmy.ml on 12 Jun 07:50
https://lemmy.world/post/16443016

I want to extract and process the metadata from PNG images and the first line of .safetensors files for LLM’s and LoRA’s. I could spend ages farting around with sed or awk but formats of files are constantly changing. I’d like a faster way to see a summary of training and a few other details when they are available.

#linux

threaded - newest

jet@hackertalks.com on 12 Jun 07:51 next collapse

jq

rtxn@lemmy.world on 12 Jun 07:54 next collapse

jq, and its Yaml sibling, yq.

UpperBroccoli@lemmy.blahaj.zone on 12 Jun 09:12 next collapse

Specifically this version of yq - there are other versions bundled with distros that look and act very differently and lack the potency of this version.

leverage@lemdro.id on 12 Jun 13:27 collapse

Seriously, can’t get those 15 minutes back.

huginn@feddit.it on 12 Jun 10:29 next collapse

I have a very handy command in my .vimrc for this -

command! JSON setlocal filetype=json | %!jq .

Anytime I’m in a json file that isn’t formatted it’s as simple as typing :JSON to have it all sorted.

hertg@infosec.pub on 13 Jun 04:47 collapse

And there is htmlq too, if you ever need to scrape some stuff from a website :)

rtxn@lemmy.world on 14 Jun 14:35 collapse

Naw, everybody knows that you have to use regex for that

patchexempt@lemmy.zip on 12 Jun 08:03 next collapse

jq, or if I need to do something wacky a one-off python script.

tiredofsametab@kbin.run on 12 Jun 08:07 next collapse

Previously, I coded something in Rust real quick to spit out and manipulate some JSON, but it looks like the jq/yq below would work fine.

Diplomjodler3@lemmy.world on 12 Jun 08:14 next collapse

Python is very good for working with JSON. Definitely will get you there faster than awk for anything not completely trivial.

coolmojo@lemmy.world on 12 Jun 08:20 next collapse

Have a look at miller

rickdg@lemmy.world on 12 Jun 08:28 next collapse

jless to know what the hell I’m looking at and then maybe jq

barkdoor@infosec.pub on 12 Jun 08:53 next collapse

Pipe to jless first to pick out targets then jq

If it is a small file and I want to do edits then use Yq to send it to Yaml and back again

Looking at whether duckdb is a better approach especially for querying, bulk transforms, python

Gutless2615@ttrpg.network on 12 Jun 09:58 next collapse

I’d probably go Python but I’m an idiot

CaptPretentious@lemmy.world on 12 Jun 11:09 next collapse

Probably not popular opinion, but pwsh (powershell). It’s got a lot of tooling built in and means I don’t have to learn a different tool just because I’m in a different system.

helpimnotdrowning@lemmy.sdf.org on 13 Jun 13:56 collapse

Big fan of running cat file.json | ConvertFrom-Json and just being able to do things quickly!

MasterBlaster@lemmy.world on 12 Jun 11:36 next collapse

For me, a C# developer by trade, this is easily solved with a one command C# call. It’s possible you already have dotnet 6 or 8 on your distro as there are many C# Linux apps now.

www.nuget.org/…/9.0.0-preview.4.24266.19

adept@programming.dev on 12 Jun 12:03 next collapse

Nushell is pretty nice.

pingveno@lemmy.ml on 14 Jun 04:32 collapse

Yeah, I’ve been learning some nushell. If you’re dealing with data, it’s just a great tool. So many sharp edges in the POSIX shell come from it being stringly typed, so having a strongly typed shell is extremely helpful.

Nibodhika@lemmy.world on 12 Jun 13:01 next collapse

A week ago I would have said jq, but just the other day I discovered nushell and have been loving it, if you deal with structured data often it’s way easier, just bear in mind it’s not POSIX compatible

bad_news@lemmy.billiam.net on 12 Jun 15:21 next collapse

If you have npm anyway, the npm json package is pretty nice, you can even edit with readable syntax

palordrolap@kbin.run on 12 Jun 17:52 next collapse

There are probably pre-written awk scripts out there that already do what you want, not that I know where they'd be.

That said, you might be better off using one of the bigger but still fairly commonly installed languages. There's bound to be things on PyPI (for Python) or CPAN (for Perl) that could be bolted together for example.

If you're really lucky there might even be something that covers your whole use-case, but I haven't checked.

semperverus@lemmy.world on 12 Jun 20:10 collapse

Python has built-in json parsing, as does (and i know this isnt gonna be popular) PowerShell.

Hammerheart@programming.dev on 12 Jun 23:52 next collapse

What are some goos resources for learning jq? I really struggle when it comes to nested keys/values which obviously limits my ability to use it.

timbuck2themoon@sh.itjust.works on 13 Jun 00:47 next collapse

Online json parser. Throw in some data and then structure a query.

It’ll keep updating the results as you tweak your query. A simple search will probably give you twenty that’ll work. I can’t remember what i normally use off the top of my head.

bizdelnick@lemmy.ml on 13 Jun 18:17 next collapse

man jq

Hammerheart@programming.dev on 14 Jun 02:07 collapse

I have perused it, but its both so dense and so broad that its not that helpful unless i know exactly what I’m looking for. I have also tried info and tldr. I actually like tldr the most,. although the exhaustiveness of the man pages must be admired. I dont find it to be the best teacher.

beejjorgensen@lemmy.sdf.org on 14 Jun 00:34 collapse

I hate to do this, but AI chatbots are typically pretty good at giving examples for things like this and you can learn from it.

xavier666@lemm.ee on 14 Jun 02:15 collapse

AI chatbots are very good for teaching. I’ll give them that.

beejjorgensen@lemmy.sdf.org on 18 Jun 16:05 collapse

I definitely use them a lot, but I think “very” is too strong a word. It’s pretty easy to get confident, contradictory information from them. They’re a good place to start and brainstorm, but all the information has to be verified either by running and testing the code, or by finding a human source.

xavier666@lemm.ee on 19 Jun 05:16 collapse

True. I wouldn’t use them for very complicated stuff. I currently use them for “what is x?” and “how is x different from y?” kinds of question.

One advantage of using an AI is that it removes a lot of fluff that you get on blogs. However, that can change very soon when our AI overlords figure out monetization.

eldavi@lemmy.ml on 14 Jun 04:01 collapse

i’m assuming that command line means bash; in which case jq and regex are your friends.

j4k3@lemmy.world on 14 Jun 04:32 collapse

I found a Python project that does enough for my needs. Jq looks super powerful though. Thanks. I managed to get yq working for PNG’s, but I had trouble with both jq and yq with safetensor files. I couldn’t figure out how to parse a string embedded in an inconsistent starting binary, and with massive files. I could get in and grab the first line with head. I tried some stuff with expansions, but that didn’t work and sent me looking for others that have solved the issue better than myself.