azorius.net

To boost Bookwyrm, there should be a tool to scrape book data from Amazon
from Novocirab@feddit.org to fediverse@lemmy.world on 18 Mar 2025 11:27
https://feddit.org/post/9392448

Bookwyrm’s catalog is currently quite limited. Every user can always add a book, of course, but a lot of people will not find the button or just don’t want to do this, and so they’ll leave the platform disappointed.

To bolster Bookwyrm’s chances against Goodreads etc., there should at least be a browser plugin where we can enter an Amazon link and have Bookwyrm’s data entry form be automatically populated with all data available from Amaozon (and, by cross-searching with the ISBN, with the book’s OpenLibrary ID etc.). Of course, the user should still check the data before ultimately submitting it. In creating the tool, one should also check if one needs to beware of copyright traps e.g. in the book descriptions.

(Theoretically, one might even go so far as to create a tool that scrapes book data in bulk, but this poses (a) increased legal risks and (b) will most likely lead to lots of duplicated books & authors on Bookwyrm, which for lack of an easy merging tool would be a real pain – the many existing duplicates are already annoying. Edit: And, as others have pointed out, there are a lot of fake & crap books on Amazon, so that indiscriminate scraping would flood Bookwyrm with entries that we really don’t want there.)

Is there currently any tool as I described above?

#fediverse

threaded - newest

simple@lemm.ee on 18 Mar 2025 11:49 next collapse

Bookwyrm does have a feature to fetch book data from OpenLibrary which is detailed enough. The problem last I checked is that the feature doesn’t replace data of already-existing books, so even if it exists in OL it won’t replace the empty listing in Bookwyrm.

If that feature worked properly or the admin of the instance would update to the newest database of OpenLibrary then it should work fine.

warmaster@lemmy.world on 18 Mar 2025 12:02 collapse

What if someone put up an instance dedicated only to pulling stuff from OL, wouldn’t those reviews federate to the other Bookwyrm instances ?

tofu@lemmy.nocturnal.garden on 18 Mar 2025 12:24 collapse

OL reviews are not pulled, just the book data

warmaster@lemmy.world on 19 Mar 2025 13:27 collapse

Damn. Thanks for clarifying.

aramis87@fedia.io on 18 Mar 2025 12:32 next collapse

There are a massive number of fake books on Amazon, though. There are AI-generated books, designed to be easily mistaken for books by real authors, or about recent high-news events, or popular series. There are people who steal an author's legitimate work and "publish" it as their own work, sometimes changing a small amount, sometimes changing nothing at all. There are people who watch upcoming book releases by popular authors and release fake books around the same time, hoping to pinch some of the sales. I'd rather have sparse but reliable data than give any authenticity to the scammers and thieves.

MajorHavoc@programming.dev on 18 Mar 2025 12:55 next collapse

ISBN Search is a non-monopolist source of the same information.

tofu@lemmy.nocturnal.garden on 18 Mar 2025 13:44 next collapse

Search currently includes OpenLibrary and Inventaire, plus some more I think but I’m not sure right now.

That doesn’t mean a Browser plug-in couldn’t be useful ofc, but Bookwyrm is not limited to what it’s users manually add - even though, through federation, that’s quite a lot already.

morrowind@lemmy.ml on 19 Mar 2025 04:01 collapse

Just wondering, do you guys care about federation in this case or do you just want a goodreads alternative? Because building a global book database is not well suited to decentralization, and there are centralized indie alternatives that are more complete

Fredthefishlord@lemmy.blahaj.zone on 19 Mar 2025 05:29 collapse

Yeah. There’s, quite frankly, no real risk of censorship on such a platform to begin with. Kinda takes out the whole point of federation for it