scrubbles@poptalk.scrubbles.tech
on 18 Dec 20:31
nextcollapse
Anyone know:
How to rip a wiki from something like fandom and save it in a format that could be uploaded to this and
If that’s legal in the first place?
PhilipTheBucket@ponder.cat
on 18 Dec 20:42
nextcollapse
Paste each article’s raw source to ChatGPT, ask it to do it for you. If there are too many, you can automate it through the API for a negligible cost.
Is it not.
ComradeMiao@lemmy.dbzer0.com
on 18 Dec 21:17
collapse
Maybe also wget the website.
I’d be careful with using “ai.” Sometimes ChatGPT makes up answers even when you provide it with the data. -source it lies to me all the time
PhilipTheBucket@ponder.cat
on 18 Dec 21:57
collapse
Converting from one format to another, it can do like gangbusters. I wouldn’t trust it to summarize stuff from its training data, it can do a little bit better with summarizing stuff you give it, but just mechanically finding the text and putting it verbatim into a different markup it’s pretty capable with.
ComradeMiao@lemmy.dbzer0.com
on 18 Dec 23:28
collapse
Even reformatting has caused me issues. My best example is I gave it 100 citations in a non standardized format and asked for MLA. It returned 100 in MLA but randomly 10 of the books were made up. It decided to delete ten I sent at random and make them up instead of just giving me what I sent
PhilipTheBucket@ponder.cat
on 18 Dec 23:30
collapse
Oh… yeah, you might have a point. Beyond a certain size of repeated things, it sometimes goes haywire, I’ve seen that.
ComradeMiao@lemmy.dbzer0.com
on 18 Dec 23:37
collapse
I didn’t consider length! That’s a good point too
Another goofy example is I asked it a python question using a specific package import . I sent a big chunk of code. It answered using a package I wasn’t even importing breaking everything. It could never figure it out either lol
PhilipTheBucket@ponder.cat
on 18 Dec 23:49
collapse
Yeah, that kind of thing requires reasoning, and it goes awry almost immediately. It’s still pretty useful for generating snippets of boilerplate or finding stuff in big chunks of code, but I more or less gave up on having it actually create anything nontrivial in code.
Nothing4You@programming.dev
on 18 Dec 21:33
nextcollapse
you might find some inspiration from breezewiki.com - either its codebase directly or using it as an intermediary while scraping
Except where otherwise permitted, the text on Fandom communities (known as “wikis”) is licensed under the Creative Commons Attribution-Share Alike License 3.0 (Unported) (CC BY-SA).
You can get the raw article text from fandom by clicking the edit button. Then you need to convert it from wikitext to markdown, there are varioustools for that. Finally post it on Ibis. Fandom also has an API, so you could write a script to automate all this.
This is a cute idea but I’ve generally been satisfied with the idea of using Gitit. That’s a wiki backed by a git repo, so you can use normal git commands to propagate updates between servers. github.com/jgm/gitit or “apt install gitit”.
threaded - newest
Anyone know:
How to rip a wiki from something like fandom and save it in a format that could be uploaded to this and
If that’s legal in the first place?
Maybe also wget the website.
I’d be careful with using “ai.” Sometimes ChatGPT makes up answers even when you provide it with the data. -source it lies to me all the time
Converting from one format to another, it can do like gangbusters. I wouldn’t trust it to summarize stuff from its training data, it can do a little bit better with summarizing stuff you give it, but just mechanically finding the text and putting it verbatim into a different markup it’s pretty capable with.
Even reformatting has caused me issues. My best example is I gave it 100 citations in a non standardized format and asked for MLA. It returned 100 in MLA but randomly 10 of the books were made up. It decided to delete ten I sent at random and make them up instead of just giving me what I sent
Oh… yeah, you might have a point. Beyond a certain size of repeated things, it sometimes goes haywire, I’ve seen that.
I didn’t consider length! That’s a good point too
Another goofy example is I asked it a python question using a specific package import . I sent a big chunk of code. It answered using a package I wasn’t even importing breaking everything. It could never figure it out either lol
Yeah, that kind of thing requires reasoning, and it goes awry almost immediately. It’s still pretty useful for generating snippets of boilerplate or finding stuff in big chunks of code, but I more or less gave up on having it actually create anything nontrivial in code.
you might find some inspiration from breezewiki.com - either its codebase directly or using it as an intermediary while scraping
It’s legal if credit is given and it’s shared under CC-BY-SA.
www.fandom.com/licensing
Except where otherwise permitted, the text on Fandom communities (known as “wikis”) is licensed under the Creative Commons Attribution-Share Alike License 3.0 (Unported) (CC BY-SA).
You can get the raw article text from fandom by clicking the edit button. Then you need to convert it from wikitext to markdown, there are various tools for that. Finally post it on Ibis. Fandom also has an API, so you could write a script to automate all this.
This is a cute idea but I’ve generally been satisfied with the idea of using Gitit. That’s a wiki backed by a git repo, so you can use normal git commands to propagate updates between servers. github.com/jgm/gitit or “apt install gitit”.