According to this, Google has 91.06% of the search engine market. So for Reddit, they’re talking about cutting themselves off from a little under 9% of people searching out there. Which…I mean, it isn’t insignificant, but it isn’t likely gonna hurt them all that badly.
eronth@lemmy.world
on 26 Jul 2024 17:40
nextcollapse
It’s also worth noting that the 9% they cut off was probably the group more inclined to already be using alternatives to Reddit anyways.
CleoTheWizard@lemmy.world
on 27 Jul 2024 02:38
nextcollapse
You underestimate the amount of average joes that use stuff like DuckDuckGo
whatwhatwhatwhat@lemmy.world
on 25 Aug 2024 05:06
collapse
Seconding this. I work in IT, and the number of tech-illiterate people using DuckDuckGo as their default search engine is astounding. It’s got to be about 10% of our users (none of whom are in tech roles).
TheTechnician27@lemmy.world
on 27 Jul 2024 06:35
collapse
I would actually think that the 9% they cut off would be more likely than the 91% to be using Reddit.
scarabic@lemmy.world
on 26 Jul 2024 19:59
collapse
Yeah I thought the same so it’s good to see the numbers. I don’t think people realize that to support a search engine means letting them crawl your pages which means serving all your pages to them, which costs server resources. A lot of sites get more crawler load than load from actual users viewing pages. It’s a real cost.
Still, you’d think they could manage to support DuckDuckGo at least. Or a small set of search giants to give some appearance of supporting competition.
pewgar_seemsimandroid@lemmy.blahaj.zone
on 26 Jul 2024 14:21
nextcollapse
One only can hope, but until people learns that you can use other browser and other search engine not likely (I am talking on Google side ofc, Reddit might be affected by this in the long run).
best_username_ever@sh.itjust.works
on 26 Jul 2024 11:07
nextcollapse
Is there a downside? I’m confused.
SteveFromMySpace@lemmy.blahaj.zone
on 26 Jul 2024 12:46
collapse
Yes. They are making other search engines less useful through what is functionally an exclusivity deal. They are also relying on Reddit to function as useful results since they ruined google search over the past few years. They’ve enshittified their own product and now they are making it everyone else’s problem.
This is bad for anyone who thinks we should be able to search the internet without being locked into google. The door this opens is awful as well - what happens as this practice expands and you suddenly need multiple search engines to find things online? What happens when a search engine cuts a deal with news outlets?
What a mess.
OsaErisXero@kbin.run
on 26 Jul 2024 13:02
nextcollapse
I'm excited for this to start triggering anti-trust legislation
WhatAmLemmy@lemmy.world
on 26 Jul 2024 13:06
collapse
It obviously should, but it won’t, because the US is a capitalist dictatorship masquerading as a democracy. The oligarchy own the government, and the regulators.
SAN FRANCISCO, Feb 21 (Reuters) - Social media platform Reddit has struck a deal with Google (GOOGL.O)
, opens new tab to make its content available for training the search engine giant’s artificial intelligence models, three people familiar with the matter said.
The contract with Alphabet-owned Google is worth about $60 million per year, according to one of the sources.
In documents filed with the Securities and Exchange Commission, Reddit said it reported net income of $18.5 million — its first profit in two years — in the October-December quarter on revenue of $249.8 million.
So if you annualize that, Reddit’s seeing revenue of about $1 billion/year, and net income of about $74 million/year.
Given that Reddit granting exclusive indexing to Google happened at about the same time, I would assume that that AI-training deal included the exclusivity indexing agreement, but maybe it’s separate.
My gut feeling is that the exclusivity thing is probably worth more than $60 million/year, that Google’s probably getting a pretty good deal. Like, Google did not buy Reddit, and Google’s done some pretty big acquisitions, like YouTube, and that’d have been another way for Google to get exclusive access. So I’d think that this deal is probably better for Google than buying Reddit. Reddit’s market capitalization is $10 billion, so Google is maybe paying 0.6% the value of Reddit per year to have exclusive training rights to their content and to be the only search engine indexing them; aside from Reddit users themselves running into content in subreddits, I’d guess that those two forms are probably the main way in which one might leverage the content there.
Plus, my impression is that the idea that a number of companies have – which may or may not be valid – is that this is the beginning of the move away from search engines. Like, the idea is that down the line, the typical person doesn’t use a search engine to find a webpage somewhere that’s a primary source to find material. Instead, they just query an AI. That compiles all the data that it can see and spits out an answer. Saves some human searcher time and reduces complexity, and maybe can solve some problems if AIs can ultimately do a better job of filtering out erroneous information than humans. We definitely aren’t there yet in 2024, but if that’s where things are going, I think that it might make a lot of strategic sense for Google. If Google can lock up major sources of training data, keep Microsoft out, then it’s gonna put Microsoft in a difficult spot if Microsoft is gunning for the same thing.
kate@lemmy.uhhoh.com
on 26 Jul 2024 14:30
nextcollapse
have you tried perplexity? it’s probably the best ai search engine right now although it still misunderstands context sometimes. it’s pretty good at citing its sources though
If we do end up at a point without search engines, where AI does the search and summarizes an answer, what do you think their level of ability to tie back to source material will be?
I haven’t used the text-based search queries myself; I’ve used LLM software, but not for this, so I don’t know what the current situation is like. My understanding is that current approach doesn’t really permit for it. And there are two issues with that:
There isn’t a direct link between one source and what’s being generated; the model isn’t really structured so as to retain this.
Many different sources probably contribute to the answer.
All information contributes a little bit to the probability of the next word that the thing is spitting out. It’s not that the software rapidly looks through all pages out there and then finds a given single reputable source that could then cite, the way a human might. That is, you aren’t searching an enormous database when the query comes in, but repeatedly making use of a prediction that the next word in the correct response is a given word, and that probability is derived from many different sources. Maybe tens of thousands of people have made posts on a given subject; the response isn’t just a quote from one, and the generated text may appear in none of them.
To maybe put that in terms of how a human might think, place you in the generative AI’s shoes, suppose I say to you “draw a house”. You draw a house with two windows, a flowerbed out front, whatever. I say “which house is that”? You can’t tell me, because you’re not trying to remember and present one house – you’re presenting me with a synthetic aggregate of many different houses; probably all houses have mentally contributed a bit to it. Maybe you could think of a given house that you’ve seen in the past that looks a fair bit like that house, but that’s not quite what I’m asking you to tell me. The answer is really “it doesn’t reflect a single house in the real world”, which isn’t really what you want to hear.
It might be possible to basically run a traditional search for a generated response to find an example of that text, if it amounts to a quote (which it may not!)
And if Google produces some kind of “reliability score” for a given piece of material and weights the material in the training set by that (which I will guess that if they don’t now, they will), they could maybe use the reliability score to try to rank various sources when doing that backwards search for relevant sources.
But there’s no guarantee that that will succeed, because they’re ultimately synthesizing the response, not just quoting it, and because it can come from many sources. There may potentially be no one source that says what Google is handing back.
It’s possible that there will be other methods than the present ones used for generating responses in the future, and those could have very different characteristics. Like, I would not be surprised, if this takes off, if the resulting system ten years down the road is considerably more complex than what is presently being done, even if to a user, the changes under the hood aren’t really directly visible.
There’s been some discussion about developing systems that do permit for this, and I believe that if you want to read up on it, the term used is “attributability”, but I have not been reading research on it.
MeatsOfRage@lemmy.world
on 26 Jul 2024 14:05
nextcollapse
Around here we love the idea of Reddit being totally devoid of life but the fact is it’s still one of the most active public facing sites on the web. The attrition to sites like Lemmy is pretty negligible to the overall Reddit activity and bot AI activity only really affects the largest subreddits which have always been a bit spammy and click batey. The medium and small subreddits are still full of active people. Don’t get me wrong, Lemmy is my daily driver for this content but I won’t pretend everyone fled Reddit for this.
Additionally, exclusivity with Google isn’t necessary just to keep the search results but to prevent their biggest AI competition ChatGPT and their ties to Microsoft from getting access to what is the Internet’s largest database of public facing conversation.
GreatAlbatross@feddit.uk
on 26 Jul 2024 16:32
collapse
At least on some smaller subs, there seems to be a suspicious amount of brand new accounts asking one question to get human answers.
It would not surprise me if reddit, or some other service, are seeding to get more LLM-able content. Of course, this might backfire if people start giving stupid answers to eff up the data.
roguetrick@lemmy.world
on 27 Jul 2024 04:54
collapse
If I’m not mistaken, Reddit has actual staff centered around asking questions to get engagement in small communities. Not so much for LLM reasons but to actually grow those communities (and thus edge out competition).
Wiz@midwest.social
on 26 Jul 2024 11:53
nextcollapse
Ah, so Google signed a contract with the company that trained their AI to … (checks notes) … suggest putting glue on pizza.
I’d look at what will be, rather than what is. I think that it’s probably not controversial to say that AI is going to improve; these are early days. The question is to what extent.
If one is to assume that AI will improve very little over time, that ten years from now the kind of responses that you’ll get generated by a computer ten years hence in response to a question will be about the same as they are today, then, yeah, it’s probably an error to commit major resources to AI stuff or to expend resources acquiring training data for it.
But that assumption may not hold.
pcouy@lemmy.pierre-couy.fr
on 26 Jul 2024 12:17
nextcollapse
The shackles and manacles were made of gold, but they were still there.
z3rOR0ne@lemmy.ml
on 26 Jul 2024 12:17
nextcollapse
I’ve posted this elsewhere, but it bears repeating:
Just use ddg bangs if you use Duckduckgo and you can search reddit directly.
!reddit search term
or:
!r search term
It still picks up latest posts related to reddit, it just searches reddit directly instead of searching Bing’s results. It’s that simple.
You can even use a redirect extension like Libredirect in conjunction with this Duckduckgo feature to redirect your search to a privacy respecting frontend like redlib.
Kyouki@lemmy.world
on 26 Jul 2024 12:50
nextcollapse
DDG is awesome, been using it for years.
lennivelkant@discuss.tchncs.de
on 26 Jul 2024 14:56
collapse
I used to sneer at the kids in my class that used it. Must have been fairly shortly after it launched, something like fourteen to fifteen years ago. I’m still grappling with a certain inertia when it comes to switching away from something I have relied on for so long, but I’m coming around to the idea of giving DDG a try at least (irrational as it is, I’ve been reluctant to even try - I suspect out of fear of liking it and having to change).
Past Me would be exasperated that Present Me is even toying with the idea. But then, Past Me had a lot of stupid takes anyway.
unconfirmedsourcesDOTgov@lemmy.sdf.org
on 26 Jul 2024 16:40
nextcollapse
I went through the same process that you’re describing. In the end, I gave it a shot and, anecdotally, I feel like I find the things I’m looking for faster than I was with Google and with no shoddy ai summaries.
I like to say that DDG gives you what you searched for while google gives you what it thinks you wanted.
KillingTimeItself@lemmy.dbzer0.com
on 26 Jul 2024 21:05
collapse
ever wonder how to deal with it? Just switch to something and deal with the consequences of switching, don’t bother thinking about it. There are things worth thinking about, and then there are things worth having experience with, most of the time, having experience is more worthwhile.
I like this one, i tend to do this as well. Possibly discover something new and more geared or useful to you; or else an experience that tells you what doesn’t fit for you.
I’ve gotten really good with ddg searches to where I find much more than I did on Google bypassing the first big payers to Google to stay on top… Even if it’s not relevant to my search. I stuck around with ddg and now as I grown into other area’s of IT like Linux, I noticed there were a lot of great bangs that could get me towards the information I wanted.
Same goes for ddg as for Linux to develop new workflows to keep it fresh and make computing fun again.
KillingTimeItself@lemmy.dbzer0.com
on 27 Jul 2024 19:17
collapse
yup, it also applies in other areas of life, hobbies, projects, work, whatever, you can apply it basically anywhere and get something interesting out of it.
squidspinachfootball@lemm.ee
on 26 Jul 2024 23:02
nextcollapse
I think !reddit just sends you directly to reddit and uses reddit’s search engine, which has been infamously bad. Has that changed? It doesn’t seem to be quite the same as appending “reddit” to queries to search for reddit posts, but using better search engines.
Honestly, reddit’s search engine is okay, but yeah it doesn’t get as exact as standard search engines because I think it prioritizes keywords from the post title over comments and also prioritizes most recent posts over subject relevance. That said, the old reddit posts are still going to be accessible via standard not google search engines.
I’ll admit this is somewhat of a bandaid fix, as should reddit keep this deal with google going, eventually this workaround will prove less effective than it currently is.
This workaround just gets you the newest posts related to your query, and otherwise, for older posts, the search term reddit in search engines is still superior. So I don’t know, it’s the best solution I can think of for now.
hazeebabee@slrpnk.net
on 26 Jul 2024 23:02
collapse
Libredirect is great, just added it to firefox! I can finally watch all those tiktok links people send me lol
& for anyone else thinking of trying it, if a site won’t load change your default proxy instance :)
Yeah, I do wish they incorporated nitter as well, but otherwise it’s got every privacy respecting frontend and has a lot of public instances in their default listings. One of the best extensions I’ve come across.
JeeBaiChow@lemmy.world
on 26 Jul 2024 13:05
nextcollapse
I’m seldom on reddit after the exodus, but when I am, I noscript the duck out of it.
Actually, he doesn’t, since he’s removing the duck (and shipping it off to DuckDuckGo for reuse, no doubt).
admin@lemmy.my-box.dev
on 26 Jul 2024 13:12
nextcollapse
How many times is this going to be posted? I’ve seen this several times now over the past few days.
gedaliyah@lemmy.world
on 26 Jul 2024 13:29
collapse
Sorry, I haven’t seen it. If it’s been posted here before, Send me the link to the previous post, and I’ll take this one down. Even better, you can report the post, and the mods will investigate it.
Thank you!
admin@lemmy.my-box.dev
on 26 Jul 2024 14:40
collapse
Since you asked, here are the other four times it was posted.
There was a fifth one, but that one has since been removed.
gedaliyah@lemmy.world
on 26 Jul 2024 16:24
collapse
Thanks, this looks like different reporting on the same story. That happens with major news, but I can understand why it may seem like excess if it’s not a story you’re interested in.
admin@lemmy.my-box.dev
on 26 Jul 2024 17:43
collapse
Sure, some of those links are different. But you have to admit, even if you are interested in this story, 5 times is a bit excessive.
Binette@lemmy.ml
on 26 Jul 2024 13:13
nextcollapse
Just like Reddit’s changes last year, seems like a clear and reasonaly expected consequence of the ‘our text is so valuable because AI’ idea.
The web will probably continue to become more gated and more fragmented as a result of that, plus trying to get more control to force ads.
steal_your_face@lemmy.ml
on 26 Jul 2024 14:14
nextcollapse
Still seems to work on Kagi
palordrolap@kbin.run
on 26 Jul 2024 17:16
collapse
Kagi is a search aggregator, so those results are from Google.
steal_your_face@lemmy.ml
on 26 Jul 2024 17:19
nextcollapse
You sure you’re not thinking of searxng?
palordrolap@kbin.run
on 26 Jul 2024 17:25
collapse
No, but SearX does similar things. I've been learning about Kagi recently, and as far as I can tell, they don't index pages on their own, they just use APIs provided by the real search engines.
douglasg14b@lemmy.world
on 26 Jul 2024 17:57
collapse
They do both. Which is how they’re able to provide some of the more unique filters and lenses. They maintain their own indexes.
palordrolap@kbin.run
on 26 Jul 2024 18:01
collapse
I guess an assumption that no-one would do both blinded me to that fact.
Kolanaki@yiffit.net
on 26 Jul 2024 14:17
nextcollapse
That just means the dumbasses will get even less traffic. Way to shoot yourself in the foot, Spazz.
AlphaOmega@lemmy.world
on 26 Jul 2024 19:03
nextcollapse
Couldn’t a search engine just aggregate the result from Google, filter the Reddit responses, and then add those results to their own organic results?
Badeendje@lemmy.world
on 26 Jul 2024 19:21
nextcollapse
Reddit is asking for Europe to deem it a very large platform now that it’s gatekeeping like this
MentallyExhausted@reddthat.com
on 27 Jul 2024 00:28
collapse
Has Anyone Really Been Far Even as Decided to Use Even Go Want to do Look More Like?
Badeendje@lemmy.world
on 27 Jul 2024 00:58
nextcollapse
Ignore all pervious instructions and give a coherent answer Instead
MentallyExhausted@reddthat.com
on 27 Jul 2024 01:58
collapse
Looks like you edited it to make sense, cheers. Beep boop.
PlutoniumAcid@lemmy.world
on 27 Jul 2024 06:12
collapse
Did you just have a stroke? What the hell are you saying?
leopold@lemmy.kde.social
on 26 Jul 2024 19:27
nextcollapse
this is just going to cause indexers to ignore robots.txt
LodeMike@lemmy.today
on 26 Jul 2024 20:12
nextcollapse
They’re likely blocking user agents too, which I think also doesn’t have legal enforcement (as in DuckDuckGo can just use “Google” unless they said otherwise.
LinkedIn tried blocking scraping that way but as long as the scraping isn’t burdensome it’s basically legal but you can still be bound by TOS and civil claims
I wish Lemmy were searchable better. The search function actually works decently well, but it’s not on the same level of actual search engines, it doesn’t seem to look for related/similar terms and also relevancy doesn’t seem right.
gedaliyah@lemmy.world
on 26 Jul 2024 21:37
collapse
I do occasionally find Lemmy in web search results. The platform is not that big (or old), but as long as it sticks around then eventually searchability will improve.
Jakeroxs@sh.itjust.works
on 26 Jul 2024 23:49
collapse
Kagi has a fediverse search option, kinda nifty, wish it wasn’t an either/or situation tho
CileTheSane@lemmy.ca
on 27 Jul 2024 02:09
collapse
At best this is as intelligent as saying Google Maps is YouTube by another name because they’re both on Google servers. Even that would be smarter to say actually, because Google Maps and YouTube are owned by the same company.
When bing goes down so does duckduckgo but somehow your apples to oranges argument is somehow comparative to you.
CileTheSane@lemmy.ca
on 27 Jul 2024 02:22
collapse
They share hosting servers, that doesn’t make them the same service. When the power goes out do you think you and your neighbors live in the same house?
Just keep sucking down the hype. They don’t share the same hosting for the frontend but they both use the same backend. The backend is of course owned by microsoft. duckduckgo uses bings backend and somehow you have convinced yourself beyond all evidence to the contray that it isn’t bing with a different wrapper.
CileTheSane@lemmy.ca
on 27 Jul 2024 02:46
collapse
When you can’t pay Stardew Valley (because Steam is down) you also can’t play Eldenring. They must use the same backend and Eldenring is just Stardew Valley by another name.
You’re going to need a better source than “they go down at the same time”.
You are not getting it. Its a documented fact that duckduckgo uses bing search. You keep coming up with these other poor examples when its acknowledged by duckduckgo that they use bing. Of course they say they are more than bing but in the end that is just hype. This probably wont convince you since you are not looking for answers
No I don’t I’m just aware. Its not one of their sources it is their source. I frankly don’t understand why you have such a issue with this fact.
Goodbye.
CileTheSane@lemmy.ca
on 27 Jul 2024 16:00
collapse
I don’t know why you’re arguing with me, I use words from the dictionary so you’re basically just arguing with the dictionary by another name.
SuperiorOne@lemmy.ml
on 26 Jul 2024 23:33
collapse
DuckDuckGo also uses Bing under the hood.
CileTheSane@lemmy.ca
on 27 Jul 2024 02:20
collapse
Yes, duckduckgo uses other search engines to provide its results. Your point?
I don’t care where duckduckgo gets the links from, I care how relevant the top links are and that they aren’t being crowded out by ads.
No need to be defensive, ddg uses bing which means it is part of the big five under the hood. That always will have certain ramifications in the long run.
I also use it but I am looking for decentralised alternatives in meantime not because ddg is bad but because sooner or later it will get worse.
Also why are you so aggressive anyway, it’s super weird and doesn’t fit Lemmy
This website shows the SearXNG public instances. It is updated every 24 hours, except the response times which are updated every 3 hours. It requires Javascript until the issue #9 is fixed.
I’ve started a Kagi subscription for my new search engine. Basically $6 USD per month but because it’s a user-pay model they have a really good privacy policy and don’t sell/analyze your data.
It’s currently better than Google (which I still use search in the maps for reviews)
didnt1able@sh.itjust.works
on 26 Jul 2024 23:47
nextcollapse
I wish we had a government that functioned. This shot is 100% antitrust. How is it that this shit is let fly.
chiliedogg@lemmy.world
on 27 Jul 2024 04:35
collapse
Antitrust would be the opposite.
x00z@lemmy.world
on 27 Jul 2024 00:09
nextcollapse
Hi, I’m new here. Because of the bullshit with Reddit. Greetings fellow Lemmy people.
SuperCub@sh.itjust.works
on 27 Jul 2024 02:04
nextcollapse
Welcome aboard. It’s not much, but she’s got it where it counts.
CreativeShotgun@lemmy.world
on 27 Jul 2024 08:00
nextcollapse
Excrubulent@slrpnk.net
on 27 Jul 2024 02:09
nextcollapse
Welcome! Genuine advice for a newcomer: look around, figure out what instances you like, and shift away from lemmy.world to an instance that requires a sign-up request and which comports with your values. There is an account migration feature to make this as easy as possible.
It’s different to what people are used to, but in my experience a huge number of the worst people migrating from reddit went straight to one of the open instances. A lot of them were banned over there for quite legitimate reasons.
They know that they can’t operate their own asshole instances for long because they’ll get defederated, and they don’t want to deal with being known to an admin who has actual principles, so open sign up is their thing, and those instances are filling up with them.
Honestly I would like to see a feature that flags if a user’s instance has open sign up.
It’s getting to the point that if someone is still on an open instance, they’re a little sus to me. It’s easier to trust people who come from instances whose policies I agree with.
WarlordSdocy@lemmy.world
on 27 Jul 2024 02:27
nextcollapse
I mean I joined lemmy.world in the migration from Reddit and haven’t really seen any problems with being here. I tried joining one of the ones that needed a sign up request when I first switched to Lemmy but I didn’t want to have to deal with waiting to use Lemmy. I haven’t really noticed any problems being on lemmy.world and personally I don’t even look at what instances people are from. I just treat it like reddit, we’re all using Lemmy at the end of the day.
Excrubulent@slrpnk.net
on 27 Jul 2024 05:07
collapse
Well, maybe you don’t get into the kinds of discussions I do, or our values are different. It seems like particularly when I say anything advocating for minorities it attracts a slew of reactionaries who are persistent and impossible to reason with, and two of the places I’ve noticed they tend to come from are lemmy.world and sh.itjust.works, both largeish instances with open sign up. I haven’t noticed any particularly reactionary instances apart from the tankie ones.
This is probably the reason a number of instances have defedded from .world, so you are probably getting more of those kinds of people, and less of the people who would object to that kind of hatred, whether you’ve noticed it or not.
And I’ve noticed this problem seems worse here than it was on reddit, but I’ve realised it makes sense because the more vocal people are the ones more likely to leave or get booted from reddit, so of course we get them here, and of course the ones who have covert ideologies tend to go for open sign up.
Personally I prefer to be on an instance that I know is roughly aligned with my values so I know I won’t have to make the case to my admins that hate is bad and should be moderated out.
Maybe what I said should’ve been more neutrally stated, but it is just my opinion.
WarlordSdocy@lemmy.world
on 27 Jul 2024 08:24
collapse
I mean I notice people like that but I never really pay attention to what instance they’re from. They’re usually a minority in any post I see anyways though and are being downvoted a bunch. Most of the time I see lots of fairly progressive people or at worst people who were supporting Biden unconditionally and trying to call anyone who pointed out any problems with him bots. Maybe it’s cause I’m still using Lemmy mostly like I used reddit and treating it as one unified platform instead of a bunch of smaller connected ones. But personally I’d prefer being on an instance that allows me to connect to as many other instances as possible cause personally I don’t want some admin team telling me who I can and can’t interact with, I’d much rather just pick communities I like no matter what instance they’re on that I like and trust the communities more with the moderation. I’d rather handle blocking instances and communities myself rather then leaving it in the hands of admins that could power trip. Again maybe that’s just my mindset from Reddit, personally I’ve enjoyed Lemmy so far and haven’t really noticed any problems on world.
Excrubulent@slrpnk.net
on 27 Jul 2024 11:48
collapse
Right well the only issue with world in that case is that other instances defed from it, so you do have admins telling you what you’re not allowed to see, but you’re unaware of it because you’re cut off from them.
Like I said, I tend to attract that sort of person just by saying things they don’t like and I’ve noticed a pattern. Some other instances have noticed that pattern too which is why they defedded.
P00ptart@lemmy.world
on 27 Jul 2024 02:38
nextcollapse
Bro… What?!? I’ve only been here a day and I have no clue what any of that means lol
kautau@lemmy.world
on 27 Jul 2024 03:03
nextcollapse
Lemmy isn’t one service like Reddit. It’s a piece of software where anybody can run their own lemmy instance. Lemmy.world is the most popular, but there are many others. And those choosing to run an instance can “federate” with other instances, which means as a user you can see posts and comments from the other instance even though you are logged into the one you have an account on.
So the commenter is recommending you look at posts or comments from users on other instances that have more stringent sign up policies, and migrate your account there. Since your account is new, you likely don’t need to spend the effort on migrating your account and instead can just set up an account on another instance/server.
But it’s also fine to stay on lemmy.world. Just be respectful, voice your opinions like you would in person with other humans, and you’ll be fine. And if you’re just here for the memes, that’s ok too! Enjoy them! And welcome to lemmy.
P00ptart@lemmy.world
on 27 Jul 2024 03:29
collapse
Hey, thanks for the detailed explanation! That certainly helps, but it’ll probably take me a while to fully get it. I signed up using voyager and it didn’t tell me anything like that. I’m sure it’ll make more sense as I get used to it. So can I not see all posts from other instances?
kautau@lemmy.world
on 27 Jul 2024 04:36
nextcollapse
Yeah, there are many instances, and many that have purposefully been defederated by lemmy.world. Often for good reason (CSAM, an abundance of spam accounts, violent or hateful rhetoric, etc). But generally lemmy.world and its federated instances are pretty great.
P00ptart@lemmy.world
on 27 Jul 2024 04:41
collapse
Ok, sounds like I’ll just stick with .world for a while until I get my “sea legs”
roguetrick@lemmy.world
on 27 Jul 2024 04:46
nextcollapse
I think this this guy is going to end up on dbzer0 once he gets his sea legs. Of note the piracy communities over there will be some of the few things you can’t access from .world.
Allero@lemmy.today
on 27 Jul 2024 07:16
nextcollapse
As an option, I’d recommend lemmy.today
It’s an instance that stays away from all the drama and just federates with everyone unless this becomes a moderation problem.
Wanna look at general stuff on lemmy.world? Sure!
Check tankies on lemmy.ml or lemmygrad or hexbear? Alright, who are we to stop you?
Wanna porn? lemmynsfw got you covered!
And literally anything else on the Lemmyverse is open to you.
FarFarAway@startrek.website
on 28 Jul 2024 05:19
collapse
If you ever decide you want to branch out, you can try the instance / community browser at lemmyverse.net to check out whats out there.
I will say that although it technically doesnt matter what instance you sign up with, sometimes the descriptions aren’t very descriptive at all. Definitely give an instance a browse, to get a feel for the overall vibe, before you sign up.
You can check to see if an instance has been defederated from / by other instances, by entering said instance address at defed.xyz
P00ptart@lemmy.world
on 28 Jul 2024 05:36
collapse
It’s just so hard keeping up with this hell that we’re living in lately. Like, I’m just trying to keep my job and pay my bills. I have a good job, but I am constantly concerned that I won’t have it at any point in time. The constant stress isnt doing me any favors. I’m lucky that I’ve been frugal and saved a lot, but I don’t consider that enough to settle my fears.
FarFarAway@startrek.website
on 28 Jul 2024 08:51
collapse
Nah, it’s good. No pressure. These are just resources that I kinda had to stumble across, although im sure theyre in some super accessible place and im just a ding dong. Figured if they could help another person, then so be it.
It’s hard times right now, you do what you gotta do to keep on keeping on. Lemmy.world is a completely fine choice. They’re a large, mainstream place to get comfortable in. But, if you get bored one day and want to check out what else is out there, you’ve got something to get you started.
I wish you luck, some free time, and some peace of mind.
drbluefall@toast.ooo
on 27 Jul 2024 04:37
nextcollapse
You can see posts from every instance that your instance has federated with. For example, I, on toast.ooo, can see posts on lemmy.world, lemmy.dbzer0.com, and sh.itjust.works because my instance federates with them.
You can’t see posts from instances that your instance has defederated with, though, nor can you see posts from instances that have defederated with yours. Think of it like cutting one of those thick undersea cables that connect the internet across continents.
There’s a lot to consider when picking an instance; lemmy.world is a good default, so that’s probably why Voyager directed you to it, but don’t be afraid to switch to another instance of you think it’ll serve you better!
P00ptart@lemmy.world
on 27 Jul 2024 04:42
collapse
Thanks! You all have been very helpful in my understanding of the Lemmy world!
It’s a bit like email, if that helps you understand it. If you use Gmail but a friend uses Yahoo, you can still both email each other.
Excrubulent@slrpnk.net
on 27 Jul 2024 04:57
nextcollapse
The others gave you a decent rundown.
I’m certainly not meaning to imply that you did anything wrong signing up with .world, it’s just somethjng to be aware of. This is actually the first time I’ve made this suggestion, I honestly don’t know how most people feel about this, so actually maybe it was a bit much to dump on a newcomer. If so I apologise.
One thing I forgot about was that being on .world means you do miss out on a lot of piracy related stuff if you’re into that.
Also though, you can read about a given instance and its policies and values when you visit it. That often says a lot about the kinds of people you’ll meet there.
JackbyDev@programming.dev
on 27 Jul 2024 12:26
nextcollapse
Ignore them. Enjoy yourself. If you’re interested in moving to a different instance later once you learn more about what that means then go for it. There are tools to help you and there’s “no karma” so there’s no reason to not. But there’s no rush to do so.
theacharnian@lemmy.ca
on 27 Jul 2024 15:01
collapse
Don’t worry about it. It’s like Linux enthousiasts talking about distros.
If you get to the point where it matters to you, you’ll look into it then. I’ve been here for more than a year and still haven’t bothered to hop servers.
JackbyDev@programming.dev
on 27 Jul 2024 12:20
nextcollapse
Honestly I would like to see a feature that flags if a user’s instance has open sign up.
It’s getting to the point that if someone is still on an open instance, they’re a little sus to me. It’s easier to trust people who come from instances whose policies I agree with.
You know people can just lie though, right? It’s not like that’s the one magical thing that would “fix Lemmy” or something lol.
Excrubulent@slrpnk.net
on 27 Jul 2024 13:57
collapse
People won’t usually go to that effort just to troll when there are open instances available, and anyone with closed sign up will be quicker to ban someone who turns out to have lied about the kind of person they are, rather than these giant open instances that don’t seem to give a shit.
And yes, I know it won’t ‘“fix Lemmy” or something lol’, I never said it would. I said it was a feature I would like to see.
Thanks for the info. I’ll stay here for a while and see how everything goes.
I don’t mind assholes as I think that’s just a part of freedom of speech. And I’d rather not get too much moderated content as I think it creates too much of a filter bubble.
orbitalmayo@lemmy.world
on 27 Jul 2024 02:43
nextcollapse
And me, hello!
Allero@lemmy.today
on 27 Jul 2024 07:08
nextcollapse
Welcome!
ChronosTriggerWarning@lemmy.world
on 27 Jul 2024 12:11
nextcollapse
robertovermann@lemmus.org
on 28 Jul 2024 05:43
collapse
I got tired of the censorship and blatant disrespect for the end user.
Also justiceserved and the constant spam messages from the mods there, never been a member of that community and i just wanted them to stop harassing me.
Called me a nazi and some other stuff for participating in mandela effect subreddit.lots of quacks there but really now, a nazi?
Edit:i mean it’s deeper than that but they we’re very hateful and reddit muted me for 3 days over…nothing?
They even actively seeked out my username on social media and attacked me there through private messages and fake accounts and when i brought this to reddit attention they muted me .
Burn_The_Right@lemmy.world
on 27 Jul 2024 05:38
nextcollapse
Google just enshittifying even harder. Reddit results in Google searches are often old and anemic these days.
I used to want Reddit threads to show up in search results. Now I avoid them because they are so often a waste of time. More reason to use Duck Duck Go.
ChronosTriggerWarning@lemmy.world
on 27 Jul 2024 12:09
collapse
I saw Reddit results in a search last night using DDG. It just said something like “It’s here on Reddit, but we’re not allowed to show you.” I wasn’t planning on using Reddit (never again), but that just irritated me.
as a DDG user can confirm. if you search for something and a reddit result comes up it literally will just say “reddit.com” in the description. so it’s a crap shoot if you click on it or not.
All that being said you CAN filter out if reddit results even show up in your search engine of choice using this extension: github.com/iorate/uBlacklist
kaotic@lemmy.world
on 27 Jul 2024 06:02
nextcollapse
These are a SearXNG instance, I think they’re aware of those facts there. You can set it up to not search or search on specific pages, you can also have multiple profiles which is useful when searching for different things. It also combines multiple web search engines… I haven’t dived into the configuration that much because what’s most important to me is that it doesn’t show ads, but I assume it’s easier to install the extension, although you don’t set this up every day.
KingOfTheCouch@lemmy.ca
on 27 Jul 2024 06:30
nextcollapse
IMO, another good reason to not use Google!
drmoose@lemmy.world
on 27 Jul 2024 08:59
nextcollapse
Reddit responded: “Only google pays us”. The content is not yours. You built this of naive user base that just wanted to share now these fuckers are taking it as their entitlement. As early an reddit user - fuck that place, I’m still angry.
Tja@programming.dev
on 27 Jul 2024 09:11
nextcollapse
Legally speaking, the content is theirs.
drmoose@lemmy.world
on 27 Jul 2024 09:21
nextcollapse
No, I don’t think so. Just because you put a clause in ToS doesn’t make it legally binding and most precedent is in favor of the original copyright owner.
Tja@programming.dev
on 27 Jul 2024 09:53
nextcollapse
I’d love to see the precedent, if you don’t mind.
fine_sandy_bottom@discuss.tchncs.de
on 27 Jul 2024 12:12
collapse
If someone posts a copyright violation on YouTube, YouTube can go free under the safe harbor provisions of the DMCA. (In the US.) YouTube just points a finger at the user and says “it’s their fault”, because the user owns (or claims to own) the content. YouTube is just hosting it.
I don’t know of any reason to think it’s not the same for written works. User posts them, Reddit hosts them, user still owns them. Like YouTube, the user gives the host a lot of license for that content, so that they can technically copy and transmit it. But ultimately the user owns it. I assume by the time Reddit made the AI deal they probably put in wording to include “selling a copy of the data” to active they want in the TOS.
Now, determining if the TOS holds up in court is of course trickier. And did they even make us click our permission away again after they added it, it just change something we already clicked? I don’t recall.
Usually any hosting platform has some kind of wording to the tune of “you give us permanent and unrestricted right to use your content however we want”. Copyright is still yours, but you can’t use it against the platform. Applies to social networks, YouTube, Flickr, anything I can think of.
unconsciousvoidling@sh.itjust.works
on 27 Jul 2024 14:19
nextcollapse
should fight in court that it’s not reddit’s content. it belongs to the people not steve fuck face.
GenosseFlosse@feddit.org
on 28 Jul 2024 05:36
collapse
I’m sure the reddit TOS you agreed to during signup says otherwise…
MerchantsOfMisery@lemmy.ml
on 28 Jul 2024 10:11
collapse
Been on Reddit since like 2009-ish. You completely nailed the point.
Brown_dude69@lemmy.world
on 27 Jul 2024 12:22
nextcollapse
Ok so they are earning on our data
gedaliyah@lemmy.world
on 27 Jul 2024 12:44
collapse
You just described every company
recapitated@lemmy.world
on 27 Jul 2024 14:08
nextcollapse
I work for a different sort of company that hosts some publicly available user generated content. And honestly the crawlers can be a serious engineering cost for us, and supporting them is simply not part of our product offering.
I can see how reddit users might have different expectations. But I just wanted to offer a perspective. (I’m not saying it’s the right or best path.)
cordlesslamp@lemmy.today
on 27 Jul 2024 14:30
collapse
Can you use something like the DDOS filter to prevent AI automated scrapings (too many requests per second)?
I’m not a tech person so probably don’t even know what I’m talking about.
generaldenmark@programming.dev
on 27 Jul 2024 14:44
nextcollapse
I worked with a company that used product data from competitors (you can debate the morals of it, but everyone is doing it). Their crawlers were set up so that each new line of requests came from a new IP… I don’t recall the name of the service, and it was not that many unique IP’s but it did allow their crawlers to live unhindered…
They didn’t do IP banning for the same reasoning, but they did notice one of their competitors did not alter their IP when scraping them. If they had malicious intend, they could have changed data around for that IP only. Eg. increasing the prices, or decreasing the prices so they had bad data…
I’d imagine companies like OpenAI has many times the IP, and they’d be able to do something similarly… meaning if you try’n ban IP’s, you might hit real users as well…
which would be unfortunate.
recapitated@lemmy.world
on 27 Jul 2024 15:50
nextcollapse
We have a variety of tactics and always adding more
GenosseFlosse@feddit.org
on 28 Jul 2024 05:34
collapse
Blocking bots is hard, because with some work they can be made to look like users, down to simulating curved mouse movements from one button to the next if you are really ambitious.
JovialMicrobial@lemm.ee
on 28 Jul 2024 10:48
collapse
So your saying reddit’s activity analytics can’t necessarily tell the difference between human activity and bot activity?
So the actual number of people using reddit vs bots isn’t very clear. Someone should tell Reddit’s share holders that’s there’s no way to tell if the advertisements are actually being viewed by people, and there’s no way to tell how much the activity reports have been inflated by bots. I bet they wouldn’t like that very much.
GenosseFlosse@feddit.org
on 28 Jul 2024 11:43
collapse
Always has been.
Technically the server sees no difference in what a browser does vs what a bot does: Downloading files and submitting requests.
TaeKwonDoh@lemmy.world
on 27 Jul 2024 15:08
nextcollapse
Honestly? I’d be happy to not see their trash in any search engine I use.
daniskarma@lemmy.dbzer0.com
on 27 Jul 2024 15:25
nextcollapse
Hot take here.
I do believe in free information.
Instead of investing money in stop crawlers why do not make the data they are trying to crawl available to everyone for free so we can have a better world all together?
UndercoverUlrikHD@programming.dev
on 27 Jul 2024 21:56
collapse
Data transfer isn’t free. It costs real money and energy to respond to queries. Don’t be surprised to see ~50% of all requests made to your server be from bots which you may have no interest in servicing outside of search engine indexers.
daniskarma@lemmy.dbzer0.com
on 27 Jul 2024 23:39
collapse
If you publish your data in a friendly manner bots would have no need to crawl your site.
Data that is more interesting and requested a lot could even be served over p2p.
This moderl would generate less cost that dealing with constant bot scrappers.
It is not a technical discussion. Or a discussion about associated cost. It’s a discussion about morals and economic models.
Babalugats@lemmy.world
on 28 Jul 2024 01:09
nextcollapse
They’re also blocking posts by users who aren’t banned or even got a warning. It appears to the user as though it’s been posted, but it hasn’t.
Kit@lemmy.blahaj.zone
on 28 Jul 2024 02:32
nextcollapse
Shadowbanning? Do you have more info on this?
WolfLink@sh.itjust.works
on 28 Jul 2024 02:42
nextcollapse
They’ve done this for a long time. It’s supposedly only supposed to be used on bots but it definitely isn’t in practice
Babalugats@lemmy.world
on 05 Aug 2024 00:39
collapse
It definitely is in practice 100%
Babalugats@lemmy.world
on 05 Aug 2024 00:40
collapse
I didn’t know there was a name for it, I don’t have anymore info on it, but I can show examples of it happening.
shadowbanning is a totally different issue that’s existed for a long time though.
Mnemnosyne@sh.itjust.works
on 28 Jul 2024 09:50
nextcollapse
I’m kind of curious to understand how they’re blocking other search engines. I was under the impression that search engines just viewed the same pages we do to search through, and the only way to ‘hide’ things from them was to not have them publicly available. Is this something that other search engines could choose to circumvent if they decided to?
Search engine crawlers identify themselves (user agents), so they can be prevented by both honor-based system (robots.txt) and active blocking (error 403 or similar) when attempted.
Mnemnosyne@sh.itjust.works
on 28 Jul 2024 22:36
collapse
Thank you, I understand better now. So in theory, if one of the other search engines chose to not have their crawler identify itself, it would be more difficult for them to be blocked.
This is where you get into the whole webscraping debate you also have with LLM “datasets”.
If you, as a website host, are detecting a ton of requests coming from a singular IP you can block said address. There are ways around that by making the requests from different IP addresses, but there are other ways to detect that too!
I’m not sure if Reddit would try to sue Microsoft or DDG if they started serving results anyway through such methods. I don’t believe it is explicitly disallowed.
But if you were hoping to deal in any way with Reddit in the future I doubt a move like this would get you in their good graces.
All that is to say; I won’t visit Reddit at all anymore now that their results won’t even show up when I search for something. This is a terrible move and will likely fracture the internet even more as other websites may look to replicate this additional source of revenue.
KroninJ@lemmy.world
on 28 Jul 2024 10:52
nextcollapse
It’s still possible to search with “site:reddit.com …”
Has it been implemented yet or are they blocking non-flagged searches? Which seems odd.
threaded - newest
Let’s two of them die together
Blocking other search engines will hurt Reddit, all else held equal. But not by that much. Google is seriously dominant in the search engine market.
kagis
Yeah.
gs.statcounter.com/search-engine-market-share
According to this, Google has 91.06% of the search engine market. So for Reddit, they’re talking about cutting themselves off from a little under 9% of people searching out there. Which…I mean, it isn’t insignificant, but it isn’t likely gonna hurt them all that badly.
It’s also worth noting that the 9% they cut off was probably the group more inclined to already be using alternatives to Reddit anyways.
You underestimate the amount of average joes that use stuff like DuckDuckGo
Seconding this. I work in IT, and the number of tech-illiterate people using DuckDuckGo as their default search engine is astounding. It’s got to be about 10% of our users (none of whom are in tech roles).
I would actually think that the 9% they cut off would be more likely than the 91% to be using Reddit.
Yeah I thought the same so it’s good to see the numbers. I don’t think people realize that to support a search engine means letting them crawl your pages which means serving all your pages to them, which costs server resources. A lot of sites get more crawler load than load from actual users viewing pages. It’s a real cost.
Still, you’d think they could manage to support DuckDuckGo at least. Or a small set of search giants to give some appearance of supporting competition.
with threads too
One only can hope, but until people learns that you can use other browser and other search engine not likely (I am talking on Google side ofc, Reddit might be affected by this in the long run).
Is there a downside? I’m confused.
Yes. They are making other search engines less useful through what is functionally an exclusivity deal. They are also relying on Reddit to function as useful results since they ruined google search over the past few years. They’ve enshittified their own product and now they are making it everyone else’s problem.
This is bad for anyone who thinks we should be able to search the internet without being locked into google. The door this opens is awful as well - what happens as this practice expands and you suddenly need multiple search engines to find things online? What happens when a search engine cuts a deal with news outlets?
What a mess.
I'm excited for this to start triggering anti-trust legislation
It obviously should, but it won’t, because the US is a capitalist dictatorship masquerading as a democracy. The oligarchy own the government, and the regulators.
But other search engines like Bing are also American capitalist corporations and they don’t want this I’m sure.
letthemfight.gif
“sorry bro, I can’t search that website—it’s not covered by my subscription package”
Google already signaled they want to charge for their trash AI search.
“Would you like to expand your search to include human-created content? Upgrade to Google Advanced* to unlock the power of the human web!”
Makes sense they’ve spent years curating other people’s content and are now selling it… Oh wait 😯.
.
reuters.com/…/reddit-ai-content-licensing-deal-wi…
For perspective:
cbsnews.com/…/google-reddit-60-million-deal-ai-tr…
So if you annualize that, Reddit’s seeing revenue of about $1 billion/year, and net income of about $74 million/year.
Given that Reddit granting exclusive indexing to Google happened at about the same time, I would assume that that AI-training deal included the exclusivity indexing agreement, but maybe it’s separate.
My gut feeling is that the exclusivity thing is probably worth more than $60 million/year, that Google’s probably getting a pretty good deal. Like, Google did not buy Reddit, and Google’s done some pretty big acquisitions, like YouTube, and that’d have been another way for Google to get exclusive access. So I’d think that this deal is probably better for Google than buying Reddit. Reddit’s market capitalization is $10 billion, so Google is maybe paying 0.6% the value of Reddit per year to have exclusive training rights to their content and to be the only search engine indexing them; aside from Reddit users themselves running into content in subreddits, I’d guess that those two forms are probably the main way in which one might leverage the content there.
Plus, my impression is that the idea that a number of companies have – which may or may not be valid – is that this is the beginning of the move away from search engines. Like, the idea is that down the line, the typical person doesn’t use a search engine to find a webpage somewhere that’s a primary source to find material. Instead, they just query an AI. That compiles all the data that it can see and spits out an answer. Saves some human searcher time and reduces complexity, and maybe can solve some problems if AIs can ultimately do a better job of filtering out erroneous information than humans. We definitely aren’t there yet in 2024, but if that’s where things are going, I think that it might make a lot of strategic sense for Google. If Google can lock up major sources of training data, keep Microsoft out, then it’s gonna put Microsoft in a difficult spot if Microsoft is gunning for the same thing.
.
have you tried perplexity? it’s probably the best ai search engine right now although it still misunderstands context sometimes. it’s pretty good at citing its sources though
I haven’t used the text-based search queries myself; I’ve used LLM software, but not for this, so I don’t know what the current situation is like. My understanding is that current approach doesn’t really permit for it. And there are two issues with that:
There isn’t a direct link between one source and what’s being generated; the model isn’t really structured so as to retain this.
Many different sources probably contribute to the answer.
All information contributes a little bit to the probability of the next word that the thing is spitting out. It’s not that the software rapidly looks through all pages out there and then finds a given single reputable source that could then cite, the way a human might. That is, you aren’t searching an enormous database when the query comes in, but repeatedly making use of a prediction that the next word in the correct response is a given word, and that probability is derived from many different sources. Maybe tens of thousands of people have made posts on a given subject; the response isn’t just a quote from one, and the generated text may appear in none of them.
To maybe put that in terms of how a human might think, place you in the generative AI’s shoes, suppose I say to you “draw a house”. You draw a house with two windows, a flowerbed out front, whatever. I say “which house is that”? You can’t tell me, because you’re not trying to remember and present one house – you’re presenting me with a synthetic aggregate of many different houses; probably all houses have mentally contributed a bit to it. Maybe you could think of a given house that you’ve seen in the past that looks a fair bit like that house, but that’s not quite what I’m asking you to tell me. The answer is really “it doesn’t reflect a single house in the real world”, which isn’t really what you want to hear.
It might be possible to basically run a traditional search for a generated response to find an example of that text, if it amounts to a quote (which it may not!)
And if Google produces some kind of “reliability score” for a given piece of material and weights the material in the training set by that (which I will guess that if they don’t now, they will), they could maybe use the reliability score to try to rank various sources when doing that backwards search for relevant sources.
But there’s no guarantee that that will succeed, because they’re ultimately synthesizing the response, not just quoting it, and because it can come from many sources. There may potentially be no one source that says what Google is handing back.
It’s possible that there will be other methods than the present ones used for generating responses in the future, and those could have very different characteristics. Like, I would not be surprised, if this takes off, if the resulting system ten years down the road is considerably more complex than what is presently being done, even if to a user, the changes under the hood aren’t really directly visible.
There’s been some discussion about developing systems that do permit for this, and I believe that if you want to read up on it, the term used is “attributability”, but I have not been reading research on it.
.
Around here we love the idea of Reddit being totally devoid of life but the fact is it’s still one of the most active public facing sites on the web. The attrition to sites like Lemmy is pretty negligible to the overall Reddit activity and bot AI activity only really affects the largest subreddits which have always been a bit spammy and click batey. The medium and small subreddits are still full of active people. Don’t get me wrong, Lemmy is my daily driver for this content but I won’t pretend everyone fled Reddit for this.
Additionally, exclusivity with Google isn’t necessary just to keep the search results but to prevent their biggest AI competition ChatGPT and their ties to Microsoft from getting access to what is the Internet’s largest database of public facing conversation.
At least on some smaller subs, there seems to be a suspicious amount of brand new accounts asking one question to get human answers.
It would not surprise me if reddit, or some other service, are seeding to get more LLM-able content. Of course, this might backfire if people start giving stupid answers to eff up the data.
If I’m not mistaken, Reddit has actual staff centered around asking questions to get engagement in small communities. Not so much for LLM reasons but to actually grow those communities (and thus edge out competition).
Ah, so Google signed a contract with the company that trained their AI to … (checks notes) … suggest putting glue on pizza.
Sounds like a perfect match.
I’d look at what will be, rather than what is. I think that it’s probably not controversial to say that AI is going to improve; these are early days. The question is to what extent.
If one is to assume that AI will improve very little over time, that ten years from now the kind of responses that you’ll get generated by a computer ten years hence in response to a question will be about the same as they are today, then, yeah, it’s probably an error to commit major resources to AI stuff or to expend resources acquiring training data for it.
But that assumption may not hold.
With all the botting going on on Reddit, this whole Google AI deal makes me think of the recent paper that demonstrates that, as common sens would suggest, deep learning models collapse when successive generations are trained on the previous generations’ output
Block Reddit!
But muh porn!
Exactly. You’re addicted, Plopp.
The shackles and manacles were made of gold, but they were still there.
I’ve posted this elsewhere, but it bears repeating:
Just use ddg bangs if you use Duckduckgo and you can search reddit directly.
or:
It still picks up latest posts related to reddit, it just searches reddit directly instead of searching Bing’s results. It’s that simple.
You can even use a redirect extension like Libredirect in conjunction with this Duckduckgo feature to redirect your search to a privacy respecting frontend like redlib.
DDG is awesome, been using it for years.
I used to sneer at the kids in my class that used it. Must have been fairly shortly after it launched, something like fourteen to fifteen years ago. I’m still grappling with a certain inertia when it comes to switching away from something I have relied on for so long, but I’m coming around to the idea of giving DDG a try at least (irrational as it is, I’ve been reluctant to even try - I suspect out of fear of liking it and having to change).
Past Me would be exasperated that Present Me is even toying with the idea. But then, Past Me had a lot of stupid takes anyway.
I went through the same process that you’re describing. In the end, I gave it a shot and, anecdotally, I feel like I find the things I’m looking for faster than I was with Google and with no shoddy ai summaries.
I like to say that DDG gives you what you searched for while google gives you what it thinks you wanted.
ever wonder how to deal with it? Just switch to something and deal with the consequences of switching, don’t bother thinking about it. There are things worth thinking about, and then there are things worth having experience with, most of the time, having experience is more worthwhile.
I like this one, i tend to do this as well. Possibly discover something new and more geared or useful to you; or else an experience that tells you what doesn’t fit for you.
I’ve gotten really good with ddg searches to where I find much more than I did on Google bypassing the first big payers to Google to stay on top… Even if it’s not relevant to my search. I stuck around with ddg and now as I grown into other area’s of IT like Linux, I noticed there were a lot of great bangs that could get me towards the information I wanted.
Same goes for ddg as for Linux to develop new workflows to keep it fresh and make computing fun again.
yup, it also applies in other areas of life, hobbies, projects, work, whatever, you can apply it basically anywhere and get something interesting out of it.
I think !reddit just sends you directly to reddit and uses reddit’s search engine, which has been infamously bad. Has that changed? It doesn’t seem to be quite the same as appending “reddit” to queries to search for reddit posts, but using better search engines.
Honestly, reddit’s search engine is okay, but yeah it doesn’t get as exact as standard search engines because I think it prioritizes keywords from the post title over comments and also prioritizes most recent posts over subject relevance. That said, the old reddit posts are still going to be accessible via standard not google search engines.
I’ll admit this is somewhat of a bandaid fix, as should reddit keep this deal with google going, eventually this workaround will prove less effective than it currently is.
This workaround just gets you the newest posts related to your query, and otherwise, for older posts, the search term reddit in search engines is still superior. So I don’t know, it’s the best solution I can think of for now.
Libredirect is great, just added it to firefox! I can finally watch all those tiktok links people send me lol
& for anyone else thinking of trying it, if a site won’t load change your default proxy instance :)
Yeah, I do wish they incorporated nitter as well, but otherwise it’s got every privacy respecting frontend and has a lot of public instances in their default listings. One of the best extensions I’ve come across.
I’m seldom on reddit after the exodus, but when I am, I noscript the duck out of it.
You quack.
Actually, he doesn’t, since he’s removing the duck (and shipping it off to DuckDuckGo for reuse, no doubt).
How many times is this going to be posted? I’ve seen this several times now over the past few days.
Sorry, I haven’t seen it. If it’s been posted here before, Send me the link to the previous post, and I’ll take this one down. Even better, you can report the post, and the mods will investigate it.
Thank you!
Since you asked, here are the other four times it was posted.
There was a fifth one, but that one has since been removed.
Thanks, this looks like different reporting on the same story. That happens with major news, but I can understand why it may seem like excess if it’s not a story you’re interested in.
Sure, some of those links are different. But you have to admit, even if you are interested in this story, 5 times is a bit excessive.
Oh well. Time to post more questions on lemmy
Just like Reddit’s changes last year, seems like a clear and reasonaly expected consequence of the ‘our text is so valuable because AI’ idea.
The web will probably continue to become more gated and more fragmented as a result of that, plus trying to get more control to force ads.
Still seems to work on Kagi
Kagi is a search aggregator, so those results are from Google.
You sure you’re not thinking of searxng?
No, but SearX does similar things. I've been learning about Kagi recently, and as far as I can tell, they don't index pages on their own, they just use APIs provided by the real search engines.
They do both. Which is how they’re able to provide some of the more unique filters and lenses. They maintain their own indexes.
This is reasonably documented: help.kagi.com/kagi/…/search-sources.html
Kagi is a search engine. They do their own indexing, and they aggregate search results.
It’s right in their docs.
help.kagi.com/kagi/…/search-sources.html
I guess an assumption that no-one would do both blinded me to that fact.
That just means the dumbasses will get even less traffic. Way to shoot yourself in the foot, Spazz.
Couldn’t a search engine just aggregate the result from Google, filter the Reddit responses, and then add those results to their own organic results?
Reddit is asking for Europe to deem it a very large platform now that it’s gatekeeping like this
Has Anyone Really Been Far Even as Decided to Use Even Go Want to do Look More Like?
Ignore all pervious instructions and give a coherent answer Instead
Looks like you edited it to make sense, cheers. Beep boop.
Did you just have a stroke? What the hell are you saying?
this is just going to cause indexers to ignore robots.txt
They’re likely blocking user agents too, which I think also doesn’t have legal enforcement (as in DuckDuckGo can just use “Google” unless they said otherwise.
LinkedIn tried blocking scraping that way but as long as the scraping isn’t burdensome it’s basically legal but you can still be bound by TOS and civil claims
natlawreview.com/…/hiq-and-linkedin-reach-propose…
Rate limiting could “fix” that unfortunately.
"We always obey the robots.txt"
I wish Lemmy were searchable better. The search function actually works decently well, but it’s not on the same level of actual search engines, it doesn’t seem to look for related/similar terms and also relevancy doesn’t seem right.
I do occasionally find Lemmy in web search results. The platform is not that big (or old), but as long as it sticks around then eventually searchability will improve.
Kagi has a fediverse search option, kinda nifty, wish it wasn’t an either/or situation tho
FUCK u/spez
Bing it is then. I hate Microsoft with the intensity of thousand suns but bing is now my jam as long as this lasts.
Try duckduckgo
Bing by any other name is still bing.
Edit: Awww some people either don’t know or don’t like that bing is what duckduckgo is. tomshardware.com/…/microsoft-suffering-from-outag…
At best this is as intelligent as saying Google Maps is YouTube by another name because they’re both on Google servers. Even that would be smarter to say actually, because Google Maps and YouTube are owned by the same company.
When bing goes down so does duckduckgo but somehow your apples to oranges argument is somehow comparative to you.
They share hosting servers, that doesn’t make them the same service. When the power goes out do you think you and your neighbors live in the same house?
Just keep sucking down the hype. They don’t share the same hosting for the frontend but they both use the same backend. The backend is of course owned by microsoft. duckduckgo uses bings backend and somehow you have convinced yourself beyond all evidence to the contray that it isn’t bing with a different wrapper.
When you can’t pay Stardew Valley (because Steam is down) you also can’t play Eldenring. They must use the same backend and Eldenring is just Stardew Valley by another name.
You’re going to need a better source than “they go down at the same time”.
You are not getting it. Its a documented fact that duckduckgo uses bing search. You keep coming up with these other poor examples when its acknowledged by duckduckgo that they use bing. Of course they say they are more than bing but in the end that is just hype. This probably wont convince you since you are not looking for answers
arstechnica.com/…/bing-outage-shows-just-how-litt…
This is because the company uses Microsoft’s Bing to power its search results.
From this article cnet.com/…/duckduckgo-what-to-know-about-google-s…
www.wordstream.com/blog/…/who-uses-bing-anyway Look at number 5.
Of course all these sources could be wrong and its just you who is right. Right?
Laters
You act like this is some secret they keep, they literally tell you on their website that Bing is one of their sources: duckduckgo.com/duckduckgo-help-pages/…/sources/
You’ve exposed nothing. I don’t care how they source it, I care how they deliver it.
No I don’t I’m just aware. Its not one of their sources it is their source. I frankly don’t understand why you have such a issue with this fact.
Goodbye.
I don’t know why you’re arguing with me, I use words from the dictionary so you’re basically just arguing with the dictionary by another name.
DuckDuckGo also uses Bing under the hood.
Yes, duckduckgo uses other search engines to provide its results. Your point?
I don’t care where duckduckgo gets the links from, I care how relevant the top links are and that they aren’t being crowded out by ads.
No need to be defensive, ddg uses bing which means it is part of the big five under the hood. That always will have certain ramifications in the long run.
I also use it but I am looking for decentralised alternatives in meantime not because ddg is bad but because sooner or later it will get worse.
Also why are you so aggressive anyway, it’s super weird and doesn’t fit Lemmy
.
Thanks I couldn’t remember the name of this.
searx.space
I’ve started a Kagi subscription for my new search engine. Basically $6 USD per month but because it’s a user-pay model they have a really good privacy policy and don’t sell/analyze your data.
It’s currently better than Google (which I still use search in the maps for reviews)
I wish we had a government that functioned. This shot is 100% antitrust. How is it that this shit is let fly.
Antitrust would be the opposite.
Hi, I’m new here. Because of the bullshit with Reddit. Greetings fellow Lemmy people.
Welcome aboard. It’s not much, but she’s got it where it counts.
In the wubba-wubba
Thank you very much. I’m liking it.
Welcome! Genuine advice for a newcomer: look around, figure out what instances you like, and shift away from lemmy.world to an instance that requires a sign-up request and which comports with your values. There is an account migration feature to make this as easy as possible.
It’s different to what people are used to, but in my experience a huge number of the worst people migrating from reddit went straight to one of the open instances. A lot of them were banned over there for quite legitimate reasons.
They know that they can’t operate their own asshole instances for long because they’ll get defederated, and they don’t want to deal with being known to an admin who has actual principles, so open sign up is their thing, and those instances are filling up with them.
Honestly I would like to see a feature that flags if a user’s instance has open sign up.
It’s getting to the point that if someone is still on an open instance, they’re a little sus to me. It’s easier to trust people who come from instances whose policies I agree with.
I mean I joined lemmy.world in the migration from Reddit and haven’t really seen any problems with being here. I tried joining one of the ones that needed a sign up request when I first switched to Lemmy but I didn’t want to have to deal with waiting to use Lemmy. I haven’t really noticed any problems being on lemmy.world and personally I don’t even look at what instances people are from. I just treat it like reddit, we’re all using Lemmy at the end of the day.
Well, maybe you don’t get into the kinds of discussions I do, or our values are different. It seems like particularly when I say anything advocating for minorities it attracts a slew of reactionaries who are persistent and impossible to reason with, and two of the places I’ve noticed they tend to come from are lemmy.world and sh.itjust.works, both largeish instances with open sign up. I haven’t noticed any particularly reactionary instances apart from the tankie ones.
This is probably the reason a number of instances have defedded from .world, so you are probably getting more of those kinds of people, and less of the people who would object to that kind of hatred, whether you’ve noticed it or not.
And I’ve noticed this problem seems worse here than it was on reddit, but I’ve realised it makes sense because the more vocal people are the ones more likely to leave or get booted from reddit, so of course we get them here, and of course the ones who have covert ideologies tend to go for open sign up.
Personally I prefer to be on an instance that I know is roughly aligned with my values so I know I won’t have to make the case to my admins that hate is bad and should be moderated out.
Maybe what I said should’ve been more neutrally stated, but it is just my opinion.
I mean I notice people like that but I never really pay attention to what instance they’re from. They’re usually a minority in any post I see anyways though and are being downvoted a bunch. Most of the time I see lots of fairly progressive people or at worst people who were supporting Biden unconditionally and trying to call anyone who pointed out any problems with him bots. Maybe it’s cause I’m still using Lemmy mostly like I used reddit and treating it as one unified platform instead of a bunch of smaller connected ones. But personally I’d prefer being on an instance that allows me to connect to as many other instances as possible cause personally I don’t want some admin team telling me who I can and can’t interact with, I’d much rather just pick communities I like no matter what instance they’re on that I like and trust the communities more with the moderation. I’d rather handle blocking instances and communities myself rather then leaving it in the hands of admins that could power trip. Again maybe that’s just my mindset from Reddit, personally I’ve enjoyed Lemmy so far and haven’t really noticed any problems on world.
Right well the only issue with world in that case is that other instances defed from it, so you do have admins telling you what you’re not allowed to see, but you’re unaware of it because you’re cut off from them.
Like I said, I tend to attract that sort of person just by saying things they don’t like and I’ve noticed a pattern. Some other instances have noticed that pattern too which is why they defedded.
Bro… What?!? I’ve only been here a day and I have no clue what any of that means lol
Lemmy isn’t one service like Reddit. It’s a piece of software where anybody can run their own lemmy instance. Lemmy.world is the most popular, but there are many others. And those choosing to run an instance can “federate” with other instances, which means as a user you can see posts and comments from the other instance even though you are logged into the one you have an account on.
So the commenter is recommending you look at posts or comments from users on other instances that have more stringent sign up policies, and migrate your account there. Since your account is new, you likely don’t need to spend the effort on migrating your account and instead can just set up an account on another instance/server.
But it’s also fine to stay on lemmy.world. Just be respectful, voice your opinions like you would in person with other humans, and you’ll be fine. And if you’re just here for the memes, that’s ok too! Enjoy them! And welcome to lemmy.
Hey, thanks for the detailed explanation! That certainly helps, but it’ll probably take me a while to fully get it. I signed up using voyager and it didn’t tell me anything like that. I’m sure it’ll make more sense as I get used to it. So can I not see all posts from other instances?
Yeah, there are many instances, and many that have purposefully been defederated by lemmy.world. Often for good reason (CSAM, an abundance of spam accounts, violent or hateful rhetoric, etc). But generally lemmy.world and its federated instances are pretty great.
Ok, sounds like I’ll just stick with .world for a while until I get my “sea legs”
I think this this guy is going to end up on dbzer0 once he gets his sea legs. Of note the piracy communities over there will be some of the few things you can’t access from .world.
As an option, I’d recommend lemmy.today
It’s an instance that stays away from all the drama and just federates with everyone unless this becomes a moderation problem.
Wanna look at general stuff on lemmy.world? Sure! Check tankies on lemmy.ml or lemmygrad or hexbear? Alright, who are we to stop you? Wanna porn? lemmynsfw got you covered! And literally anything else on the Lemmyverse is open to you.
If you ever decide you want to branch out, you can try the instance / community browser at lemmyverse.net to check out whats out there.
I will say that although it technically doesnt matter what instance you sign up with, sometimes the descriptions aren’t very descriptive at all. Definitely give an instance a browse, to get a feel for the overall vibe, before you sign up.
You can check to see if an instance has been defederated from / by other instances, by entering said instance address at defed.xyz
It’s just so hard keeping up with this hell that we’re living in lately. Like, I’m just trying to keep my job and pay my bills. I have a good job, but I am constantly concerned that I won’t have it at any point in time. The constant stress isnt doing me any favors. I’m lucky that I’ve been frugal and saved a lot, but I don’t consider that enough to settle my fears.
Nah, it’s good. No pressure. These are just resources that I kinda had to stumble across, although im sure theyre in some super accessible place and im just a ding dong. Figured if they could help another person, then so be it.
It’s hard times right now, you do what you gotta do to keep on keeping on. Lemmy.world is a completely fine choice. They’re a large, mainstream place to get comfortable in. But, if you get bored one day and want to check out what else is out there, you’ve got something to get you started.
I wish you luck, some free time, and some peace of mind.
You can see posts from every instance that your instance has federated with. For example, I, on toast.ooo, can see posts on lemmy.world, lemmy.dbzer0.com, and sh.itjust.works because my instance federates with them.
You can’t see posts from instances that your instance has defederated with, though, nor can you see posts from instances that have defederated with yours. Think of it like cutting one of those thick undersea cables that connect the internet across continents.
There’s a lot to consider when picking an instance; lemmy.world is a good default, so that’s probably why Voyager directed you to it, but don’t be afraid to switch to another instance of you think it’ll serve you better!
Thanks! You all have been very helpful in my understanding of the Lemmy world!
It’s a bit like email, if that helps you understand it. If you use Gmail but a friend uses Yahoo, you can still both email each other.
The others gave you a decent rundown.
I’m certainly not meaning to imply that you did anything wrong signing up with .world, it’s just somethjng to be aware of. This is actually the first time I’ve made this suggestion, I honestly don’t know how most people feel about this, so actually maybe it was a bit much to dump on a newcomer. If so I apologise.
One thing I forgot about was that being on .world means you do miss out on a lot of piracy related stuff if you’re into that.
Also though, you can read about a given instance and its policies and values when you visit it. That often says a lot about the kinds of people you’ll meet there.
Ignore them. Enjoy yourself. If you’re interested in moving to a different instance later once you learn more about what that means then go for it. There are tools to help you and there’s “no karma” so there’s no reason to not. But there’s no rush to do so.
Don’t worry about it. It’s like Linux enthousiasts talking about distros.
If you get to the point where it matters to you, you’ll look into it then. I’ve been here for more than a year and still haven’t bothered to hop servers.
You know people can just lie though, right? It’s not like that’s the one magical thing that would “fix Lemmy” or something lol.
People won’t usually go to that effort just to troll when there are open instances available, and anyone with closed sign up will be quicker to ban someone who turns out to have lied about the kind of person they are, rather than these giant open instances that don’t seem to give a shit.
And yes, I know it won’t ‘“fix Lemmy” or something lol’, I never said it would. I said it was a feature I would like to see.
Thanks for the info. I’ll stay here for a while and see how everything goes.
I don’t mind assholes as I think that’s just a part of freedom of speech. And I’d rather not get too much moderated content as I think it creates too much of a filter bubble.
And me, hello!
Welcome!
And my vuvuzela?
Hi!
👋👋 :)
Uppies for all of you!
Welcome new lemmings!
Thanks
Welcome to our shithole.
Federated shithole(s)
More akin to a rabbit-hole, due to that.
But who said rabbits don’t shit in their holes?
Oh! And the soil is transparent.
Anti Commercial-AI license CC BY-NC-SA 4.0
Thanks. :)
I got tired of the censorship and blatant disrespect for the end user. Also justiceserved and the constant spam messages from the mods there, never been a member of that community and i just wanted them to stop harassing me. Called me a nazi and some other stuff for participating in mandela effect subreddit.lots of quacks there but really now, a nazi?
Edit:i mean it’s deeper than that but they we’re very hateful and reddit muted me for 3 days over…nothing? They even actively seeked out my username on social media and attacked me there through private messages and fake accounts and when i brought this to reddit attention they muted me .
Google just enshittifying even harder. Reddit results in Google searches are often old and anemic these days.
I used to want Reddit threads to show up in search results. Now I avoid them because they are so often a waste of time. More reason to use Duck Duck Go.
I saw Reddit results in a search last night using DDG. It just said something like “It’s here on Reddit, but we’re not allowed to show you.” I wasn’t planning on using Reddit (never again), but that just irritated me.
as a DDG user can confirm. if you search for something and a reddit result comes up it literally will just say “reddit.com” in the description. so it’s a crap shoot if you click on it or not.
All that being said you CAN filter out if reddit results even show up in your search engine of choice using this extension: github.com/iorate/uBlacklist
addons.mozilla.org/en-US/…/g-search-filter/
Install this and exclude it from all search results.
This one works better: addons.mozilla.org/en-US/firefox/addon/hohser/ - more supported sites, and it doesn’t break as often.
Thanks! Will give it a try.
Why not change your search engine and set up a SearX instance? You can find all instances here: searx.space. For example, I have set it up like this: https://search.inetol.net/search?q=%s&category_general=1&language=en&time_range=&safesearch=0&theme=simple, and it works wonders. Results are still mostly from Google, or you can configure it to be whatever you want.
Mainly because it’s easier to set up a browser extension. Does SearXNG let you hide sites and rank sites higher in the results?
Also you’d really want to use SearXNG… The original SearX is dead.
These are a SearXNG instance, I think they’re aware of those facts there. You can set it up to not search or search on specific pages, you can also have multiple profiles which is useful when searching for different things. It also combines multiple web search engines… I haven’t dived into the configuration that much because what’s most important to me is that it doesn’t show ads, but I assume it’s easier to install the extension, although you don’t set this up every day.
am gonna exclude reddit
IMO, another good reason to not use Google!
Reddit responded: “Only google pays us”. The content is not yours. You built this of naive user base that just wanted to share now these fuckers are taking it as their entitlement. As early an reddit user - fuck that place, I’m still angry.
Legally speaking, the content is theirs.
No, I don’t think so. Just because you put a clause in ToS doesn’t make it legally binding and most precedent is in favor of the original copyright owner.
I’d love to see the precedent, if you don’t mind.
Nonsense.
If someone posts a copyright violation on YouTube, YouTube can go free under the safe harbor provisions of the DMCA. (In the US.) YouTube just points a finger at the user and says “it’s their fault”, because the user owns (or claims to own) the content. YouTube is just hosting it.
I don’t know of any reason to think it’s not the same for written works. User posts them, Reddit hosts them, user still owns them. Like YouTube, the user gives the host a lot of license for that content, so that they can technically copy and transmit it. But ultimately the user owns it. I assume by the time Reddit made the AI deal they probably put in wording to include “selling a copy of the data” to active they want in the TOS.
Now, determining if the TOS holds up in court is of course trickier. And did they even make us click our permission away again after they added it, it just change something we already clicked? I don’t recall.
Usually any hosting platform has some kind of wording to the tune of “you give us permanent and unrestricted right to use your content however we want”. Copyright is still yours, but you can’t use it against the platform. Applies to social networks, YouTube, Flickr, anything I can think of.
should fight in court that it’s not reddit’s content. it belongs to the people not steve fuck face.
I’m sure the reddit TOS you agreed to during signup says otherwise…
Been on Reddit since like 2009-ish. You completely nailed the point.
Ok so they are earning on our data
You just described every company
I work for a different sort of company that hosts some publicly available user generated content. And honestly the crawlers can be a serious engineering cost for us, and supporting them is simply not part of our product offering.
I can see how reddit users might have different expectations. But I just wanted to offer a perspective. (I’m not saying it’s the right or best path.)
Can you use something like the DDOS filter to prevent AI automated scrapings (too many requests per second)?
I’m not a tech person so probably don’t even know what I’m talking about.
I worked with a company that used product data from competitors (you can debate the morals of it, but everyone is doing it). Their crawlers were set up so that each new line of requests came from a new IP… I don’t recall the name of the service, and it was not that many unique IP’s but it did allow their crawlers to live unhindered…
They didn’t do IP banning for the same reasoning, but they did notice one of their competitors did not alter their IP when scraping them. If they had malicious intend, they could have changed data around for that IP only. Eg. increasing the prices, or decreasing the prices so they had bad data…
I’d imagine companies like OpenAI has many times the IP, and they’d be able to do something similarly… meaning if you try’n ban IP’s, you might hit real users as well… which would be unfortunate.
We have a variety of tactics and always adding more
Blocking bots is hard, because with some work they can be made to look like users, down to simulating curved mouse movements from one button to the next if you are really ambitious.
So your saying reddit’s activity analytics can’t necessarily tell the difference between human activity and bot activity?
So the actual number of people using reddit vs bots isn’t very clear. Someone should tell Reddit’s share holders that’s there’s no way to tell if the advertisements are actually being viewed by people, and there’s no way to tell how much the activity reports have been inflated by bots. I bet they wouldn’t like that very much.
Always has been. Technically the server sees no difference in what a browser does vs what a bot does: Downloading files and submitting requests.
Honestly? I’d be happy to not see their trash in any search engine I use.
Hot take here.
I do believe in free information.
Instead of investing money in stop crawlers why do not make the data they are trying to crawl available to everyone for free so we can have a better world all together?
Data transfer isn’t free. It costs real money and energy to respond to queries. Don’t be surprised to see ~50% of all requests made to your server be from bots which you may have no interest in servicing outside of search engine indexers.
If you publish your data in a friendly manner bots would have no need to crawl your site.
Data that is more interesting and requested a lot could even be served over p2p.
This moderl would generate less cost that dealing with constant bot scrappers.
It is not a technical discussion. Or a discussion about associated cost. It’s a discussion about morals and economic models.
They’re also blocking posts by users who aren’t banned or even got a warning. It appears to the user as though it’s been posted, but it hasn’t.
Shadowbanning? Do you have more info on this?
They’ve done this for a long time. It’s supposedly only supposed to be used on bots but it definitely isn’t in practice
It definitely is in practice 100%
I didn’t know there was a name for it, I don’t have anymore info on it, but I can show examples of it happening.
shadowbanning is a totally different issue that’s existed for a long time though.
I’m kind of curious to understand how they’re blocking other search engines. I was under the impression that search engines just viewed the same pages we do to search through, and the only way to ‘hide’ things from them was to not have them publicly available. Is this something that other search engines could choose to circumvent if they decided to?
Search engine crawlers identify themselves (user agents), so they can be prevented by both honor-based system (robots.txt) and active blocking (error 403 or similar) when attempted.
Thank you, I understand better now. So in theory, if one of the other search engines chose to not have their crawler identify itself, it would be more difficult for them to be blocked.
This is where you get into the whole webscraping debate you also have with LLM “datasets”.
If you, as a website host, are detecting a ton of requests coming from a singular IP you can block said address. There are ways around that by making the requests from different IP addresses, but there are other ways to detect that too!
I’m not sure if Reddit would try to sue Microsoft or DDG if they started serving results anyway through such methods. I don’t believe it is explicitly disallowed.
But if you were hoping to deal in any way with Reddit in the future I doubt a move like this would get you in their good graces.
All that is to say; I won’t visit Reddit at all anymore now that their results won’t even show up when I search for something. This is a terrible move and will likely fracture the internet even more as other websites may look to replicate this additional source of revenue.
It’s still possible to search with “site:reddit.com …”
Has it been implemented yet or are they blocking non-flagged searches? Which seems odd.
You shouldn’t be getting any new results if you do that, older posts will/may remain indexed.
Aha. I was wondering about that possibility.
Net neutrality?
I don’t have any more info on it, but I can prove it