Data centers contain 90% crap data (gerrymcgovern.com)
from fantawurstwasser@feddit.org to technology@lemmy.world on 07 Apr 2025 12:10
https://feddit.org/post/10394330

#technology

threaded - newest

ptz@dubvee.org on 07 Apr 2025 12:16 next collapse

Checks out, at least in my case.

I self-host my email and pretty much every other cloud service I’d otherwise be using. My Gmail account is literally a spam catcher address, so everything there and elsewhere I haven’t already deleted is 100% crap.

Vortieum@sopuli.xyz on 07 Apr 2025 12:31 next collapse

Solutions?

nyan@lemmy.cafe on 07 Apr 2025 12:43 next collapse

Massive deduplication across all accounts on all servers of image, audio, and video data would theoretically be possible, but ain’t gonna happen. Or we could just discourage people from posting cat videos and bad memes (even less likely to happen).

lemmyng@lemmy.ca on 07 Apr 2025 13:12 next collapse

I would argue that duplication of content is a feature, not a bug. It adds resilience, and is explicitly built into systems like CDNs, git, and blockchain (yes I know, blockchains suck at being useful, but nevertheless the point is that duplication of data is intentional and serves a purpose).

nyan@lemmy.cafe on 07 Apr 2025 13:27 next collapse

If the data has value, then yes, duplication is a good thing up to a point. The thesis is that only 10% of the data has value, though, and therefore duplicating the other 90% is a waste of resources.

The real problem is figuring out which 10% of the data has value, which may be more obvious in some cases than others.

muntedcrocodile@lemm.ee on 07 Apr 2025 13:28 next collapse

Technically git is a blockchain

futatorius@lemm.ee on 09 Apr 2025 10:04 collapse

explicitly built into systems like CDNs, git, and blockchain

Git only duplicates blobs; textual content is generally stored as deltas (look at git_repack for more details). And it’s bad practice to version-control blobs: the more correct approach is to control the source from which the blob is generated.

CDNs don’t all work alike so it’s impossible to generalize. I won’t comment on blockchain, since in my work as a developer and architect, I’ve never encountered a valid use case for it.

lemmyng@lemmy.ca on 09 Apr 2025 11:08 collapse

You’re missing the forest for the tree here.

Given identical client setups, two clones of a git repo are identical. That’s duplication, and it’s an intentional feature to allow concurrent development.

A CDN works by replicating content in various locations. Anycast is then used to deliver the content from any one of those locations, which couldn’t be done reliably without content duplication.

Blockchains work by checking new blocks against previous blocks. In order to fully guarantee the validity of a block you need to guarantee every block, going back to the beginning of the chain. This is why each root node on a chain needs a full local copy of it. Duplication.

My point is that we have a lot of processes that rely on full or partial duplication of data, for several purposes: concurrency, faster content delivery, verification, etc. Duplicated data is a feature, not a bug.

Brkdncr@lemmy.world on 07 Apr 2025 14:52 collapse

Deduplication is trivial when applied at the block level, as long as the data is not encrypted, or is encrypted at rest by the storage system.

nyan@lemmy.cafe on 07 Apr 2025 17:30 collapse

If the storage all belongs to one machine, yes. If it’s spread across multiple machines with similar setups that share a LAN, then you need to put in a little thought to make sure that there’s only one copy for all machines, but it’s still doable.

In this case, we’re talking millions of machines with different owners, OSs, network security setups, etc. that are only connected across the Internet. The logistics are enough to make a hardened sysadmin blanch.

acosmichippo@lemmy.world on 07 Apr 2025 13:09 next collapse

charge more to customers for long term data storage. allow short-term for free.

Fluffy_Ruffs@lemmy.world on 07 Apr 2025 14:00 collapse

How do you differentiate old from new? I can just create a fresh copy of whatever I’m storing and it’ll look new.

Flagstaff@programming.dev on 07 Apr 2025 15:47 next collapse

If the files are exact copies, then MD5 checks will catch them; tweaking so many files just to bypass this could prove to be too tedious of a process for people to bother exploiting it.

However, people could create scripts for others to mass-download, -edit, and -upload their files accordingly to reduce this tedium.

futatorius@lemm.ee on 09 Apr 2025 10:02 collapse

then MD5 checks will catch them

That can be trivially defeated.

Flagstaff@programming.dev on 09 Apr 2025 12:35 collapse

I know; I just gave an example of how immediately after in the same comment lol.

acosmichippo@lemmy.world on 07 Apr 2025 15:59 collapse

It doesn’t matter, strict enforcement is not the point. we’re talking about reducing “crap data” which is data people don’t care about long-term. If you care enough about the data to copy it manually more power to you. If you don’t care that much, you’ll let it get purged, whch is the goal.

vane@lemmy.world on 07 Apr 2025 13:24 next collapse

sudo rm -rf /data

mr_jaaay@lemmy.ml on 07 Apr 2025 17:23 collapse

I’m imagining Data from Star Trek being deleted…

Captain, this is most illogical.

TacticalCheddar@lemm.ee on 07 Apr 2025 16:50 next collapse

We fully transition to clean energy like nuclear and build more power plants to allow us to store our online stuff.

The author of this article is not a serious person. He’s in the same bucket as Greta Thunberg. They just like to scream and blame people instead of providing practical solutions. It’s frankly tiring to hear them despite their honorable intentions.

kkj@lemmy.dbzer0.com on 07 Apr 2025 16:58 next collapse

Thunberg’s solution has always been “listen to the experts who have been screaming at you for 50 years.” You don’t have to be an expert to care about things or to want to listen to people who are experts.

TacticalCheddar@lemm.ee on 07 Apr 2025 17:13 collapse

“listen to the experts who have been screaming at you for 50 years.”

That would be fine provided that it’s done correctly and civilized. Which is my point. Raising awarness is fine.Throwing insults loudly left and right to raise awareness is not. It only makes you seem delusional and sheds a bad light on your cause. This allows climate change deniers to take advantage of that to further their agenda.

kkj@lemmy.dbzer0.com on 07 Apr 2025 17:49 next collapse

People have tried to politely call attention to the climate crisis for decades. They were ignored. Sometimes, you have to be chaotic to get noticed. See also: Stonewall, the Black Panthers.

futatorius@lemm.ee on 09 Apr 2025 09:58 collapse

That would be fine provided that it’s done correctly and civilized.

Tone-policing is never a good look. If you’re opposed to something, just admit it.

partial_accumen@lemmy.world on 07 Apr 2025 17:02 collapse

He’s in the same bucket as Greta Thunberg. They just like to scream and blame people instead of providing practical solutions.

Greta Thunberg is 22 years old right now, and was “screaming” and “blaming people” when she was 11 years old.

She saw the world she was going to inherit and forced conversation to work toward solutions. Expecting an 11 year old to provide answers that none of the established world has is silly.

TacticalCheddar@lemm.ee on 07 Apr 2025 17:26 collapse

Greta Thunberg is 22 years old right now, and was “screaming” and “blaming people” when she was 11 years old.

Expecting an 11 year old to provide answers that none of the established world has is silly.

Fully agreed.

She saw the world she was going to inherit and forced conversation to work toward solutions

I disagree. I saw her speak and the reactions to some of her speeches. Her inflamatory and derogatory speeches did nothing more than help opponents of the energy transition. To give you an example, when asked about it during an interview Putin jumped at the opportunity to discredit the energy transition. While the public saw Greta behaving like a petulant child during the speech, they then saw Putin speaking calmy, asking real questions like “How are poor nations going to transition when they need cheap fossils to sustain themselves?”. They then take this bit and plaster it on every social media site. People see it and are inclined to take Putin’s side since he appears more knowledgeable and in control of himself. And just like that he gets a boost in his reputation.

This is why I don’t like activists like her and the author of this article. They do more harm than good by expressing themselves in such a violent manner.

partial_accumen@lemmy.world on 08 Apr 2025 02:52 collapse

Wait, you think Putin has credibility when speaking on climate change? To quote the late Sen. John McCain describing Russia as “a gas station masquerading as a country”. Putin’s life and livelihood depend on continued world’s unchecked consumption of fossil fuels. Putin has zero credibility on the subject. Why would anyone consider him an objective source?

While the public saw Greta behaving like a petulant child during the speech

You and I must have seen different speeches. Part of Thunberg’s appeal was her eloquence in speech especially speaking truth to power. Here’s part of 16 year old Greta Thunberg’s speech in the UN:

"The popular idea of cutting our emissions in half in 10 years only gives us a 50% chance of staying below 1.5 degrees [Celsius], and the risk of setting off irreversible chain reactions beyond human control.

"Fifty percent may be acceptable to you. But those numbers do not include tipping points, most feedback loops, additional warming hidden by toxic air pollution or the aspects of equity and climate justice. They also rely on my generation sucking hundreds of billions of tons of your CO2 out of the air with technologies that barely exist.

“So a 50% risk is simply not acceptable to us — we who have to live with the consequences.”

source

I can’t imagine a world where you’re calling that “petulant”. At 16 years old she had more poise and gravitas than many of the world leaders she was speaking to. You say she hasn’t done anything. I beg to differ. Further, if what she has done is nothing, it raises the obvious question: what have you done to avert climate catastrophe?

TacticalCheddar@lemm.ee on 08 Apr 2025 03:10 collapse

Wait, you think Putin has credibility when speaking on climate change?

I never said that. I said that to the general public he may appear as more credible based on his behavior when adressing the issue compared to Greta which would undermine environmental efforts. This is what I’m concerned about.

I can’t imagine a world where you’re calling that “petulant”. At 16 years old she had more poise and gravitas than many of the world leaders she was speaking to. You say she hasn’t done anything. I beg to differ. Further, if what she has done is nothing, it raises the obvious question: what have you done to avert climate catastrophe?

Ok, usually I welcome debates, but I can see that you were very angry when you wrote this so I think it’s better that we stop here. Before finishing off, I would like to clarify one more time that I’m not a climate change denier. I just don’t like the way Greta and others like her operate since it’s my belief that their aggressiveness undermines the movement and increases the risk of turning people into reactionaries. That’s my opinion. I’m entitled to it just as much as you’re entitled to your own.

partial_accumen@lemmy.world on 08 Apr 2025 04:41 collapse

I can see that you were very angry when you wrote this

I’m not angry. I’m shocked at your position though. I see your position as dismissive of someone who is actually doing something about the crisis she will inherit with the tiny fractional power she had before adulthood. She, and her generation, have no time for a timid approach. We’re going to be long dead and she’ll still be here trying to live through the mess we, and our parents, have cause her and everyone else her age.

so I think it’s better that we stop here.

Thats fine. I don’t see a path to anything that would yield productive conversation from here.

partial_accumen@lemmy.world on 07 Apr 2025 17:04 next collapse

Solutions?

Carbon tax.

In this micro example, imagine if you could access all of your data for free when there as abundant sunshine (carbon free), or had to pay for carbon based energy at night. You’d start to sort your data for what you really wanted so that you’d only be paying a small amount for a small amount of data.

HubertManne@piefed.social on 07 Apr 2025 17:04 next collapse

I don't see one unless our society because less dependent on bullshit and honors privacy. I don't know about anyone else but I constantly bullshit specifics about myself on line to dirty up any data collected on me.

sugar_in_your_tea@sh.itjust.works on 08 Apr 2025 04:52 collapse

That depends on the problem.

I disagree w/ the author that storing blurry cat memes is what’s “destroying our environment.” Transportation is our biggest net polluter in terms of CO2, which is higher than all electrical generation combined. If we’re want to solve CO2 emissions, we have to solve transportation, since that’s the 500 pound gorilla in the room.

If we look specifically at datacenters, storage makes up a tiny fraction of the overall energy use. That article mentions that datacenters probably have a similar CO2 footprint as the aviation industry, which makes up about 2.5% of the world’s carbon emissions, or about 10% of the total transportation emissions from the above link.

If the goal is to fix climate change, data centers are pretty far down the list in terms of priorities. Higher priorities are, roughly in this order:

  1. ground transportation - electrify or switch to something like hydrogen
  2. electrical power generation - this will directly reduce the impact of data centers, be part of 1, and solve a number of other issues
  3. residential heating - switch from fossil fuels to heat pumps for heating, which should be a relatively “drop-in” replacement and could save customers money
  4. industry - largely solved by 2, but there may need to be some shifts in certain types of production processes to reduce emissions

Changing anything about data centers is way down the list of priorities, and it’ll be largely solved by something much higher up. So it’s really the wrong target to attack.

Neon@lemmy.world on 09 Apr 2025 11:08 collapse

You forget the production and disposal stage of datacenters which are the biggest polluters.

sugar_in_your_tea@sh.itjust.works on 09 Apr 2025 11:14 next collapse

Sure, but how does that compare to all the plastic crap people buy? Or electronic waste from consumer goods? Businesses keeping offices open when WFH is a thing?

I haven’t looked up the supply chain stats here, but I imagine it’s also relatively small potatoes when compared to other 500 pound gorillas in the room.

We should certainly deal with it, but it should be much lower priority than the larger sources of pollution.

CHKMRK@programming.dev on 09 Apr 2025 12:56 collapse

How is that different from producing and disposing a modern car? Those things are essentially large computers with wheels and a combustion engine

nyan@lemmy.cafe on 07 Apr 2025 12:39 next collapse

Sturgeon’s Law in action again.

Skullgrid@lemmy.world on 07 Apr 2025 13:09 next collapse

1980s-2000s : the information age

2000s-present : the data age.

Information implies it’s correct, data implies it can be anything , true or false.

HubertManne@piefed.social on 07 Apr 2025 16:59 collapse

aughts were not bad but it was falling and once we got in the teens ugh. oh and old man thing the pre www was advertisement free which was awesome.

Skullgrid@lemmy.world on 07 Apr 2025 17:01 collapse

sure. the cut off can be somewhere around there, start can be earlier too.

0x0@lemmy.zip on 07 Apr 2025 15:18 next collapse

You’ll pry my kitten pictures from my cold dead hands!

Endymion_Mallorn@kbin.melroy.org on 09 Apr 2025 04:20 next collapse

Yes, but 90% of everything is crap. Why should we expect data centers to be any different?

[deleted] on 09 Apr 2025 09:56 collapse

.