I’m extremely wary of any law that can be used to censor or otherwise remove material online, but one gripe i have with the Techdirt article is their assertion that hash matching is expensive or difficult.
Generating a SHA hash of an image when uploaded is very inexpensive in terms of processing, and there’s already going to be a db somewhere that stores the image metadata, so it’s not like putting the hash there is hard. Similarly, a simple No/SQL lookup for a known hash is incredibly simple and non-intensive.
The real issue is the lack of an appeal mechanism, the lack of penalty for, or legal mechanism to, ignore false reports (which should probably be about spam/ volume of requests, rather than single requests), and the lack of definition around what exactly a site must do to show good-faith, reasonable compliance.
ryannathans@aussie.zone
on 20 Dec 05:04
nextcollapse
(5) “visual depiction” includes undeveloped film and videotape, data stored on computer disk or by electronic means which is capable of conversion into a visual image, and data which is capable of conversion into a visual image that has been transmitted by any means, whether or not stored in a permanent format;
The way it is written, even cropped, rotated, blurred, or in any other way processed files of that “depiction”, even the values learned by a neural network (capable of conversion into a visual image), would fall under the “identical” part.
Since perceptual hashing does exist, there are open source libraries to run it, and even Beehaw runs an AI based image filter, the “reasonable effort” is arguably to use all those tools as the bare minimum. Even if they sometimes (or always) fail at removing all instances of a depiction.
But ultimately, deciding whether a service has applied all “reasonable efforts” to remove “identical copies” of a “depiction”, will fall on the shoulders of a judge… and even starting to go there, can bankrupt most sites.
threaded - newest
I’m extremely wary of any law that can be used to censor or otherwise remove material online, but one gripe i have with the Techdirt article is their assertion that hash matching is expensive or difficult.
Generating a SHA hash of an image when uploaded is very inexpensive in terms of processing, and there’s already going to be a db somewhere that stores the image metadata, so it’s not like putting the hash there is hard. Similarly, a simple No/SQL lookup for a known hash is incredibly simple and non-intensive.
The real issue is the lack of an appeal mechanism, the lack of penalty for, or legal mechanism to, ignore false reports (which should probably be about spam/ volume of requests, rather than single requests), and the lack of definition around what exactly a site must do to show good-faith, reasonable compliance.
Change one bit, now we have a brand new hash
Depends on “how identical” is “identical”.
The SHA hash of a file, is easy to calculate, but pretty much useless at detecting similar images; change a single bit, and the SHA hash changes.
In order to detect similar content, you need perceptual hashes, which are no longer that easy to calculate.
Why “no longer”?
because of the “perceptual” part.
A normal hash has the property that it produces wildly different hashes for even the tiniest of changes in the file.
Perceptual hashing flips that requirement on its head, and therefore makes finding a suitable hash function much harder.
Oh, the way I read it it seemed like they were saying perceptual hashes used to be easier to calculate
.
The problem lies in what is a “depiction”:
section 2256(5) of title 18
uscode.house.gov/view.xhtml?req=(title:18 section…
via: section 1309 of the Consolidated Appropriations Act, 2022 (15 U.S.C. 6851).
uscode.house.gov/view.xhtml?req=(title:15 section…
via: the definitions section of the act
www.congress.gov/bill/118th-congress/…/text#idE94…
The way it is written, even cropped, rotated, blurred, or in any other way processed files of that “depiction”, even the values learned by a neural network (capable of conversion into a visual image), would fall under the “identical” part.
Since perceptual hashing does exist, there are open source libraries to run it, and even Beehaw runs an AI based image filter, the “reasonable effort” is arguably to use all those tools as the bare minimum. Even if they sometimes (or always) fail at removing all instances of a depiction.
But ultimately, deciding whether a service has applied all “reasonable efforts” to remove “identical copies” of a “depiction”, will fall on the shoulders of a judge… and even starting to go there, can bankrupt most sites.