Can Watermarking Help Find GPL Violations? 265
bitkid writes "I recently run across techniques that can be used to
watermark
program code.
While I yet have to see some source code for this to play with, the authors claim that
the watermarks can be introduced into the source code and can be found in the compiled executable.
My question for the slashdot-crowd is: Do you think free software (GPL or other viral licenses)
should be watermarked? This could help to find GPL violations (think
Everybuddy or
Linksys) or can
be used in court someday against the next SCO to prove authorship.
What might be the ramifications of this?"
Useful, but easy to get around. (Score:5, Insightful)
Re:Useful, but easy to get around. (Score:5, Insightful)
Furthermore, you could automate the process by writing a script to do things like randomising white space, replacing variable names, and even rewriting simple flow control constructs.
I would suggest that if it is deemed important to be able to establish the originator of the code, then the originator should publish it as theirs as soon as it is written, or at least give it to an independent witness for safekeeping.
Re:Useful, but easy to get around. (Score:3, Interesting)
Re:Useful, but easy to get around. (Score:3, Informative)
Of course registering every cvs checkin is going to get expensive
Re:Useful, but easy to get around. (Score:2)
Excellent advice, but it doesn't work for those of us outside the US. Here in the UK (and AFAIK the rest of the EU) copyright resides with the creator, but there is no place of registration.
The registration road is one that the US followed many years ago, [copyright.gov] and it provides an excellent degree of legal protection. But the history of copyright law in the UK has tended towards an "If you wrote it then it's yours, now can you prove
Re:Useful, but easy to get around. (Score:2)
I believe that only applies to *printed* works.
Re:Useful, but easy to get around. (Score:3, Insightful)
But the history of copyright law in the UK has tended towards an "If you wrote it then it's yours, now can you prove it?" model, and proving it is a difficult thing to do nowadays if you publish an original work on the net.
Here we have no national registry of claims of copyright as a matter of public record. Maybe the EU should move towards this model (although I deeply distrust any approach that would actually require registration).
I disssagree. If you have created something, you usually have kept the
More on accepting code for storage (Score:2)
Re:More on accepting code for storage (Score:2)
Some organizations (like say SCO) might not want to release their code to an outside party.
I, too, will admit to being embarrassed about some old code I've written.
Re:Useful, but easy to get around. (Score:2)
No legal value (Score:2)
Re:Such an organization already exists (Score:2, Informative)
The UK doesn't have a Copyright Office. Neither does Australia. I guess they'd have to use their traditional notary channels.
UK Method (Score:4, Informative)
Later when Parasitesoft trys to claim you stole it from them, the solicitor can produce this as legally acceptable evidence of its date of existence.
Re:UK Method (Score:2)
Re:UK Method (Score:3, Informative)
I do this with my own writing (that is, I post it to myself) so I have the means to prove creation date should it ever become an issue.
Re:UK Method (Score:2)
There are allready (since ages) law cases which got lost by trying this method. There are als law cases where one trying the "post mark method" tried to forge his evidence.
And if you ask a lawer, his first attempt will be to get the "post mark method" out of your mind.
angel'o'sphere
Re:Useful, but easy to get around. (Score:5, Informative)
Those are stuff that cannot be seen in the resulting executable, the watermark is claimed to be found even in the resulting executable. (Yes I know in some cases variable names can be visible in the executable, but you can easilly prevent it from being there.) I somehow doubt this watermarking is at all possible. With optimizing compilers it is hard to find resemblance between source and executable. Finally knowing how the watermarks are made on the code, it is probably easy to write another but slightly similar algorithm that will remove the watermark.
Re:Useful, but easy to get around. (Score:2)
If you read the PDF you will see that it is still detectable in the executable. That is because the watermark isn't actually in the executable directly, it is placed in the data structures that the executable creates/uses.
I do agree with your other point though. If you know about the process then I can't imagine any way they can prevent you from writing a program ca
Re:Useful, but easy to get around. (Score:2)
I somehow doubt this watermarking is at all possible.
const char* watermark = "This is mine"
This should work, but is trivial to remove if you have the source. Might be less trivial if you don't, but have decompiled something, which is what the linked article discuss.
So while watermarking can work for closed source I don't think it can work for opensource software, if the copyright infringers have any clue at all. It's likely to be known that the ugly kludge in foo.c is actually a trick to get the string
Re:Useful, but easy to get around. (Score:2)
Trivial even when compiled, just have to care more about the lengths of strings.
$ sed -e 's/Free Software Foundation/SCO Group. All rights reserved/' /usr/local/bin/bash > /usr/local/bin/scosh /usr/local/bin/scosh
$ chmod +x
$ scosh --version
GNU bash, version 2.05b.0(1)-release (i386-portbld-freebsd5.1)
Copyright (C) 2002 SCO
Re:Useful, but easy to get around. (Score:2)
Of course that can be changed also, even far enough to "obscure" the origin. But changing the inheritance graph of code is not that easy
the problem with watermarking, IMHO, is that as soon as the schema is known it can be "removed" or "destroyed".
angel'o'sphere
Re:Useful, but easy to get around. (Score:2)
Not easy -- story submitter is confused (Score:5, Interesting)
The approaches they're talking about are done at the compilation phase or post-compilation on Java bytecode.
It's *extremely* difficult to produce good, reliable watermarks, because different compilers will build software differently, as will different optimization options.
I'd essentially say that source-based watermarks are a lost cause (at least with C, and with the current constraints of readability and simplicity on code).
A much better approach would be a project that does fuzzy comparisons on binaries, and is somewhat aware of ELF. Basically, you'd have a program that would have a set of known GPL code (a compiled Linux system would work well) and compare it to a set of compiled code.
This is still not perfect if the person is malicious and just tries using a different compiler. This has happened before with xvid and use of icc. However, there aren't *too* many compilers out there.
Hmm...this is an interesting problem.
A more interesting approach that just occurs to me now -- in general, the proportions of compiled code should be roughly the same, independent of compiler -- adding padding, etc. Generate a call graph of the function tree in a set of GPL code. Then your checker would do fuzzy matching on chunks of that call graph against the suspicious code. It'd take a bit of massaging. It'd also still need some manual looking at the target once identified. However, this should be able to run in a pretty automated manner (even if it takes a long time to run) and could potentially turn up some interesting goodies. It'd certainly discourage commercial folks from ripping off GPL-using authors and companies.
Try taking a Windows system with a lot of installed (non-GPL) software and a Linux system with a lot of (GPL) installed software. Start a comparison running. See what turns up.
Re:Not easy -- story submitter is confused (Score:5, Informative)
Even with compiler optimizations and processor specific instructions AND EVEN different compilers, you can actually find and detect "similar HLL code" (there's a tool called DATING that can do that - contact me for a copy, it's hard to find - and which the name is a pun to the IDA FLIRT abilities). I dont know for different cpu, but i guess it would be ressources hungry, and i dont know of a tool that can catch those for now. Try anyway to have a look at VMWARE binaries - win32/linux - with it, you'd probably be surprised.
blah, dunno what i wanted to say next it's late here... ~<:(
Re:Not easy -- story submitter is confused (Score:3)
[shrug] If you post a link to a source archive, I expect more than a few people would be interested. I think it would have to go well beyond FLIRT to be useful for this problem, though.
I just took a look at FL
Re:Not easy -- story submitter is confused (Score:2, Funny)
And also an excellent name if you want people to NEVER be able to find it using Google.
Re:Not easy -- story submitter is confused (Score:2)
Why? How do you define false positives?
Re:Not easy -- story submitter is confused (Score:2)
A tool like this couldn't distinguish between positives and non-positives alone. But it could isolate code to be reviewed by authors -- if it could email the GNOME folks and say "80KB of very similar looking code is in Adobe Photoshop", it'd let people start poking at it. Given a bit of poking at disassembly, it's not that hard to see whether the code's been swiped.
And I'm not sure what you'd mean by "if it *does* become legal"...surely there'd be no legal problems wit
Re:Not easy -- story submitter is confused (Score:2)
Hard to be certain. In order to operate, it would have to create a temporary copy of the program being analysed in memory. This may or may not constitute a copyright infringement, depending on your particular jurisdiction and how the judge / jury / evil fascist dictator (delete as applicable) is feeling on that particular day.
Many countries have specific prov
Why become like them? (Score:2)
Are we no better than the big conglomerations where we can't trust anyone and are filled with fear and dread of all the abuse that _MIGHT_ happen?!?
I just ask. It's up to you to answer..
Re:Useful, but easy to get around. (Score:2)
Erm... isn't the history of digital watermarking pretty dismal so far? I mean, the RIAA tried to watermark music but Ed Felten and gang pretty much demonstrated the futility of that (and pretty quickly, too).
Are software binaries really so different that watermarking would work for it?
Re:Useful, but easy to get around. (Score:2)
This is not getting around, this is the legal way of doing it without violating the GPL. Reusing the code directly under non-GPL licenses is forbidden, but writing code that implements the same algorithm is not. Unless, of course, you have a software patent on that...
Er.... (Score:2)
The only way it would be a violation is if you could prove it was a derrivative work, and for that there'd have to be at least some line of code the same... having *functionally equivalent* lines of code != derrivitive work. If that was the case then Encyclopedias would have ben sueing eachother since the beginning of time for publishing "functionally equivalent" information.
Beware the flipside (Score:5, Insightful)
Re:Beware the flipside (Score:2)
Thanks!
Re:Beware the flipside (Score:4, Insightful)
Re:Beware the flipside (Score:2)
Actually, there would be nothing wrong with this. If it's good for GPL'd software, it's good for all software to protect IP.
The real trick is what if some non-GPL code is watermarked, but in fact watermarked by the real author (in other words they "borrowed" the code and then watermarked it)?
Re:Beware the flipside (Score:2)
Re:Beware the flipside (Score:2)
Because only open source developers should be able to protect there IP.
I think not (Score:4, Insightful)
If the trademark stuff gets too hectic, then maybe this will be needed, but for now i dont think it's needed
Re:I think not (Score:5, Insightful)
If?
Can I have directions to your hole, I'd like to live there too.
Re:I think not (Score:2)
We don't know what AOL is and we don't know what DMCA is. What more could you ask?
Re:I think not (Score:2)
If people object from this sort of thing in commercial products, then they can't very well turn around and advocate it for this and still have a moral leg to stand on.
Re: (Score:2)
The ramifications. (Score:5, Funny)
It might cause the sky to fall down on our heads, or the atmosphere to evaporate, killing us all with solar radiation.
Re:The ramifications. (Score:2)
Watermark? (Score:2, Insightful)
Re:Watermark? (Score:5, Funny)
we are talking about a bunch of 1s and 0s here. If it can be watermarked, it can be unwatermarked. A simple script will be able to rearrange stuff to disrupt the watermark without affecting the execution of the program.
Yes, a bit like how it's easy to reconstruct a burned down house from its ashes.
Re:Watermark? (Score:2)
On the other hand, digital watermarking is hardly http://lzip.sourceforge.net territory!
Watermarking can't be lossy, since it can't affect operation. Any non-lossy translation, however clever, can be detranslated?
Haven't digital watermarks and obfusication etc got a real bad press recently? There was that DMCA case with the professor not allowed to say how he cracked watermarks, iirc.
Re:Watermark? (Score:2, Informative)
I envisage a 'watermarker' as being some program you run your app through and it records a signature, which you can treat as a 'fingerprint'. You can then run that watermarked program through a checker, and it will tell you how close (100%) the match is?
There are commercial programs which translate binary applications from one instruction set at a time, sometimes as a simulator, sometimes outputting a compiled program.
A program is just a flowchart
Re:Watermark? (Score:2)
Before you discount their techniques, you should probably read the papers. I have a sneaking suspiscion that these people, who have spent a great deal of time researching this topic, may know some things you don't.
Re:Watermark? (Score:5, Informative)
More info can be found in this [ucla.edu] paper, if you're into reading that sort of thing.
Re:Watermark? (Score:2)
Just an extra step (Score:4, Interesting)
I can still see this being useful if blatent copying of the software is the biggest problem the project faces, however I'm having trouble envisioning a scenerio where that's the case.
details about watermarking techniques (Score:5, Informative)
On his website [arizona.edu] I found his full article, if you want some details about watermarking techniques. It's has a lot more meat than presentation slides.
Re:details about watermarking techniques (Score:4, Informative)
as usual (Score:5, Insightful)
The main idea is that you embed the watermark into the code and then obfuscate it. The resulting code is unreadable, otherwise watermark would be trivial to remove, which makes it absolutely useless as far as open source is concerned.
Re:as usual (Score:5, Informative)
Re:as usual (Score:2)
Re:as usual (Score:2)
If you can think of a source-code watermarking scheme which can't be trivially defeated with search-and-replace, I'll concede the point.
Unreadable == useless? (Score:2)
Um, surprising as it may sound, I have looked at some open source code, you know, and some bits of it could reasonably be described as, you know, just a tad "unreadable". So there's nothing to be lost here.
Re: (Score:2, Insightful)
...and NOT open source (Score:2)
Furthermore, they are not talking about techniques that you could use if the "attacker" had access to the source code. (See the full paper, linked to in a comment above.)
This would work about as well for open source software as adding easter eggs (which they also discuss). From my perspective, this is a fine paper but easter eggs are still a lot more fun to write.
-- MarkusQ
its for java and its binary watermark, not source (Score:3, Interesting)
So,
1) only protects binaries not source
2) its for Java which is easier due to the cannonical form (bytecodes) that can be manipulated by the watermarking tool. You could probably do this to protect GPL binaries but with less portability
IMHO opinion, not usefull for source but sure if you're worried that some of your precompiled binaries are being ripped, then maybe.
For source, you need to detect common code patterns and use source tools that have been discussed elsewhere on
MOD UP PARENT (Score:2)
Does it really matter??? (Score:5, Insightful)
So Intel is working on a product and they just swipe up the code out of the book, never ask for permission or anything, and use it in a commercial product (VTune). The author of the code, of course, was furious. He approached Intel. They blew him off. He had reverse engineered their code. He could produce an exact replica of the binary with his own code using the MS C compiler.
He never got anything out of Intel. I suppose he could have hired attorneys, but he wasn't a wealthy guy. He couldn't find attorneys to take it without cash up front. So my question is: How do watermarks help him? I mean the guy could put the binaries side-by-side, and there was no question, it was his code.
Your code is as protected as the lawyer you can afford...
Reply: Does it really matter??? No ... (Score:2)
As for all the copyright and patent stuff, let them protect their shit, it is their rig
Re:Does it really matter??? (Score:2)
Your code is protected as long as you are willing to protect it. Too often people seem to think that they will NEED a lawyer to do a lawsuit. If this guy had just followed up in court, there was a darn good chance that he could have gotten a settlement.
The legal process may seem complicated but when it comes down to it, a judge will listen to a regular Joe just as intently as he would a lawyer. If you are too intimidated to go in front of a judge, you have already lost. Don't let some lame-ass attorney
Re:Does it really matter??? (Score:2)
If he's given up on trying to persue it himself he could throw the code under a GPL license and transfer the copyright to the Free Software Foundation. Let them persue it. The case should be a slam-dunk if you have a copy of the book that pre-dates VTune.
-
Re:Does it really matter??? (Score:2)
A programmer could easily hold up a copy of your book, and say "I learnt how to do this kind of thing from this book. It says to do it this way, so I did". This would be a fairly convincing argument.
I'm not saying that Intel were right to take the code, but rather that:
a) the lawyer was probably right not to take the case without an advance payment. I'd say there was at best a 30% chance of winni
Re:Does it really matter??? (Score:2)
Pointless. (Score:2, Interesting)
I like the idea behind it but I don't think it's the answer. It would be easier and more applicable to have a 3rd party database that held published coding rather than having to graph and mark my work everytime I released etc... this way I have it (1) in the public domain and (2) have a published reference f
Re: Pointless. (Score:3, Insightful)
> Does this not go against what open source is all about? It's open code given without the extremes of ownership like water/copy/trade-marks.
No, all GPL'd code is supposed to be copyrighted. The GPL just grants the user certain rights that are not normally available under copyright law.
Look at the headers in the source for some of the GPL'd programs on your system, or visit the FSF Web site and see what is recommended for those headers.
No, it can't (Score:5, Insightful)
Isn't the code itself a watermark? Sure, you can change things here and there, but ultimately the similarities are going to be far to much to be pure coincidence.
The purpose of digital watermarking seems to be to identify unique instances of the thing being watermarked. So if I have a copy of Britney Spears' album, it's obviously copyrighted by her record company. With watermarking I can get more specific, and see that it was burned from a CD which was sold to Bob Jones. With the GPL this isn't useful. Sure, the code might have been derived from a copy sold to Bob Jones, but he may have legally made a million copies and distributed them around the globe before the GPL was violated, by someone else. You can't control the watermarks, because you can't control the distribution.
Re:No, it can't (Score:2)
A lot of code is very generic. Much code could easily be reimplemented as a new implementation that produces almost identical object code just by starting with the same specification and compiling with the same compiler.
How then do you tell if the code was copied?
(Actually, the answer's easier than watermarking - use a non-obvi
RTFA (Score:2, Informative)
Is this a big problem? (Score:3, Interesting)
If it's happening all the time and this is a method slow progress of it, then I don't see a huge issue with it. But if it is a once in a while type of thing, then how could this have anything but a negative impact on GPL? The potential is there (reality could tell a different story) for people to shy away from it, worrying that they haven't quite got all their ducks in a row. If it's easy to automatically scan their code and say they're in violation, well then what? I guess what I'm trying to say is that it could be mishandled, thus treating the users of GPL code like they're potentially thieves. It strikes me that one of the compelling factors of GPL is their reliance on the honor system. Whatever you do, don't play games that can damage that bright point of GPL.
Maybe I'm looking at this the wrong way. I suppose it could be used to defend against an accusation not unlike what SCO has claimed. "You copied our code!" "No, we used GPL'd code, see?" In that case, my previous comment about disrupting GPL's trust might not be as likely. "Well, we're just doing it so that this sort of thing doesn't happen again." I can see people nodding their head in agreement in that case.
In short, it's one thing to do it if your aim is to defend yourself from SCO'esque accusations, it's another to use it to look for victims to sue. Whatever is implemented, be very careful about damaging GPL's image to the community that values it.
Re:Is this a big problem? (Score:2)
Sorta like if MS could prevent piracy then more people would turn to the free stuff.
Not possible with open source (Score:3, Interesting)
Read the presentation. Although complete sentences aren't exactly present, there seems to be the indication that access to the source can provide an attack on the watermarking scheme: well, duh, if it's open source just modify the source to eliminate the watermark.
But what's the likelihood a lazy company/individual will actually do this before violating the GPL? Probably slim, but more of the world seems to be going GPL anyway; and if the whole world did GPL, why would you need watermarks?
Point is: if the monopolies of the world insist on using GPL code without releaing the source, they'll expend the effort to remove the watermark.
Re: (Score:2)
using watermarks for tracking (Score:2)
personally, as the lead developer of a large and significant (though niche) libre software project, my interest in watermarking is not to prevent illegal copying but merely to trace copying. i have thought recently about embedding serial numbers in executables. nothing would check them, providing little incentive for hackers to remove them, but they would allow me to learn who redistributed the program and on what scale. perhaps.
The question is SHOULD it be done? (Score:2, Insightful)
Code comes naturally watermarked, just use what is (Score:2)
The nice thing about this approach is you can wait until you suspect someone of stealing before you even bother thinking about the issue.
Oh, and in response to someone who asked if GPL v
Coffee Marking (Score:2, Funny)
Heck, I've lost two keyboards to spilled coffee so far this year...
Seems pointless... (Score:3)
Not worth the trouble. (Score:2)
The reason is that once the nature of such a watermak is knowen, all currently published schemes can be easily removed. Proving publicly that one piece of code was stolen is enough for that.
In addition, depending on the language and compiler used, finding a watermark can be extremely difficult. Just think of different levels of optimization, different compiler verions and different libraries used. The often propos
How does this help GPL? (Score:5, Informative)
What is the essence of this watermarking technique?:
- For embedding copyright information into individual
- It modifies compiled Java bytecode, shuffling eight bytecode operators in targeted "dummy" class methods. The shuffling is able to encode only three bits per operation, so watermarks need to be short or dummy methods need to be large.
- It relies on the watermarked dummy method(s) appearing in stolen (decompiled/recompiled)
What are its downfalls?:
- The technique is specific to Java. Forget about using it for other languages which output platform-specific machine code binaries, although it might be possible to modify it for use in
- If an intelligent thief (or smart optimizing compiler) is able to detect the always-false condition used to shield the dummy method(s) the watermark(s) will be removed.
- The larger your watermark, the larger you need to make your dummy method(s), or you need to embed more of them. The larger you make your dummy methods, the more obvious it will be that there's something strange about them.
- Optimizing compilers could still destroy the modified operators used to form the watermarks.
The paper also claims it protected more
How does this technique benefit GPL? I'm not sure that it would. Even if the above problems were fixed:
- To submit "source code" for your protected
- It's really designed to embed personal/corporate copyrights into code, protecting the IP of the submitter not the GPL community. I suppose the GPL community could design a community-wide watermark policy, but then that would become public knowledge and so thieves would be aware of its existence and be inclined to search harder to remove it.
You missed the point of Free Software (Score:4, Insightful)
You missed the point of Free Software. Ignoring some of the antics of zealous fringe, the idea of "Free Software" isn't to be a separate-but-equal analogue to proprietary software. The point of Free Software is freedom, not surveillance. Too many advocates for Free Software say their contributions are free, but act as proprietary masters with their obsession over owning, controlling and regulating the software.
It saddens me to see people advocating watermarking Free Software. Next they'll want a "FSSA" analogue to the BSA and their brownshirts.
ubiquitous GPL code == BAD? (Score:4, Interesting)
Ok, assume a corporation CAN sucessfully steal GPL code, with or without watermark. Let's say M$ paints an IE browser look on top of the mozilla firebird codebase:
So aside from ethical issues, why should the GPL community really care?
Re:ubiquitous GPL code == BAD? (Score:2)
Absolutely. The GPL community has to get its story straight.
If the Open Source way of producing software is self-evidently going to produce better results, as we are repeatedly told, it doesn't actually matter much if a company nicks a bit and hides it in a proprietary product, as, by doing this, according to OS logic, they will immediately start to fall behind the OS fork of the code.
If that's what we believe, who cares about copyleft violation? If it isn't what we believe, can we please change our propa
We need a copyright registry. FSF? (Score:2)
It would be great if the Free Software Foundation would create a copyright registry. Anyone would be able to upload any file and get back an MD5 sum and a digital time stamp.
The U.S. Copyright Office copyright registry is too expensive! It costs at least $20, it is necessary to fill forms, mail by snail mail, it takes weeks to get acknowledgement, and it is not private!
I suggest that the cost be $1. Pay a minimum of $10 by credit card, and have credit for 10 uploads of 20 megabytes or less.
With
Re:WTF?! (Score:2)
Re:WTF?! (Score:2)
Re:WTF?! (Score:2)
Either you are an idiot, or you just were in such a rush to make the 3rd post you forgot to read the article or even think about the concept of watermarking.
It would be trivial to put a watermark in sourcecode, you just have to develop the program so that the watermark was essential to the programs operation.
Re:WTF?! (Score:2)
Unless you are talking about obfuscating the code so that it is incomprehensible to most... in which case, why bother with open source at all?
Re:WTF?! (Score:2)
even if you did have the code as with gpl software, it wouldn't be totally trivial to remove the watermark, because it wouldn't be immediately obvious which code contributed to the watermarking.
Re:WTF?! (Score:2)
I had read the article, but the post to which I was responding made reference to "source code", which is what I addressed.
People who write code that can't be easily understood by
You can't? (Score:2)
they all require secrecy, until the time of need. (lawsuit)
simplest form would be to insert extra characters to the text based on a set formula.. i.e. after every 49th "A" insert a space.. and after every 273rd "e" insert a tilde
most people will take it for a typo..
yet if you can show the consistency, you might be able to defend it..
Re:Tottaly offtopic...somewhat. (Score:2)
Celebrity Obsession (Score:2)
For instance, if an expensive football player goes to the hairdresser, that is front-page news on all the tabloid newspapers.
I sympathise with your complaint that outsiders critisise the "wrong things" about the USA