Slashdot is powered by your submissions, so send in your scoop

 



Forgot your password?
typodupeerror
×
GNU is Not Unix Software

Can Watermarking Help Find GPL Violations? 265

bitkid writes "I recently run across techniques that can be used to watermark program code. While I yet have to see some source code for this to play with, the authors claim that the watermarks can be introduced into the source code and can be found in the compiled executable. My question for the slashdot-crowd is: Do you think free software (GPL or other viral licenses) should be watermarked? This could help to find GPL violations (think Everybuddy or Linksys) or can be used in court someday against the next SCO to prove authorship. What might be the ramifications of this?"
This discussion has been archived. No new comments can be posted.

Can Watermarking Help Find GPL Violations?

Comments Filter:
  • This would be useful to prove that code is under the GPL, but this could be simply gotten around by just looking at the code, then rewriting it yourself. But, of course this will take time and money, something the big business hate to spend.. But the technology is useful.
    • by floydigus ( 415917 ) on Sunday October 26, 2003 @04:42PM (#7314800)
      Absolutely right.

      Furthermore, you could automate the process by writing a script to do things like randomising white space, replacing variable names, and even rewriting simple flow control constructs.

      I would suggest that if it is deemed important to be able to establish the originator of the code, then the originator should publish it as theirs as soon as it is written, or at least give it to an independent witness for safekeeping.
      • You may be on to something... create an organization that accepts code and stores it with a datestamp forevermore. No need for random-access hard disk, just archive the material to tapes, CDs, or DVDs and properly maintain them. If your ownerhsip of the code is ever called into question, you can obtain 3rd party proof that the code was on that time period's record, proving you had the code as of that time.
        • Or you could just register the copyright and use the existing institution [copyright.gov] (or the equivalent in your jurisdiction) that has been doing that task [copyright.gov] since before computers were invented.

          Of course registering every cvs checkin is going to get expensive :)
          • Or you could just register the copyright and use the existing institution

            Excellent advice, but it doesn't work for those of us outside the US. Here in the UK (and AFAIK the rest of the EU) copyright resides with the creator, but there is no place of registration.

            The registration road is one that the US followed many years ago, [copyright.gov] and it provides an excellent degree of legal protection. But the history of copyright law in the UK has tended towards an "If you wrote it then it's yours, now can you prove

            • Hmm... strictly speaking, every copyrighted work should be deposited with the Briitish Library, Bodleian Library (Oxford), Cambridge University Library, National Library of Scotland, Library of Trinity College Dublin and the National Library of Wales.

              I believe that only applies to *printed* works.

            • But the history of copyright law in the UK has tended towards an "If you wrote it then it's yours, now can you prove it?" model, and proving it is a difficult thing to do nowadays if you publish an original work on the net.

              Here we have no national registry of claims of copyright as a matter of public record. Maybe the EU should move towards this model (although I deeply distrust any approach that would actually require registration).

              I disssagree. If you have created something, you usually have kept the
        • I'll go a step further: Some organizations (like say SCO) might not want to release their code to an outside party. And an outside party might not want to accept it and then be sued if there is ever any doubt as how some Linux code ended up in SCO Unix. But there is really no need for any concern here: The submitter can always (and in the case of non-open code should perhaps be required to) submit the code in an encrypted form with a well know encryption system. (Perhaps PGP). Then if there was ever any nee
        • Print it out, snail-mail it to yourself, with some marking on the outside that will allow you to identify it as project #42 revision 6.9 or whatever. Should the need arise, hand it to the judge un-opened.
      • by kasperd ( 592156 ) on Sunday October 26, 2003 @05:03PM (#7314901) Homepage Journal
        randomising white space, replacing variable names

        Those are stuff that cannot be seen in the resulting executable, the watermark is claimed to be found even in the resulting executable. (Yes I know in some cases variable names can be visible in the executable, but you can easilly prevent it from being there.) I somehow doubt this watermarking is at all possible. With optimizing compilers it is hard to find resemblance between source and executable. Finally knowing how the watermarks are made on the code, it is probably easy to write another but slightly similar algorithm that will remove the watermark.
        • I somehow doubt this watermarking is at all possible. With optimizing compilers it is hard to find resemblance between source and executable.

          If you read the PDF you will see that it is still detectable in the executable. That is because the watermark isn't actually in the executable directly, it is placed in the data structures that the executable creates/uses.

          I do agree with your other point though. If you know about the process then I can't imagine any way they can prevent you from writing a program ca
        • I somehow doubt this watermarking is at all possible.

          const char* watermark = "This is mine"

          This should work, but is trivial to remove if you have the source. Might be less trivial if you don't, but have decompiled something, which is what the linked article discuss.

          So while watermarking can work for closed source I don't think it can work for opensource software, if the copyright infringers have any clue at all. It's likely to be known that the ugly kludge in foo.c is actually a trick to get the string

          • This should work, but is trivial to remove if you have the source. Might be less trivial if you don't, but have decompiled something, which is what the linked article discuss.

            Trivial even when compiled, just have to care more about the lengths of strings.

            $ sed -e 's/Free Software Foundation/SCO Group. All rights reserved/' /usr/local/bin/bash > /usr/local/bin/scosh
            $ chmod +x /usr/local/bin/scosh
            $ scosh --version
            GNU bash, version 2.05b.0(1)-release (i386-portbld-freebsd5.1)
            Copyright (C) 2002 SCO

        • I agree completely. I think furhter there are far easyer and more reliable possibilities. E.g. creating a kind of checksum based on higer level informations, like data flow, control flow or program structure, like inheritance etc.
          Of course that can be changed also, even far enough to "obscure" the origin. But changing the inheritance graph of code is not that easy ...
          the problem with watermarking, IMHO, is that as soon as the schema is known it can be "removed" or "destroyed".
          angel'o'sphere
      • Furthermore, you could automate the process by writing a script to do things like randomising white space, replacing variable names, and even rewriting simple flow control constructs.
        Now, I'd like to know by which compiling process randomized white space, and "different" variable names will show up in object code...
    • by 0x0d0a ( 568518 ) on Sunday October 26, 2003 @05:11PM (#7314936) Journal
      Look at the techniques. This stuff is designed for use on binary-only software (with the sole exception of the comment embedding, which is easy to strip, and the embedded strings, which are easy to remove/modify).

      The approaches they're talking about are done at the compilation phase or post-compilation on Java bytecode.

      It's *extremely* difficult to produce good, reliable watermarks, because different compilers will build software differently, as will different optimization options.

      I'd essentially say that source-based watermarks are a lost cause (at least with C, and with the current constraints of readability and simplicity on code).

      A much better approach would be a project that does fuzzy comparisons on binaries, and is somewhat aware of ELF. Basically, you'd have a program that would have a set of known GPL code (a compiled Linux system would work well) and compare it to a set of compiled code.

      This is still not perfect if the person is malicious and just tries using a different compiler. This has happened before with xvid and use of icc. However, there aren't *too* many compilers out there.

      Hmm...this is an interesting problem.

      A more interesting approach that just occurs to me now -- in general, the proportions of compiled code should be roughly the same, independent of compiler -- adding padding, etc. Generate a call graph of the function tree in a set of GPL code. Then your checker would do fuzzy matching on chunks of that call graph against the suspicious code. It'd take a bit of massaging. It'd also still need some manual looking at the target once identified. However, this should be able to run in a pretty automated manner (even if it takes a long time to run) and could potentially turn up some interesting goodies. It'd certainly discourage commercial folks from ripping off GPL-using authors and companies.

      Try taking a Windows system with a lot of installed (non-GPL) software and a Linux system with a lot of (GPL) installed software. Start a comparison running. See what turns up.
      • by sICE ( 92132 ) on Sunday October 26, 2003 @05:39PM (#7315057) Homepage
        It's not that fuzzy - i mean you seem to look like you know what all this stuff is about, and no offense is intended here - but, sadly, you underestimate the power of modern cracking and reverse engineering tools you have at your disposal.

        Even with compiler optimizations and processor specific instructions AND EVEN different compilers, you can actually find and detect "similar HLL code" (there's a tool called DATING that can do that - contact me for a copy, it's hard to find - and which the name is a pun to the IDA FLIRT abilities). I dont know for different cpu, but i guess it would be ressources hungry, and i dont know of a tool that can catch those for now. Try anyway to have a look at VMWARE binaries - win32/linux - with it, you'd probably be surprised.

        blah, dunno what i wanted to say next it's late here... ~<:(
        • Even with compiler optimizations and processor specific instructions AND EVEN different compilers, you can actually find and detect "similar HLL code" (there's a tool called DATING that can do that - contact me for a copy, it's hard to find - and which the name is a pun to the IDA FLIRT abilities).

          [shrug] If you post a link to a source archive, I expect more than a few people would be interested. I think it would have to go well beyond FLIRT to be useful for this problem, though.

          I just took a look at FL
        • (there's a tool called DATING that can do that - contact me for a copy, it's hard to find - and which the name is a pun to the IDA FLIRT abilities).

          And also an excellent name if you want people to NEVER be able to find it using Google.
      • While your idea is noble, it's also more of an academic excercise... and even if it *does* become legal, I would abhore it.

        Why? How do you define false positives?

        • How do you define false positives?

          A tool like this couldn't distinguish between positives and non-positives alone. But it could isolate code to be reviewed by authors -- if it could email the GNOME folks and say "80KB of very similar looking code is in Adobe Photoshop", it'd let people start poking at it. Given a bit of poking at disassembly, it's not that hard to see whether the code's been swiped.

          And I'm not sure what you'd mean by "if it *does* become legal"...surely there'd be no legal problems wit
          • And I'm not sure what you'd mean by "if it *does* become legal"...surely there'd be no legal problems with running something like this.

            Hard to be certain. In order to operate, it would have to create a temporary copy of the program being analysed in memory. This may or may not constitute a copyright infringement, depending on your particular jurisdiction and how the judge / jury / evil fascist dictator (delete as applicable) is feeling on that particular day.

            Many countries have specific prov
    • Right. And how about ol'fashioned TRUSTING PEOPLE on this issue? If somebody misuses code, they will be found out sooner or later. Anyhow, it's not like you lose anything on it. Later they'll be all the wiser for being permitted to make a mistake.

      Are we no better than the big conglomerations where we can't trust anyone and are filled with fear and dread of all the abuse that _MIGHT_ happen?!?

      I just ask. It's up to you to answer..


    • Erm... isn't the history of digital watermarking pretty dismal so far? I mean, the RIAA tried to watermark music but Ed Felten and gang pretty much demonstrated the futility of that (and pretty quickly, too).

      Are software binaries really so different that watermarking would work for it?

    • but this could be simply gotten around by just looking at the code, then rewriting it

      This is not getting around, this is the legal way of doing it without violating the GPL. Reusing the code directly under non-GPL licenses is forbidden, but writing code that implements the same algorithm is not. Unless, of course, you have a software patent on that...
    • If you looked at the code, and re-wrote it yourself, it wouldn't be a GPL violation.

      The only way it would be a violation is if you could prove it was a derrivative work, and for that there'd have to be at least some line of code the same... having *functionally equivalent* lines of code != derrivitive work. If that was the case then Encyclopedias would have ben sueing eachother since the beginning of time for publishing "functionally equivalent" information.
  • by egg troll ( 515396 ) on Sunday October 26, 2003 @04:34PM (#7314753) Homepage Journal
    I would be very careful with using something like this. Its nice to think that one could use watermarking for protecting GPL'ed code. However, should the technique prove successful, expect to see everything under the sun watermarked by less benevolent entities.
    • Could you please elaborate on the "less benevolent entities", perhaps giving some examples of what bad situations might arise? I just haven't get it yet.
      Thanks!
      • by Directrix1 ( 157787 ) on Sunday October 26, 2003 @04:56PM (#7314866)
        It doesn't matter what you do to code as far as watermarking goes. If the watermarking method is publicly known than it can be easily changed anyways too look like it was watermarked by someone else. For instance you could watermark your code by having variable length whitespace before your comments or something. But that could easily be changed.
      • non-free or open source (ie commercial) entities.

        Actually, there would be nothing wrong with this. If it's good for GPL'd software, it's good for all software to protect IP.

        The real trick is what if some non-GPL code is watermarked, but in fact watermarked by the real author (in other words they "borrowed" the code and then watermarked it)?

    • So? You already shouldn't be using code owned by the said 'less benevolent entities.' Your arguement boils down to "It makes it harder to steal!" I'm all for the protection of software copyrights by both technical and legal means. Now, if it is a question of someone illegally claiming ownership via watermark, the true owner should have plenty of evidence on hand to prove the rogue watermarker has no legal ground to claim ownership-- just as if I took all of the linux source files, put a comment at the top t
    • "expect to see everything under the sun watermarked by less benevolent entities"

      Because only open source developers should be able to protect there IP.

  • I think not (Score:4, Insightful)

    by Espectr0 ( 577637 ) on Sunday October 26, 2003 @04:35PM (#7314759) Journal
    GPL appears to common sense still found in people, and simply decency.

    If the trademark stuff gets too hectic, then maybe this will be needed, but for now i dont think it's needed
  • by caluml ( 551744 ) <slashdot&spamgoeshere,calum,org> on Sunday October 26, 2003 @04:36PM (#7314767) Homepage
    What might be the ramifications of this?"

    It might cause the sky to fall down on our heads, or the atmosphere to evaporate, killing us all with solar radiation.

  • Watermark? (Score:2, Insightful)

    we are talking about a bunch of 1s and 0s here. If it can be watermarked, it can be unwatermarked. A simple script will be able to rearrange stuff to disrupt the watermark without affecting the execution of the program.
    • by Doomrat ( 615771 ) on Sunday October 26, 2003 @04:40PM (#7314792) Homepage

      we are talking about a bunch of 1s and 0s here. If it can be watermarked, it can be unwatermarked. A simple script will be able to rearrange stuff to disrupt the watermark without affecting the execution of the program.

      Yes, a bit like how it's easy to reconstruct a burned down house from its ashes.

      • Well I think burning is lossy.

        On the other hand, digital watermarking is hardly http://lzip.sourceforge.net territory!

        Watermarking can't be lossy, since it can't affect operation. Any non-lossy translation, however clever, can be detranslated?

        Haven't digital watermarks and obfusication etc got a real bad press recently? There was that DMCA case with the professor not allowed to say how he cracked watermarks, iirc.
    • Re:Watermark? (Score:5, Informative)

      by Naerbnic ( 123002 ) on Sunday October 26, 2003 @04:46PM (#7314821)
      Perhaps this is true for static data (as in a bunch of source code), you can insert a watermark into code, which will create a dynamic watermark (i.e. something that depends on the runtime operation of the program). To make a long story short, you cannot easily remove it by rearranging binary code, and it's difficult (i.e. NP-complete for those in the know) to analyze the software to remove. Tack on the fact you can tamperproof the code (i.e. make the behavior of the program depend on the existence of the watermark), and you have a pretty difficult path to walk if you want to remove it.

      More info can be found in this [ucla.edu] paper, if you're into reading that sort of thing.
  • Just an extra step (Score:4, Interesting)

    by Moeses ( 19324 ) on Sunday October 26, 2003 @04:44PM (#7314810)
    I think this would only help the most blatent copying. If the watermark code is embedded in the datastructures of the source code either it would be fairly easy to remove or the software would be in such a state that it would be hard to maintain and evolve. The attempt to avoid piracy would have a negative long term effect on the project.

    I can still see this being useful if blatent copying of the software is the biggest problem the project faces, however I'm having trouble envisioning a scenerio where that's the case.

  • by gripdamage ( 529664 ) on Sunday October 26, 2003 @04:44PM (#7314812)
    The paper cited in the first link is from a professor I once had.

    On his website [arizona.edu] I found his full article, if you want some details about watermarking techniques. It's has a lot more meat than presentation slides.
  • as usual (Score:5, Insightful)

    by snarkh ( 118018 ) on Sunday October 26, 2003 @04:44PM (#7314813)
    The submitter did not bother to look at the atricle (or rather the presentation).

    The main idea is that you embed the watermark into the code and then obfuscate it. The resulting code is unreadable, otherwise watermark would be trivial to remove, which makes it absolutely useless as far as open source is concerned.

    • Re:as usual (Score:5, Informative)

      by dspeyer ( 531333 ) <dspeyer&wam,umd,edu> on Sunday October 26, 2003 @05:23PM (#7314982) Homepage Journal
      From the GPL [gnu.org] (section 3):
      You may copy and distribute the Program (or a work based on it, under Section 2) in object code or executable form under the terms of Sections 1 and 2 above provided that you also do one of the following:

      * a) Accompany it with the complete corresponding machine-readable source code,
      ...
      The source code for a work means the preferred form of the work for making modifications to it. (emphass added)

      So, unless you plan to do maintainance on obfuscated code, this is no good for GPL software. In fact, it's no good for Open Source [opensource.org] software of any kind.

      Admitadly, you could use unobfuscated code and refuse to reveal the watermark, but it's kind of tricky to keep things secret in the OSS world.

    • The resulting code is unreadable ... which makes it absolutely useless as far as open source is concerned.

      Um, surprising as it may sound, I have looked at some open source code, you know, and some bits of it could reasonably be described as, you know, just a tad "unreadable". So there's nothing to be lost here.
  • Re: (Score:2, Insightful)

    Comment removed based on user account deletion

    • Furthermore, they are not talking about techniques that you could use if the "attacker" had access to the source code. (See the full paper, linked to in a comment above.)

      This would work about as well for open source software as adding easter eggs (which they also discuss). From my perspective, this is a fine paper but easter eggs are still a lot more fun to write.

      -- MarkusQ

  • by Anonymous Coward on Sunday October 26, 2003 @04:48PM (#7314830)
    Caveat - I haven't read the paper but from the description is looks like you apply your watermark to the class files after compilation.

    So,
    1) only protects binaries not source ... therefore not applicable in its current form to source code which would be required for any usefullness to GPL.

    2) its for Java which is easier due to the cannonical form (bytecodes) that can be manipulated by the watermarking tool. You could probably do this to protect GPL binaries but with less portability

    IMHO opinion, not usefull for source but sure if you're worried that some of your precompiled binaries are being ripped, then maybe.

    For source, you need to detect common code patterns and use source tools that have been discussed elsewhere on /.

  • by Pedrito ( 94783 ) on Sunday October 26, 2003 @04:56PM (#7314864)
    I wrote a book ages ago about Windows File Formats. Included in the book was some code which was written by a third party. I obtained permission from the code's author to put it in the book, but it was very clearly copyrighted by the author of the code, both in the code, and in the book.

    So Intel is working on a product and they just swipe up the code out of the book, never ask for permission or anything, and use it in a commercial product (VTune). The author of the code, of course, was furious. He approached Intel. They blew him off. He had reverse engineered their code. He could produce an exact replica of the binary with his own code using the MS C compiler.

    He never got anything out of Intel. I suppose he could have hired attorneys, but he wasn't a wealthy guy. He couldn't find attorneys to take it without cash up front. So my question is: How do watermarks help him? I mean the guy could put the binaries side-by-side, and there was no question, it was his code.

    Your code is as protected as the lawyer you can afford...
    • Watermarking will just be another way of clouding an issue. With GPL-IP you need a protective shield, that is accepted by the UN (under international treaties) and major governments that are willing to pound and fine pirates and thieves of GPL-IP. The UN needs to come together to provide for the common defense of GPL-IP, the Public property of humanity (the Genetics, Genome, Evolution, ...), ... a few other major items.

      As for all the copyright and patent stuff, let them protect their shit, it is their rig
    • Your code is protected as long as you are willing to protect it. Too often people seem to think that they will NEED a lawyer to do a lawsuit. If this guy had just followed up in court, there was a darn good chance that he could have gotten a settlement.

      The legal process may seem complicated but when it comes down to it, a judge will listen to a regular Joe just as intently as he would a lawyer. If you are too intimidated to go in front of a judge, you have already lost. Don't let some lame-ass attorney

    • Just a random thought...

      If he's given up on trying to persue it himself he could throw the code under a GPL license and transfer the copyright to the Free Software Foundation. Let them persue it. The case should be a slam-dunk if you have a copy of the book that pre-dates VTune.

      -
    • The case would have been clouded by the publication of the code in an instructive book.

      A programmer could easily hold up a copy of your book, and say "I learnt how to do this kind of thing from this book. It says to do it this way, so I did". This would be a fairly convincing argument.

      I'm not saying that Intel were right to take the code, but rather that:

      a) the lawyer was probably right not to take the case without an advance payment. I'd say there was at best a 30% chance of winni
  • Pointless. (Score:2, Interesting)

    by Clinoti ( 696723 )
    Does this not go against what open source is all about? It's open code given without the extremes of ownership like water/copy/trade-marks. Where does this apply with CVS and open project developments?

    I like the idea behind it but I don't think it's the answer. It would be easier and more applicable to have a 3rd party database that held published coding rather than having to graph and mark my work everytime I released etc... this way I have it (1) in the public domain and (2) have a published reference f

    • Re: Pointless. (Score:3, Insightful)

      by Black Parrot ( 19622 )


      > Does this not go against what open source is all about? It's open code given without the extremes of ownership like water/copy/trade-marks.

      No, all GPL'd code is supposed to be copyrighted. The GPL just grants the user certain rights that are not normally available under copyright law.

      Look at the headers in the source for some of the GPL'd programs on your system, or visit the FSF Web site and see what is recommended for those headers.

  • No, it can't (Score:5, Insightful)

    by anthony_dipierro ( 543308 ) on Sunday October 26, 2003 @05:00PM (#7314887) Journal

    Isn't the code itself a watermark? Sure, you can change things here and there, but ultimately the similarities are going to be far to much to be pure coincidence.

    The purpose of digital watermarking seems to be to identify unique instances of the thing being watermarked. So if I have a copy of Britney Spears' album, it's obviously copyrighted by her record company. With watermarking I can get more specific, and see that it was burned from a CD which was sold to Bob Jones. With the GPL this isn't useful. Sure, the code might have been derived from a copy sold to Bob Jones, but he may have legally made a million copies and distributed them around the globe before the GPL was violated, by someone else. You can't control the watermarks, because you can't control the distribution.

    • Isn't the code itself a watermark? Sure, you can change things here and there, but ultimately the similarities are going to be far to much to be pure coincidence.

      A lot of code is very generic. Much code could easily be reimplemented as a new implementation that produces almost identical object code just by starting with the same specification and compiling with the same compiler.

      How then do you tell if the code was copied?

      (Actually, the answer's easier than watermarking - use a non-obvi
  • RTFA (Score:2, Informative)

    OK, i'll make it easy on all of you. Hear is the article:

    A Practical Method for Watermarking Java Programs

    Akito Monden, Hajimu Iida, Ken-ichi Matsumoto, Koji Torii, Nara Institute of Science and Technology
    Katsuro Inoue , Osaka University

    Java programs distributed through Internet are now suffering from program theft. It is because Java programs can be easily decomposed into reusable class files and even decompiled into source code by program users. In this paper, we propose a practical method that disc

  • by NanoGator ( 522640 ) on Sunday October 26, 2003 @05:16PM (#7314960) Homepage Journal
    Pardon my naievity. I just wanted to ask, are GPL violations a big problem?

    If it's happening all the time and this is a method slow progress of it, then I don't see a huge issue with it. But if it is a once in a while type of thing, then how could this have anything but a negative impact on GPL? The potential is there (reality could tell a different story) for people to shy away from it, worrying that they haven't quite got all their ducks in a row. If it's easy to automatically scan their code and say they're in violation, well then what? I guess what I'm trying to say is that it could be mishandled, thus treating the users of GPL code like they're potentially thieves. It strikes me that one of the compelling factors of GPL is their reliance on the honor system. Whatever you do, don't play games that can damage that bright point of GPL.

    Maybe I'm looking at this the wrong way. I suppose it could be used to defend against an accusation not unlike what SCO has claimed. "You copied our code!" "No, we used GPL'd code, see?" In that case, my previous comment about disrupting GPL's trust might not be as likely. "Well, we're just doing it so that this sort of thing doesn't happen again." I can see people nodding their head in agreement in that case.

    In short, it's one thing to do it if your aim is to defend yourself from SCO'esque accusations, it's another to use it to look for victims to sue. Whatever is implemented, be very careful about damaging GPL's image to the community that values it.
    • Ah yes, if the GPL were perfectly enforced, then business would stop using the GPL and move to free'r licenses, I believe.
      Sorta like if MS could prevent piracy then more people would turn to the free stuff.
  • by HoleNdaBitBucket ( 667995 ) on Sunday October 26, 2003 @05:17PM (#7314966)

    Read the presentation. Although complete sentences aren't exactly present, there seems to be the indication that access to the source can provide an attack on the watermarking scheme: well, duh, if it's open source just modify the source to eliminate the watermark.

    But what's the likelihood a lazy company/individual will actually do this before violating the GPL? Probably slim, but more of the world seems to be going GPL anyway; and if the whole world did GPL, why would you need watermarks?

    Point is: if the monopolies of the world insist on using GPL code without releaing the source, they'll expend the effort to remove the watermark.

  • Comment removed based on user account deletion
  • personally, as the lead developer of a large and significant (though niche) libre software project, my interest in watermarking is not to prevent illegal copying but merely to trace copying. i have thought recently about embedding serial numbers in executables. nothing would check them, providing little incentive for hackers to remove them, but they would allow me to learn who redistributed the program and on what scale. perhaps.

  • ESR and others argue that GPL is "free" as in "free speech." Well, in the United States, we enjoy a lot of this "freedom" (at least until the RWEs are through with us). Much of this has to do with the fact that we go to great lengths to NOT encumber ourselves with systems designed "to get the bad guys." Rather, we depend on a system of mutual responsibility and respect for the law. It's only when an infraction occurs should we seriously consider using effort to detect such fraud. Americans need to be l
  • In any piece of code there are certain patterns to it. Look for them. Particularly data structures which the code's effectiveness is tightly linked to. Most thieves are lazy, so they will leave some of the code unchanged. Very few persons are both willing to steal code and willing to take the time to fully obfuscate it.

    The nice thing about this approach is you can wait until you suspect someone of stealing before you even bother thinking about the issue.

    Oh, and in response to someone who asked if GPL v
  • I don't know about water marking of code, but coffee marking is common.

    Heck, I've lost two keyboards to spilled coffee so far this year...

  • by Junior J. Junior III ( 192702 ) on Sunday October 26, 2003 @07:58PM (#7315690) Homepage
    If you have access to the source, you could probably find a way to remove the watermarks, unless they are somehow tightly worked into the executable code itself. And, if they're tightly worked into the executable code itself, then this has to mean that the code will not be as efficient, and that there'd be some kind of performance cost to watermarking that does not benefit the end user at runtime.
  • The problem is that it would need wide deployment but could be used only once of a few times.

    The reason is that once the nature of such a watermak is knowen, all currently published schemes can be easily removed. Proving publicly that one piece of code was stolen is enough for that.

    In addition, depending on the language and compiler used, finding a watermark can be extremely difficult. Just think of different levels of optimization, different compiler verions and different libraries used. The often propos
  • by scdeimos ( 632778 ) on Sunday October 26, 2003 @08:55PM (#7315930)
    Having read the .PDF paper and then skimmed the /. comments it would seem few people have taken the time to actually read (or understand) the paper before commenting on it. Hats-off to those who have.

    What is the essence of this watermarking technique?:
    - For embedding copyright information into individual .class files, as opposed to signing .cab's for whole Java apps/applets.
    - It modifies compiled Java bytecode, shuffling eight bytecode operators in targeted "dummy" class methods. The shuffling is able to encode only three bits per operation, so watermarks need to be short or dummy methods need to be large.
    - It relies on the watermarked dummy method(s) appearing in stolen (decompiled/recompiled) .class, which is achieved by pretending to call the dummy method(s) from other methods using always-false logic constructs.

    What are its downfalls?:
    - The technique is specific to Java. Forget about using it for other languages which output platform-specific machine code binaries, although it might be possible to modify it for use in .NET and other bytecode environments.
    - If an intelligent thief (or smart optimizing compiler) is able to detect the always-false condition used to shield the dummy method(s) the watermark(s) will be removed.
    - The larger your watermark, the larger you need to make your dummy method(s), or you need to embed more of them. The larger you make your dummy methods, the more obvious it will be that there's something strange about them.
    - Optimizing compilers could still destroy the modified operators used to form the watermarks.

    The paper also claims it protected more .class files from decompile/recompile attacks than *I* feel it should have: five of the ten .class files crashed their test decompiler (Mocha), thereby "protecting" their watermarks. If someone is keen to re-source your .class file, particularly if there's money to be made, I'm fairly certain they'd try another decompiler instead of giving-up on just one crash. I suspect that these five .class files could be decompiled by another utility, so the question of their watermark protection remains unanswered. Potentially this could cause up to 18 (instead of 3) of their 23 watermarks actually being defeated. This is entirely feasible, since only 3 of the 8 watermarks fully tested survived (the other 15 being in the five .class files which crashed Mocha).

    How does this technique benefit GPL? I'm not sure that it would. Even if the above problems were fixed:
    - To submit "source code" for your protected .class, you'd have to compile it, watermark it, decompile it and then post the decompiled version. Not very pretty and what about comments? I suppose you could have a Perl script reinsert comments from the original source, or copy-and-paste the watermarked dummy methods back in.
    - It's really designed to embed personal/corporate copyrights into code, protecting the IP of the submitter not the GPL community. I suppose the GPL community could design a community-wide watermark policy, but then that would become public knowledge and so thieves would be aware of its existence and be inclined to search harder to remove it.
  • by Brandybuck ( 704397 ) on Sunday October 26, 2003 @09:24PM (#7316090) Homepage Journal
    Do you think free software (GPL or other viral licenses) should be watermarked? This could help to find GPL violations (think Everybuddy or Linksys)

    You missed the point of Free Software. Ignoring some of the antics of zealous fringe, the idea of "Free Software" isn't to be a separate-but-equal analogue to proprietary software. The point of Free Software is freedom, not surveillance. Too many advocates for Free Software say their contributions are free, but act as proprietary masters with their obsession over owning, controlling and regulating the software.

    It saddens me to see people advocating watermarking Free Software. Next they'll want a "FSSA" analogue to the BSA and their brownshirts.
  • by natron8080 ( 719257 ) on Monday October 27, 2003 @12:12AM (#7316747)

    Ok, assume a corporation CAN sucessfully steal GPL code, with or without watermark. Let's say M$ paints an IE browser look on top of the mozilla firebird codebase:

    1. Is it a bad thing that their software just got better, faster, and more standards compliant?
    2. Doesn't this even out the playing field, as far as proprietary technology goes? Everyone starts at 0.
    3. The mozilla developers would have real speed/memory/feature competition from M$, as opposed to the "we'll never touch IE code again" stance of M$.
    4. More company coders would be familiar with and able to develop on open source projects in their spare time (or convert even!).
    5. GPL projects aren't really in competition with corporate firms. GPL software doesn't lose profit margins if there's better software out there.

    So aside from ethical issues, why should the GPL community really care?

    • Absolutely. The GPL community has to get its story straight.

      If the Open Source way of producing software is self-evidently going to produce better results, as we are repeatedly told, it doesn't actually matter much if a company nicks a bit and hides it in a proprietary product, as, by doing this, according to OS logic, they will immediately start to fall behind the OS fork of the code.

      If that's what we believe, who cares about copyleft violation? If it isn't what we believe, can we please change our propa


  • It would be great if the Free Software Foundation would create a copyright registry. Anyone would be able to upload any file and get back an MD5 sum and a digital time stamp.

    The U.S. Copyright Office copyright registry is too expensive! It costs at least $20, it is necessary to fill forms, mail by snail mail, it takes weeks to get acknowledgement, and it is not private!

    I suggest that the cost be $1. Pay a minimum of $10 by credit card, and have credit for 10 uploads of 20 megabytes or less.

    With

Dynamically binding, you realize the magic. Statically binding, you see only the hierarchy.

Working...