Follow Slashdot blog updates by subscribing to our blog RSS feed

 



Forgot your password?
typodupeerror
×
Data Storage Security

Delta Compression for Linux Security Patches? 289

cperciva asks: "For people without fast internet connections, it is often impractical to download large security patches. In order to avoid to reduce patch sizes, some operating systems -- starting with FreeBSD over a year ago, and recently followed by Mac OS X and Windows XP SP2 -- have started to use delta compression (also known as binary diffs, which constitutes a portion of my doctoral thesis), and can often reduce patch sizes by over a factor of 50. In light of the obvious benefits, I have to ask: When will Linux vendors follow suit?"
This discussion has been archived. No new comments can be posted.

Delta Compression for Linux Security Patches?

Comments Filter:
  • by drinkypoo ( 153816 ) <drink@hyperlogos.org> on Friday September 03, 2004 @09:50PM (#10154973) Homepage Journal

    Certainly for your primary commercial auto-updated Linux distributions it does, but for anything else it usually doesn't. What makes more sense (because it's easier) is breaking up media and programs, and distributing them separately so you don't have to update one when you update the other. Some projects do this already, and even package their sources this way.

    Personally I'd prefer to see binary distributions move to a model of using something like cvs, so you can just do a cvs up (or equivalent) and update everything. Some files would have to be marked to always be overwritten, while config files would be merged. This solves both your differential update problem (if the right system is used - I'm thinking that's pretty much not CVS but I don't know if there's a way to make it do all of that - CVS doesn't handle binaries amazingly intelligently from what I understand) and your updates in general. Plus, you can use it both for source and binary updates.

    • I disagree. I've used smartversion [smartversion.com] on Windows for a couple years now for making versioned archives of important files, and I wish Linux had something comparable. It's liked having a portable single tar.gz of an entire cvs repository without all the headaches...
    • by GweeDo ( 127172 ) on Friday September 03, 2004 @10:24PM (#10155155) Homepage
      What you are requesting can already be done basically in Gentoo (emerge -Uupv world), Debian (apt-get something or other) and Redhat/Fedora (up2date something or other). So why do we need something else again :) Oh, with Gentoo...add ccache to that for faster compiles too
      • I use gentoo. I never have found any ccache settings that make much of a difference. None of these systems do a binary differential update, they download a whole package and install it, or in the case of gentoo, download a source package and compile it. Neither of these approaches are what is being called for in this article, nor what I suggest above.

        Mind you, I'm fine with the way gentoo does things, but I have a fairly powerful system - not incredibly fast by modern standards but faster than anything I've run linux on before, or probably any Unix at all for that matter. For a dialup user on an older computer, atomically differential updates would make a really big difference.

        • Binary updates are not a good fit for Gentoo! Not only because most people don't use the binary packages, but because in order to generate the diff, the server must know the exact contents of the file on your system, as well as the exact contents of the updated file. The number of different binary patches would be exponential in the number of compile switches, compiler versions, USE flags, and so on - for both "old" and "new" file versions, so square it again!

          I guess if you reall wanted to be clever, yo

          • Binary updates are not a good fit for Gentoo! Not only because most people don't use the binary packages, but because in order to generate the diff, the server must know the exact contents of the file on your system, as well as the exact contents of the updated file.""

            No, those "binary" diffs for Gentoo would be done against the sources used for the previous version of the gentoo "package", which would then be used to download the diff so that the gentoo computer could then construct new sources to build a

      • by morcego ( 260031 ) on Friday September 03, 2004 @10:48PM (#10155284)
        I'm not sure about Gentoo, but I'm positive that is not what happens for Debian, RedHat, Fedora, Mandrake, SuSe, Conectiva etc.

        On those systems, when you do an upgrade (apt-get update), you will get a fresh package, including not only the files that changes, but all the files for that package. And if we have a package with 1 binary and 50 images, and only the binary changed, we get to download all the images again.

        Some distributions have been implementing package fragmentantion for this (package-core and package-images for this example), and that is a good thing for these cases, although it is a nightmare to manage. Not as fine grained as proposed by the grandparent post, but good enough for most cases.
        • Some distributions have been implementing package fragmentantion for this (package-core and package-images for this example)

          Yep, but still, in almost all cases, you have to upgrade both packages when a new version of the game comes out. For Debian, this splitting helps in another way: Instead of having the images in the archive for every supported architecture (more than ten, these days), they're now in the archive only once for every architecture.

          Don't know about other distro's, but usually such an imag
      • Yeah... except it's better not to use Gentoo at all.

        I shouldn't say that. It's better not to use Gentoo if you like to have a system that works after your updates more than, oh, 9 times in 10.

        I used Gentoo for almost a year total (most recently in august of this year). Compile-time options change without notice, major and easy to spot bugs slip through on a regular basis, and when something like KDE gets updated, it can be a week before things work again. That is, when you can get YOUR config files update
    • License of BSDiff (Score:4, Interesting)

      by gnuman99 ( 746007 ) on Saturday September 04, 2004 @12:59AM (#10155848)
      Certainly for your primary commercial auto-updated Linux distributions it does, but for anything else it usually doesn't.

      Especially since the license of bsdiff is not even close to a BSD license (don't let the name of BSD Protection License fool you). Unless the license is changed to something like BSD, BSDiff is not going to be implemented anywhere except in closed source software. Debian cannot even package this software becauses it is non-free [debian.org].

      I guess the bottom line is if you want to have something accepted in open source *and* in propriatary software, you want to license under BSD. You want to cater to one group (closed source in this case), you will lose the other.

  • by Fermier de Pomme de ( 570654 ) on Friday September 03, 2004 @09:51PM (#10154981)
    ... their biggest customers start using dialup.
  • SP2? (Score:5, Funny)

    by keiferb ( 267153 ) on Friday September 03, 2004 @09:52PM (#10154983) Homepage
    You mean to tell me that beast I downloaded was just a diff? Jesus H. Christ!
    • Re:SP2? (Score:5, Informative)

      by cperciva ( 102828 ) on Friday September 03, 2004 @09:59PM (#10155021) Homepage
      Sorry, the writeup was a bit unclear. Windows XP SP2 contains a new version of Windows Installer (or whatever they're calling it today). This new version includes support for downloading updates via binary diffs, and most updates to XP after this point should be done that way.
      • Before even SP2 you get the BITS and HTTP 5.1 update the first time you login to Windows Update V5. The problem is it made the downloads faster, but the installs painfully slow.
    • Re:SP2? (Score:5, Funny)

      by dracvl ( 541254 ) on Friday September 03, 2004 @10:23PM (#10155149) Homepage
      You mean to tell me that beast I downloaded was just a diff? Jesus H. Christ!

      If you look at the URL...

      http://www.microsoft.com/windows2000/techinfo/plan ning/redir-binarydelta.asp

      ...you will clearly see that what you downloaded was Windows 2000, with a binary patch that turned it into Windows XP SP2.

  • by sPaKr ( 116314 ) on Friday September 03, 2004 @09:52PM (#10154984)
    As soon as binary diffs get hacked into RPM then it might happen. binary diffs of one rpm to another later version wont really work as binary diffs are only small when they are produced on uncompressed, unecrypted data. The real issue is that linux doesnt really need binary diffs. Linux distros already have fine grain packages ( lots of little packegs not a few bigs ones). Security updates usally just require a one or very few packegs to be updated. Binary diffs only really make sense when you have huge packages that require a whole new package for upgrade. I bet the average RPM is about the same size as the minium binary diff from MS.
    • Binary diffs only really make sense when you have huge packages that require a whole new package for upgrade

      Binary diffs make sense any time you've got large files being updated. On my system, libssl (library archive + shared object file + profiled library) is 600kB; that's large enough to justify using a 10kB binary diff instead.

      I bet the average RPM is about the same size as the minium binary diff from MS.

      I can't say anything about Microsoft's patches directly, but the patches used by FreeBSD Updat
      • 65 times smaller? So a patch that's normally 100k is now 1.5k?

        Maybe sometimes, but I don't see that happening on average.
        • 65 times smaller? So a patch that's normally 100k is now 1.5k?

          Maybe sometimes, but I don't see that happening on average.


          Look at the statistics yourself [daemonology.net]. The average patch compression ratio (ie, [size of new file] / [size of patch file]) for FreeBSD Update is 66.404 right now. (Ignore the "Speedup due to patching" line -- that includes files which were downloaded before delta compression support was added.)

          In fact, my current development code produces patches around 30% smaller than that, but I haven
    • The real issue is that linux doesnt really need binary diffs. Linux distros already have fine grain packages ( lots of little packegs not a few bigs ones). Security updates usally just require a one or very few packegs to be updated.

      I beg to differ. SuSE 9.1 came out only 5 months ago:

      $du -h /var/lib/YaST2/you/mnt/i386/update/9.1/rpm/

      417M /var/lib/YaST2/you/mnt/i386/update/9.1/rpm/i586
      14M /var/lib/YaST2/you/mnt/i386/update/9.1/rpm/noarch
      431M /var/lib/YaST2/you/mnt/i386/update/9.1/rpm/

      That's almos

    • Actually, Red Hat were using binary diffs a long time ago - see rhmask [maruhn.com]. Of course, when they switched from shipping some proprietary software (CDE, Red Baron, Metrolink's(?) X11) to only shipping 100% FOSS, rhmask fell into disuse.

      It probably wouldn't take much to take rhmask and update it to use xdelta [berkeley.edu] or something, though. Note what the xdelta manpage says about using it on compressed data, though:

      Gzip processing
      Attempting to compute a delta between compressed input files usually
      results in poor

    • I disagree. Binary diffs are needed now.

      I have followed the release cycle of Mandrake 10 very closely and the amount of updates is huge, which is fine, it means that bugs are being addressed. However, the updates can come at 100MB at a time, simply because they just have no way of doing real patches and thus redownload the whole of Openoffice or kdelibs for a small change.

      I love Mandrake to death, but this is something that needs to be addressed as soon as possible. This issue has been enough of a showsto
  • Mindvision (Score:5, Informative)

    by shawnce ( 146129 ) on Friday September 03, 2004 @09:53PM (#10154987) Homepage
    The folks at mindvision made an installer/installer creation tool that allowed one to scan two different sets of files and directories to find differences between them (binary differences) and it would just package up those differences in the installer archive. In fact you could use it to diff and package delta between several versions at once. When the user ran the installer (really and updater) it would apply the binary patch to the file set as needed.

    I was using this tool over 7 years ago now on Mac OS so I don't see what is so new about this concept... but I am glad is looks like it starting to be used more.
    • Does anyone remember "rescompare" for the older macs? Same idea, you'd give it two apps and turn it loose, and it would isolate the differences for you to browse, and had the option to gennerate a self-contained executable patch file that would turn one of the images into the other. VERY handy tool for generating updates, and overall a very smart app to be able to find inserted and removed blocks of code cleanly without biting more than needed to be chewed, so to speak.

      Version 2.6 was (c) 1989-1996, so
  • Linux makes it very easy to install new packages and upgrade packages from sources father away from the vendor. If a vendor tried to release a patch using delta versioning, it could totally wreck a system. Since neither RPM nor DPKG are designed to handle checking md5sum hashs against each file, and making sure the patch can be installed safely, it will have to wait until this feature is incomporaited into either system.
    • If you maintained a library of all the installation packages you've ever downloaded (assuming space is not an issue), a network installer could just look through your library, check hashes on matching filenames and download the smallest available new diff that can be used to build the desired new installation package. Then it doesn't even matter what's been installed or compiled, just what you've downloaded. Kind of like a cross between a cache and a progressive JPEG.
  • SUSE (Score:4, Interesting)

    by DreadSpoon ( 653424 ) on Friday September 03, 2004 @09:57PM (#10155012) Journal
    SUSE already does this.

    RPM in general, however, doesn't nicely support this feature. Either RPM needs to be extended/modified, or a new format needs to be made. While I favor a new format for many reasons other than this, modifying RPM is probably the best solution in order to provide backwards compatibility.
    • Any excuse is a good excuse for getting rid of RPM. First of all, RPM puts you in dependency hell. Second of all, there's no reason whatsoever that we can't just do away with RPM tomorrow. All you have to do is create an RPM of your new package manager and start distributing packages. Bonus points if you figure out a way to turn RPM repository information into repository information for your system, but it's not strictly necessary. Extra bonus points if you design all your applications to be installable any
      • Someone correct me if I'm wrong. But isn't the reason RPMs are so particular about dependancies because whoever does the packaging doesn't research whether their app will actually work with an _older_ version of a distribution. Then, if it did work, they could define a broader set of other packages it would work with in the spec file.

        Other than the RPMs needlessly not installing in older environments, applications like urpmi, yum, yast and redcarpet take care of other dependancies painlessly.
    • Re:SUSE (Score:4, Informative)

      by cperciva ( 102828 ) on Friday September 03, 2004 @10:12PM (#10155082) Homepage
      SUSE already does this.

      Nope. SuSE's "patches" are created by packaging all the files which are affected by a security fix; those files are packaged intact, without any delta compression.

      Now, this is certainly a step forward from the common (eg, Debian, RedHat) approach of having people download a complete new package, including copies of files which haven't changed at all, but SuSE's approach is still suboptimal by more than an order of magnitude.
    • As you point out, SuSE already does this, using, you guessed it, RPM. So the changes have already been made, they just need to be adopted by other distros (if they haven't already been pickedup upstream by redhat) and (probably more importantly) someone needs to document the complex magic required to create the patch rpms.
  • Well... (Score:3, Interesting)

    by iamdrscience ( 541136 ) on Friday September 03, 2004 @09:58PM (#10155015) Homepage
    On that topic, why does almost everybody distribute source code as gzipped tars instead of bzip2'ed tars (just about everybody that does use bzip2 also distributed gzips)? Sure, in the beginning gzip made more sense for people on slow machines, but nowadays the difference in the time it takes to decompress is trivial, whereas the compression benefits of bzip2 on text are phenomenal in my experience.
    • Well, I don't know for sure, but here my $0.02 (twice).

      Ramble 1:In my admittedly limited experience (since '94), it was a while before traditional old-school Unix (tm) like OSF (now Digital Unix) and Solaris abandoned the encumbed compress/uncompress utilities and started having gzip.

      Even now, the old Solaris Ultra 10 sitting in the corner of my office doing nothing (running 5.7, which has had uptimes of a couple of years solid) doesn't have bzip2 - cant be arsed to ask the sysadmin to update it as I'm

    • One advantage of gzip is that it requires less memory to decompress. It probably doesn't matter if someone's old Pentium 90 with 16MB of RAM takes a while to decompress a file, but that machine will probably *never* successfully extract an archive compressed with bzip2 (at least with the default 900kB block size).
    • Re:Well... (Score:2, Insightful)

      by p3d0 ( 42270 )
      bzip2 is a retarded name, for one thing. It makes it sound like it's in flux (gee, should I wait for bzip3?).

      They definitely should have done whatever was necessary to keep the name as just "bzip".

    • Re:Well... (Score:3, Insightful)

      by spitzak ( 4019 )
      It would help a lot if tar would do it if you just provided -z instead of having to remember to provide -j. Come to think of it, it would be nice if tar just detected compression and you did not have to give it -z either! Can this be done?
      • Come to think of it, it would be nice if tar just detected compression and you did not have to give it -z either! Can this be done?

        Yes. bsdtar [freebsd.org] does this.
      • I don't see why not - just add a trivial front-end that parses the output of the file command. IIRC less already does this; typing "less foo.gz" decompresses the file on the fly.
    • I think Gentoo has a policy of using bzip2 to compress all the source tarballs that they mirror themselves. Gzip is of course still used extensively for files that Gentoo legally cannot mirror on their own servers or cannot repackage from a binary format.

    • bzip2 compresses only slightly better than gzip, but uses up MUCH more time to do it. It's more of an issue for the distributors than the users.

      bzip2 uses up much more memory just to uncompress. This makes low-end machines incredibly slow because it has to swap lots of data. On my Psion 5mx, it makes it impossible to uncompress bzip2 files, unless they were originally compressed with "-s" (not common).

      bzip2's legality is questionable. Nobody has done a patent search, so the methods it uses could very
    • You mean like Mandrake does? Mandrake won't accept a src.rpm unless it's bzip2.
    • Or even better, rzip [samba.org].

      You can set it to have a buffer of up to 900 megs, as opposed to bzip2's 900k. So instead of looking for redundant information in small blocks of 900k, it looks for it in everything you compress (up to 900 megs).

      And surprisingly, I haven't found it to be noticeably slower than bzip2, even on my ancient hardware (the only thing is that if you want to use it to it's full potential, you need a lot of ram, but it'll work anyway without that.. just slower).
    • Re:Well... (Score:5, Insightful)

      by Sunspire ( 784352 ) on Saturday September 04, 2004 @04:03AM (#10156322)
      I always for example grab the "regular" tar.gz version of the kernel for two reasons,

      1) I always forget the j option to tar, since bz2 packages are not that common. It should autodetect it.
      2) I have the perception that the combined download time and unpacking is longer for bz2

      Point two was subjective up until now, but just for the hell of it I decided to measure it. I used the time command to measure how long it took to download the kernels and how long it took to unpack them:

      time to download linux-2.6.8.tar.bz2 1m4.414s
      time to download linux-2.6.8.tar.gz 1m9.706s

      time to unpack linux-2.6.8.tar.bz2 2m05.457s
      time to unpack linux-2.6.8.tar.gz 0m26.309s

      This is on a P4C 3.2GHz, 1GB RAM, 8Mbit connection. So there you have it, with a fast enough connection the difference is significantly in favor of the old gz format. The size difference between the bz2 and gz kernel, about 8.8 MB, is not nearly good enough to merit the slower unpacking. If you have a slower machine but also a slower connection the result is likely in the same ballpark.

      This goes to show that if you want to provide faster (subjective) update times to users, especially in the future with faster connections, you have to study the problem in detail and not just blindly try to optimize some aspect of the process (size in this case) since the global performance might in fact perform worse. Premature optimization and all that... What's the time for patching using delta compression any way? If a 600KB RPM update can be delta compressed to 10KB, but the patching process takes longer than 15 seconds, I'm likely see a slow down in system update time.
  • use rsync (Score:3, Interesting)

    by stonebeat.org ( 562495 ) on Friday September 03, 2004 @09:59PM (#10155018) Homepage
    delta based patch distribution on linux platform is quite easy. Just use RSYNC to sync application file to the source. I have used this technique of patching (i.e. RSYNC), to provide updates/patches to a in-house built application. Work very nicely.
    • Rsync is certainly good, but it has limitations. First, it is a protocol, which means that you need to be running a daemon (possible security issue), and it needs to be accessible (offline patching is impossible). Second, rsync tends to perform very poorly on compiled binaries, due to artifacts introduced in the linking process.
      • First, it is a protocol, which means that you need to be running a daemon (possible security issue)

        There is an rsyncd available, but that's just a rarely-used alternative to running it over rsh/ssh. The situation is similar to CVS: there is a natively daemon version, but smart people just run it via ssh.

        Although even if rsync was only a daemon, that'd still be backwards: the client of a protocol never needs to run a daemon to download!
  • In light of the obvious benefits, I have to ask: When will Linux vendors follow suit?

    Perhaps, right after they get a good package management system...

    I can't even imagine the mess that would be cause if someone tried to uninstall a binary-diff RPM/DEB.


    There are some rsync servers out there, which provide essentially the same service, and then some.

    Also, if download size is your #1 concern, why not download the source patches, and compile? A whole 10K may need to be downloaded...

  • Gentoo (Score:3, Interesting)

    by SuperBanana ( 662181 ) on Friday September 03, 2004 @10:05PM (#10155051)
    Jokes about gentoo aside, the source tarballs are cached in /var, and only removed when they exceed configured limits for max disk space. Patches are contained in the portage tree, along with the "ebuild" files which are the build instruction files.

    If the update is just a patch to the source, there's sometimes a minor revision made and an updated gentoo ebuild file and source code patch added to the portage tree, which is of course done via rsync. All in all, it's decently efficient. This mostly(I think) happens with unstable package versions, where a security update may make it into portage before the official project bumps their release, but that's not the case with stable stuff.

    I think for basic systems, compile time complaints are slightly exaggerated. My -original- celeron 450 isn't shabby at all at compiling most of the more basic system packages and server apps. Even glibc and gcc build with relative ease, and when I set up distcc amongst my three systems, it became even less of a hassle. Even without distcc, the time to clear out 50 packages of updates on a mail server is surprisingly low on a low-powered system.

    • Re:Gentoo (Score:2, Informative)

      by bzBetty ( 787223 )
      Gentoo users should make sure they know about this [gentoo.org] its called deltup, basically a script for portage that grabs xdelta patches instead of downloading the entire file again. It seems to save me alot of bw anyway.
  • It makes a lot more sense for non open source operating systems because when you have the source, it means that there are going to be more people who compiled programs by themselves and thus are unable to use binary diffs. I'm sure the fact that FreeBSD has it is more because it's a neat hack than because a lot of people find it necessary (although I'm sure a few people do). So it's not really a necessity for linux distros to have this, but I bet that in the near future Debian and Redhat (maybe a few others
  • by !ucif3r ( 713159 ) on Friday September 03, 2004 @10:07PM (#10155066) Homepage
    Ok before I get berated by the karma (whoring) police I do realize these are not binary diffs. But, seriously, linux has been using diff's as a way to save bandwidth before Windows even offered 'updates'. Another example of Windows 'innovation' I guess.

    Yes, I see how it is neat that there is a binary version of this process with Windows but linux is primarily a source based operating system. It is that way becuase the software is designed to be compiled for a variety of systems and setups and work with all of them.

    I do understand the authors question though, but it really should be reworded. Linux is not a OS in the sense that Windows is an OS. He should perhaps be more correctly asking when one of the 'binary' distributions of Linux (or of a Linux 'based' OS to be exact) will plan on offering this. Binary packages are really only offered on a per distribution basis with the binaries not being very compatible between distro's and systems (although some basic compatibility is generally there). As to that question who knows and who cares I use Gentoo, and after trying almost every one of the binary distro's
    • He should perhaps be more correctly asking when one of the 'binary' distributions of Linux (or of a Linux 'based' OS to be exact) will plan on offering this.

      I asked about Linux vendors... isn't that clear enough? Certainly when I hear "Linux vendors" I think "Redhat, SuSE, Mandrake, and other companies which make money by distributing operating systems built around the Linux kernel".
    • What about the grammar police? Since I'm such a nerd, I think I'll assume that role for the next minute (my Karma's so bright I have to wear shades):

      As to that question who knows and who cares I use Gentoo, and after trying almost every one of the binary distro's

      a) As You really shouldn't start a sentence with a preposition.

      b) Run-on sentences are very hard to read and don't often tend to make very much sense since they are run-on sentences but I guess I shouldn't worry about it in this situation I ju
  • by avida ( 683037 ) on Friday September 03, 2004 @10:07PM (#10155067)
    Delta compression requires the vendor to create a delta for each older version that you can upgrade from. So if a package has had ten updates, the next yupdate will need to have eleven deltas. I don't think so. Unless you want to do something like Windows Update where an agent scans your binaries and compares the difference with the update and then downloads individual files ... but that's a lot more complicated and isn't justified by the bandwidth savings.
    • And how is this different from source code patches? It seems to me that they'll only provide patches from version to version, like they do with GNU Emacs [gnu.org]. If you need to update multiple versions then you have to make a decision about going through 10 patches, or doing a full download of the desired version.
      • Source code patches are text and generally follow a simple set of rules. Ie: replace this line of text (surrounded by these other lines of text) with this other line. Source code patches generally don't automatically resolve conflicts (ie: the line of test is different than the source, or the surrounding lines aren't quite what was expected). Even then, it's still possible for the patch to go bad, depending on what else has changed.

        Binary diffs don't have any rules other than the start/end point. It is
    • You are extactly right... I did limited delta patching in the 80's and 90's. All of our modules were under 64k so it was finding the set that actually changed and sending it out.

      Why is delta patching is coiming up.. becuase of poor design. Linking unnessary functions into a run time to save some time MAYBE. The only reason for the dalta patching the original person is asking about is because of poor development standards.

      Yes, this leads back to the Monolithic vs Micro Kernals arguements. Each has thei
    • Assuming the files are properly versioned, the WU client would simply check the file's current version and send that to the WU server. The WU server would reply with all the diffs needed to get you up to speed, or if it would be faster, the updated file.

      It probably wouldn't be too hard to combine multiple diffs into one single diff and strip out any redundant or unnecessary modifications -- There are only a fixed number of versions in place between each service pack, and the service packs could be used as
    • I'm not sure it's that difficult. Rsync already does a lot of the work.
  • by Malc ( 1751 )
    This [slashdot.org] was supposed to be the last word.

  • ...toggle their diffs in from the front panel.

  • XDelta3 (Score:5, Informative)

    by TheBashar ( 13543 ) on Friday September 03, 2004 @10:23PM (#10155150)
    XDelta3 recently reached its first public release.

    http://xdelta.org/xdelta3.html [xdelta.org]

    XDelta3 is a library which is designed to foster exactly this kind of functionality. If distrobutions integrate the xdelta functionality into their package management framework we would be well on our way to what the poster is looking for.
    • I tried xdelta3 on a large text file; it was much slower & produced a much larger patch than xdelta1. That may be a pathologically bad example. Also, that's the first public release of xdelta3; the author's written that there's lots of tuning to be done.
      The article link for "binary diff" talks about a utility offering 50%-80% reduction from the equivalent xdelta1. If the point is saving space, I would compare the patch size first.
      The overhead of tracking deltas with the versions involved seems like en
    • Re:XDelta3 (Score:3, Interesting)

      by Spy Hunter ( 317220 )
      Oh man, I just had a great idea. What if you incorporated XDelta3 into a Reiser4 filesystem plugin? Versioning built into the filesystem would be an *awesome* feature. I'm sure it's been done before on some other OS, but it could really go mainstream on Linux with Reiser4.
  • Shareware (Score:4, Insightful)

    by slittle ( 4150 ) on Friday September 03, 2004 @10:28PM (#10155175) Homepage
    Used to do this back in ye olde DOS shareware days. I think RTPatch was the most common of the commercial ones.
    • Yep, RTPatch [pocketsoft.com] is some 12 years old and has been able to do binary diff patches since the beginning. Heck, you could even package patches against patches in order to work against multiple versions of the installed software.

      It actually still exists and there are versions available for Linux, DOS, Windows, etc. I imagine the support for binary patches in WISE and InstallShield have hurt them quite a bit.
  • Gentoo Portage (Score:4, Interesting)

    by WamBamBoozle ( 113151 ) on Friday September 03, 2004 @10:40PM (#10155237) Homepage
    I wonder why Gentoo [gentoo.org] doesn't do this. Gentoo, as far as I can tell, always distributes a bzip2'ed tar of any particular distribution.

    It works beautifully but I can't help but think it is a waste of bandwidth.

  • by lakeland ( 218447 ) <lakeland@acm.org> on Friday September 03, 2004 @10:57PM (#10155332) Homepage
    Firstly, linux programs tend to be smaller than windows programs (do one thing, and do it well). Even a huge beast like tetex is 'only' 14.4MB -- compare to SP2... This has reduced the demand for delta compression.

    Secondly, in the windows world people release rarely. However, the opposite is true in the linux world -- projects with daily releases are not unheard of, and weekly releases are fairly common. This means enumerating patches (v 3.4 -> v. 3.7) is infeasible in Linux where it is feasible in Windows.

    More sophisticated algorithms than delta checksums do exist (as I guess you know if your thesis is on them) -- rolling checksums have been used in several projects I know of. However, there is a widespread rumour that these techniques are patented. I have never seen any evidence, but it puts a damper on any implementations.

    There is a semi-vapourware project implementing all of this (part of the apache project IIRC). However the project fizzled away several years ago.
    • by tyrione ( 134248 ) on Saturday September 04, 2004 @04:12AM (#10156346) Homepage
      Well congratulations.

      You point out TeTeX at 14+MB which is as bare as it gets for TeTeX, then comes the TeTeX-Doc and the TeTeX-Extra which by now we're up to over 50MBs.

      Oh and here is the real kicker. Debian has updated 2.02 3 if not 4 times this month. Now 150MB+ to over 200MBs of fixes? Nope. SP2 looks a bit smaller now don't it?

      And that doesn't even touch the -1,-2,..-20 Debian patches they keep spewing out for project after project.

      The only plus for a 56k access is they don't cap youru downloads on a monthly basis. The badside obviously is bandwidth, but for me its time down waiting for important packages like TeTeX to update.

      Having a SVN approach to patching systems makes sense. Or CVS if you prefer a different versioning system approach.

      It's already been said but it is worth repeating, especially when one runs KDE or GNOME. Just Build a freakin' base package and update us with Binary Images that are new or replaced, documentation that is new or revision updates and binaries to the executables, libraries, so on that change and not the mountains of innert parts that don't change.

      You can't tell me KDELIBS , KDEBASE needs to be completely rebuilt each .x revions or -x revision by Debian and by completely rebuilt I mean all the inert files that don't actually get touched during the build process other than to make sure some wallpaper image still exists. Hell the Wallpaper backdrops, etc should be add-ons, not part of the distributions. But then again I suppose everyone thinks we all have T1 access.

      K.I.S.S.

  • by Mongo222 ( 612547 ) on Friday September 03, 2004 @11:07PM (#10155376)


    http://www.daemonology.net/bsdiff/

    bsdiff and bspatch are tools for building and applying patches to binary files. By using suffix sorting (specifically, Larsson and Sadakane's qsufsort) and taking advantage of how executable files change, bsdiff routinely produces binary patches 50-80% smaller than those produced by Xdelta, and 15% smaller than those produced by .RTPatch (a $2750/seat commercial patch tool).

    http://sourceforge.net/projects/diffball

    A general delta compression/differencing suite for any platform that supports autoconf/automake, written in c, w/ builtin support for reading,writing, converting between multiple file formats, and an easy framework to drop in new algorithms.

  • when will Gentoo get this? ;)

  • The current Fedora Core 2 DVD ISO is about 5GB beast. It would sure be nice to only have to download a 500MB Fedore Core 3 DVD ISO patch when FC3 does become available later this year.
  • The FreeBSD link looks like some dude's pet project. Cool, but it is not the official method for distributing patches.
  • by mr_zorg ( 259994 ) on Saturday September 04, 2004 @03:13AM (#10156200)
    ...have started to use delta compression (also known as binary diffs...
    Why does the poster make this sound like a new technology? And why does one of the high ranked comments link to a Microsoft tech note from 3/04 talking about this new thing called Binary Diff Compression?

    What's so new about it? I remember working with InstallShield, RTPatch, and others, way back in the Windows 3.11 days... New? <yawn>

  • I don't think it will happen on Linux for the reason that it is "too free".

    o Gentoo - builds from sources so you can't ship binary diffs
    o RPM based - symlinking and nature of open source (lots of individuality between systems running the same version of OS; such as workarounds and such)
    o APT-GET - similar to RPM
    o Others - wouldn't know but it just doesn't sound feasible

    Some may call this insignificant but when you have to patch kernel for vulnerability then every minute could be important. Downloading a 3
  • by rigolo ( 416338 ) on Saturday September 04, 2004 @04:28AM (#10156374)
    Well, gentoo is known for the fact that you download the source of every program and than start compiling. These sources are distributed in .tar.gz or .tar.bz form and can be very large. A version change (even a change from .0.0.1 to .0.0.2) has it's own tarball and therefor is downloaded again completly. But, the real changes between these 2 can be small.

    Enter "deltup" a tool that looks at to tarrballs and gives you a diff between the 2 that you can use to "transform the old tarball to a exact copy of the new tarball", it even preserves MD5 checksums compatibility. Now some enterprising gentoo user create a "dynamic deltup server" that automates the creation of these delta files, and people can reuse the delta files that other people used.


    Using this technique in combination with gentoo portage people can reduce there traffic with on average 75%.


    Have a look at the following URL's for more information:

    http://forums.gentoo.org/viewtopic.php?t=215262 [gentoo.org]

    http://linux01.gwdg.de/~nlissne/deltup-status.atim e.html [linux01.gwdg.de]


    Rigolo

  • by borud ( 127730 ) on Saturday September 04, 2004 @05:27AM (#10156487) Homepage
    Why does Linux need this? How many people have a connection which is so bad they really benefit from this?

    Sure it is always nice to have faster downloads. But is it worth the extra work involved in setting this up both at the distribution point and on the client side?

    I am not being rethorical. I am just wondering.

"Protozoa are small, and bacteria are small, but viruses are smaller than the both put together."

Working...