Delta Compression for Linux Security Patches? 289
cperciva asks: "For people without fast internet connections, it is often impractical to download large security patches. In order to avoid to reduce patch sizes, some operating systems -- starting with FreeBSD over a year ago, and recently followed by Mac OS X and Windows XP SP2 -- have started to use delta compression (also known as binary diffs, which constitutes a portion of my doctoral thesis), and can often reduce patch sizes by over a factor of 50. In light of the obvious benefits, I have to ask: When will Linux vendors follow suit?"
Doesn't make as much sense to use for Linux (Score:5, Informative)
Certainly for your primary commercial auto-updated Linux distributions it does, but for anything else it usually doesn't. What makes more sense (because it's easier) is breaking up media and programs, and distributing them separately so you don't have to update one when you update the other. Some projects do this already, and even package their sources this way.
Personally I'd prefer to see binary distributions move to a model of using something like cvs, so you can just do a cvs up (or equivalent) and update everything. Some files would have to be marked to always be overwritten, while config files would be merged. This solves both your differential update problem (if the right system is used - I'm thinking that's pretty much not CVS but I don't know if there's a way to make it do all of that - CVS doesn't handle binaries amazingly intelligently from what I understand) and your updates in general. Plus, you can use it both for source and binary updates.
Re:Doesn't make as much sense to use for Linux (Score:3, Interesting)
Re:Doesn't make as much sense to use for Linux (Score:4, Informative)
Re:Doesn't make as much sense to use for Linux (Score:4, Informative)
I use gentoo. I never have found any ccache settings that make much of a difference. None of these systems do a binary differential update, they download a whole package and install it, or in the case of gentoo, download a source package and compile it. Neither of these approaches are what is being called for in this article, nor what I suggest above.
Mind you, I'm fine with the way gentoo does things, but I have a fairly powerful system - not incredibly fast by modern standards but faster than anything I've run linux on before, or probably any Unix at all for that matter. For a dialup user on an older computer, atomically differential updates would make a really big difference.
Re:Doesn't make as much sense to use for Linux (Score:3, Interesting)
I guess if you reall wanted to be clever, yo
Re:Doesn't make as much sense to use for Linux (Score:3, Informative)
No, those "binary" diffs for Gentoo would be done against the sources used for the previous version of the gentoo "package", which would then be used to download the diff so that the gentoo computer could then construct new sources to build a
Re:Doesn't make as much sense to use for Linux (Score:5, Informative)
On those systems, when you do an upgrade (apt-get update), you will get a fresh package, including not only the files that changes, but all the files for that package. And if we have a package with 1 binary and 50 images, and only the binary changed, we get to download all the images again.
Some distributions have been implementing package fragmentantion for this (package-core and package-images for this example), and that is a good thing for these cases, although it is a nightmare to manage. Not as fine grained as proposed by the grandparent post, but good enough for most cases.
Re:Doesn't make as much sense to use for Linux (Score:2)
Yep, but still, in almost all cases, you have to upgrade both packages when a new version of the game comes out. For Debian, this splitting helps in another way: Instead of having the images in the archive for every supported architecture (more than ten, these days), they're now in the archive only once for every architecture.
Don't know about other distro's, but usually such an imag
Re:Doesn't make as much sense to use for Linux (Score:2)
I shouldn't say that. It's better not to use Gentoo if you like to have a system that works after your updates more than, oh, 9 times in 10.
I used Gentoo for almost a year total (most recently in august of this year). Compile-time options change without notice, major and easy to spot bugs slip through on a regular basis, and when something like KDE gets updated, it can be a week before things work again. That is, when you can get YOUR config files update
Re:Doesn't make as much sense to use for Linux (Score:2)
License of BSDiff (Score:4, Interesting)
Especially since the license of bsdiff is not even close to a BSD license (don't let the name of BSD Protection License fool you). Unless the license is changed to something like BSD, BSDiff is not going to be implemented anywhere except in closed source software. Debian cannot even package this software becauses it is non-free [debian.org].
I guess the bottom line is if you want to have something accepted in open source *and* in propriatary software, you want to license under BSD. You want to cater to one group (closed source in this case), you will lose the other.
Right after... (Score:4, Funny)
SP2? (Score:5, Funny)
Re:SP2? (Score:5, Informative)
Re:SP2? (Score:2)
Re:SP2? (Score:5, Funny)
If you look at the URL...
http://www.microsoft.com/windows2000/techinfo/plan ning/redir-binarydelta.asp
as soon as it gets hacked in to RPM (Score:3, Insightful)
Re:as soon as it gets hacked in to RPM (Score:3, Insightful)
Binary diffs make sense any time you've got large files being updated. On my system, libssl (library archive + shared object file + profiled library) is 600kB; that's large enough to justify using a 10kB binary diff instead.
I bet the average RPM is about the same size as the minium binary diff from MS.
I can't say anything about Microsoft's patches directly, but the patches used by FreeBSD Updat
Re:as soon as it gets hacked in to RPM (Score:2)
Maybe sometimes, but I don't see that happening on average.
Re:as soon as it gets hacked in to RPM (Score:2)
Maybe sometimes, but I don't see that happening on average.
Look at the statistics yourself [daemonology.net]. The average patch compression ratio (ie, [size of new file] / [size of patch file]) for FreeBSD Update is 66.404 right now. (Ignore the "Speedup due to patching" line -- that includes files which were downloaded before delta compression support was added.)
In fact, my current development code produces patches around 30% smaller than that, but I haven
Re:as soon as it gets hacked in to RPM (Score:2, Informative)
I beg to differ. SuSE 9.1 came out only 5 months ago:
That's almos
Re:as soon as it gets hacked in to RPM (Score:3, Informative)
Re:as soon as it gets hacked in to RPM (Score:2)
It probably wouldn't take much to take rhmask and update it to use xdelta [berkeley.edu] or something, though. Note what the xdelta manpage says about using it on compressed data, though:
Re:as soon as it gets hacked in to RPM (Score:2)
I have followed the release cycle of Mandrake 10 very closely and the amount of updates is huge, which is fine, it means that bugs are being addressed. However, the updates can come at 100MB at a time, simply because they just have no way of doing real patches and thus redownload the whole of Openoffice or kdelibs for a small change.
I love Mandrake to death, but this is something that needs to be addressed as soon as possible. This issue has been enough of a showsto
Mindvision (Score:5, Informative)
I was using this tool over 7 years ago now on Mac OS so I don't see what is so new about this concept... but I am glad is looks like it starting to be used more.
Re:Mindvision (Score:2)
Version 2.6 was (c) 1989-1996, so
Here the problem: (Score:2, Interesting)
Re:Here the problem: (Score:2)
SUSE (Score:4, Interesting)
RPM in general, however, doesn't nicely support this feature. Either RPM needs to be extended/modified, or a new format needs to be made. While I favor a new format for many reasons other than this, modifying RPM is probably the best solution in order to provide backwards compatibility.
Re:SUSE (Score:2)
Dependancy Hell (Score:2)
Other than the RPMs needlessly not installing in older environments, applications like urpmi, yum, yast and redcarpet take care of other dependancies painlessly.
Re:SUSE (Score:2)
Re:SUSE (Score:4, Informative)
Nope. SuSE's "patches" are created by packaging all the files which are affected by a security fix; those files are packaged intact, without any delta compression.
Now, this is certainly a step forward from the common (eg, Debian, RedHat) approach of having people download a complete new package, including copies of files which haven't changed at all, but SuSE's approach is still suboptimal by more than an order of magnitude.
Re:SUSE (Score:2)
Well... (Score:3, Interesting)
Re:Well... (Score:2)
Ramble 1:In my admittedly limited experience (since '94), it was a while before traditional old-school Unix (tm) like OSF (now Digital Unix) and Solaris abandoned the encumbed compress/uncompress utilities and started having gzip.
Even now, the old Solaris Ultra 10 sitting in the corner of my office doing nothing (running 5.7, which has had uptimes of a couple of years solid) doesn't have bzip2 - cant be arsed to ask the sysadmin to update it as I'm
Re:Well... (Score:2)
Re:Well... (Score:2, Insightful)
They definitely should have done whatever was necessary to keep the name as just "bzip".
Re:Well... (Score:3, Insightful)
Re:Well... (Score:2)
Yes. bsdtar [freebsd.org] does this.
Re:Well... (Score:2)
Re:Well... (Score:2)
I think Gentoo has a policy of using bzip2 to compress all the source tarballs that they mirror themselves. Gzip is of course still used extensively for files that Gentoo legally cannot mirror on their own servers or cannot repackage from a binary format.
Re:Well... (Score:2)
bzip2 uses up much more memory just to uncompress. This makes low-end machines incredibly slow because it has to swap lots of data. On my Psion 5mx, it makes it impossible to uncompress bzip2 files, unless they were originally compressed with "-s" (not common).
bzip2's legality is questionable. Nobody has done a patent search, so the methods it uses could very
Re:Well... (Score:2)
rzip (Score:2)
You can set it to have a buffer of up to 900 megs, as opposed to bzip2's 900k. So instead of looking for redundant information in small blocks of 900k, it looks for it in everything you compress (up to 900 megs).
And surprisingly, I haven't found it to be noticeably slower than bzip2, even on my ancient hardware (the only thing is that if you want to use it to it's full potential, you need a lot of ram, but it'll work anyway without that.. just slower).
Re:Well... (Score:5, Insightful)
1) I always forget the j option to tar, since bz2 packages are not that common. It should autodetect it.
2) I have the perception that the combined download time and unpacking is longer for bz2
Point two was subjective up until now, but just for the hell of it I decided to measure it. I used the time command to measure how long it took to download the kernels and how long it took to unpack them:
time to download linux-2.6.8.tar.bz2 1m4.414s
time to download linux-2.6.8.tar.gz 1m9.706s
time to unpack linux-2.6.8.tar.bz2 2m05.457s
time to unpack linux-2.6.8.tar.gz 0m26.309s
This is on a P4C 3.2GHz, 1GB RAM, 8Mbit connection. So there you have it, with a fast enough connection the difference is significantly in favor of the old gz format. The size difference between the bz2 and gz kernel, about 8.8 MB, is not nearly good enough to merit the slower unpacking. If you have a slower machine but also a slower connection the result is likely in the same ballpark.
This goes to show that if you want to provide faster (subjective) update times to users, especially in the future with faster connections, you have to study the problem in detail and not just blindly try to optimize some aspect of the process (size in this case) since the global performance might in fact perform worse. Premature optimization and all that... What's the time for patching using delta compression any way? If a 600KB RPM update can be delta compressed to 10KB, but the patching process takes longer than 15 seconds, I'm likely see a slow down in system update time.
use rsync (Score:3, Interesting)
Re:use rsync (Score:2)
Re:use rsync (Score:2)
There is an rsyncd available, but that's just a rarely-used alternative to running it over rsh/ssh. The situation is similar to CVS: there is a natively daemon version, but smart people just run it via ssh.
Although even if rsync was only a daemon, that'd still be backwards: the client of a protocol never needs to run a daemon to download!
I have to answer... (Score:2)
Perhaps, right after they get a good package management system...
I can't even imagine the mess that would be cause if someone tried to uninstall a binary-diff RPM/DEB.
There are some rsync servers out there, which provide essentially the same service, and then some.
Also, if download size is your #1 concern, why not download the source patches, and compile? A whole 10K may need to be downloaded...
Re:I have to answer... (Score:2)
Gentoo (Score:3, Interesting)
If the update is just a patch to the source, there's sometimes a minor revision made and an updated gentoo ebuild file and source code patch added to the portage tree, which is of course done via rsync. All in all, it's decently efficient. This mostly(I think) happens with unstable package versions, where a security update may make it into portage before the official project bumps their release, but that's not the case with stable stuff.
I think for basic systems, compile time complaints are slightly exaggerated. My -original- celeron 450 isn't shabby at all at compiling most of the more basic system packages and server apps. Even glibc and gcc build with relative ease, and when I set up distcc amongst my three systems, it became even less of a hassle. Even without distcc, the time to clear out 50 packages of updates on a mail server is surprisingly low on a low-powered system.
Re:Gentoo (Score:2, Informative)
Makes more sense for proprietary operating systems (Score:2)
Re:Makes more sense for proprietary operating syst (Score:2)
Sun, are you kidding me? (Score:2)
The only replace individual files, never binary diff.
Ummm... diffs? Not for Linux? Are you kidding? (Score:5, Insightful)
Yes, I see how it is neat that there is a binary version of this process with Windows but linux is primarily a source based operating system. It is that way becuase the software is designed to be compiled for a variety of systems and setups and work with all of them.
I do understand the authors question though, but it really should be reworded. Linux is not a OS in the sense that Windows is an OS. He should perhaps be more correctly asking when one of the 'binary' distributions of Linux (or of a Linux 'based' OS to be exact) will plan on offering this. Binary packages are really only offered on a per distribution basis with the binaries not being very compatible between distro's and systems (although some basic compatibility is generally there). As to that question who knows and who cares I use Gentoo, and after trying almost every one of the binary distro's
Re:Ummm... diffs? Not for Linux? Are you kidding? (Score:2)
I asked about Linux vendors... isn't that clear enough? Certainly when I hear "Linux vendors" I think "Redhat, SuSE, Mandrake, and other companies which make money by distributing operating systems built around the Linux kernel".
Re:Ummm... diffs? Not for Linux? Are you kidding? (Score:2)
As to that question who knows and who cares I use Gentoo, and after trying almost every one of the binary distro's
a) As You really shouldn't start a sentence with a preposition.
b) Run-on sentences are very hard to read and don't often tend to make very much sense since they are run-on sentences but I guess I shouldn't worry about it in this situation I ju
Too complicated and confusing (Score:4, Informative)
Re:Too complicated and confusing (Score:3, Insightful)
Re:Too complicated and confusing (Score:2)
Binary diffs don't have any rules other than the start/end point. It is
Re:Too complicated and confusing (Score:2)
Why is delta patching is coiming up.. becuase of poor design. Linking unnessary functions into a run time to save some time MAYBE. The only reason for the dalta patching the original person is asking about is because of poor development standards.
Yes, this leads back to the Monolithic vs Micro Kernals arguements. Each has thei
Re:Too complicated and confusing (Score:2)
It probably wouldn't be too hard to combine multiple diffs into one single diff and strip out any redundant or unnecessary modifications -- There are only a fixed number of versions in place between each service pack, and the service packs could be used as
Re:Too complicated and confusing (Score:2)
Re:Too complicated and confusing (Score:2)
Shhhh! (Score:2)
Real hackers... (Score:2, Funny)
...toggle their diffs in from the front panel.
Re:Real hackers... (Score:2)
She slapped me.
I kinda liked it.
XDelta3 (Score:5, Informative)
http://xdelta.org/xdelta3.html [xdelta.org]
XDelta3 is a library which is designed to foster exactly this kind of functionality. If distrobutions integrate the xdelta functionality into their package management framework we would be well on our way to what the poster is looking for.
Re: XDelta3 (Score:2)
The article link for "binary diff" talks about a utility offering 50%-80% reduction from the equivalent xdelta1. If the point is saving space, I would compare the patch size first.
The overhead of tracking deltas with the versions involved seems like en
Re:XDelta3 (Score:3, Interesting)
Shareware (Score:4, Insightful)
Re:Shareware (Score:2)
It actually still exists and there are versions available for Linux, DOS, Windows, etc. I imagine the support for binary patches in WISE and InstallShield have hurt them quite a bit.
Gentoo Portage (Score:4, Interesting)
It works beautifully but I can't help but think it is a waste of bandwidth.
Re:Gentoo Portage (Score:4, Informative)
http://www.daemonology.net/bsdiff/, this util is already in Gentoo.
Several reasons, but not all technical (Score:4, Insightful)
Secondly, in the windows world people release rarely. However, the opposite is true in the linux world -- projects with daily releases are not unheard of, and weekly releases are fairly common. This means enumerating patches (v 3.4 -> v. 3.7) is infeasible in Linux where it is feasible in Windows.
More sophisticated algorithms than delta checksums do exist (as I guess you know if your thesis is on them) -- rolling checksums have been used in several projects I know of. However, there is a widespread rumour that these techniques are patented. I have never seen any evidence, but it puts a damper on any implementations.
There is a semi-vapourware project implementing all of this (part of the apache project IIRC). However the project fizzled away several years ago.
Re:Several reasons, but not all technical (Score:4, Interesting)
You point out TeTeX at 14+MB which is as bare as it gets for TeTeX, then comes the TeTeX-Doc and the TeTeX-Extra which by now we're up to over 50MBs.
Oh and here is the real kicker. Debian has updated 2.02 3 if not 4 times this month. Now 150MB+ to over 200MBs of fixes? Nope. SP2 looks a bit smaller now don't it?
And that doesn't even touch the -1,-2,..-20 Debian patches they keep spewing out for project after project.
The only plus for a 56k access is they don't cap youru downloads on a monthly basis. The badside obviously is bandwidth, but for me its time down waiting for important packages like TeTeX to update.
Having a SVN approach to patching systems makes sense. Or CVS if you prefer a different versioning system approach.
It's already been said but it is worth repeating, especially when one runs KDE or GNOME. Just Build a freakin' base package and update us with Binary Images that are new or replaced, documentation that is new or revision updates and binaries to the executables, libraries, so on that change and not the mountains of innert parts that don't change.
You can't tell me KDELIBS , KDEBASE needs to be completely rebuilt each .x revions or -x revision by Debian and by completely rebuilt I mean all the inert files that don't actually get touched during the build process other than to make sure some wallpaper image still exists. Hell the Wallpaper backdrops, etc should be add-ons, not part of the distributions. But then again I suppose everyone thinks we all have T1 access.
K.I.S.S.
It's already doing it. (Score:3, Interesting)
http://www.daemonology.net/bsdiff/
bsdiff and bspatch are tools for building and applying patches to binary files. By using suffix sorting (specifically, Larsson and Sadakane's qsufsort) and taking advantage of how executable files change, bsdiff routinely produces binary patches 50-80% smaller than those produced by Xdelta, and 15% smaller than those produced by
http://sourceforge.net/projects/diffball
A general delta compression/differencing suite for any platform that supports autoconf/automake, written in c, w/ builtin support for reading,writing, converting between multiple file formats, and an easy framework to drop in new algorithms.
My question is... (Score:2, Funny)
Fedora Core 3 DVD ISO could use this (Score:2)
The current Fedora Core 2 DVD ISO is about 5GB beast. It would sure be nice to only have to download a 500MB Fedore Core 3 DVD ISO patch when FC3 does become available later this year.
Re:Fedora Core 3 DVD ISO could use this (Score:2)
You should be able to use apt or yum to upgrade from the test release to the final release, and the Fedora team should be testing to assure that this flow works.
FreeBSD support not official (Score:2)
Why do they make it sound new? (Score:3, Insightful)
What's so new about it? I remember working with InstallShield, RTPatch, and others, way back in the Windows 3.11 days... New? <yawn>
Won't happen on Linux (Score:2)
o Gentoo - builds from sources so you can't ship binary diffs
o RPM based - symlinking and nature of open source (lots of individuality between systems running the same version of OS; such as workarounds and such)
o APT-GET - similar to RPM
o Others - wouldn't know but it just doesn't sound feasible
Some may call this insignificant but when you have to patch kernel for vulnerability then every minute could be important. Downloading a 3
Gentoo now has "source delta's" reducing traffic (Score:3, Interesting)
Enter "deltup" a tool that looks at to tarrballs and gives you a diff between the 2 that you can use to "transform the old tarball to a exact copy of the new tarball", it even preserves MD5 checksums compatibility. Now some enterprising gentoo user create a "dynamic deltup server" that automates the creation of these delta files, and people can reuse the delta files that other people used.
Using this technique in combination with gentoo portage people can reduce there traffic with on average 75%.
Have a look at the following URL's for more information:
http://forums.gentoo.org/viewtopic.php?t=215262 [gentoo.org]
m e.html [linux01.gwdg.de]
http://linux01.gwdg.de/~nlissne/deltup-status.ati
Rigolo
is there any real benefit? (Score:3, Interesting)
Sure it is always nice to have faster downloads. But is it worth the extra work involved in setting this up both at the distribution point and on the client side?
I am not being rethorical. I am just wondering.
Re:Is this an Issue? (Score:4, Insightful)
Yes, it is. I just switced to broadband less than two months ago. A lot of my friends are still on dialup. Also, do not forget rural areas which do not have access to broadband. You would be surprised how many people still have dialup, I believe the number of broadband users just recently surpassed the number of dialup users. This means, obviously, that nearly half of all internet users are still on dialup.
Re:Is this an Issue? (Score:3, Informative)
As somebody pointed out before in this article [slashdot.org], there is rsync [samba.org] which minimizes transmitted data using some xdelta-like algorithm. This is not really new, and some sites offer anonymous rsync downloads for exact this reason.
(Rumours were that some people actually use rsync in the following way to get the latest Debian ISOs from a collection of old, already downlod packages: They
Yes, Linux has low HW mins (Score:2)
Re:Huh?? (Score:2)
Re:Huh?? (Score:3, Insightful)
Re:How about this... (Score:4, Funny)
Re:Its well known (Score:2, Informative)
Re:Its well known (Score:3, Insightful)
This is probably because of portage. Precompiled packages coming from all differnet sources can be a bitch to maintain. Mandrake is my example for this if you ever want to update a package they don't have RPMs for. And as for compile time, I'd rather let the computer sit for a hour or two overnight compiling a huge package than having to deal with the dependencies myself.
Re:Its well known (Score:2)
That is the proper way to maintain an rpm distribution. And it would serve you well to learn the power of urpmi.
By the way, when was the last time that you fail to find a package for Mandrake? Put together, contrib,plf, elsac, just to name a few and you have thousands of packages available.
Do you need to berate other distributions to feel better about whatever it is you run, Gentoo in this case?
Re:Its well known (Score:2)
Re:Warez (Score:3, Informative)
Re:What exactly is binary diff/delta compression ? (Score:2)
and I modified it to become:
Then convention would require me to distribute the second version as a replacement for the first. Using a binary diff, I could distribute a command file that represents something like:
(i.e. insert 1 character '4' at position 4, delete characters at original positions 13-16, then append D to the end). Of cours
Re:this gives me an idea... (Score:2)
Have you used Microsoft Office lately? It already attempts this, and it's poorly implemented. You frequently get a messaged "Office setup needs to install new components" when you click on a certain feature.
It frequently gets broken so you have to go through the install progressbar each time you restart the program. (this was on slashdot 10 days ago)