Subversion as Automatic Software Upgrade Service? 41
angel'o'sphere asks: "I'm working on a contract where the customer wants a automated, Internet-based check-for-updates, update and install system. So far we've considered a Subversion based solution. The numbers are: a typical upgrade is about 10MB in size. Usually it's about 30 to 50 new files (which have an average size of about 200kB) and 2 database files (which can be anywhere from 500MB to 2GB) that change regularly. Upgrades are released about every 3 months, and this will probably become more frequent as the system matures. The big files are the problem as we estimate about 100-300 changes in every file.
The total user base is currently 2000 users, creeping up to probably 5000 over the next year, and might be finally end up at some 30,000 users.
Any suggestions from the crowd about setting up a meaningful test environment? How about calculating the estimated throughput of our server farm? Does anyone know of projects that have tried something similar using an RCS or a configuration management system?"
"We want to support as many concurrent users as possible (bandwith is not an issue). We use an Apache front end as a load balancer and as many Subversion servers as necessary on the backend.
My largest worry, from my calculations, is disk access on the Subversion server. We could not run meaningful tests, because a typical PC kills itself if you try to run more than 4 or 5 parallel Subversion clients doing an upgrade (due to insanely high disk IO, and high seek times)."
rsync (Score:4, Insightful)
Re:rsync (Score:2)
On the customer site, run a script that applies a visual file merge to any config files that have changed both places. The customer will have a good chance of recognizing changes they've made, and if there are clashes will tend to call you on the phone and ask what to do next.
--dave
Agreed, rsync rocks (Score:2, Informative)
My general routine: I have a "development server", and a staging farm (set up exactly like one of the customer's locations, right down to the network hardware). After changes are made and unit-tested, the changes are pushed to the staging servers using rsync. When all the various remaining tests pass, the software is pushed out to a customer's location
Re:rsync (Score:2)
I tend to agree with the parent. You might want to do version control on your software releases with subversion, but ultimately you should check out the new stable copy you want everyone upgraded to and then distribute it via other means, like rsync. rsync is particularly a good choice because it will only send the minimum amount of data neccesary to get the job done efficiently.
Re:rsync (Score:4, Informative)
Subversion gives excellent control (tags anyone?) of binary installations. We use it at for things way beyond the usual source code storage.
I have also found disk IO is the main killer. I would suggest looking in to caching. The subversion client sends straightforward HTTP commands to the server. I have a custom PostgreSQL backend which does some caching- in his place, I would have a Squid set up to cache some basic data fetches- obviously, you need to be careful to not cache old data but that's not hard.
So yes, Subversion is excellent for this, and with a little thought, the heavy disk IO can be reduced. Cache, cache, cache.
Re:rsync (Score:1)
When all you have is a hammer.... (Score:2, Redundant)
You want to update large files over the 'net, Files which have changes in the middle of the file.
Why use Subversion? Why not use rsync?
Transfer file to compare, then change file (Score:2)
Re:Transfer file to compare, then change file (Score:1)
Rsync? (Score:4, Informative)
Wouldn't Rsync be better for what you want? Why do you need to be able to choose different versions to fetch?
If the files contains parts that are constant along with parts that vary then rsync will in many cases only transfer the partial file. With Subversion that won't apply for binary files, but rsync will still recognise partial matches even on those.
Re:Rsync? (Score:1)
Re:Rsync? (Score:1)
times two (Score:3, Informative)
Re:times two (Score:1)
double the actual space required to hold the files for a "working copy"
True. However, using an export (svn export), you can just get a non-working copy of the code.
rsync is probably a better solution anyway. If you want to track what went into each release, maybe a subversion backend, with a cronjob to update everything to a rsync server.
Re:times two (Score:3, Insightful)
Re:times two (Score:2)
Re:times two (Score:2)
Even worse, we make (as client configuration option) a third copy to allow a local rollback to reverse changes without need for accessing the upgrade server via the internet.
angel'o'sphere
If this was in java... (Score:3, Insightful)
Apt? (Score:3, Funny)
Re:Apt? (Score:2)
It's not very convenient though: apt doesn't do binary diffs as far as I know, so the 2GB file would have to be downloaded every time it's changed... With 30000 users that would be 60 terabytes per update.
Not Subversion (Score:3, Insightful)
How about bsdiff/patch and some scripts? (Score:5, Interesting)
rsync is highly general purpose; your servers will end up generating hashes for every n-bytes of every file for every client, which is a lot more heavyweight than just serving patches you generate once. SubVersion may be more effecient since it should know something about the files it's checked out previously, but it's still going to end up dynamically generating diffs between whatever versions each client has and the latest; this likely gets worse if your clients aren't tracking HEAD.
Also note that a custom solution can likely get away with a single tag file detailing the latest patches; rsync and svn are going to be scanning their directory trees religiously. Both you and your users will probably appreciate a single GET to a small file on a webserver than a load of CPU use and disk thrashing.
Re:How about bsdiff/patch and some scripts? (Score:2)
I think the right solution for the submitter is "talk to someone with experience in this area" -- ideally, me. I'm no longer looking for a job, but I'd still be happy to hear details about a problem and offer my opinion on how best to attack it.
CVS (Score:2, Insightful)
I have been using CVS to manage many different websites and/or projects on various servers. It doesn't store more then it needs (just the CVS folders) and it add, updates, patches and removes the files according to your repository.
Additionally you can use branches and sticky tags to keep track of files that don't need to be updated, or files that vary from client to client.
It is also easy to trigger and update over ssh or cron.
One downside compared to SVN is the lack of a binary diff mechanism, but I
Re:CVS (Score:1)
Re:CVS (Score:2)
Second drawback is user management on the server.
Regarding binaries, CVS might not be able to merge binaries, and probably its default configuration does not even DIFF them, but: it can do binary diffs!
Also, we can't work without diffs, if everything would fail us, we likely would diff the big files manualy and distribute them as "new release" of a patch file.
angel'o'sphere
Re:CVS (Score:2)
This is important because you DON'T need to be storing 100 large revisions of your software release in the repo with no way to ever remove it.
Of course CVS sucks when tagging a huge repo, and removing releases is a PITA, but you got no such options in SVN.
Disk Accesses (Score:2, Informative)
Put enough ram in your server, and the changed portion will likely fit in cache. If that's not an option, use RAID to speed up disk accesses.
Others have mentioned rsync. You might also consider xdelta.
Disk I/O (Score:2, Insightful)
Your problem is either that you don't have enough RAM in the system, or you have an OS that doesn't do a rational job of caching disk.
Or both.
-Peter
perhaps (Score:3, Informative)
cfengine (Score:1, Informative)
Some clarifications, especially about rsync (Score:3, Informative)
First I like to clarify a bit, probably my original question was not clear enough!
The clients of the system are customers. They have Windows PCs as the software runs on windows. On the server side we need to be able to authenticate every client as there are several region and user level restrictions about who may access which file.
You can assume there are simply 5 to 10 user levels, where a user on level 10 may access everything and a user on level 5 only a subset.
So far SVN looks good:
* authentication via the Apache front end, probably via a LDAP server
* structuring the "download area" into directories with user level appropriated content
Regarding, rsync:
* first off all, I did not know about it
* my first investigation indicates several draw backs
It seems not to run on Windows (without Cygwin), users need to be unix/linux users on the server, building a distribution seems "more complicated" than making a tag/version with SVN.
Please consider: from the point of view of the service provider the system is just the same like hosting a hugh pile of sourcecode. The starting distribution probably has 3000 files and is about 2.5 GB big.
The users need to have the ability to fall back on a later revision in case of errors during distribution.
Users need to be able to upgrade to the latest HEAD (there is only one main thrunk anyway).
Regarding performance of SVN, yes we are clear we need to put a lot of RAM into the servers. But we cant get rid of the disk IO it seems as SVN does not cash requests (in this case all clients allways want the same release to upgrade to, and most of the time they either have the previous or the second oldest release installed)
However: alternatives to SVN are very welcome! I only wanted to make clear why we considered DVN in the first place.
angel'o'sphere
Re:Some clarifications, especially about rsync (Score:3, Interesting)
o DON'T use SVN (imo)
o check out your latest rev to a staging 'folder'
o rename your previous release 'folder' to backup name
o rsync the data from your staging 'folder' to all your clients one by one.
If you have issues with the release, just roll back to the previous release 'folder'.
There other thought is to use rsync a
All this should let you get by with a 1GB or less ram ma
Re:Some clarifications, especially about rsync (Score:2)
If at all, the clients can rsync from me, and as rsync does not run natively on windows, we can't rely on rsync, imho.
Strange, did I use the wrogn term? No one of you has a program that has an automated check for updates from vendor option?
Thats what we want to do. A client, over the internet, not via LAN, has to be able to use HTTP!!! and needs to be athenticated and it's pull and not push distribution.
A bit torrent is completely out of option as we have several different access
rsync on Windows (Score:2)
All one needs to run a Cygwin binary in general is the cygwin1.dll library. rsync in particular requires cygpopt-0.dll from the libpopt0 package. It can be daemonized with srvany.exe and instsrv.exe from the Windows 2003 Resource Kit [microsoft.com]. You might have to adjust the timestamp window to account for client time zones or the two-second resolution of FAT32, but it doesn't require exceptional wi
Re:Some clarifications, especially about rsync (Score:4, Insightful)
Subversion doesn't need to cache requests -- the OS* does this itself. With plenty of RAM, whatever isn't being used by processes is used for cache. If you don't trust the disk caching algorithm, just make a 2.5G ramdisk and copy your files over to that when you want to release them. Then the disk won't be a problem.
* Assuming you're using a Real OS, and not Windows. Don't use Windows for anything that requires speed or reliability.
Re:Some clarifications, especially about rsync (Score:3, Informative)
Consider CFEngine (Score:2)
Also, if you're the sort who can/does go to conferences, the LISA '05 [usenix.org] conference (Dec. 4-9 2005) features several sessions on cfengine by Mark Burgess. (LISA is the "Large Installation System Administration Conference", put on by USENIX [usenix.org] and SAGE [sage.org]. There's also a conference BLOG [lisaconference.org], and this is the link to the tech program info [usenix.org].