Forgot your password?
typodupeerror
Software Upgrades Technology

Subversion as Automatic Software Upgrade Service? 41

Posted by Cliff
from the thinking-out-of-the-box dept.
angel'o'sphere asks: "I'm working on a contract where the customer wants a automated, Internet-based check-for-updates, update and install system. So far we've considered a Subversion based solution. The numbers are: a typical upgrade is about 10MB in size. Usually it's about 30 to 50 new files (which have an average size of about 200kB) and 2 database files (which can be anywhere from 500MB to 2GB) that change regularly. Upgrades are released about every 3 months, and this will probably become more frequent as the system matures. The big files are the problem as we estimate about 100-300 changes in every file. The total user base is currently 2000 users, creeping up to probably 5000 over the next year, and might be finally end up at some 30,000 users. Any suggestions from the crowd about setting up a meaningful test environment? How about calculating the estimated throughput of our server farm? Does anyone know of projects that have tried something similar using an RCS or a configuration management system?"
"We want to support as many concurrent users as possible (bandwith is not an issue). We use an Apache front end as a load balancer and as many Subversion servers as necessary on the backend. My largest worry, from my calculations, is disk access on the Subversion server. We could not run meaningful tests, because a typical PC kills itself if you try to run more than 4 or 5 parallel Subversion clients doing an upgrade (due to insanely high disk IO, and high seek times)."
This discussion has been archived. No new comments can be posted.

Subversion as Automatic Software Upgrade Service?

Comments Filter:
  • rsync (Score:4, Insightful)

    by ¡ (97947) on Friday September 16, 2005 @05:31PM (#13580373)
    Why not use rsync instead of Subversion? Subversion wasn't really designed for this, where as rsync is used for mirroring and syncing large repositories all over the place all the time.
    • by davecb (6526) *
      Actually use both: distribute binaries as binaries, configuration and xml files as subversion files, both via rsync.

      On the customer site, run a script that applies a visual file merge to any config files that have changed both places. The customer will have a good chance of recognizing changes they've made, and if there are clashes will tend to call you on the phone and ask what to do next.

      --dave

    • Agreed, rsync rocks (Score:2, Informative)

      by Anonymous Coward
      I have several apps like this. One is deployed to more than a dozen locations around the country, each having roughly 5000 users. It's a mod_perl app on BSD.

      My general routine: I have a "development server", and a staging farm (set up exactly like one of the customer's locations, right down to the network hardware). After changes are made and unit-tested, the changes are pushed to the staging servers using rsync. When all the various remaining tests pass, the software is pushed out to a customer's location

    • I tend to agree with the parent. You might want to do version control on your software releases with subversion, but ultimately you should check out the new stable copy you want everyone upgraded to and then distribute it via other means, like rsync. rsync is particularly a good choice because it will only send the minimum amount of data neccesary to get the job done efficiently.
      • Re:rsync (Score:4, Informative)

        by commanderfoxtrot (115784) on Friday September 16, 2005 @08:10PM (#13581548) Homepage
        Subversion uses binary diffs in a similar way to rsync. The original poster pointed out bandwidth was not an issue- therefore any bandwidth advantages rsync gives (and yes, there are plenty) are meaningless.

        Subversion gives excellent control (tags anyone?) of binary installations. We use it at for things way beyond the usual source code storage.

        I have also found disk IO is the main killer. I would suggest looking in to caching. The subversion client sends straightforward HTTP commands to the server. I have a custom PostgreSQL backend which does some caching- in his place, I would have a Squid set up to cache some basic data fetches- obviously, you need to be careful to not cache old data but that's not hard.

        So yes, Subversion is excellent for this, and with a little thought, the heavy disk IO can be reduced. Cache, cache, cache.
        • If it's economically feasible in this case, I would suggest a better disk subsystem. The more spindles, the better. Something fibre channel, if possible. A memory size large enough to get to a supercached state will certainly help, but disks are cheap in quantity and using more of them in a RAID configuration is an orthodox solution to high service times.
  • This sounds to me a bit like "All I Have Is A Hammer, So Everything Is A Nail".

    You want to update large files over the 'net, Files which have changes in the middle of the file.

    Why use Subversion? Why not use rsync?
  • Sounds like twice the work for thrice the price.
  • Rsync? (Score:4, Informative)

    by Karora (214807) on Friday September 16, 2005 @05:45PM (#13580519) Homepage

    Wouldn't Rsync be better for what you want? Why do you need to be able to choose different versions to fetch?

    If the files contains parts that are constant along with parts that vary then rsync will in many cases only transfer the partial file. With Subversion that won't apply for binary files, but rsync will still recognise partial matches even on those.

  • times two (Score:3, Informative)

    by Lord Bitman (95493) on Friday September 16, 2005 @05:48PM (#13580551) Homepage
    remember that svn always uses more than double the actual space required to hold the files for a "working copy". For "one-way" updates, svn is _NOT_ the answer.
    • double the actual space required to hold the files for a "working copy"

      True. However, using an export (svn export), you can just get a non-working copy of the code.

      rsync is probably a better solution anyway. If you want to track what went into each release, maybe a subversion backend, with a cronjob to update everything to a rsync server.

      • Re:times two (Score:3, Insightful)

        by saurik (37804)
        By non-working it should be noted that you also mean non-upgradable. Once you do an export, you dan't do an update, which makes that feature useless for this purpose.
    • They're also looking at using compression in upcoming versions for the local "hidden" originals.
    • We know that and we accept that.

      Even worse, we make (as client configuration option) a third copy to allow a local rollback to reverse changes without need for accessing the upgrade server via the internet.
      angel'o'sphere
  • by hexghost (444585) on Friday September 16, 2005 @06:03PM (#13580674) Homepage
    You would use java web start. Maybe you should consider writing something like it for this project?
  • Apt? (Score:3, Funny)

    by cortana (588495) <sam@[ ]ots.org.uk ['rob' in gap]> on Friday September 16, 2005 @06:07PM (#13580717) Homepage
    Can the clients run dpkg and apt? A daily apt-get update && apt-get upgrade is very convenient. Server-side, you don't need anything more complicated than a web server.
    • Why is this funny?

      It's not very convenient though: apt doesn't do binary diffs as far as I know, so the 2GB file would have to be downloaded every time it's changed... With 30000 users that would be 60 terabytes per update.

  • Not Subversion (Score:3, Insightful)

    by the eric conspiracy (20178) on Friday September 16, 2005 @06:11PM (#13580745)
    rsync is excellent at this, and rdist can have benefits too if you are updating a bunch of servers at once.

  • by Fweeky (41046) on Friday September 16, 2005 @06:14PM (#13580773) Homepage
    This is the technique used by portsnap [daemonology.net]; basically you generate binary diffs from a known starting point, and the client keeps track of what new patches it needs to keep in sync. Since you're just serving static files, scaling it should be as easy and cheap as it gets.

    rsync is highly general purpose; your servers will end up generating hashes for every n-bytes of every file for every client, which is a lot more heavyweight than just serving patches you generate once. SubVersion may be more effecient since it should know something about the files it's checked out previously, but it's still going to end up dynamically generating diffs between whatever versions each client has and the latest; this likely gets worse if your clients aren't tracking HEAD.

    Also note that a custom solution can likely get away with a single tag file detailing the latest patches; rsync and svn are going to be scanning their directory trees religiously. Both you and your users will probably appreciate a single GET to a small file on a webserver than a load of CPU use and disk thrashing.
    • Yes, this might be the best approach; but it's hard to say without knowing more details.

      I think the right solution for the submitter is "talk to someone with experience in this area" -- ideally, me. I'm no longer looking for a job, but I'd still be happy to hear details about a problem and offer my opinion on how best to attack it.
  • CVS (Score:2, Insightful)

    by alexpach (807980)

    I have been using CVS to manage many different websites and/or projects on various servers. It doesn't store more then it needs (just the CVS folders) and it add, updates, patches and removes the files according to your repository.

    Additionally you can use branches and sticky tags to keep track of files that don't need to be updated, or files that vary from client to client.

    It is also easy to trigger and update over ssh or cron.

    One downside compared to SVN is the lack of a binary diff mechanism, but I

    • CVS updates are not atomic, unlike subversion. If integrity of data is important to your customers, don't consider CVS. As far as using Subversion is concerned, I would be wary of giving customers that type of access to my systems.
    • CVS lacks in our eyes easy access via HTTP and by that easy circumvention of firewalls on the client site.

      Second drawback is user management on the server.

      Regarding binaries, CVS might not be able to merge binaries, and probably its default configuration does not even DIFF them, but: it can do binary diffs!

      Also, we can't work without diffs, if everything would fail us, we likely would diff the big files manualy and distribute them as "new release" of a patch file.

      angel'o'sphere
    • CVS is better than SVN here because SVN lacks the 'obliterate', or 'admin -o' ability that Perforce and CVS have.

      This is important because you DON'T need to be storing 100 large revisions of your software release in the repo with no way to ever remove it.

      Of course CVS sucks when tagging a huge repo, and removing releases is a PITA, but you got no such options in SVN.
  • Disk Accesses (Score:2, Informative)

    by Anonymous Coward
    My largest worry, from my calculations, is disk access on the Subversion server.

    Put enough ram in your server, and the changed portion will likely fit in cache. If that's not an option, use RAID to speed up disk accesses.

    Others have mentioned rsync. You might also consider xdelta.
  • Disk I/O (Score:2, Insightful)

    by pete-classic (75983)
    Let's see. You have a ceiling of 2.01GB worth of updates. You have disk I/O problems.

    Your problem is either that you don't have enough RAM in the system, or you have an OS that doesn't do a rational job of caching disk.

    Or both.

    -Peter
  • perhaps (Score:3, Informative)

    by /dev/trash (182850) on Friday September 16, 2005 @08:47PM (#13581710) Homepage Journal
    rdiff-backup
  • cfengine (Score:1, Informative)

    by Anonymous Coward
    First of all, it's obvious you are not using enough RAM on the servers. Get 8 GB. Don't do the balancing with Apache. If you are using Linux, resort to IPVS instead. For the large database files you'll want to use rsync. After the transfer, though, most likely you'll still need to perform the actual update. That's where cfengine comes in. You set it up to run rsync every N hours, then perform operations (restarting programs, cleaning up, whatever) when there's new data. You can also use it to restart dead i
  • by angel'o'sphere (80593) on Saturday September 17, 2005 @09:43AM (#13584157) Homepage Journal
    First of all, thanks for so many replies!

    First I like to clarify a bit, probably my original question was not clear enough!

    The clients of the system are customers. They have Windows PCs as the software runs on windows. On the server side we need to be able to authenticate every client as there are several region and user level restrictions about who may access which file.

    You can assume there are simply 5 to 10 user levels, where a user on level 10 may access everything and a user on level 5 only a subset.

    So far SVN looks good:

    * authentication via the Apache front end, probably via a LDAP server

    * structuring the "download area" into directories with user level appropriated content

    Regarding, rsync:

    * first off all, I did not know about it :D

    * my first investigation indicates several draw backs

    It seems not to run on Windows (without Cygwin), users need to be unix/linux users on the server, building a distribution seems "more complicated" than making a tag/version with SVN.

    Please consider: from the point of view of the service provider the system is just the same like hosting a hugh pile of sourcecode. The starting distribution probably has 3000 files and is about 2.5 GB big.

    The users need to have the ability to fall back on a later revision in case of errors during distribution.

    Users need to be able to upgrade to the latest HEAD (there is only one main thrunk anyway).

    Regarding performance of SVN, yes we are clear we need to put a lot of RAM into the servers. But we cant get rid of the disk IO it seems as SVN does not cash requests (in this case all clients allways want the same release to upgrade to, and most of the time they either have the previous or the second oldest release installed)

    However: alternatives to SVN are very welcome! I only wanted to make clear why we considered DVN in the first place.

    angel'o'sphere
    • Here's a combination of available strategies:

      o DON'T use SVN (imo)
      o check out your latest rev to a staging 'folder'
      o rename your previous release 'folder' to backup name
      o rsync the data from your staging 'folder' to all your clients one by one.

      If you have issues with the release, just roll back to the previous release 'folder'.

      There other thought is to use rsync a .torrent file and use something like bittornado to distribute from your 'staging' folder.

      All this should let you get by with a 1GB or less ram ma
      • I can't rsync to my clients.
        If at all, the clients can rsync from me, and as rsync does not run natively on windows, we can't rely on rsync, imho.

        Strange, did I use the wrogn term? No one of you has a program that has an automated check for updates from vendor option?

        Thats what we want to do. A client, over the internet, not via LAN, has to be able to use HTTP!!! and needs to be athenticated and it's pull and not push distribution.

        A bit torrent is completely out of option as we have several different access
        • If at all, the clients can rsync from me, and as rsync does not run natively on windows, we can't rely on rsync, imho.

          All one needs to run a Cygwin binary in general is the cygwin1.dll library. rsync in particular requires cygpopt-0.dll from the libpopt0 package. It can be daemonized with srvany.exe and instsrv.exe from the Windows 2003 Resource Kit [microsoft.com]. You might have to adjust the timestamp window to account for client time zones or the two-second resolution of FAT32, but it doesn't require exceptional wi

    • by jrockway (229604) * <jon-nospam@jrock.us> on Sunday September 18, 2005 @02:53PM (#13590488) Homepage Journal
      > Regarding performance of SVN, yes we are clear we need to put a lot of RAM into the servers. But we cant get rid of the disk IO it seems as SVN does not cash requests (in this case all clients allways want the same release to upgrade to, and most of the time they either have the previous or the second oldest release installed)

      Subversion doesn't need to cache requests -- the OS* does this itself. With plenty of RAM, whatever isn't being used by processes is used for cache. If you don't trust the disk caching algorithm, just make a 2.5G ramdisk and copy your files over to that when you want to release them. Then the disk won't be a problem.

      * Assuming you're using a Real OS, and not Windows. Don't use Windows for anything that requires speed or reliability.
    • You may be interested in the Unison project. More info can be found here: http://www.cis.upenn.edu/~bcpierce/unison/ [upenn.edu]
  • A previous poster mentioned cfengine [cfengine.org] briefly. If I understand cfengine correctly, it may be just what you're looking for.

    Also, if you're the sort who can/does go to conferences, the LISA '05 [usenix.org] conference (Dec. 4-9 2005) features several sessions on cfengine by Mark Burgess. (LISA is the "Large Installation System Administration Conference", put on by USENIX [usenix.org] and SAGE [sage.org]. There's also a conference BLOG [lisaconference.org], and this is the link to the tech program info [usenix.org].

Luck, that's when preparation and opportunity meet. -- P.E. Trudeau

Working...