Become a fan of Slashdot on Facebook

 



Forgot your password?
typodupeerror
×
The Internet

Software Distribution via Multicast? 23

RockyMountain asks: "When it took me over 24 hours to download the latest Mandrake ISOs, I got to wondering...why do we still put up with servers overloaded with zillions of simultaneous TCP connections, all sending copies of the same thing?Hasn't multicasting evolved to the point where there's a better way? A quick look at Freshmeat turned up no obvious candidates. Are there any protocols or programs for distributing software via multicast? Are there any evolving standards? Or are there fundamental problems with this approach that I am overlooking?" An interesting question. With my limited understanding of Multicast, I would think that, at the very least, if you are a software distribution site you can have software distribution "channels", where each channel serves one piece of software. Milticast clients wanting a specific piece of software would connect to the right channel and wait until the next time it starts serving the software from the beginning (or, in the case of an interrupted connection, when the channel gets to the appropriate resume point). Might such a system be ideal for multicast? Can any of you come up with others?
This discussion has been archived. No new comments can be posted.

Software Distribution via Multicast?

Comments Filter:
  • There are a few problems with multicast. First, of all having the right hardware in line to handle multicast packets and get them all the way to the end user (routers, firewalls, etc.) Secondly, finding the right content to send. This is a big one. People want the content they want, when they want it. They don't feel like waiting, even if it would be more efficient.

    But a multicast model, is worth a look. On September 11th, it was, for the most part, a multicast model (broadcast tevevision) that got us our information. Most of the news web sites could not handle the unicast method. So, its a good idea, but like you said, not that much seems to be going on with it. I'd love to here some good, real-world implementations of multicast.

    KidA
    • Why wait? (Score:1, Interesting)

      by Will Dyson ( 40138 )
      For things like .iso files, where the user needs the whole file before they can use any of it, there is no need to start from the begining of the file.

      Server sends the file over and over again, as long as there is at least one member of the mulitcast group.

      Clients join the mulitcast group and start recording wherever in the file they happen to find themselves. When the file stream ends, it simply starts from the beginning again and the client procedes to capture the part that it missed the first time around before disconnecting.
  • When you're distributing .iso files in a ftp or web server you do get a huge load for a few files, but the problem is the downloads start at random points in time. For multicast to work everyone must receive at the same time. I guess you could just start a download sequence in multicast every 5 minutes and be efficient but it's just too much hassle for these types of once or twice a year events.

    What multicast is very good for is replicating installs. If you want to burn one image to every disk in a room full of computers you can easily start the download client on every computer and then start the multicast session.

    Overall, multicast could be useful if anyone actually wrote convenient software to serve and receive it, and for the geek crowd that downloads distribution .iso files it actualy might work, but the normal internet public has enough issues with the current download-on-demand thing to be bothered with multicast downloading

    For video and audio broadcasts it's just ideal. With one simple cable/dsl connection *anyone* can become an internet radio/TV.

    How is the state of the multicast capabilities throughout the net? Do ISP's use it? Do they let their clients use it?

    • Some ISPs support multicast, at least within thier own networks, but it's not widespread on the Internet. For example, we've been trying to get a multicast link working on a Cable & Wireless T3 for over a year, and while they list it as a feature in their sales literature, it's not really available for delivery.

      Multicast is ideal for streams where you can join and leave at any time, and for that reason it's being used mostly for audio and video.

      The primary use I see for multicast in the near future might be Usenet news. It's clear that Usenet is going to continue to grow, and it's already gotten to crazy sizes at about 175 gigs per day. Various multicast streams containing different Usenet hierarchies would allow many end sites to take in news groups that they wanted in sections, without straining the backbones.

      A more realistic solution for the problem you're currently having with your huge download is probably edge caching. While it's not without problems, it would allow your ISP to avoid taxing the remote FTP server and the bandwidth after one user gets the file.

      Multicast and/or caching of any sort of content work well when you have the following:

      A large number of users
      At a small number of points of access
      Accessing a small number of data objects
      That are large in size as compared with available resources

      --
      Dane Jasper
      Sonic.net
  • Big Problem. (Score:2, Insightful)

    The biggest problem I see with a multicast method for ISO images isn't the varying start times, but the varying connection speeds of users.

    Multicast video works because there is only X number of bits needed at any one time. Aka, a 24k video stream only needs 24k of pipe to work, having a T1 down won't help you get the video faster. But having a T1 will help you download an ISO image faster so you can start the install process.

    Also streaming video and such does not require a perfect stream... if a piece is missing it just ignores it and goes on its way. But an ISO image needs to be perfect. If not you just made a nice coaster for your coffee cup.

    The only way I see it working is if everyone agreed to download at the speed of the slowest link. And I'm not going to agree to let my DSL line go to waste so I can download at the 33.6k of the dialup user who wants to wait 4 days for a download. Also having to be perfect would require the server to resend anytime a client reported a lost or corrupted packet. One needs only to be familiar with Norton Ghost and a lab with one bad NIC or HDD to see the crawl this will result in till the bad box times out.

    So while nice in theory I doubt it would have much benefit outside of a controlled lab environment where everyone is on the same high-speed connection and there is very little loss of packets.
    • I thought about this a bit a while ago. There's a farily simple soultion to this, all you have to do is set up a large number of low bandwidth channels broadcasting from diffrent offests. For example, you setup 64 16kbps streams all broadcasting in a loop in diffrent offsets. This way, everyone can get the full potential of thier bandwidth. Joe Modem user connects to one or two streams, while Leet Cable Dude connects to about half of them, and finishes much sooner because he's downloading from many channels at once. This also helps Joe Modem user, because he just has to wait for one of the other streams to catch up with the one he was downloading from, and in the mean time he can download from another stream which has data he hasn't gotten yet.

      Alternitvy, you could split the file into many chuncks and have a seperate broadcast stream for each chunk.

      It ought not to be to hard to setup somthing like this with a bit of creative programming.
  • Look at bitTorrent.

    http://bitconjurer.org/BitTorrent/

    It's not multicast per se, but seeks to avoid the horrific inefficiencies you've noted.

    You could think of it as inspired by mojoNation, but it's a different architecture focusing on a different problem.

  • I'm the original poster of the AskSlashdot question.

    I'm no expert on network protocols. I'm not even a software guy. So some of what follows may seem very naive. Bear with me and see if this makes sense. Here's how I see it working.

    Data Rate. The server would send several streams at once on several channels, each one paced for a different data rate. For example, the T1 user would pick a different channel than the 28K modem user. Each channel endlessly repeats the same data set, over & over.

    Keeping Track. Each datagram sent would contain an offset value that shows where it fits into the big picture. Thus, the client knows which parts of the whole have been received, and which ones have not. As we shall see, this helps deal with start time synchronisation and dropped packet issues.

    Start Time. You don't even try to synchronise start times. If a client connects in the middle, so what. It just stores the second half of the data set, then stays on the line for the next repetition of the first half. The client knows when it has received the whole data set, because each datagram is tagged with an offset that shows where it fits into the big picture.

    Missed Packets. This is the hard part. If a client misses a packet because it is dropped en- route, or for whatever reason, there are a few ways to deal with it.

    • The client could just wait for the next iteration of the data set, and listen for the datagrams that fill in the blanks.
    • The protocol could use a UDP backchannel which allows clients to request retransmissions of datagrams by offset. The server could keep track of which datagrams have been requested, and periodically retransmit those datagrams out of sequence. If there are too many, and forward progress is threatened, the server could keep a histogram of which packet have been requested most often, and resend the most-requested ones only -- let the others wait for the next iteration.
    • My favorite approach: The protocol could get most of the data across, and just not worry about the occasional gap. Once the client has a mostly- complete data set, it could use a connected point-to-point protocol to fill in the gaps. Rsync, for example, is very good at filling small gaps in otherwise complete data sets. (True, this is point-to-point, and partially defeats the purpose of using multicast, but since it's only used for relatively tiny parts of the data set, the connections should be short-lived and relatively few in number.)
    Does any of this make sense?
    • These issues have already been solved by the reliable multicast transport [ietf.org] group at the IETF.

      The cornerstone technology to any reliable multicast system is FEC (Forward Error Correction) which is an encoding technique that can repair lost or corrupt packets.

      We at Onion Networks [onionnetworks.com] have created a very solid FEC library that will form the foundation of our open source implementations of the reliable multicast protocols. The FEC library can be had at http://onionnetworks.com/components.html

  • Opencola has a comerical and opensource solution for a simliar problem. http://www.opencola.org:8080/
  • by X ( 1235 )
    This is definitely a job for Swarmcsat [swarmcast.com]. It avoids most of the problems people have identified, although extensive firewalling can still undermine it.

    Alternatively, Gnutella or eDonkey [edonkey2000.com] like programs can be used.
    • Agreed, it's just a pity that not more places use it. (I've only used it on their demo file, but it does work. The more people that joined the faster it got.)

      Just FYI Swarmcast is developed by the OpenCola guys mentioned above. So it is in fact the same thing.

      The basic idea is to make it possible to share the file between the downloading clients. That is, client A begins a download from server S. This is done in a normal fasion. After a while client B joins in. Server S then begins to transmit to B and also tell B about the existance of A and vice versa. Now A sends parts of it's download to B and B sends to A. Both still get data from S. This continues in the same fasion as more clients join the "mesh".

      The smart part is that the file is first coded using a FCC (forward correcting code) algorithm. (Also used when communicating with satellites.) Basically you can think of it as RAID for packets. The packets are coded redundanty so you don't need all of the coded ones. (There are more /coded/ packages than original ones, it doesn't comress the file.) This means that none of the involved computers need to do smart things like track what has been sent/received. They just fire away, odds are that the receiver can use the package in some way.

      The same algorithm (FCC) can be used as is for multi-casting. (And the site contains links to papers describing this.)
  • I remembered this from the dead-tree edition, and luckily it's one of the articles that has full text available online.

    Check it out here [ddj.com]...

  • Because you're downloading Mandrake, this might not be useful for you. I'm posting this in case it might help some Windows admins out there. Intel makes a product called LANDesk Management Suite that does multicast software distribution. There are two things that I should explain at this point. 1) In the current version (6.4) of the product, multicast software distribution is an add-on that must be purchased seperately. 2) It is mis-named because it doesn't use multicast packets.

    The way it works is that the server will send a command to one node on each subnet telling them to fetch the software from a specified location. Then the rest of the nodes will be given a command telling them to fetch the software from the designated computer on their subnet. So it should be called a multi-tiered distribution instead of multicast distribution. It works well and is worth looking into if you have to do this sort of task all the time.

    • Just after I posted my last message, I found out that Intel is releasing version 6.5, which will include the multicasting add-on for no additional fee.
  • Hi, I am Justin Chapweske, the inventor of OpenCola's Swarmcast [swarmcast.com]. I am now working on another software project to specifically address the needs of content distribution over multicast, and the Onion Networks FEC Library [onionnetworks.com] is the first step in building that soluiton. The FEC library will provide the foundation of our future open source multicast content distribution software, so keep an eye out at http://onionnetworks.com [onionnetworks.com] for more info.
    • Very cool stuff Justin!

      My question for you is: Will it work over the Internet as is now, or do all the routers in between the source and the destinations have to be specially set up to handle the multicasting traffic? I did a number of multicast experiments a couple years ago and found multicast to be unusable over the net because the routers dropped all the packets.

      --jeff
    • Thanks, Jason. I just played around with the FEC Library a little, and it is really cool. I wish I had the time (and skill) to make use of it myself. I also took a browse through the IETF documents (someone posted a link [ietf.org].), and learned a lot.

      Do you get much feedback from users of your library? I'm curious what other projects are going on, especially open source ones that I could follow. How much can you reveal about the content distribution project you are working on? For example, what platforms, and when it will be on the market.

      How long do you think it will be until multicast becomes the mainstream delivery method for popular packages over the internet? That would obviously take generally accepted standards, widely-adopted packages, and I expect a lot of expensive Cisco upgrades! Do you forsee it any time soon?

I tell them to turn to the study of mathematics, for it is only there that they might escape the lusts of the flesh. -- Thomas Mann, "The Magic Mountain"

Working...