Please create an account to participate in the Slashdot moderation system

 



Forgot your password?
typodupeerror
×
The Internet

Building a "Distributed" FTP Server? 13

austad asks: "At my company, we run a fairly large Web site. It's distributed on multiple servers in three geographic locations. In each of these locations, we have several Real Video servers which all serve the same content for redundancy, and load balancing. We have a central FTP server in one geographic location that files get uploaded to, and then replicated out to the Real Video servers. The problem with this model is that there is a single point of failure. We would like to put an identical FTP server in each location, and when a producer wants to upload a file, they are randomly directed to an active FTP server (we use a distributed DNS system that will direct users to machines that are marked as "up"), and they upload the file." (Continued in body...)

"The problem is keeping the other two FTP servers current. How does each FTP server know who has the most current file tree? What if multiple producers are uploading simultaneously, and each has been directed to a different FTP server? Keep in mind that when replicating, we need to delete files on the Realservers that are no longer in the file tree on the FTP server. "

This discussion has been archived. No new comments can be posted.

Building a "Distributed" FTP Server?

Comments Filter:
  • by Anonymous Coward
    I have a somewhat similar problem. I have two FTP servers that need to be kept in sync, and people can upload to either of the two servers. I don't control one of the two servers so I don't think I could run rsync on it. I wrote a quick and dirty perl script [zevils.com] that seems to work alright. Perhaps you can use that as a basis for your own sync-control program.

    --
    matthewg {matthewg@zevils.com} (Matthew Sachs) [zevils.com], not at home

  • by davew ( 820 ) on Wednesday February 09, 2000 @04:07AM (#1294855) Journal

    The other comments about rsync et. al. are spot on for replicating (rsync is great), but they don't address the problem of authority; if a file exists on a particular server, is it new (so replicate) or old (so delete)?

    I think you need an upload procedure to get around that. Try this:

    • Restrict uploads to a particular "upload" directory, on every server.
    • Wait for a file to be uploaded to this dir.
    • Use a separate rsync to copy this file to the appropriate place on the nominal master server
    • Use your regular rsync to synchronise the mirrors with the master.

    There are a couple of issues with this, but you can get around them with a little added complexity in your uploading-to-master algorithm. If the master server goes down then it's true, you can't update; but the master doesn't need to be static, it just needs to be consistent. If uploads are that critical, you can use another protocol - say DNS? - to designate an arbitrary server as master.

    Dave

    --

  • I love rsync. I have a client who cannot handle rotating backup tapes (I know... I know) so I took their tape drive from them and I rsync their fileserver to mine and backup from my local machine once a day. And you dont HAVE to have a rsync server running on each end, just the executable. It can launch itself over rsh (ewww) or ssh (woohoo!) Depending on the size of the ftp server you will have some lag before all the sites update.
  • by Zaffle ( 13798 ) on Tuesday February 08, 2000 @11:51AM (#1294857) Homepage Journal
    like you need rysnc [samba.org]. From what little I know of it, it basically maintains a mirror of directories. I think its normaly used one way (as in, mirroring from a central server), but I can't see why you couldn't use it both ways. Run rsync in a cron job, say every 10 mins, and that should be fine. I would definatly take a close look at rsync if I were you.

    Taking a very quick look at the documentation myself, I see that you'd probably have a rsync server running on each site, and then have a cronjob run on each site that mirrors every other site. If all 3 sites do this, it should mirror pretty well. The lag time will probably be something like 2T, where T is the time between cronjob runs.

    In regards to your specific what-if questions, I think the best way to answer those will be to try it out yourself. :) Hope that helps

    ---

  • How about some Cisco DistributedDirector love along with Veritas clustering/mirroring solutions. If you're running Solaris, you can't go wrong with this combination. If you're running Linux or something else, then s/Veritas/CODA/ or something.
  • Comment removed based on user account deletion
  • Comment removed based on user account deletion
  • Comment removed based on user account deletion
  • RedHat provides piranha which can load balance FTP sessions..what he is looking for is really coda with piranha...or commercial AFS with piranha if you have 15K to blow.
  • There you can build identical "points of failiure", so that if one falls out, the other one takes over. Or something like that. Good luck!
    _
    / /pyder.....
    \_\ sig under construction
  • Your own FTP server would do nicely. Log all incoming files in a special place and then set up a cron job that mirrors these files to the other servers (you'd have to use a special user whose transfers were not logged in the same way so you wouldn't be mirroring hundreds of times). Similarly, a delete request would pass to the other servers.

    There is quite likely an FTP server available that is flexible enough with its logging to do this. The capability would not have to be in the FTP server; it could be a script that searched the server's log files. However, implementing it on the server side allows you to ensure that the mirroring is accurate and keep any parsing scripts from worrying about parsing date/times (unless you have a server that logs in Unix ticks; in that case you would just store the tick when the script last mirrored it, and only be concerned about the transfers after that date).

    I would suggest running an rsync every so often just to make sure.

    The key here would be to ensure that everything you are doing is accurate. This is a "high-profile" environment. You might want to consider something other than FTP, e.g., HTTP POSTs. (Yes, there are problems with using this method to upload large files. (No progress indication.) However, considering most users will be on a fast network, this should not be too much of a problem. A Java(Script) applet that broke the upload into managable chunks and displayed the progress to the user might cut it.) An HTTP POST would let you keep track of other information along with the file, such as specific user comments.

    Kenneth

Those who can, do; those who can't, write. Those who can't write work for the Bell Labs Record.

Working...