Building a "Distributed" FTP Server? 13
austad asks: "At my company, we run a fairly large Web site. It's distributed on multiple servers in three geographic locations. In each of these locations, we have several Real Video servers which all serve the same content for redundancy, and load balancing. We have a central FTP server in one geographic location that files get uploaded to, and then replicated out to the Real Video servers. The problem with this model is that there is a single point of failure. We would like to put an identical FTP server in each location, and when a producer wants to upload a file, they are randomly directed to an active FTP server (we use a distributed DNS system that will direct users to machines that are marked as "up"), and they upload the
file." (Continued in body...)
"The problem is keeping the other two FTP servers current. How does each FTP server know who has the most current file tree? What if multiple producers are uploading simultaneously, and each has been directed to a different FTP server? Keep in mind that when replicating, we need to delete files on the Realservers that are no longer in the file tree on the FTP server. "
What I've been using (Score:2)
--
matthewg {matthewg@zevils.com} (Matthew Sachs) [zevils.com], not at home
You need some intelligence in the uploading (Score:3)
The other comments about rsync et. al. are spot on for replicating (rsync is great), but they don't address the problem of authority; if a file exists on a particular server, is it new (so replicate) or old (so delete)?
I think you need an upload procedure to get around that. Try this:
There are a couple of issues with this, but you can get around them with a little added complexity in your uploading-to-master algorithm. If the master server goes down then it's true, you can't update; but the master doesn't need to be static, it just needs to be consistent. If uploads are that critical, you can use another protocol - say DNS? - to designate an arbitrary server as master.
Dave
--
Re:Well, it sounds to me... (Score:2)
Well, it sounds to me... (Score:3)
Taking a very quick look at the documentation myself, I see that you'd probably have a rsync server running on each site, and then have a cronjob run on each site that mirrors every other site. If all 3 sites do this, it should mirror pretty well. The lag time will probably be something like 2T, where T is the time between cronjob runs.
In regards to your specific what-if questions, I think the best way to answer those will be to try it out yourself. :) Hope that helps
---
Well.. (Score:1)
Re: (Score:1)
Re: (Score:1)
Re: (Score:1)
Re:Isn't this what RH 6.1 provides? (Score:2)
Isn't this what RH 6.1 provides? (Score:1)
_
/
\_\ sig under construction
If you have the time, resources, and trust... (Score:2)
Your own FTP server would do nicely. Log all incoming files in a special place and then set up a cron job that mirrors these files to the other servers (you'd have to use a special user whose transfers were not logged in the same way so you wouldn't be mirroring hundreds of times). Similarly, a delete request would pass to the other servers.
There is quite likely an FTP server available that is flexible enough with its logging to do this. The capability would not have to be in the FTP server; it could be a script that searched the server's log files. However, implementing it on the server side allows you to ensure that the mirroring is accurate and keep any parsing scripts from worrying about parsing date/times (unless you have a server that logs in Unix ticks; in that case you would just store the tick when the script last mirrored it, and only be concerned about the transfers after that date).
I would suggest running an rsync every so often just to make sure.
The key here would be to ensure that everything you are doing is accurate. This is a "high-profile" environment. You might want to consider something other than FTP, e.g., HTTP POSTs. (Yes, there are problems with using this method to upload large files. (No progress indication.) However, considering most users will be on a fast network, this should not be too much of a problem. A Java(Script) applet that broke the upload into managable chunks and displayed the progress to the user might cut it.) An HTTP POST would let you keep track of other information along with the file, such as specific user comments.
Kenneth