Guaranteed Transmission Protocols For Windows? 536
Posted
by
timothy
from the no-charge-for-autocompression dept.
from the no-charge-for-autocompression dept.
Michael writes "Part of our business at my work involves transferring mission critical files across a 2 mbit microwave connection, into a government-run telecommunications center with a very dodgy internal network and then finally to our own server inside the center. The computers at both ends run Windows. What sort of protocols or tools are available to me that will guarantee to get the data transferred across better than a straight Windows file system copy? Since before I started working here, they've been using FTP to upload the files, but many times the copied files are a few kilobytes smaller than the originals."
Any encrypted transmission protocol actually (Score:5, Informative)
SFTP should do since the communications are encrypted, if something changes along the way it should be rejected by the other end. HTTPS and any other protocol-over-SSL should do.
FTP is a plain-text protocol so if something changes along the way it won't give you any issues.
Use BITS (Score:5, Informative)
Background Intelligent Transfer Service (BITS) can be used to transfer files between windows servers. It is the technology behind Windows Update. We use it in our company to transfer files across a low bandwidth sattelite connection. Great thing is that it can automatically resume transfer after rebooting both machines. SharpBits offer a nice .NET API. You can find it here: http://www.codeplex.com/sharpbits [codeplex.com]
Re:Robocopy? (Score:3, Informative)
Re:TCP? (Score:5, Informative)
rsync (Score:5, Informative)
... is what you want. Yes, you can use it with Windows (with or without cygwin bloat). Use -c and a short --timeout and you're good to go. If you're using it over ssh you're looking at three layers of integrity (rsync checksums, ssh and TCP), two of them quite strong even against malicious attacks not only against normal stuff. Put it in a script with a short --timeout; if anything is wrong with the link your ssh session will freeze completely, as soon as your --timeout is reached rsync will die and your script can respawn a new one (which will resume the transfer using whatever chunks with good checksum you have already transfered and will again checksum the whole file when it finishes).
Re:Correct me if I'm wrong... (Score:4, Informative)
Implementations of TCP in most operating systems fall a bit short of that, killing off stalled connections, etc. Also, some firewall suites, and some routers make a habit of killing off connections after a certain amount of time, sometimes without regard to whether or not they are 'active'.
You might have some luck boosting reliability with the TcpMaxDataRetransmissions registry setting in Windows. But ultimately, the poster is going to need to find a file copy suite which retries when connections die.
RTFM - set binary mode in FTP (Score:5, Informative)
Ritchie's Law - assume you have screwed something up *first*, before blaming the tool...
Re:Robocopy? (Score:5, Informative)
Yeah but that extra functionality contains things like the ability to resume a transfer, retry if things fail, and verify the files after copying.
Re:Well...duh (Score:3, Informative)
You don't need to MD5 if you're using rsync. The rsync algorithm already uses checksums to ensure the files are bit-for-bit identical. In fact, rsync 3.x uses MD5.
Re:rsync (Score:4, Informative)
Re:Robocopy? (Score:5, Informative)
MOD PARENT UP. Not to mention it's multithreaded, so it's not really the same as copy/paste - it's the same as a whole bunch of copy/pastes as the same time.
Why do people keep fighting the Robocopy, I'll never know.
RSync would do the trick nicely (Score:1, Informative)
Why don't you try rsync. That should do the trick nicely.
Re:Any encrypted transmission protocol actually (Score:5, Informative)
So, in short, something like SSH or any other properly encrypted communication mechanism is a great way to both secure the data from snooping (in the case of a microwave link, a VERY real problem) as well as to safeguard the data from corruption (intentional or unintentional). I sincerely hope, for the asker's sake and possibly for the country's sake, that these files he works with are trivial.
Re:rsync should do the trick (Score:3, Informative)
Re:Line endings! (Score:2, Informative)
Twenty bucks says you're converting from DOS line endings (\r\n) to Unix line endings (\n).
There, fixed that for you.
Re:TCP? (Score:3, Informative)
Re:TCP? (Score:5, Informative)
Re:TCP? (Score:5, Informative)
I used to get dropped characters and groups of characters in text files using FTP back in the 1990s and early 21st century. It seemed to be a bug in the FTP client, because it only happened when we used the Windows Explorer interface for the product. When we did command line or used the native GUI there was no problem. If you're seeing this type of a pattern where you can see that characters are missing, switch to a different FTP client or try the Windows command line FTP.
Another possibility is that the target Windows system is mimicking a Unix system, so that an ASCII transfer is stripping the CR characters from CR/LF sequences.
On the other hand, if you really want a "guaranteed delivery" with formal acknowledgment and validation, try using a secured protocol like SSH or SFTP or a messaging system like JMS with a handshaking architecture around it. There are plenty of Open Source architectures you can build around (xBus for example), but I don't know of any ready-built executables. Commercially, vendors like IBM (MQ) and Tibco have products that deal with the messaging at a similar level.
Re:TCP? (Score:4, Informative)
You could deal with a situation like this by zipping or rarring it into multiple small files and including parity files.
http://en.wikipedia.org/wiki/Parchive [wikipedia.org]
Re:Robocopy? (Score:5, Informative)
Actually, you can specify a single file, it just has a silly syntax.
robocopy source destination file
So "robocopy c:\a c:\b myfile.txt" will copy c:\a\myfile.txt to c:\b\myfile.txt.
Re:rsync should do the trick (Score:2, Informative)
Re:UDP. (Score:3, Informative)
UDP is actually a great basis for accelerated file transfer. Several file transfer utilities / protocols have been built around it. I deal with really large files, but I have been using Aspera on several projects with great success. Worth a look.
http://www.asperasoft.com/ [asperasoft.com]
Re:rsync should do the trick (Score:1, Informative)
Agree... rsync is the way to go. builtin hashing, diffing, session realibility, retries... what more could you ask for?
Re:Well...duh (Score:3, Informative)
"You don't need to MD5 if you're using rsync. The rsync algorithm already uses checksums to ensure the files are bit-for-bit identical. In fact, rsync 3.x uses MD5."
Rsync, by default, does not necessarily do this. I've seen situations where rsync would happily copy files from a remote host over ssh to a destination host and the resulting files failed an independent MD5 test. Rsync was not causing this trouble - but it did fail to detect it. Forcing a checksum of every file (using "-c") would let rsync detect the failure to copy properly (after the entire file was done) and it would retry.
In the end, a router and one of the hosts were rebooted and the problem went away. The point is that just using rsync and ssh does not guarantee anything.
A.
Re:TCP? (Score:5, Informative)
While others point out, probably correctly, that the problem is probably a binary/ascii conversion, in actuality the error checking on TCP is simply not that good.
TCP uses a 16-bit checksum, so you have 1 in 65536 chance of an error packet being incorrectly validated as being correct. To make matters worse, it uses 1's complement instead of 2's complement, so 0x00 and 0xFF are indistinguishable.
Ethernet has a 32-bit, 2's complement checksum so if you're transmitting over that link-layer you're probably in good shape. But depending on that from a systems point of view seems risky.
Much better to only transfer ZIPs and check them at the other end if you only have control over the endpoints. If you can control the transmission, use a better error-correcting high-level protocol or even a forward-error correction protocol on top of TCP.
Or just use rsync.
The protocol needs to be a part of the discussion (Score:3, Informative)
But I digress. What the user is running into here is a fundamental problem with TCP over lossy networks. It really was not designed with really lossy networks in mind. E.g., the congestion control mechanism in TCP ("exponential backoff") makes the assumption that there is a wire sitting there and that certain parameters (like bandwidth) are not going to change. If you need certain QoS guarantees on a wireless link, TCP may be hard-pressed to deliver, because TCP's [limited] QoS mechanisms may make the problem worse. There is a HUGE amount of overhead on 802.11 networks to make sure that TCP doesn't suck.
I don't know how this person's microwave link is configured, but they might be better served by thinking about the QoS guarantees in the various layers in their network stack. I know a previous poster was joking when they said UDP might be a good option, but look, part of the problem on wireless is TCP's retransmission mechanism. With UDP it is up to the user/application to ask for a retransmit. Bittorrent works exactly like this, so something like Bittorrent, where each small file chunk gets its own hash, and those hashes are checked upon receipt, might not be a bad idea. I like rsync as well (because it has a rolling checksum feature), but again, you have TCP in the mix, and if I recall correctly, rsync will not retry automatically on failures, which is what you want.
Re:TCP? (Score:5, Informative)
Because of differences between systems like Unix and Windows, where line ends are a simple newline on Unix but a CR/LF pair on Windows. Also systems like VMS which have (had) about thirteen different file formats all inherent in the file structure itself.
In other words, because all ASCII files are not represented the same way by all different operating systems.
I know that Windows uses CR/LF for line termination and *nix uses just LF. That's a very minor inconvenience at worst,
Not if you have an "ASCII" file you are trying to read on Windows that has Unix newline conventions. Try opening a newlined file with notepad, for example.
"Little standalone utilities" are really handy for small files and small numbers of files. It's really handy when you know the format the file you have is in and what it needs to be. Please tell me how you will identify a VMS fixed record file that you have just ftp'd from a VMS FTP server when it gets to your Windows system. It has NO newlines or CR/LF pairs. You might dump the file somehow and notice that the lines are all 93 characters long and then write yourself a perl script to split it up -- or you could simply tell your FTP client that you are in ASCII mode and let the FTP server/client negotiate some resulting format that your system likes. Now try that with a VMS variable length record file, where the lines are variable length, still without line endings.
FTP wasn't designed just for hobbyists who want a file or two and have the time to deal with file formats by hand. It was designed to move data, and anything that can be automated should be. "Little standalone utilities" are a pain in the ass when trying to automate something, especially when the critical information necessary to know what specific utility to use has been lost, or is completely unknown to the recipient's system. Like VMS fixed length records on Unix or Windows.
It just seems like it's not the job of a file transfer protocol to concern itself with what an independent, unrelated application can or cannot do with the file after it's transferred.
ASCII mode in FTP has nothing to do with anyone trying to tell anyone what they can or cannot do with a file after it's transferred. It's all about knowing how to deal with a hundred different ways of representing ASCII data on dozens of different operating systems and making life EASIER for people who have to do that on a daily basis.
If YOU would rather operate in BIN mode and worry about which file formats you've just downloaded and how to convert them to an ASCII representation that your software knows how to deal with, more power to you. I got tired of dealing with this the first time I had to convert a VMS "ASCII" file to Unix and I'll let FTP do it silently for me. Yes, I've dealt with users who didn't know what ASCII mode was and downloaded a zipped file in ASCII mode and it didn't work, but the time I've saved just myself not having to deal with converting crap has more than made up for the time I've spent telling them to use BIN mode.
Set BINARY MODE in FTP!!! (Score:1, Informative)
FTP and TCP cannot "Drop" packets or bytes. You need to learn-up on TCP and FTP.
FTP _does_ translate DOS end-of-line sequences (carriage-return followed by new-line -- 2 bytes) with Unix end-of-line sequences (just new-line -- 1 byte). So your files may become shorter by as many bytes as they contain lines.
The solution is to tell FTP to not treat the file as text, but as binary image information in which new-line characters are treated with no special processing. Traditionally, FTP called this "file type I" and the command to set it is "bin" as in "binary":
C:\Documents and Settings\fred>ftp abc.net
Connected to abc.net.
220 ProFTPD 1.3.1 Server (ABC Global Enterprise Group) [10.13.131.34]
User (abc.net:(none)): freddy
331 Password required for freddy
Password:
230 User freddy logged in
ftp> bin
200 Type set to I
ftp>
Re:UDP. (Score:2, Informative)
TCP is so horrible. I wish HTTP used UDP by default so I wouldn't have the pro
Aspera is little better than Tsunami. [sourceforge.net]
As an exercise for the reader, guess which one is cheaper.
Re:TCP? (Score:2, Informative)
Windows reports file sizes exactly, to the byte.
It reports both the true file size and the file size on the disk, which is based on the block size and the number of blocks required to store the file.
Re:Any encrypted transmission protocol actually (Score:3, Informative)
Poster isn't concerned about whether the data has errors. That's a problem for the data creators. He's worried about it getting screwed up in transmission, either accidentally or maliciously
Sigh. You're welcome to nitpick my prose, but would you mind doing so in a way that makes sense. Data that got screwed up in transmission can be said to have errors. And that's what I meant.
and encryption absolutely solves that issue.
How? Not all encryption algorithms break if you mung the data after it was encrypted. Do all the algorithms break if this happens? Show me where it says this, and I'll admit that encryption is sufficient.
BTW, checksum hasn't been considered a trustworthy means of ensuring data integrity for more than a decade.
Dude, you really need to start listening to how people actually talk. For more than a decade, the word "checksum" has been used to apply to algorithms that don't simply add up bits, such as MD5 [google.com]. Not strictly logical, but language rarely is.
Re:FTP is fairly reliable... (Score:3, Informative)
Windows reports file sizes exactly, to the byte.
It reports both the true file size and the file size on the disk, which is based on the block size and the number of blocks required to store the file. ..
Re:TCP? (Score:3, Informative)
Or they'd rather just have you use the already included Wordpad that does handle new lines correctly.
Re:TCP? (Score:3, Informative)
Who rated this "insightful"?
I'm sorry, but I've worked in this area for years. I was responsible for moving data and source files to and from Unix to DOS to VMS to OSs that are even deader than VMS, and the problem is hardly unique to "notepad". YOU may see it only in notepad because YOU only use Windows, but there are a lot of other OSs out there. If you've never worked on an OS that has structured files inherent in the filesystem, well, lucky you. I have. The newlines in those kinds of files are completely lost when you copy the byte stream contents, because the newlines are implicit and defined in the file structure itself. A fixed-record file doesn't need newlines because every line is the same length.
Every other text editor I've ever tried handles files with Unix-style newlines correctly.
There is much more to the world than Windows and Unix-style newlines. If all you have seen is Windows and Unix newlines, I suppose you could think the problem was limited to that, but it really isn't. In fact, if you use FTP much at all, I suspect even you have been protected by ASCII mode, to the point that you never even knew that an FTP site you visited was VMS-based. I know I've been to VMS sites, and ASCII mode is critical if you are dealing with ASCII files.
Re:TCP? (Score:4, Informative)
Re:Robocopy? (Score:3, Informative)
It really is phenomenal how much effort Microsoft forces you to go through just to back up their servers. These days, I just go with image-based software for server backups - they seem to do a far more reliable job of getting Windows servers back up in a hurry than file-level products (which Robocopy + NTBackup would qualify as). But, that's just me, and I primarily deal with smallish networks, so I'm not entirely sure how well that scales.
Re:Any encrypted transmission protocol actually (Score:3, Informative)
Re:Any encrypted transmission protocol actually (Score:3, Informative)