

Ask Slashdot: Automated Verification For Uploaded Files? 74
VernonNemitz writes: There are a lot of ways for hackers to abuse a web site, but it seems to me that one of them is receiving less attention than it deserves. This is the simple uploading of a malware file, that has an innocent file-name extension. I'm looking for a simple file-type verification program that the site could automatically run, on each uploaded file, to test it to see if it is actually the type of file that its file-name extension claims it is. That way, if it ever gets double-clicked, we can be assured it won't hijack the system or worse. At the moment I'm only interested in testing .png files, but I'm sure plenty of web site operators would want to be able to test other file types. A quick Googling indicates the existence of a validator project under the OWASP umbrella, but is it the best choice, and what other choices are there?
Would be easier to check if potentially harmful (Score:3)
It would be simpler to just check if it's executable in some way and then if it has a file extention that doesn't match throw up a red flag.
Re:Would be easier to check if potentially harmful (Score:5, Informative)
this is pretty easy in *nix:
$ file lobotomy.png
lobotomy.png: PNG image data, 298 x 300, 8-bit/color RGB, non-interlaced
$ file jetpack.png
jetpack.png: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.24, not stripped
Re:Would be easier to check if potentially harmful (Score:5, Insightful)
This bears pointing out.
UNIX systems have used "magic" for decades, and try to identify based on the actual file contents instead of its name.
And then Microsoft came along, decided the extension was magic and reliable, and then also decided to hide well known extensions (which created new problems).
Relying on the file name has pretty much always been a terrible way of dealing with this. Because it became exactly how things targeted people -- because calling .gif.exe hid the .exe part, and people thought it was a .gif.
Trusting a file name for an operating system to take action has pretty much always been a terrible idea. But, historically, Microsoft has been more focused on dumbing down the system than making it more secure.
Re: (Score:2)
Methinks you don't understand how 'magic' works. It is a tag, embedded inside the file, that identifies the file type.
Since it is inside the file, it is far harder to change than a simple rename operation.
Re: (Score:2)
It is nothing of the sort. It iterates through a list of definitions looking for specific values at specific places and assumes a file which matches those to be of the type identified by the definition. They are (supposedly) chosen to be values which can't be changed without breaking the usage of the file, so editing them renders the file useless for its intended purpose. For example, replacing the first two bytes of a Windows executable will cause it to stop being identified as a Windows executable, but
Re: (Score:3)
UNIX systems have used "magic" for decades, and try to identify based on the actual file contents instead of its name.
That sounds terrible.
And then Microsoft came along, decided the extension was magic and reliable
Better to use the file extension than it is to load and execute dancingbunny.png.
No, UNIX systems don't arbitrarily execute just any file. dancingbunny.exe would not run unless it's executable bit is set, except on Windows where the .exe makes it executable, and if that's hidden - well, you have the nightmare that exists due to it. Of course, if you're just clicking icons on your desktop you're in deep shit with either OS.
Re: (Score:3, Informative)
It's worth noting that this is just a heuristic. A pretty good heuristic for most cases, but a heuristic nonetheless. A file can be a valid-looking PNG and still be malicious. (Heck, it can be valid and still malicious.)
As far as validity is concerned, if you want to go further than file magic checks, you can parse the uploaded file as the expected type. For example, opening it with a library or utility intended for working with those files.
A simple PNG check with image magick:
$ convert png:rot66.png info:-
Re: (Score:2)
It gets problematic when the intended library is vulnerable to the malicious payload. If libpng, for example, was broken and decided to arbitrarily execute a malicious payload hidden within the PNG's otherwise-valid data, your approach is just what the attacker wants - get this file parsed by the vulnerable library.
To combat this, it's a pretty common method for web uploads to be sent directly to a virtual machine which runs a locked-down OS which can be periodically reset, which performs a raft of tests o
Re: Would be easier to check if potentially harmfu (Score:2)
For PNG files specifically, there is a "pngcheck" utility that parses the file and verifies the contents are valid.
If you want to go a step further, you can use "pngcrush" to parse and repack/compress the file and strip out any extra data chunks that are not required to display the image. That should strip out any malicious or malformed content, and can be run on a sandbox that is not directly accessible, so if there is a compromise of pngcrush or pngcheck the effects can be isolated.
Re: (Score:2)
Wow, did you piss off someone? That AC seems to have it in for you.
fileutils (Score:4, Insightful)
Well, if you are running on a Linux of Unix/BSD host, you can use the "file" utility.
Of course, that means that you need to have shell_exec() or exec() or whatever your programming language of choice uses for running shell commands, and the other security dangers/issues involved with allowing that type of stuff.
What may be best/easiest/safest would be to NOT allow direct HTTP access to the uploaded files, but rather use a wrapper script that would send appropriate headers to make the browser believe that the file is of the type "x-application/unknown" or whatever content type that will force a "save as" dialog instead of opening with a plugin, auto opening with a local application, etc.
Bobby Tables strikes again (Score:1)
And what if there's a semicolon or another interesting character in the filename ?
yes, but directory traversal and buffer dos, so. . (Score:4, Informative)
This is on the right track, because as others have said, just because it's valid png doesn't mean it's not also valid PHP and Javascript. I just pulled a file like that off a server yesterday.
HOWEVER, -all- of the "download.php" scripts I've ever looked at have at least two of the same three vulnerabilities. Protection from directory transversal is harder than it looks, fopen_url, and memory depletion from failing to disable the output buffer before reading and writing chunks of the file.
A better, safer, higher performance option is to RemoveHandler PHP and RemoveHandler cgi-script in the designated upload directory, which should be the only directory that's writeeable.
A further problem this solves is since the directory is writeable, the designated upload script which checks the files probably is NOT the only mechanism to put files there. Imperfections in other scripts will allow bad guys to upload any file they want, to the world-writeable directory* . Therefore, use httpd.conf to ensure that any scripts in that directory can not run.
* Instead making it -explicitly- world writeable, you can instead use SuExec, which effectively makes the ENTIRE SITE world-writeable. This is extremely stupid.
Re: (Score:2)
HOWEVER, -all- of the "download.php" scripts I've ever looked at have at least two of the same three vulnerabilities.
1) Protection from directory transversal is harder than it looks,
2) fopen_url, and
3) memory depletion from failing to disable the output buffer before reading and writing chunks of the file.
I'm a PHP dev, and the first two are relatively straightforward to prevent. EG: Check that basename($file) == realpath(Basename($file)) kind of stuff. But #3 is interesting to me; how would the following cause any problem?
$fp = fopen($hugefile, 'r');
while ($line = fgets($fp, 1024))
echo $line;
In this case, the buffered output will be spooled to Apache/end user as it fills. Or did you mean OOM errors from trying to load a 2 GB file into RAM?
ob_flush() and flush(), Content-Length, x-sendfile (Score:2)
You need to flush() and ob_flush() after each echo, or PHP will buffer ~ the entire thing in RAM. When a bad guy hits it, he'll have it buffer 100,000 copies in RAM.
You'll also need to send Content-Length header manually in the PHP, otherwise the header can't be set without buffering the whole file. Compression and encoding can bite you here, so disable compression. Of course you've kinda broken resume, if someone loses their connection halfway through the download. OR ...
Check out X-Sendfile. That's an all
Re: (Score:1)
Yeah. Some would probably argue it's overkill; and of course it opens a potential new exploit (if imagemagic or the GD library or whatever you use has serious flaw) - but for the really paranoid applications I've worked on, I generate a new image from the old one, using a trusted library. I figure by converting wha
Unix 'file' is not sufficient (Score:5, Insightful)
Re: (Score:3)
A paranoid (or sensible, depending on how juicy a target you are) way to handle it is to isolate the thing that verifies the file in s
Re: (Score:2)
Why would the browser be opening a save as dialog for uploading a file? The question didn't say the the uploaded files were going to be accessible on the website. Many sites don't bother checking images if they are just going to be displaying them on the site.
I would have the files upload to a designated directory. Once the upload was complete and the user doesn't need to interact with them right away then a message saying that the transfer was successful would be displayed. One of the last jobs of the
Re: (Score:2)
TFS mentions "double clicking a file" - I took that to mean that someone downloads the file from the server and double clicks it on local machine, or someone is browsign the directory on the server itself (as their local machine) and opens files...
Re: (Score:2)
And if someone is browsing the directory on the server, or the files are copied somewhere else for them to do that, then there wouldn't be a save as dialog brought up in the browser. So don't go bringing up TFS with me when you answer your own questioning why there wouldn't be a dialog.
PHP (Score:3)
In PHP, simply run something like the following against the file and see if you get a valid result back
http://php.net/manual/en/funct... [php.net]
http://php.net/manual/en/funct... [php.net]
Re: (Score:1)
Also
http://php.net/manual/en/ref.fileinfo.php
The file command? (Score:3)
The file command does exactly this. Type in "file foo", it will tell you what it is.
No need to add any additional software to the Linux box.
RTFM (Score:2)
Not the right question. (Score:3, Insightful)
test it to see if it is actually the type of file that its file-name extension claims it is.
There are various ways to make "hybrid" files which are multiple types. Graphics files which are also archives, etc. What you really want to do is normalize the files to the type they're supposed to be. PNGs are a good candidate for this because PNG is lossless, so you can decode the image and re-encode it without losing information.
Re: (Score:2)
test it to see if it is actually the type of file that its file-name extension claims it is.
There are various ways to make "hybrid" files which are multiple types. Graphics files which are also archives, etc. What you really want to do is normalize the files to the type they're supposed to be. PNGs are a good candidate for this because PNG is lossless, so you can decode the image and re-encode it without losing information.
This is exactly what we did on a production site. We wanted to support several different document types but wanted everything uniform so we use a locked down version of openoffice headless that converts everything doc, txt, png, spreadsheets, etc.. to pdf format. In our case pdf format made the most sense because 99% of the stuff that was suppose to be uploaded was documents and by using openoffice we automatically can support anything that openoffice does. You still have to worry about viruses that affe
Re: (Score:3)
But it could be even worse depending on your server configuration. I believe (but I haven't tested) that some Apache configurations can result in unknown file extensions being ignored. So if someone uploads a file named say "myhack.php.foobar" and it is placed in a publicly accessible directory, Apache will ignore the "foobar" extension because it doesn't recognise it, and then decide it's a PHP file, and execute it.
Also check out Apache content negotiation [apache.org] (and mod_mi [apache.org]
Won't work (Score:4, Insightful)
This won't work because a file can be a valid file in multiple formats at once and it can also be an invalid file that is nevertheless interpreted as a valid file as well.
Take for example, a plain-text file. Harmless, right? Nope. It can also be a valid HTML file containing executable JavaScript. Or an XML file containing a billion laughs attack.
Or take media type sniffing. Some browsers bend over backwards to interpret crap as HTML even when labelled otherwise by the Content-Type HTTP header. So one attack is to stuff enough HTML into PNG metadata to confuse a browser that doesn't follow the standards into thinking that it's HTML. This is a valid PNG file and anything that checks to see if it's really a PNG file will tell you that much. But it's still not safe.
Re: (Score:1)
That's what I was thinking also. One could hide a sinister executable inside an image file, for example. It might look like modern art when projected as an image, but still be a "valid" image from the computer's perspective. The file (or parts of) can be a valid EXE and a valid image at the same time.
The trick is not to allow "running" a given file in the wrong application on both client and the server. For example, a text file with a script in it is only a text file with a script in it if one views it in a
The easy way (Score:2)
If I recall correctly, you have the file in memory before you save it to disk. Check if the first bytes are 0x8950E4E70D0A1A0A [wikipedia.org] and it should be "close enough".I'm not sure if anyone has compiled a list of headers and file extensions, but it seems a little overkill.
Re: (Score:2)
Re: (Score:2)
look out for sql injection as well. (Score:2)
look out for sql injection as well.
Reverse Proxy (Score:2)
Try a reverse proxy with a malware scanning component.
Or subscribe to the premium service for Virus Total and use the API to check all uploads to your server.
Zip of death (Score:2)
Zip of death [wikipedia.org]
Is it a zip file? Yes
Is it dangerous? Yes
So how do you test for this without opening the file in a virtual environment and seeing what happens?
I have a feeling that testing for malicious files is akin to solving the halting problem
Nope (Score:2)
There's no way to determine what type a file really is. File types are designated in the Windows world by extensions (the .jpg in bigdick.jpg), but applications and other OSes use actual file information (typically the first few / few dozen bytes) of the file to determine what to do with it.
This typically involves some specific byte sequence, or "magic number", which alerts the OR/program to start trying to read a particular type of header, or tells it the file is big/little endian.
However, ANY file can co
Just check the file headers (Score:1)
It's always the following decimal values:
137 80 78 71 13 10 26 10
Things get more tricky when you're talking about an exploitable file type, in which additional validation is required, but for most purposes, if the file being broken won't ruin the application, this is fine.
About extensions (Score:1)
Re: (Score:2)
Easy (Score:2)
For image files just convert it to another format at the highest possible resolution and then back again. Maybe an executable could survive that, but I haven't seen one that has yet to get through (and yes, I've tried it with some infected and/or bogus files).
And yes, I fully admit that it's a sleazy trick but it seems to work pretty well.
For other files type, I dunno.
Lots of layers to consider (Score:2)
Personally if we are just talking about PNGs then I think that one of the safest things for your clients/customers would be to not serve the file as uploaded, but to serve a file that is
Router firmware (Score:2)
Signed checksum, private key, verified public key systems.