Follow Slashdot stories on Twitter

 



Forgot your password?
typodupeerror
×
Security Software

Ask Slashdot: Automated Verification For Uploaded Files? 74

VernonNemitz writes: There are a lot of ways for hackers to abuse a web site, but it seems to me that one of them is receiving less attention than it deserves. This is the simple uploading of a malware file, that has an innocent file-name extension. I'm looking for a simple file-type verification program that the site could automatically run, on each uploaded file, to test it to see if it is actually the type of file that its file-name extension claims it is. That way, if it ever gets double-clicked, we can be assured it won't hijack the system or worse. At the moment I'm only interested in testing .png files, but I'm sure plenty of web site operators would want to be able to test other file types. A quick Googling indicates the existence of a validator project under the OWASP umbrella, but is it the best choice, and what other choices are there?
This discussion has been archived. No new comments can be posted.

Ask Slashdot: Automated Verification For Uploaded Files?

Comments Filter:
  • by alzoron ( 210577 ) on Thursday November 12, 2015 @04:09PM (#50916997) Journal

    It would be simpler to just check if it's executable in some way and then if it has a file extention that doesn't match throw up a red flag.

    • by Anonymous Coward on Thursday November 12, 2015 @04:15PM (#50917027)

      this is pretty easy in *nix:

      $ file lobotomy.png
      lobotomy.png: PNG image data, 298 x 300, 8-bit/color RGB, non-interlaced

      $ file jetpack.png
      jetpack.png: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.24, not stripped

      • by gstoddart ( 321705 ) on Thursday November 12, 2015 @04:48PM (#50917261) Homepage

        this is pretty easy in *nix:

        $ file lobotomy.png
        lobotomy.png: PNG image data, 298 x 300, 8-bit/color RGB, non-interlaced

        $ file jetpack.png
        jetpack.png: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.24, not stripped

        This bears pointing out.

        UNIX systems have used "magic" for decades, and try to identify based on the actual file contents instead of its name.

        And then Microsoft came along, decided the extension was magic and reliable, and then also decided to hide well known extensions (which created new problems).

        Relying on the file name has pretty much always been a terrible way of dealing with this. Because it became exactly how things targeted people -- because calling .gif.exe hid the .exe part, and people thought it was a .gif.

        Trusting a file name for an operating system to take action has pretty much always been a terrible idea. But, historically, Microsoft has been more focused on dumbing down the system than making it more secure.

      • Re: (Score:3, Informative)

        by Anonymous Coward

        It's worth noting that this is just a heuristic. A pretty good heuristic for most cases, but a heuristic nonetheless. A file can be a valid-looking PNG and still be malicious. (Heck, it can be valid and still malicious.)

        As far as validity is concerned, if you want to go further than file magic checks, you can parse the uploaded file as the expected type. For example, opening it with a library or utility intended for working with those files.

        A simple PNG check with image magick:
        $ convert png:rot66.png info:-

        • by dave420 ( 699308 )

          It gets problematic when the intended library is vulnerable to the malicious payload. If libpng, for example, was broken and decided to arbitrarily execute a malicious payload hidden within the PNG's otherwise-valid data, your approach is just what the attacker wants - get this file parsed by the vulnerable library.

          To combat this, it's a pretty common method for web uploads to be sent directly to a virtual machine which runs a locked-down OS which can be periodically reset, which performs a raft of tests o

        • For PNG files specifically, there is a "pngcheck" utility that parses the file and verifies the contents are valid.

          If you want to go a step further, you can use "pngcrush" to parse and repack/compress the file and strip out any extra data chunks that are not required to display the image. That should strip out any malicious or malformed content, and can be run on a sandbox that is not directly accessible, so if there is a compromise of pngcrush or pngcheck the effects can be isolated.

    • Wow, did you piss off someone? That AC seems to have it in for you.

  • fileutils (Score:4, Insightful)

    by i.r.id10t ( 595143 ) on Thursday November 12, 2015 @04:10PM (#50917003)

    Well, if you are running on a Linux of Unix/BSD host, you can use the "file" utility.

    Of course, that means that you need to have shell_exec() or exec() or whatever your programming language of choice uses for running shell commands, and the other security dangers/issues involved with allowing that type of stuff.

    What may be best/easiest/safest would be to NOT allow direct HTTP access to the uploaded files, but rather use a wrapper script that would send appropriate headers to make the browser believe that the file is of the type "x-application/unknown" or whatever content type that will force a "save as" dialog instead of opening with a plugin, auto opening with a local application, etc.

    • by Anonymous Coward

      And what if there's a semicolon or another interesting character in the filename ?

    • by raymorris ( 2726007 ) on Thursday November 12, 2015 @05:00PM (#50917313) Journal

      This is on the right track, because as others have said, just because it's valid png doesn't mean it's not also valid PHP and Javascript. I just pulled a file like that off a server yesterday.

      HOWEVER, -all- of the "download.php" scripts I've ever looked at have at least two of the same three vulnerabilities. Protection from directory transversal is harder than it looks, fopen_url, and memory depletion from failing to disable the output buffer before reading and writing chunks of the file.

      A better, safer, higher performance option is to RemoveHandler PHP and RemoveHandler cgi-script in the designated upload directory, which should be the only directory that's writeeable.

      A further problem this solves is since the directory is writeable, the designated upload script which checks the files probably is NOT the only mechanism to put files there. Imperfections in other scripts will allow bad guys to upload any file they want, to the world-writeable directory* . Therefore, use httpd.conf to ensure that any scripts in that directory can not run.

      * Instead making it -explicitly- world writeable, you can instead use SuExec, which effectively makes the ENTIRE SITE world-writeable. This is extremely stupid.

      • by mcrbids ( 148650 )

        HOWEVER, -all- of the "download.php" scripts I've ever looked at have at least two of the same three vulnerabilities.

        1) Protection from directory transversal is harder than it looks,

        2) fopen_url, and

        3) memory depletion from failing to disable the output buffer before reading and writing chunks of the file.

        I'm a PHP dev, and the first two are relatively straightforward to prevent. EG: Check that basename($file) == realpath(Basename($file)) kind of stuff. But #3 is interesting to me; how would the following cause any problem?

        $fp = fopen($hugefile, 'r');
        while ($line = fgets($fp, 1024))
        echo $line;

        In this case, the buffered output will be spooled to Apache/end user as it fills. Or did you mean OOM errors from trying to load a 2 GB file into RAM?

        • You need to flush() and ob_flush() after each echo, or PHP will buffer ~ the entire thing in RAM. When a bad guy hits it, he'll have it buffer 100,000 copies in RAM.

          You'll also need to send Content-Length header manually in the PHP, otherwise the header can't be set without buffering the whole file. Compression and encoding can bite you here, so disable compression. Of course you've kinda broken resume, if someone loses their connection halfway through the download. OR ...

          Check out X-Sendfile. That's an all

      • This is on the right track, because as others have said, just because it's valid png doesn't mean it's not also valid PHP and Javascript. I just pulled a file like that off a server yesterday.

        Yeah. Some would probably argue it's overkill; and of course it opens a potential new exploit (if imagemagic or the GD library or whatever you use has serious flaw) - but for the really paranoid applications I've worked on, I generate a new image from the old one, using a trusted library. I figure by converting wha

    • by Techmeology ( 1426095 ) on Thursday November 12, 2015 @05:04PM (#50917339) Homepage
      Sadly Unix's 'file' utility is not sufficient for security purposes. Generally, file only checks for magic numbers near the beginning of the file. Many file formats remain valid, even with prepended data. For example, Python programs with several source files can be archived into a single zip file and still be executed, but you can stick a shebang onto the beginning, and still have Python (or most zip programs) recognise the archive as a zip file. There's a good video on youtube about this kind of thing: https://www.youtube.com/watch?... [youtube.com] tl;dr: This is security. It goes wrong in amusing and unobvious ways.
    • by zarr ( 724629 )
      Even if you manage to invoke file in a safe manner, you probably shouldn't. The file utility isn't isn't immune to security issues either. A quick google found at least 3 different CVSs from 2014 only. Don't expose stuff that wasn't designed with a hostile Internet in mind, to a hostile Internet. Anyway, if file says it's a png file, it doesn't mean it's a _safe_ png file.

      A paranoid (or sensible, depending on how juicy a target you are) way to handle it is to isolate the thing that verifies the file in s
    • Why would the browser be opening a save as dialog for uploading a file? The question didn't say the the uploaded files were going to be accessible on the website. Many sites don't bother checking images if they are just going to be displaying them on the site.

      I would have the files upload to a designated directory. Once the upload was complete and the user doesn't need to interact with them right away then a message saying that the transfer was successful would be displayed. One of the last jobs of the

      • TFS mentions "double clicking a file" - I took that to mean that someone downloads the file from the server and double clicks it on local machine, or someone is browsign the directory on the server itself (as their local machine) and opens files...

        • And if someone is browsing the directory on the server, or the files are copied somewhere else for them to do that, then there wouldn't be a save as dialog brought up in the browser. So don't go bringing up TFS with me when you answer your own questioning why there wouldn't be a dialog.

  • by darkain ( 749283 ) on Thursday November 12, 2015 @04:14PM (#50917025) Homepage

    In PHP, simply run something like the following against the file and see if you get a valid result back

    http://php.net/manual/en/funct... [php.net]
    http://php.net/manual/en/funct... [php.net]

    • by Anonymous Coward

      Also

      http://php.net/manual/en/ref.fileinfo.php

  • by mlts ( 1038732 ) on Thursday November 12, 2015 @04:16PM (#50917037)

    The file command does exactly this. Type in "file foo", it will tell you what it is.

    No need to add any additional software to the Linux box.

  • by mi ( 197448 )
    libmagic(3) and file(1). Plus, if you need to tune them, magic(5).
  • by Anonymous Coward on Thursday November 12, 2015 @04:22PM (#50917087)

    test it to see if it is actually the type of file that its file-name extension claims it is.

    There are various ways to make "hybrid" files which are multiple types. Graphics files which are also archives, etc. What you really want to do is normalize the files to the type they're supposed to be. PNGs are a good candidate for this because PNG is lossless, so you can decode the image and re-encode it without losing information.

    • test it to see if it is actually the type of file that its file-name extension claims it is.

      There are various ways to make "hybrid" files which are multiple types. Graphics files which are also archives, etc. What you really want to do is normalize the files to the type they're supposed to be. PNGs are a good candidate for this because PNG is lossless, so you can decode the image and re-encode it without losing information.

      This is exactly what we did on a production site. We wanted to support several different document types but wanted everything uniform so we use a locked down version of openoffice headless that converts everything doc, txt, png, spreadsheets, etc.. to pdf format. In our case pdf format made the most sense because 99% of the stuff that was suppose to be uploaded was documents and by using openoffice we automatically can support anything that openoffice does. You still have to worry about viruses that affe

  • Won't work (Score:4, Insightful)

    by Bogtha ( 906264 ) on Thursday November 12, 2015 @04:33PM (#50917155)

    test it to see if it is actually the type of file that its file-name extension claims it is.

    This won't work because a file can be a valid file in multiple formats at once and it can also be an invalid file that is nevertheless interpreted as a valid file as well.

    Take for example, a plain-text file. Harmless, right? Nope. It can also be a valid HTML file containing executable JavaScript. Or an XML file containing a billion laughs attack.

    Or take media type sniffing. Some browsers bend over backwards to interpret crap as HTML even when labelled otherwise by the Content-Type HTTP header. So one attack is to stuff enough HTML into PNG metadata to confuse a browser that doesn't follow the standards into thinking that it's HTML. This is a valid PNG file and anything that checks to see if it's really a PNG file will tell you that much. But it's still not safe.

    • by Tablizer ( 95088 )

      That's what I was thinking also. One could hide a sinister executable inside an image file, for example. It might look like modern art when projected as an image, but still be a "valid" image from the computer's perspective. The file (or parts of) can be a valid EXE and a valid image at the same time.

      The trick is not to allow "running" a given file in the wrong application on both client and the server. For example, a text file with a script in it is only a text file with a script in it if one views it in a

  • If I recall correctly, you have the file in memory before you save it to disk. Check if the first bytes are 0x8950E4E70D0A1A0A [wikipedia.org] and it should be "close enough".I'm not sure if anyone has compiled a list of headers and file extensions, but it seems a little overkill.

  • Don't accept foreign input and put it out as your own (on your web page). It's just a disaster waiting to happen. Misconfigurations or bugs could happen at any point.

    What you do is you take the input and verify that's the input you're expecting. Not just a PDF file or a PNG file but make sure you only accept PDF/PNG and then parse it and rewrite it in a way that takes out any and all foreign input. You're expecting text, only parse text, images, only parse images and parse anything within a jail with limite

  • look out for sql injection as well.

  • Try a reverse proxy with a malware scanning component.

    Or subscribe to the premium service for Virus Total and use the API to check all uploads to your server.

  • Zip of death [wikipedia.org]

    Is it a zip file? Yes
    Is it dangerous? Yes

    So how do you test for this without opening the file in a virtual environment and seeing what happens?

    I have a feeling that testing for malicious files is akin to solving the halting problem

  • There's no way to determine what type a file really is. File types are designated in the Windows world by extensions (the .jpg in bigdick.jpg), but applications and other OSes use actual file information (typically the first few / few dozen bytes) of the file to determine what to do with it.

    This typically involves some specific byte sequence, or "magic number", which alerts the OR/program to start trying to read a particular type of header, or tells it the file is big/little endian.

    However, ANY file can co

  • When you're talking about PNG, if you're looking to avoid malicious files, you can just check the headers.
    It's always the following decimal values:
    137 80 78 71 13 10 26 10

    Things get more tricky when you're talking about an exploitable file type, in which additional validation is required, but for most purposes, if the file being broken won't ruin the application, this is fine.
    • In addition to the above method, I simply ignore the original filename (and save it somewhere) and rename the file to a random UUID+the auto detected extension (for images you only need a couple of headers, for example).
  • For image files just convert it to another format at the highest possible resolution and then back again. Maybe an executable could survive that, but I haven't seen one that has yet to get through (and yes, I've tried it with some infected and/or bogus files).

    And yes, I fully admit that it's a sleazy trick but it seems to work pretty well.

    For other files type, I dunno.

  • There are several layers here that make a solution quite "interesting". On the one hand you are trying to protect your users by avoiding serving them bad content. On the other hand you want to protect your service. Protecting your users means doing more work on the uploaded content which increases your own attack surface.

    Personally if we are just talking about PNGs then I think that one of the safest things for your clients/customers would be to not serve the file as uploaded, but to serve a file that is
  • Read up on the efforts some router and modem brands goto to try and protect their firmware like updates over the life of a product line.
    Signed checksum, private key, verified public key systems.

Math is like love -- a simple idea but it can get complicated. -- R. Drabek

Working...