How Do You Know Your Code is Secure? 349
bvc writes "Marucs Ranum notes that 'It's really hard to tell the difference between a program that works and one that just appears to work.' He explains that he just recently found a buffer overflow in Firewall Toolkit (FWTK), code that he wrote back in 1994. How do you go about making sure your code is secure? Especially if you have to write in a language like C or C++?"
Same way you hunt bugs (Score:5, Informative)
0) Don't "roll your own" security unless absolutely necessary. Find someone else's implementations and work with those.
1) Design the code for security, code to that design. I've seen of security bugs creep into code because it was never designed to be secure.
2) Use static code checkers--such as Splint [splint.org] for C/C++ and FindBugs [sourceforge.net] for Java--that look for security vulnerabilities.
3) Peer reviews/code audits. Sit down with your code (and have others who know how to look for security vulnerabilities sit down with your code) and do a full review.
Nothing is foolproof, but every little bit helps. It should be noted that all of the above also improve the overall quality of the code and reduce the number of overall bugs: Finding existent implementations of features that can be used can reduce maintenance and reduce bugs; Designing the code and putting it through a proper design review can catch a lot of logic problems and ensure that the code fits the requirements list--I've seen a huge number of synchronization bugs in Java simply because the author didn't know how to use synchronization properly; static code checkers find a lot more than just security bugs; and Peer Reviews/Code Audits can help isolate a variety of problems.
Valgrind (Score:5, Informative)
However, security is a lot more than buffer overflows... but at least it brings you up to the relative security of Java, with speed to boot.
Re:The answer is simple - you never know (Score:3, Informative)
Re:Avoid direct memory access (Score:5, Informative)
There are a couple of solutions to this problem:
1.) Pass character arrays at the interfaces between your components and immediately put those character arrays under the control of your library once they come in.
2.) Write or find your own string library and pass that string class between program components. Be careful when doing this. Mistakes will come back to byte you.
All of it's kind of nasty. It'd be nice if C++ could standardize their binary representation, even if it's only a standard valid per platform.
Then there's also:
3.) Choose a language which unlike C++ already has a standardized binary representation for strings, or a system global interpreter for a varying binary representation. This is just an extension of the "higher-level library which does the memory management for you" option really.
Don't get me wrong -- I'm agreeing with the parent post. I'm just adding a caveat.
Re:I don't. (Score:5, Informative)
Um, how's that?
Your poor grammar has a chilling effect on me. If I were you, I'd find a way to effect an improvement in your knowledge. Luckily it affects me only a little. But the fact that so few seem to understand that these two words are both verb and noun leaves me of sad affect.
String overflows (Score:3, Informative)
For example, I recently fixed a bug Blob And Conquer to do with Strings, the code was something like this:
char nm[2];
nm[0] = mission[11];
nm[1] = mission[12];
The code then went on to doing a
missionNum = atoi(nm);
Most of the time, this'd work OK because of the way atoi works. Other times though it'd stray off into other memory and pick up a random number and return a three or more digit number instead.
Obviously there's an easy way to fix it.
Re:Don't use C++ as if it was only "C with classes (Score:4, Informative)
Yes, sure, if you use STL, you need not worry about getting the buffer size wrong. And that's about it - container indexing is not bound-checked (unless you use at() instead of operator[] - and that's about the only instance of run-time safety check I remember seeing in STL!), iterators can go outside their container without notice, or can suddenly become invalid depending on what their container is and what was done to it. Even leaving library issues aside, there are some nasty things about the language itself - it's just way too easy to get an uninitialized variable or a class member, or to mess up with the order of field initializers in constructor.
This is not to say that C++ is not a good language. All of the above are features in a sense they are there for a reason - but they certainly don't make writing secure software easier.
Re:The only sure way I know of: Lambda calculus (Score:2, Informative)
Assume you have an algorithm (however complex) that can determine if a program in some turing complete language is secure, call it IsSecure(). IsSecure() is provably secure, because you've ran it on itself.
Now, write a program that has a security hole if and only if IsSecure() returns true:
#Program A
start(input)
{
if(IsSecure(input))
ExposeSecurityHoleInSelf()
else
#The hole must be in the function IsSecure(), which is silly, because you've used IsSecure() to secure IsSecure()
}
Call program A passing itself as input.
Q.E.D.
Re:Easy (Score:3, Informative)
It certainly won't allow you to execute arbitrary code in for example a Java application -- infact, you'd have to find a bug in the JVM itself or one of the native implementations of basic classes like String to have any chance of that. That is however highly unlikely given the amount of use these core parts of Java see.
There are no good open source tools, unfortunately (Score:3, Informative)
Re:What's the matter with C/C++? (Score:3, Informative)
Which is irrelevant. That code can be thoroughly tested and safe, even with the fundamental issues of C++. What matters is your code. You probably won't get the chance to test that code thousands or millions of times the way the compiler/library or interpreter has been.
It's not that C/C++ is so insecure by itself, the problem is that programmers may not have used the best programming practices. There are plenty of libraries for handling strings and memory allocation in C, in C++ there are string and storage classes that do as much or as little checking as you need.
C/C++ IS insecure by itself, because of what it allows you to do. No programmer is perfect, and we all make mistakes. Driving without a safety belt is fundamentally less safe, and you can't argue it away by talking about 'people not driving skillfully enough'.
When you are an expert programmer there are places where you need more efficiency than the super-safe string routines can give you. It's the job of the expert to determine exactly how to balance efficiency against security, and only C/C++ can give you this balance.
I would be very surprised to know exactly when you think this is the case. If it is, it is for the most specialised circumstances. The problem with C/C++ is you have this division between safe (with checks) or fast. Other languages get around this problem, by including safety, but allowing the safety checks to be optimised away by code analysis, often at run time. For example, you could have code something like this:
int [] array = new int [4];
for (int i = 0; i 4; i++) array[i] = i;
The compiler/runtime can analyse this code, and because it is obvious that the bounds of 'array' are never exceeded here, remove any checking and optimise hugely.
It is a myth that you need to balance efficiency against security the C/C++ way.
Re:The only sure way I know of: Lambda calculus (Score:1, Informative)
If you know lambda calculus then you also know The Halting Problem. There are an entire set of exploits based up on it. Real ones, they don't generally lead to data compromise but they negatively impact performance and hide other things. Snort for example allows for regular expressions to be used in signatures, likewise there are pathological datasets that cause snort to spend 10s of thousands times more time processing regexes than initially expected. There are signatures that have datasets that can cause a modern machine to spends minutes processing regexes, while real hacker data is passing through unseen. It's a classical halting problem example
FP is about algorithm correctness.
Another problem is that programs are attacked at their touch points to the world, users and other program. FP nicely ignores those problems as side effects and doesn't have a clearly definied lambda calculus for dealing with them.
I definitely thing FP solves some set of problems and should be used more but it won't make anything more secure any time soon.
Re:Don't use C++ as if it was only "C with classes (Score:5, Informative)
Those bugs aren't harder to track down than "old-style" bugs, in fact I think they're vastly easier to track down than, say, a wild pointer. The difference is that you're less experienced at dealing with the new problems, so they seem harder to you. With time and practice, you'll see through copy/reference errors quickly. In the meantime, a little discipline can cover your lack of experience -- never store raw pointers in collections, always "objects". If you don't want to create copies, then store objects of a smart pointer class. In fact, avoid ever using raw pointers at all. *Always* assign the result of a 'new' operation to a smart pointer (auto_ptr works for a surprisingly large set of cases, but you may have to get a reference counted pointer type or similar for others -- the BOOST library has some good options if you haven't already rolled your own).
If you really run into different behavior with different compilers, then at least one of the compilers is buggy. That does happen, but it's a lot rarer today than it was a few years ago. When you find that situation, wrap the tricky bit behind another abstraction layer and implement compiler-specific workarounds so that your application code can just use the abstraction and get consistent behavior. In most cases, someone else has already done this work for you. Again, look into BOOST.
Re:I don't. (Score:1, Informative)
Perhaps your constant exposure to the poor grammar here has affected your ability to use it properly - I hear it's a common effect amongst Slashdot readers and posters.
Oh, and as a few other people have stated, they can both be used either way, but the common usage is exactly the opposite of what you've stated.
HTH, HAND.
Re:Verified (Score:2, Informative)
On the other hand, MS has some of the best code analysis technology available in Prefast, FXCop, SAL, and Application Verifier:
http://msdn.microsoft.com/msdnmag/issues/05/11/SD
Disclaimer from the linked content:
"Security tools will not make your software secure. They will help, but tools alone do not make code resilient to attack. There is simply no replacement for having a knowledgeable work force that will use the tools to enforce policy."
Re:The only sure way I know of: Lambda calculus (Score:3, Informative)
Congratulations, you have won today's "Ignorant undergraduate misunderstanding of the Halting problem" prize.
You're wrong on every significant point. You can write a program in a turing-complete language to determine if another program in a turing-complete language is 100% secure. The way that you specified implementing such a program is flawed. The Halting problem says that there exist certain ways to specify a problem which admit no solution, even though a solution to the problem exists. It does not say that there exist no alternative ways to specify the problem which do admit solutions. People always seem to think that the purpose of the Halting proof was to demonstrate that the real-world problem couldn't be solved - this is wrong. The purpose of the proof was merely to demonstrate that there exist certain non-trivial, interesting mathematical problem specifications which don't have solutions. This has interesting results in computability theory. It has very little relevance to the question of what sort of software we can write. It's all about how you reduce the real-world problem into a mathematical problem specification.
IsSecure returns false when passed this program as input. It doesn't matter that you think the answer is silly. This program is not secure because there exists a call to ExposeSecurityHoleInSelf in it and IsSecure failed to prove that this call was unreachable, or just didn't give a damn that it was unreachable. That is defined as an insecure program for the purpose of the IsSecure function. By specifying the problem in this way, we admit the possibility of a solution, and the Halting problem is avoided.
In most cases, the Halting problem can be avoided in this manner. Nothing compels you to define your program as having no false positives.
For the purposes of automated security validation, false positives are not a serious problem - we can easily write the program in a manner that can be proven secure by a given prover. We don't have to accept arbitrary programs as input.
In practice, we don't do it like this. The function we use in the real world is is_proof_of_security_valid(), and it takes two inputs - a program and a proof of the program's security. The function checks that the proof is valid for this program. The proof itself is generated semi-automatically, but some parts are supplied by humans - typically via markup in the program's source (lint tags are a classic example of this sort of thing). It's much easier to write the thing this way.
Re:Don't use C++ as if it was only "C with classes (Score:4, Informative)
There's something attractive about the Java and C# languages having all constructs so well-defined. But both of those languages could afford not to support real hardware. Both target abstract machines and are happy with the results. C++ can afford no such conceit: it thrives in high-performance, customized, and otherwise exotic environments.
Re:Don't use C++ as if it was only "C with classes (Score:3, Informative)
Re:Easy (Score:3, Informative)
#define BUFSZ 1024
char buf[BUFSZ];
printf("Enter something: ");
fgets(buf, BUFSZ, stdin);
strip_newline(buf, BUFSZ);
Re:You don't (Score:2, Informative)
Re:The only sure way I know of: Lambda calculus (Score:3, Informative)
This simply isn't the case - there are lots of programs for which you can easily prove termination. The catch with the Halting problem is that you cannot find a procedure that will work for all programs. In other words you may find yourself in a situation where you cannot prove termination for certain programs; that does not mean, however, that you can't prove termination for others, nor that such proofs are invalid. Trying to prove termination is far from futile - the Halting problem will at worst tell you that you might not be able to do it, but if you can (and often enough you can indeed) then the proof is perfectly valid.
Again, this isn't quite true. Certainly, for some problems, an accurate specification will be equivalent in complexity to an implementation. On the other hand, there are a great many problems where that isn't the case. Take a specification for finding a square root (to within a specified error tolerance epsilon): given an input number x, the function must produce a value y such that abs(x - y*y) < epsilon. That's a complete specification (and it isn't hard to formalise that into a suitable specification language) but you certainly can't compile and run it and get anything useful - the actual implementation of how to find the square root is going to be more detailed, and quite important.
Similarly we can specificy a sort function: given a list of items (comparable by '<') the function must return a list that is a permutation of the input list (that is, they contain the same elements), and such that for each list index i (except the index of the last element) result[i] < result[i+1]. Again, that's a complete specification for a sort function - it ensures that the function does indeed sort the list. On the other hand it is not compilable (except, maybe, into bogosort), and any particular sort implementation will have to use a specific sorting algorithm (be it quicksort, mergesort, or otherwise) which will be undoubtedly more complex than the simple specification given.
Re:Don't use C++ as if it was only "C with classes (Score:2, Informative)
C# might be appropriate for your domain but it certainly isn't in Ada's - safety critical or mission critical systems.
It's also easy to learn as can be seen here http://www.stsc.hill.af.mil/crosstalk/2000/08/mcc