Static Code Analysis Tools? 87
rewt66 asks: "We are looking for a good static analysis tool for a fairly large (half a million lines) C/C++ project. What tools do you recommend? What do you recommend avoiding? What experience (good or bad) have you had with such tools?"
Re:Ideas (Score:4, Insightful)
That's great and all, but some things just take a lot of code. Refactoring into libraries only goes so far, you're still going to have a ton of code, it'll just be split up in libraries. That's useful, and it's good advice, but since the poster didn't ask about it, you could at least give him the benefit of the doubt and assume the project is already organized appropriately. Half a million lines isn't that big, certainly not big enough to automatically assume their codebase is organized badly.
Re: (Score:2)
For example, I worked on DB2 for a while. I routinely saw 3000 line files that implement such complicated things as hash lists. Then there was another 2000 line file that performs modular reduction in a dozen different ways because
Re: (Score:3, Insightful)
I agree that much code is far longer than it needs to be, but I don't think it's fair to equate this with large projects.
IME, large projects (over a million lines, say) often get that way because they have been built around some sort of framework, and the boilerplate code pushes the line count up. When you get past a certain scale -- more than a handful of developers, or with the team split across multiple geographic locations, that sort of thing -- such frameworks can be very valuable in retaining a sane
Re: (Score:2)
Whoever wrote that code obviously failed "problem statement" 101. Worse yet, the code had bugs in it and wasn'
Re: (Score:2)
Part of IBMs problem is turnaround. Many of the developers are new to DB2 and fresh out of uni. The hash template I saw was a prime example of "I found this in a textbook somewhere." It was completely overkill since it's only used to hash array of bytes (why a template?) and the montgomery reduction used to perform the bucketing is not needed since the hash is invoked only upon startup/shutdown.
I have to stop you there. Turnaround on DB2 developers, at least in my area, is almost zero. Most of the developers around me who have 5 or more years experience, some having been with the project for 20 plus years.
Now we do hire a fair number of IIP students each year for 16 months sessions - maybe you were surrounded by students.
In my experience, DB2 concentrates on functionality, stability and performance. Code-size is tackled when it impacts one of those areas and is otherwise unimportant.
Cheers
Re: (Score:2)
I don't doubt OO is shitty -- I wouldn't poke it with a stick. But one important thing to realize is that smart people end up writing shitty programs all the time.
For example, I once tested an API that was obviously designed and written by utter morons. Yet each time I had to talk to one of the programmers, or their manager, I was pleasantly surprised. They were smart, committed, had the
Re: (Score:2)
The key here is that once some piece of (relatively) independent code is in library, you can make a test suit for it.
After any change committed to library run local test and see does it work or not.
The approach does miracles to reusability and maintainability of code.
Re: (Score:2)
Look at something like my LibTomCrypt. It covers a wide range of cryptographic algorithms, it's only ~48K lines of code, quite a bit of which are tables for the ciphers/hashes. There are also plenty of comments, etc. Of actual code there is probably only ~30K or so.
And in that 30K I do symmetric ciphers, hashes, prngs, MACs, RSA (with PKCS #1), ECC (DSA/DH), DSA (DSS) and a decent subset of ASN.1.
Would it be more impressive if I did all that in 100K line
Re: (Score:2)
That's I think part of the problem, people think they have to have all of the source in one build to make a project.
A hello world program execution is the result of a kernel, shell, standard C library, etc... none of which you count as lines of code in the program.
Tom
Re: (Score:1, Insightful)
I've seen projects in the 1.5Msloc range, but they were broken down into 1500+ different modules to make them managable. It was all homegrown because there were no free or commercial alternatives to any of the
Re: (Score:2)
How DO you define the a project?
Perhaps it's already split into numerous sub-projects with even more sub-sub-projects.
I've seen a project where large quantities of source code was automatically recompiled with a new compiler. That single project easily had several million lines of code
Re: (Score:2)
Would you consider a Fedora Core installation a single project? No, it's the amalgamation of hundreds of independent OSS projects.
No one DLL or application should be 500k lines of code. If it is, it's either a lot of tables, or shitty code that finds new and inventive ways of doing things you don't need.
Tom
Re: (Score:2)
That's a very bold statement! Is there some reason for adopting that particular magic number?
Re: (Score:2)
Frankly, I'd be disappointed if any one part was larger than 100K lines of code.
Tom
It All Depends on Context (Score:2)
Re: (Score:2)
What would it be doing that it can't refactor the code into manageable and verifyable libraries?
Tom
Re: (Score:3, Insightful)
Is is SAP a single project, and are all those individual parts considered projects too? Perhaps a single
You seem to be missing the point that there is no clear definition or scale for a project, atleast not in the world outside of yours where every single compiled module seems to be a "project".
In real-life, a project may be anything from rebuilding an entire set of applications to fixing a typo in a batch file.
Re: (Score:3, Insightful)
Re: (Score:2)
If you look at things like the kernel or GCC they're already split up into mini libraries inside the host project. So yeah, all of GCC may be several million lines of code (I don't know the exact numbers) but it's not just
Re: (Score:2)
Anyways, point being, you shouldn't have 500K lines in any single part of a project. It makes testing and verification impossible
Re: (Score:2, Informative)
C/C++ checkers:
http://www.coverity.com/ [coverity.com] (commercial)
http://www.dwheeler.com/flawfinder/ [dwheeler.com] (OSS)
Re: (Score:1)
Disclaimer: I have never used this tool and actually know relatively little about it. However, my current research uses other software the same company makes (CodeSurfer) and is very much tied to this company, and I have an internship with them this summer. The company was started by my adviser and his adviser, employs a couple former advisees of my adviser, etc.
Prodev Workshop (Score:4, Informative)
http://www.sgi.com/products/software/irix/tools/p
Looks like it's IRIX only though, so YMMV, to put it mildly.
LLVM (Score:2, Informative)
http://llvm.org/ [llvm.org]
PreFast (Score:2, Informative)
Just add the
It's the tool that is used by MS to test its own code, known internally as PreFast.
It helped me find many bugs in other people's code.
Re: (Score:1)
With the latest versions of the Windows SDK,
Static analysis tool? (Score:5, Funny)
Re: (Score:3, Insightful)
That may be part of the problem. Cheap junior programmers from India doing cut'n paste coding.
Re: (Score:1)
Re:Static analysis tool? (Score:5, Funny)
"Sorry, I needed it somewhere else."
Copy and paste coding is much better.
Coverity (Score:5, Informative)
They have excellent checks as well as the best framework for creating custom tests that I have ever come across.
NOTE: I am not affiliated with coverity, just a very satisfied user.
Re: (Score:2)
Re: (Score:2)
We used it once on a large set of code from a company we acquired. Since none of us were very familiar with the code, and the code had a lot of stability problems, the thought was that it might help us find some of the more elusive bugs and improve the stability of the software.
Coverity did find a lot of "problems".
Re: (Score:2)
Re: (Score:2)
FlexeLint / PC-lint (Score:5, Informative)
http://www.gimpel.com/html/lintinfo.htm/ [gimpel.com]
I've never tried it for a code base as large as 500k. My guess it that I used it up to 15k. I was very pleased with it. I agreed with just about every warning it raised, and was able to easily suppress individual instances or whole classes of errors. I also found it somewhat easier to get started with compared to the big tools from Rational et al.
I think it's a bit pricey for a an open-source coder like me, but it should be cheap enough for a company with a tools budget.
Re: (Score:1)
Re: (Score:1)
I've been using PC-Lint for over 10 years now. I think it's made me a better programmer.
I love PC-Lint, but I really do wish its handling of C++ was better. It was really rough at first, generating kinds of false errors on even the most harmless-looking template code. It's better now, but it still has a lot o
Re: (Score:3, Informative)
A few points, though:
- It is purely text-based, so if you are looking for a shiny GUI-based tool (easier to sell to the PHB), you are out of luck.
- depending on the quality of your code, running it for the first time can result in a huge (make that HUGE) amount of warnings. You might want to start small and only turn on more and more options later. Initially, you will have to invest quite a bit of time to get your code "lint-clean". In the long run, thi
Re: (Score:2)
I'd agree with the recommendation, and FWIW I work on a project with over 1,000,000 lines of C++ code.
I also agree with the warnings from others about Lint being a bit verbose until you shut off a few stylistic things you might not care about, which fortunately is easy to do.
I also also agree with the caveat about false positives with non-trivial C++ code: sometimes it just plain misunderstands and gives incorrect warning/error messages. It's been improving steadily in recent versions, though, and the v
Re: (Score:1)
I only wish the Linux version was as cheap as the Windows one, so I could afford to buy a copy.
Re: (Score:1)
All the statistics I ever use (Score:4, Funny)
For finding duplicated code... (Score:1)
Splint (Score:2)
http://www.splint.org/ [splint.org]
END OF LINE
Careful what you wish for (Score:2, Informative)
C and C++ Static Analysis tools (Score:3, Informative)
I work on a C/C++ code base that is a lot bigger than 500k lines. I've worked with results produced by Klocwork [klocwork.com] and also with the output from Reasoning [reasoning.com]. Both of these services/packages will cost you money but both provide good insight into your code. The commercial packages generally produce more focused results with less false-positives, so while they cost you money up front, your developers will spend less time weeding out the noise.
If paying money out for a commercial package isn't your thing, don't overlook the old standby lint or splint [splint.org], an updated successor.
Also well worth investigating to see how your code is actually running is Valgrind and it's associated tools [valgrind.org]. The Valgrind toolkit will give you a good idea where memory is being leaked, where variables and pointers are going off the rails. Valgrind hooks into a running program, so it's important to make sure that you test all the corners of the codebase if you go this route.
Cheers,
Toby Haynes
Re: (Score:2)
One minor clarification: valgrind can't attach to an already-running program the way a debugger can. Valgrind is actually an x86 emulator, so you have to ask valgrind to execute your program from the very beginning.
Re: (Score:2)
Re: (Score:2)
If valgrind isn't available on Windows (I wouldn't know, or care), there's the always the classic, Rational Purify. It's probably expensive.
Purify (Score:3, Informative)
A coworker of mine who's quite a C/C++ jockey used it recently (this month), and said it's still very good.
Re: (Score:2)
Our experience was just the opposite. We recently gave up using Purify entirely, because it wasn't finding anything that tools like Valgrind didn't find more reliably and much faster. YMMV.
Re: (Score:1)
For what? (Score:2)
Re: (Score:2)
What software analysis tool? That all depends... (Score:3, Informative)
One important thing to consider is the set of compilers, tools, target system, and build environments you are using. If you are using MS only products the you will most likely have very good support because most all source code analysis suits will simply import the build information and you will be off and running right away. If your environment is Unix or embedded systems then things may be more difficult because you will need to hook into the build process somehow. The scanner tools usually intercept the CC command from a "make" build and call their back end using their custom processing rather than the compiler proper. Different products do this in different ways so be sure the product you choose knows how to deal with your specific build environment. In my case I walked into another parties environment and needed to simulate a build for a new build environment that I had never seen before, every time. Not one environment ever looked like the next, so the setup and configuration was always a big challenge, just to get started.
Prexis is primarily a tool for life cycle scanning of source code for security issues. There are two ways to perform the code scanning, with either the main engine component which can schedule nightly scans and track progress over time or with the additional Prexis Pro utility, which is designed for quick assessments by the engineers on their own code without logging everything into the main database. The Pro tool worked best for my code assessments since I had no need for tracking changes over time, and it was a little easier to configure which counts for a lot in my situation.
PolySpace is a completely different tool with a different purpose from Prexis. PolySpace attempts to mathematically discover runtime flaws in the code while only using static analysis to do so. It does a great job on smaller projects, but because of the complexity and thoroughness of its analysis, it is somewhat slow. PolySpace needs to evaluate an entire application all at once in order to do a good analysis. If your .5 MSLOC of code is many separate programs/executables then you will be fine, but if you are talking about one huge monolithic application then you may have to evaluate it in chunks which just increases the false positives and forces the engineer to do more manual chasing of details to determine if the issue is really a problem or not. From what I have seen this product is in a class by itself.
BTW - keep you eyes on this site: http://samate.nist.gov/index.php/Main_Page [nist.gov]
Re: (Score:2)
Last time I heard, Polyspace didn't do C++ -- just C and some random toy language (Java or Ada?). Cool but extremely expensive.
CodeSonar + other commercial tools (Score:2)
Our major competitors in this space are Coverity [coverity.com] and Klocwork [klocwork.com].
All three tools can (to some extent) infer how a program will behave at run-time, so they find more subtle bugs than tools that just look for suspicious patterns in your code.
SPLINT is the answer for C. (Score:2)
I have used it sometimes, and as I have noticed that in some cases the version from CVS is better than the released version. (but as always, your mileage may vary).
For C++ it's a lot harder, but the programming rules for C++ and the compilers are a bit stricter too, so you may be helped there.
To make things worse (or better, depending on how you see it :-) ) you can always take a look at PurifyPlus from IBM. It conta
Management Side (Score:2)
Regardless of what tool you select, you will have to decide what rules you want to apply and what you are trying to get out of using the tool. If management doesn't understand the purpose of the tools, they may make inappropriate decisions on how to use them. As an example, I worked on a large project, (hundreds of developers), and management decided that we needed to use a static analysis tool and that code had to be "clean" before it could be checked in. It was phased in, so we had a month to eliminate
Klocwork static analysis suite (Score:1)
My Static Analysis (Score:1, Troll)
Re: (Score:2, Informative)
Most console computer games for example start at around 500K lines...
Re: (Score:1, Troll)
Klocwork (Score:1)
I am employee of Klocwork.
If you are researching this for you enterprise I suggest you evaluate Klocwork (and its competitors: Coverity, Grammatech, Parasoft, there are others). We handle large-scale C/C++ projects, our own codebase is much larger than yours and we run Klocwork in-house to track defects in our own code on a daily basis and on developer desktops for subprojects. In fact we successfully handled mammoth projects as big as 10M lines of code and beyond (but frankly, it is getting rather trick
Understand for C++ (Score:2, Interesting)
From their marketing blurb...
Understand, our flagship product, helps thousands of companies maintain impossibly large or complex amounts of source code. It parses source code for reverse engineering, automatic documentation, and calc
Re: (Score:2)
Re: (Score:1)
c++test (Score:2)
I prefer the Insure++ product myself. It really helps in finding bugs.
IT IS NOT FREE.
Depends on platform and kind of analysis (Score:1)
If you're on Windows, the latest Visual Studio C/C++ compilers include a pretty good (but basic) code analysis tool built in. Just use the
Fortify Software has a static tool (Score:2, Informative)
OK, I should have given more detail... (Score:2)
One of the CPUs is running Linux. This code we compile with gcc on a Linux box.
Another CPU is running ThreadX. We cross-compile this on Windows using the Green Hills compiler.
A couple other CPUs run Nucleus OS. These are also cross-compiled on Windows using the Green Hills compiler.
We have gotten evaluations of KlokWorks and Coverity (and I've probably said enough here for them to figure out who we are). And they do good stuff, too. But I'm trying to look a
sparse (Score:1)
http://www.kernel.org/pub/software/devel/sparse/ [kernel.org]
The compiler (Score:2)
If you are like all projects I have seen, you haven't turned on the relevant compiler switches for ANSI/ISO compliance and full warnings. Do that first.
Second, get yourselves a few more compilers. If you use gcc, fetch the latest version. It doesn't matter if you can run the compiled code.
Third, write type safe code. If it's really C++ code you are writing, disable C-style casts and see how much of those monstrosities (from a C++ poi
C++test from Parasoft (Score:1)
IncludeManager (Score:1)