Forgot your password?
typodupeerror
Programming IT Technology

Memory Leaks 34

Posted by michael
from the bloatware dept.
G3ck0G33k writes: "Is there any free software version/clone of Rational's programs PureCoverage and/or Purify? I have worked with both of them on fairly large projects (>150,000 lines of code) and they were great to work with. When the first runs of Purify found nearly fifty instances of minor memory leaks, I was deeply frustrated/impressed. A free (perhaps GPLd) clone would be so interesting; Rational's licensing is killing my current budget. Of course, the more kinds of leaks it may detect, the better. GeckoGeek" We had a similar question last year but there's no harm in seeing what the current answers are.
This discussion has been archived. No new comments can be posted.

Memory Leaks

Comments Filter:
  • by Lumpish Scholar (17107) on Friday August 10, 2001 @12:04PM (#2114924) Homepage Journal
    For the other kinds of stuff Purify does (aside from memory leaks), look at Greg McGary's bounded pointer [gnu.org] work.

    Bad news: You'll have to build your own gcc (Greg's changes haven't yet been accepted in to the gcc trunk), and all your libraries (just as Purify re-writes all your libraries).

    Good news: The resulting code is much faster than Purify'ed code, and finds some problems Purify doesn't. I know of a major software development effort (hundreds of developers, millions of lines of code; sorry, can't give details) that uses bounded pointers to great advantage.

    Other tools: GNU Checker, dbmalloc, Bruce Perens' Electric Fence, MemProf, mpatrol, and Mprof; Google searches will turn them all up.
    • by Anonymous Coward
      Interesting - I had the same idea for a hardware implementation, where each index register would store the bounds. The operation "restrict to tighter bounds" is allowed, while the operation "expand bounds" is not. So the kernel begins with a single pointer to all RAM, and then gives pointers to the user-mode code which point to subsections. It's not even much extra silicon, although a pointer becomes 96 bits!
      • You might be interested in reading about the Unisys A-Series Mainframe architecture. The hardware does automatic bounds checking on arrays (and with support of the operating system, kill your program off if it tries to touch something it's not supposed to). The only recent public document that I know of is the architecture support reference manual at:

        http://public.support.unisys.com/aseries/docs/HMPN X05_SSR461_SSP4/PDF/70126610.PDF

        Unfortunately, that doucment is quite dense (and you're going to have to remove the lameness filter modifications). The A-Series actually uses a structure called an ASD (actual segment descriptor) to store information about the base address, length and type of data in the array, among other things. Of course, the processor can take a look at that data in parallel with accessing data in the array (and throw an exception before committing any changed data), so it has almost no performance cost (aside from reading the ASD, which is probably on par with the cost of loading the array length into a register at the beginning of a loop).

        More food for thought: the architecture also has additional "tag" bits on every data word. These give some primitive type information (e.g. code, single-precision real, array element, ASD, etc...) . The processor will not allow a program to arbitrarily change data in a code segment, or things such as return addresses on your stack. I don't know if there are any other machines around today that still have this attribute (if anyone knows of some, please post!). For example, it makes a lot of the recent buffer overflow attacks that we see a moot point, since a string transfer operator would not be allowed to touch the stack frame!

  • by pthisis (27352) on Thursday August 09, 2001 @01:52PM (#2118340) Homepage Journal

    The Boehm-Weiser garbage-collecting malloc() can be built in a leak-detection mode. Every time an object is leaked, it prints out the address of the memory in question. Do that. Then it's 15 lines of python to correlate that back with the malloc() calls; I wrapped malloc/realloc to print out the line number and filename, e.g.

    void *our_malloc(size_t howbig, int line, char * file)
    {
    void *p;

    p=GC_malloc(howbig);
    fprintf(stderr, "Line %d of %s/%s(): %p\n", line, file, p);
    return p;
    }
    #define malloc(x) our_malloc(x, __LINE__, __FILE__)

    with similar for realloc (and make free do GC_free).

    Then run the proggy, redirecting stderr through a simple python script: (leading spaces have been replaced with underscores since slashdot doesn't do PRE)

    import sys

    a={}

    for line in sys.stdin.readlines():
    __line=line.strip()
    __num=line[line.find("0x"):]
    __try:
    ____num=num[0: num.index(" ")]
    __except:
    ____pass

    __if line[1]=="i":
    ____a[num]=line
    __else:
    ____print "Leaked object: "+a[num]

    When I run my program this way I get the following output:

    Leaked object: Line 43 of leak_stuff.c/(): 0x806efe0
    Leaked object: Line 43 of leak_stuff.c/(): 0x806eff0
    Leaked object: Line 55 of leak_stuff.c/(): 0x806dfd8

    Which tells me which lines to look for the initial allocations of leaked objects at.

    The garbage-collecting malloc is really cool; it's at:

    http://www.hpl.hp.com/personal/Hans_Boehm/gc/

    for now, but rumor has it that gcc will become the official source for it at some point (it's needed for the Java compiler).

    Sumner

    • this is all fine and dandy, but what about closed source, stripped 3rd party libraries?

      i'm using closed source libraries in a multi-million line project, and I think they have a memory leak.

      I cant wrap around malloc in there code, 'cos I dont have it. I call functions like FMLAdd() and it all happens magically.

      • You can use LD_PRELOAD to wrap malloc, assuming they're dynamically linked against libc (almost definitely). If they use GNU libc and don't dynamically link, they're required by the LGPL to distribute object files so you can relink against your own libc.

        If you don't have the source, fixing a leak is tough but you can rebuild the garbage-collecting malloc in a redirect mode so their app uses it instead of libc's malloc. Then LD_PRELOAD it. I used to do this with netscape-communicator back when it leaked like mad; worked great, though as I mentioned there is a chance that gcc's optimizations could confuse the gc. In practice it seemed to work okay, for any app where a very rare crash isn't the end of the world (netscape crashed all the time anyway) and where the app is already leaking anyway, it's worth a try.

        Sumner

    • I was going to write the same thing about the Boehm GC, but yours was the first message I saw. Seriously, this collector is an excellent thing. A person would have to be crazy to do manual memory management unless they had a solid technical reason not to.
      • by pthisis (27352)
        Be careful using Boehm in production code; the web pages has the caveat:

        C compilers may not hide pointers in the generated object code. In our experience, standard commercial compilers obey this restriction in unoptimized code. Most aggressive optimizing compilers do not obey this restriction for all optimized code. For details and examples see papers/pldi96.ps.gz. However, it is difficult to construct examples for which they violate it, especially for single-threaded code. In our experience, the only examples we have found of a failure with the current collector, even in multi-threaded code, were contrived.

        However, the gcc developers claim the gcc does in fact violate this constraint. So using Boehm gc with gcc may not be safe in production code. The gcc mailing list has had a couple of threads on how to make gcc garbage-collector friendly in the future (once again, Java is one impetus for this). Until then, I'd stick to manual mm and use the gc only to help find leaks.

        Sumner

        • Thanks, good to know. But personally I won't have to worry about this. All of my gcc code is permanently under development. I once read about something called "optimization". I hope to use it when I get a program completed, someday.

          (I finish things at work to my manager's satisfaction. At home, I finish things to my satisfaction, and I'm never satisfied.)

    • by d^2b (34992) on Thursday August 09, 2001 @10:41PM (#2169884) Homepage
      dmalloc (www.dmalloc.com) seems to work pretty well for finding memory leaks. It is distributed under a BSDish
      license.

      Compiles and runs out of the box on an alpha
      running Linux.

      GUI? uh no. It has a nifty command line utility to control logging etc...

  • ccmalloc (Score:1, Informative)

    by Anonymous Coward
    I went through this phase of trying to fix up the memory of all the code I'd ever written. I found ccmalloc [inf.ethz.ch] to be the best. Its the easiest, instead of gcc -o prog prog.o you just prefix with ccmalloc eg. ccmalloc gcc -o prog prog.o. It provides a nicely formatted output log file, with configurable filtering, showing the stack trace of each unfreed leak, and also catches over/underflows, and lots of other stuff. hint: if you are using the c++ std library get g++-3 (with libstdc++-3) and #define __USE_MALLOC to disable malloc pooling. RPMs here [rpmfind.net]
  • Write a malloc wrapper and #define it in place of the real thing. With #define you can easily log the location in the code, amount of RAM, and location in memory to a file, then write a script in the language of your choice to see which locations in RAM weren't dealloced, and match them with the appropriate malloc call, which also contains the location in code. It took me about an hour to implement this in a multi-thousand line program and it works very well. The only thing it doesn't catch is when a library call mallocs something and expects you to dealloc it, but i solved this by including a fake malloc call that just logs but doesn't actually malloc, so you'd call it right after the library call that actually does the malloc.
  • by epperly (188343) on Friday August 10, 2001 @11:34AM (#2129544)

    I like dmalloc [dmalloc.com] for memory debugging. It even found a memory bug for a program that purify choked on. It doesn't have a GUI.

  • use C++ (Score:2, Interesting)

    by mj6798 (514047)
    In C, this is a never ending battle. Even with Purify, you are going to spend lots of time introducing bugs, then tracking them down. If you must stick with C, consider using one of the C interpreters (EiC, cint, etc.). Machines have gotten fast enough that you can use them for debugging your code. Or stop worrying about it and just use the Boehm garbage collector as a garbage collector.

    I switched from C to C++ basically because I couldn't get Purify for Linux. C++ has allowed me to adopt clear, well-defined memory management strategies and automate various pointer checks. I hardly ever get memory leaks or pointer errors in my C++ code anymore.

    But no matter what you do in your own code, if you are using C or C++, you will always be exposed to numerous pointer bugs and leaks in library code. Most real-world C++ code commits the same memory allocation sins and has the same pointer bugs as real-world C code--people aren't taking sufficient advantage of C++'s smart pointer facilities (even STL is flawed in that way). Therefore, for multiprogrammer projects, I wouldn't use anything but Java or another safe language anymore.

  • One of the places I'm involved with doesn't use the standard malloc calls. Instead we use something more like:
    get_mem(ptr, size, "widget hash table")
    When debugging, get_mem keeps track of all allocs. At the end, just before the program shuts down the heap dump routine is called which lists all outstanding memory blocks along with the debug string so you can see where they were allocated.

    It's also often practical to call the dump routine at various points within the program and give the output a quick look-over or diff - it's amusing how often you can nip these problems in the bud this way.

    Also, if you get really desparate, change the get_mem routine to increment a global counter and tag that to the end of each allocation info block. If you keep a program debug log and log each allocation it makes it easy to see where a loose block was allocated - grab the unique ID from the dump and search the log file for it.

    A handy feature about this trick is that you use #define to define get_mem, so when you go to production you simply define it to malloc and throw the debug string away - no speed or size cost in the running program. In addition, it basically costs nothing except an hour or so to set it up in the first place. The catch is you have to use it religiously from the start of your project.

    A really simple trick, but it has saved me so much work!

  • Roll your own (Score:2, Interesting)

    by Ratbert42 (452340)
    In college, I rolled my own wrapper for malloc(), free(), and array/pointer dereferences. A couple hours of coding that wrapper caught most of my memory leaks and seg faults. If I could do it when I was half-drunk and didn't know what I was doing, you've probably got a developer on staff who can handle it.
  • Free Beer? (Score:2, Interesting)

    by Ratbert42 (452340)

    A free (perhaps GPLd) clone would be so interesting; Rational's licensing is killing my current budget.

    Maybe you should put a developer or two on that project and see how long it takes them to build something similar. I think Purify runs about $1,500 now (could be wrong). That's what, two Aeron chairs? That shouldn't kill any real company's budget. Numega's Boundschecker is a viable cheaper alternative though. Or just rip off the free trial versions.

    When I've seen Purify bought, a developer downloaded the trial and built a list of all the problems he found and fixed using it. When he showed his manager how much pain and suffering the product could save it was an easy sell. (The hardest part was countering the "so everything's fixed already?" mentality.)

  • MEMPROF (Score:3, Informative)

    by kijiki (16916) on Thursday August 09, 2001 @02:47PM (#2151318) Homepage
    Its by one of the RHAD labs kids. Its basically just a GUI around bohem's garbage collector in leak-detector mode.

    Its not purify (it really aims for leak detection, not all the other errors purify finds), but the efence + memprof combination gets you about 85% of purify's functionality.

    It seems to handle threaded apps reasonably well, and C++ doesn't faze it. The only down side is that its hard to get running on non-x86 platforms.
  • find a collection of different memory usage problems, and is reasonably easy to use even on large projects
  • they told me they use electric fence. While it is definately not the same (I have used purify as well) it is basically a library that you like against and then when you run your program it checks your malloc's and things like that to make sure you have allocated the correct amount of space.

    But to answer the question are there any out there? NO, not with pretty GUIs and all.

    • by szomb (318129)
      ElectricFence detects overruns of malloc()d buffers (hence its name). Unless this changed recently I am fairly sure it has nothing to do with leak detection?
  • Checker (Score:2, Informative)

    by anaymouse (513946)
    Try Checker [gnu.org] I think AX.25 pointed to some relevant information, but was moded has redundant for some odd reason.
  • mpatrol (Score:2, Informative)

    by brianmed (131838)
    mpatrol is another tool to help with this.

    It can:
    - log your memory usage
    - report on improper memory usage
    - profile your memory usage
    - work with your applications *without* re-linking (assuming your OS allows this)

    The web page is at:

    http://www.cbmamiga.demon.co.uk/mpatrol/

    In addition, the author has excellent documentation. The pdf manual actually has a section that lists competing products and what they do.

    http://www.cbmamiga.demon.co.uk/mpatrol/files/mp at rol.pdf

Do molecular biologists wear designer genes?

Working...