Ultra-Stable Software Design in C++? 690
null_functor asks: "I need to create an ultra-stable, crash-free application in C++. Sadly, the programming language cannot be changed due to reasons of efficiency and availability of core libraries. The application can be naturally divided into several modules, such as GUI, core data structures, a persistent object storage mechanism, a distributed communication module and several core algorithms. Basically, it allows users to crunch a god-awful amount of data over several computing nodes. The application is meant to primarily run on Linux, but should be portable to Windows without much difficulty." While there's more to this, what strategies should a developer take to insure that the resulting program is as crash-free as possible?
"I'm thinking of decoupling the modules physically so that, even if one crashes/becomes unstable (say, the distributed communication module encounters a segmentation fault, has a memory leak or a deadlock), the others remain alive, detect the error, and silently re-start the offending 'module'. Sure, there is no guarantee that the bug won't resurface in the module's new incarnation, but (I'm guessing!) it at least reduces the number of absolute system failures.
How can I actually implement such a decoupling? What tools (System V IPC/custom socket-based message-queue system/DCE/CORBA? my knowledge of options is embarrassingly trivial :-( ) would you suggest should be used? Ideally, I'd want the function call abstraction to be available just like in, say, Java RMI.
And while we are at it, are there any software _design patterns_ that specifically tackle the stability issue?"
You're not the first one.... (Score:4, Insightful)
Re:You're not the first one.... (Score:2, Informative)
As for your question about CORBA, look into IceC++ [zeroc.com]. I read about it somewhere and it sounded cool
Re:You're not the first one.... (Score:5, Funny)
Learn to use STL and don't *ever* type "new[]" (Score:3, Informative)
Do *all* memory management via STL vector/string.
2) Don't ever type "new[]/delete[]".
Just don't do it. Not. Ever. Use std::vector instead.
"Arrays are evil" - the C++ FAQ.
PS: You can still use malloc()/free() but only as a last resort in low-level classes which are designed for data storage.
3) Get a reference-counted pointer and use it.
Automatic memory management...'nuff said.
4) Attach an alarm bell to your "~" key.
If you're writing destructors for classes which don't control system resou
Re:Learn to use STL and don't *ever* type "new[]" (Score:3)
By spliting the app into more than one language you end up with a clean code seperation between the *fast* code and the *stable* core. Chances are you don't realy need to work out complex thread communication all that often, but if people think the App needs to be fast then they will start optimising stuff that they have no reason to thouch. If this is not fast enough prof
Re:You're not the first one.... (Score:3, Informative)
What a frightening post. (Score:3, Insightful)
Re:You're not the first one.... (Score:5, Insightful)
If not, then great! Please post some references to literature which demonstrates how what you've suggested is sane and/or possible
Re:You're not the first one.... (Score:3, Interesting)
* No (or very, very limited) side-effects. In other words the result of a function is not dependent on the current program state. Once it is exhaustively verified in testing, that function will forever more return the correct results because the run-time state won't affect it.
* The language itself can often be treated as a specification of correctness, and even formally proved through static analysis. As a trivial example if you
Re:You're not the first one.... (Score:3, Insightful)
This is not a comment against Haskell, but against suggesting it as appropriate means, given the poster's situation.
Re:You're not the first one.... (Score:5, Interesting)
Re:You're not the first one.... (Score:5, Funny)
Referential transparency. [google.com]
That comp.lang.functional thread is interesting because there are guys from Ericsson elaborating on some real-world aspects of referential transparency. As you know, Ericsson uses the funtional programming language Erlang for their switches. See more in: Welcome to a Smarter Way of Programming [erlang.se]. Of course, you can't take their use of Erlang seriously, because they're from Sweden, and Sweden, being a fucked-up third-world country with no tech at all, is not an example for America. The mighty AT&T pushed C++, and now the world is better, safer place, where software errors are a thing of the past.
Obvious ! (Score:5, Funny)
Thomas-
Re:Obvious ! (Score:3, Insightful)
Re:You're not the first one.... (Score:4, Interesting)
And, in visual studio
Re:You're not the first one.... (Score:5, Insightful)
Re:You're not the first one.... (Score:5, Insightful)
a good result ( correctness ). And it wont.
What the guy really needs is a great team and some decent
process to backstop that team. Not a silver bullet.
Re:You're not the first one.... (Score:5, Insightful)
What I'm hearing is the guy's boss telling him "And it'd better not crash!"
Typically, when absolute reliability is needed (nuclear power plants, spacecraft, pacemakers), you start subtracting libraries which aren't known to be absolutely reliable, yet in this case they're adding them. In addition, he's wanting it to run on multiple platforms, which radically increases your testing workload.
On top of that, he admits he's got no experience in the techniques needed to produce reliable software. Probably has a short deadline, too.
My crystal ball says he's doomed to failure.
Chip H.
Re:You're not the first one.... (Score:3, Insightful)
mmm...huge inline claim. Sounds like that could lead to memory thrashing. But seriously folks, if you took the time to actually read my post (you know, left to right, top to bottom), you'll see that I never made any such claim. There is simply nothing inherent in C# that will protect you from most classes of errors. Especially the really insidious ones that A.) You don't discover until it's too late and B.) Usually are the result of bad design,
misc. advice and a small rant? (Score:3, Interesting)
Where do people get this idea? I have ported quite a few applications, and usually the porting done by locating the libraries you need on the new platform, and fix a few oddities in the current platform (like closing sockets in z/OS or switching to unsafe multitasking (p-threads) on windows. Porting to linux is so trivial that I often do it just to get access to the superior tools available there, especially valgrind. GUI is the exception, of c
Re:Bullshit (Score:3, Interesting)
I agree.
This is by nature one of the biggest strengths of C and C++, how someone could conclude that by using C++ adds some sort of complexity in cross platform development actually amazes me.
If it adds complexity, in comparison to what? I would like to see the poster above you explain what is actually easier to use for diverse application development that is actually better at c
Good call (Score:2, Interesting)
Executive summary of this post: Keep it simple. As simple as it can be while getting the job done. The more buzzwords you think about implementing, the more you need to reconsider whether you really need that whiz-bang feature.
You need to abstract your design into really independent layers, such that the backend proc
Listen to what he said!! (Score:5, Insightful)
I hate to break it to people but there *are* libraries, especially for types of scientific computing, that are only (reasonably) availible in C++ or sometimes FORTRAN. Not only would abandoning these libraries mean he would completely have to reinvent the wheel but also might cause serious compatibility problems not to mention a much greater ongoing maintenence responsibility (he can't just check his program to make sure things still work when someone fixes a library bug).
Moreover, the idea that because he is considering using CORBA, IPC or whatever else speed can't matter enough to require C/C++ is dead wrong. It is true that whatever *parts* of the process are done using these components may not require huge amounts of speed but this doesn't mean one of these components isn't doing something very processor heavy.
In particular what he says sounds like the situation in some areas of scientific computing. If one is writing a program to do some sort of simulation or similar math intensive operations speed can be *very* important in the critical parts of the code but (in some cases) transfering information to the GUI or other components need not be particularly speedy (increasing by an order of magnitude may make a small difference in overall runtime). Imagine a program that does some kind of weather, or nuclear detonation simulation. The cross-processor communication and the core simulation kernel need to be very fast but the GUI and data input components need not be particularly fast. Also it is my understanding that often the critical libraries in this area are often only availible (at least freely) with C/C++ or fortran bindings.
Anyway I think it is important to distingush several different goals, ultra-stability, minimal downtime, and minimal data/computation loss. For instance a climate simulation that may run on a supercomputer for months it is very important to have minimal data/computation loss (i.e. if something goes bad you don't lose months of very valuable supercomputer time) but you need not have ulta-stability or minimal downtime. As long as when any node crashes the simulation can easily be restarted without loss of data there is no problem. On the other hand if you are running a website like slashdot it is minimal downtime that is important it doesn't really matter if some of the web server processes are rebooted once in awhile. If, on the other hand, you are writing code to monitor a nuclear power plant it is ultra-stability that is important (though I can't at the moment think of something that requires distributed processing and ultra-stability but I'm probably just missing something).
So I think the answer depends on what sort of stability you want. If it is important that no individual *node* crashes (though the GUI/other non-core components can crash) then you should pursue the seperation you described above. I have to admit I'm not an expert here but the client-server model (like mysql, X etc.) seems to work well in this context. However, this depends alot on what sort of data you need to transfer. If you just need to send the core setup commands and get back mostly unstructured info (say a grid of tempratures or other simple datasets) then I would suggest sticking with one of the simpler abstractions and don't get lost in CORBA. On the other hand if you need to send back and forth real objects with significant structure then creating your own serialization system/bindings is just asking for bugs.
On the other hand if what you want is minimal data/computation loss, downtime, or any other property where it is the overall system you care about not a crash at any particular node then I suggest concentrating less on dividing any one node into comp
Re:Listen to what he said!! (Score:3, Insightful)
And what he is asking can not be done BECAUSE of the libraries.
you CAN NOT guarentee that the libraries are 100% stable. Typically I find that stability points to the libraries and you have to write your own to
Re:Listen to what he said!! (Score:4, Informative)
So what he needs to do is develop a design that is robust in the face of errors. In other words, it needs to be fault tolerant. There are well-known design practices for doing this (checkpoints, watchdogs, rollbacks, etc.) as well as design patterns for robust distributed computation (see, for one example, Joe Armstrong's thesis on making reliable systems in the presence of software errors [www.sics.se].
No, the situation the OP is in is not ideal. But it's also not impossible to work with, and there are techniques that can help him to get closer to achieving his goals within the constraints placed upon him.
Re:You're not the first one.... (Score:5, Insightful)
On Garbage Collection and Stability (Score:4, Insightful)
If your code is unstable in a way that memory leaks and segmentation faults are not only a "remote possibility" but a - even if only rarely - reoccuring event, then any safeguards you implement won't be overly sucessfull, unless you fix the code that causes the errors first. (Disclaimer: There is no perfect code. Even if there were no bugs in the code, the program has still the "remote possibility" to crash due to errors in the hardware / OS)
That said, garbage collection or not is a different discussion. Some say it is bad and breed lazy programmers, while others argue (I amongst them) that it is a terrific tool for designers, since it almost eliminates the occurance of memory leaks (unless you do some really bad programming) and it might even speed up your program [wikipedia.org]
Re:You're not the first one.... (Score:3, Informative)
The use of managed language will not necessarily result in a more stable code. Recovering form SIGSEGV by installing a POSIX handler or detecting the death of the forked child process in C++ can be done with the same ease as catching NullPointer Runtime exception in Java.
I would agree that having to write memory management code is error prone, but it is possible to be careful (i.e., use auto pointers, stl vectors instead of arays, etc). You do need to be very good with C++, however.
My suggestion to
I don't know why this dominates the first page... (Score:5, Insightful)
Question 1: what strategies should a developer take to insure that the resulting program is as crash-free as possible?
Answer:
a. Use OO techniques and maintain all objects in your system extremely simple; furthermore, maintain all methods in your system extremely short, well-contained, well-defined.
b. Don't use C++ arrays, ever. Especially not for strings. Use and abuse the STL. is just plain beautiful IMH?O.
c. Check extensively the behaviour of your constructors and destructors.
d. Make a object-lifecycle diagram of each class you program. In the diagram, relate it to the neighboring classes (parents, children, siblings, classes involved in design patterns with, classes aggregated, classes value-aggregated, classes where this is aggregated or value-aggregated)
e. Use, carefully, and always when possible, smart pointers. Remember std::auto_ptr is your best friend -- its limitations are a defining part of its strength. Remember boost::shared_ptr is also a good friend, but its cousin boost::intrusive_ptr is even more friendly -- but use one of those (and their other cousins scoped_{ptr,array}, shared_array, weak_ptr) only in the (rare) cases where auto_ptr does not apply.
f. As a corollary to (e) above, use boost. This is really an extension of (b), too.
Question 2: How can I actually implement such a decoupling?
Answer:
I would use a simple, socket-base, take-my-data, gimme-my-results scheme. It would be network-distributable, easy to detect if some service is or isn't alive via timeouts... If you want something more sofisticated/RMI-like, SOAP (with binary XML or compressed) may be an option. The simpler the better IMHO.
Question 3: are there any software _design patterns_ that specifically tackle the stability issue?
Answer:
All of them? IMHO, DPs can represent huge tool to increase the stability of a system. Take a look athere [WARNING: PDF] [uiuc.edu] (and in the bibliography) for some ideas.
I know many of my posts were self-marketing lately, but if you need someone to work with you, I'll be happy to send you my resume... write me at hmassa (at) gmail.
Re:I don't know why this dominates the first page. (Score:4, Insightful)
>
> is just plain beautiful IMH?O.
I'm sorry, but I just can't agree. It might appeal to a mathematician who wants to see everything use functional notation and hates every language except lisp, but to a non-abstract-elite-ivory-tower-mathematician this is absurd. cin is not an array of integers and the use of the adapter obfuscates the fact that you are using a conversion from a char array to an int. The back_inserter also makes it harder to see where the data is going by losing "v" in it. Many would also frown at it for taking a non-const reference, although since it is a standard adaptor it is probably ok.
C++ programmers are often unnaturally attached to efficiency and have to be watchful for template bloat. Your copy generates 88 instructions, whereas an equivalent iterative solution is only 33 instructions long, most of them belonging to the inlined push_back. Not only is the generated machine code smaller, but the source code is smaller as well, and is far more readable, making the algorithm obvious at a glance to any procedural programmer, who make up the majority outside the hopelessly out-of-touch with reality academia.
Academics love integer and float arrays because that's what they usually work with. Scientific simulations produce data in that form and require processing programs that take something from a data file, crunch some numbers, and output something to cout. In the real world people work on user interfaces, databases, and other complicated things, where one normally works with arrays of objects rather than numbers. If you ever tried to apply a functional algorithm to a vector of objects, trying to manipulate some member variables or call a member function, you would know that the result is so hideous that it isn't even worth considering. There is a reason people prefer iterative solutions; they are how the real world works. Reality is algorithmic, not functional, and so are user specifications for the things they want done. Trying to cram them into an abstract mathematical functional model is insanity.
> Use, carefully, and always when possible, smart pointers.
> Remember std::auto_ptr is your best friend
Most of the time, no. While I would not deny the utility of auto_ptr in localized situations manipulating the object state during reallocation, its constant use indicates lack of understanding of object lifecycle in the program. It is fashionable in Java to create objects left and right, without consideration of who is supposed to own them. Hey, just let the garbage collector take care of it! Who cares how long the object lives? Obviously, such immature mentality produces plenty of memory leaks for which Java is so infamous. In a good design object ownership is strictly defined. Objects belong to collections that manage their lifecycle. There ought to be no "dangling" objects that just "hang there". If you don't know to which collection the object belongs, you have no business creating it. If you think your objects are "special", you haven't thought beyond their internal functionality or considered where it fits in your overall design.
> Question 2: How can I actually implement such a decoupling?
> I would use a simple, socket-base, take-my-data, gimme-my-results scheme
And thereby slowing your program to a crawl? There is a reason people use CORBA and the like: those frameworks optimize distributed object calls to avoid network hits, often being able to reduce the overhead to be equivalent to a virtual function call. Furthermore, networked applications have their own set of complexities and security considerations. You get to keep an open port somewhere, handle authentication (becase wherever there's an open port, there will be malicious connections), and extensive data validation (for the same reason). While these problems are applicable to dis
Re:I don't know why this dominates the first page. (Score:3, Interesting)
Re:I don't know why this dominates the first page. (Score:4, Insightful)
> so let's reject it as being "elite-ivory-tower"
I did not say I did not understand it. I said I did not like it. I do not like it because it does not fit with the reality of computer operation, as discussed below.
> Reality is reality. Algorithmic or Functional are just ways people look at it.
On the contrary, you can see reality being algorithmic. Things happen one after another. To type "algorithmic", you depress a, l, g, etc. in order; you don't declare a set of letters, fill it with appropriate values and throw it at the computer. When you receive a specification for your program, it will say something like "get this from the user, then do this, then do that, then print out the result". No specification is ever written in functional notation outside the academic world.
More importantly, the computer itself works algorithmically. It does one thing, then another. No computer has ever worked functionally, and no computer ever will. All of them will decode and execute a sequence of instructions, and if you refuse to write your code likewise, you're only adding translation overhead.
Even in the hallowed halls of science overuse of the functional notation creates serious problems. The entire hodge-podge nonsense we call quantum mechanics stems from the attempt to describe a complicated system as a function. Instead of trying to get a set of time-value maps for the whole system, it would be more appropriate to look at the system's constituent parts and algorithmically simulate them through time. That way you wouldn't get any "spooky action at a distance", stuff being there and not there at the same time, and all other equally ridiculous denials of reality.
> I advise you, that your use of the common peoples' fear for mathematics
> in your arguments is not going to help.
I wasn't using that argument, but, now that you mention it, it is a reasonable one. Most programmers couldn't care less about higher mathematics, and, even if they were forced to study it in college, they likely have forgotten it all by now. Computer algorithms require minimal mathematical background. The most I ever used was a bit of calculus to write scan-conversion routines. So, whether from lack of practice, or from lack of interest, most programmers will prefer you didn't drag them into the world of useless mathematics. (and I use the word literally here)
> Templates, being code generators, differ by nature to hand-tuned codes.
> So your code generates only 33 instructions vs the template's 88. Great
> - now tell me - which architecture? What compiler?
That is quite irrelevant in this case. istream_iterator notation generates extra code for reasons that will not go away no matter how hard you try to optimize it. Yes, I might be able to write an istream_iterator that would have no overhead over my iterative version, but it will not be standard compliant. The istream iterator has to read on construction; it has to store the read value; it has to be constructed, since it must keep a reference to the source stream; it has to handle special cases, like the end-of-file, and the subsequent conversion to the end iterator value. However good you might be at optimization, you will not be able to discard these and still be compliant with the specification.
Also, which compiler or architecture you use will not make all that much difference in the size of the compiled code. I guarantee you that your functional copy will never generate smaller code than my iterative loop, no matter what compiler you use or what architecture you compiler for. There is a certain amount of work to be done, and my version does less work. It is as simple as that.
> And before you count the instructions, did you realize that this code is waiting
> for keyboard inputs, therefore what you're doing is unnecessary (and obviously
> premature) optimization?
First, you should note that I
Re:While I can certainly respect your opinions, (Score:4, Interesting)
> for_each(components.begin(), components.end(), _1.disable())
It is never that simple. The fact that you can't do what you've typed is one of the reasons I dislike it so much. What you really need is:
Things suddenly got uglier, didn't they? But wait, what if you need to call a function with an argument? Gotta use a bind2nd adaptor to wrap it, and then it becomes:
Wait 'till you try to explain to some maintaining programmer how to untangle that! Oh, and just for laughs, try to debug this thing. Put an assert in SetParameter, and you get a lovely callstack from gdb:
Now that's something to scare newbie programmers with! Oh, and forget about putting a breakpoint inside the loop; templated functions aren't targetable until executed.
> in some code I need to maintain then to encounter
> for(i = 0; i < components.count(); ++i) components[i].disable()
So why not just use an iterator loop? for_each does not have a monopoly on it:
(foreach is a macro I wrote because I use this construct so often)
> first form permits, for instance, components to be a linked list or even a hash.
> The second is implementation-dependent and if you change the underlying data
> structure, you'll have extra work to refactor.
If you use iterator loops, this wouldn't happen to you.
> I once worked, changing all instances of SomeObject* to auto_ptr
> eliminated altogether 35 bugs we had lurking in the BTS for a long, long time,
> with less than one day of work (strange, delayed, errors were suddently
> transformed in EARLY null-pointer dereferences
Why were you using SomeObject* in the first place? When I was advocating moderation in the use of auto_ptr, I wa
Re:You're not the first one.... (Score:3, Insightful)
Design is about turning analysis models into implementation. Analysis models are the result of analyzing what your system needs to do; they describe what it does, not how it does it. Use some type of formal modeling language (there are several) to exhaustively describe t
Re:You're not the first one.... (Score:3, Informative)
Of course it doesn't - you just killed it.
C++ runtime libraries are loaded into the programs memory space. Just like the Java runtime is in the memory space of whatever Java program you are running. The difference is that the operating system has a specific loader for the kind of files (ELF in Linux) C++ comp
inline code (Score:4, Informative)
You can easily embed C/C++ in other languages. Take a look at Inline::CPP [cpan.org], for example. With code like:
use Inline CPP;
print "9 + 16 = ", add(9, 16), "\n";
print "9 - 16 = ", subtract(9, 16), "\n";
__END__
__CPP__
int add(int x, int y) {
return x + y;
}
int subtract(int x, int y) {
return x - y;
}
you can put the parts that need to be fast in C++, and the parts that need to be easy in Perl. (If you do the GUI in perl, you won't have to worry about portability or memory allocation. And the app will be fast, because the computation logic is written in C++.)
> The application can be naturally divided into several modules, such as GUI, core data structures, a persistent object storage mechanism, a distributed communication module and several core algorithms.
Yup. There's no need for the GUI to know how to do computations, remember. The more separate components you have, the more reliable your application (can) be. Make sure you have good specs for communication between components. Ideally, someone will be able to write one component without having the other one to "test" with. For testing, write unit tests that emulate the specs... and make sure your tests are correct!
Re:inline code (Score:5, Insightful)
Re:inline code (Score:3, Interesting)
It worked amazingly well. There's a little bit of interfacing work that needs to be done, but I found that, in that project at least, the C code didn't need to be modified very often.
It very often DOES simplify to use two programming languages.
Re:inline code (Score:3, Informative)
A language that leaks memory in real use cases is just not good enough. A language that slows down execution for periods of time may be good enough. The difference is that the first impairs correctness, while the second does so to performance. While bad performance can be tolerated, incorrectness can't.
I'm gonna take a guess, but.. (Score:5, Funny)
Re:I'm gonna take a guess, but.. (Score:5, Insightful)
The secret of stable system design is designing from failure. Designing and implementing defensively. If you want to design an ultrastable system you start with the failure analysis for every component, following with failure analysis of modules and the entire thing as it grows.
This in the world of C++ (and C for that matter) quite often means checking paranoiacally everything everywhere for NULLs before doing anything about it.
Designing and writing from failure means that every system or library call should be assumed to fail first and all failures handled cleanly. This may be quite painfull because it usually requires the development of special tools like wrappers around malloc, file calls, etc that return error conditions which are nearly impossible to achieve on a live system.
Only after all codepaths for "bad" results have been handled, the actual "normal" codepaths should be written. This unfortunately is not the way code is written in 99% of the shops out there. Most design and implement from success first and add failure handling later.
Just ask in your shop: "Where is our memalloc wrapper that simulates a failed memory allocation? I need to link versus it to do some testing to see how our app handles NULLs in a few places". The usual answer you will get is "Ugh? WTF you are talking about Dude... We do not smoke that stuff here... Just go and write the code you have been assigned to write..."
And the results are quite bloody obvious.
Re:I'm gonna take a guess, but.. (Score:3, Informative)
This is why all of the comments about using a managed language are completely missing the point. Catching exception conditions before they terminate the process is great for tracking down bugs and for code that's allowed to recover from errors, but if you need error-free code, then you cannot afford to have these
Performance? (Score:2, Insightful)
If you can provide more details about the specific requirements, you might get more informed responses. As it is, though, your stated goals really don't seem to add up.
Even as stated, I would write the core in a highly tuned fashion (although C++ might not be my
Development Practices (Score:4, Insightful)
Bulletproof code isn't cheap, but it can be done.
Re:Development Practices (Score:2)
Re:Development Practices (Score:4, Informative)
Jedidiah.
Re:Development Practices (Score:4, Informative)
There is no silver bullet for what you describe other than sound development practices.
True, but it should be pointed out that C++ is well-equipped to make such sound development practices easy. Consider the major sources of instability in C programs:
In my experience, doing the above religiously will ensure you never see segmentation faults. The next step, of course, is to make sure your code correctly implements the desired functionality. C++ is no different from Java or any other OO language in this respect. Clear rquirements definition, modularity, clean separation of concerns and testing, both automated an manual, are the basic keys to generating correct and maintainable code in any language.
[*] A story: I once asked a guy on my team to write a little program to monitor a bank of modems, accepting incoming calls and exchanging data with the callers. He spent two weeks and produced nearly 10,000 lines of code
My Top Ten (Score:3, Informative)
Bulletproof code isn't cheap, but it can be done.
This is the most insightful comment I've seen so far. Particular tools can fix particular problems, but that's the easy part. The hard part is finding and noticing the problems, so that you know to look for (or make) the t
Here's your best bet. (Score:5, Interesting)
2. Once it's bullet-proof, replace each function and object with C++ code.
3. Profit.
Re:Here's your best bet. (Score:2)
Plus you are also able to include c++ libraries in python too. So after you get it to work you can replace each python module with the c++ ones and still make sure the app still works.
I tend to do most of my programming in python for proof of concept even if it takes hours to run the code at least I know the concept works or not. If it does then I go into optimizing in a higher level language.
Re:Here's your best bet. (Score:2, Informative)
Re:Here's your best bet. (Score:5, Informative)
1) Wrap your legacy libs with SWIG
2) Code a working prototype in Python
3) Profile it (never skip this step)
4) Use SWIG to write the bottle neck parts in C++
5) Use Valgrind to ensure you are still OK memory wise
6) Profit!!
I am invoking Greenspun's 10th law (Score:2)
A variable in Python is a variable as in anything else, but a variable is a reference to an instance of a type that could be anything -- the referenced instance has a type as opposed to being some universal type like a string, but it can be assigned on the fly, and it can be a number, a string, an object instance, a
Re:I am invoking Greenspun's 10th law (Score:3, Insightful)
I've been following this methodology (Python first, then C++ as/where needed) for a number of years. In all of that time, I've only had one application where I ended up needing to drop into C++ at all. In that case, a couple of pages of Python did translate into a couple of pages of C++, virtually line for line. Heavy use of STL allowed this, as there are a lot of data structures and algorithms there that map more-
Re:Here's your best bet. (Score:3, Informative)
Two additional points:
1.) You don't need to replace all the python code.
2.) Use a garbage collector like http://www.hpl.hp.com/personal/Hans_Boehm/gc/ [hp.com] for your C++ code.
They Write the Right Stuff (Score:5, Interesting)
Re:They Write the Right Stuff (Score:3, Informative)
You can still be a balls-out code-monkey if the verification-analysis-requirements-code loop in your organization is well designed.
The part about blame is important, too. The fact that gangs of humans are applying a vast store of partially-learned rules to a purely imagined set of requirements through a skein of lossy transmission lines with any number of distractions means that noise is inevitable, and in something as literal as code for Von
Re:They Write the Right Stuff (Score:3, Insightful)
Follow NASA's advice... http://www.fastcompany.com/online/06/writestuff.h t [fastcompany.com] ml [fastcompany.com]
Your post should have been ranked informative +10 and is underrated. Those that think they are "professional" programmers aught to read this and memorize it. This thread has so much BS about what is right for making code stable it just shows how many poorly qualified people there are out there. But for other readers---
I have been in both kind of shops, dime a dozen out of control cowboy mentality workshops
For starters: (Score:2)
http://www.hpl.hp.com/personal/Hans_Boehm/gc/ [hp.com]
This isn't necessarily something you'd have to design around, either. You can add it later.
You need three things (Score:2)
lots of time
lots of money
then you have a chance of pumping out the good product
Don't get too fancy... (Score:5, Informative)
Furthermore, you're crunching large amounts of data, so I'm guessing batch processing. If you can have the application not be a server, then you simplify things a lot. Make it a utility that takes data on standard input and runs whatever analysis you need, and duct tape it together with cron or a simple program that watches for new input files.
Also, I'd like to suggest that you consider whether other languages could be efficient for the task. For example, Java is pretty good numerically, and as far as your libraries go, see if you can use SWIG to generate JNI wrappers. Also, then you get Java RMI.
Next, get them down to one platform. It's *way* easier to develop software with tight constraints on a single platform (versus multiple platforms). Investigate QNX: a reliable operating system (though admittedly quirky) with a beautiful IPC API. In any case, make sure you get a well-tested library with message queues, etc. You don't want to be using raw sockets; you could but that's just another pain in the ass on top of everything else.
Last, figure out what the cost of a failure is. Getting that last few percent of reliability is very very expensive. Unless you're a pacemaker or respirator, the cost of failure is probably not as high as the cost of five nines of uptime.
Don't code to impress. (Score:5, Informative)
When coding something that needs to be stable, you need to keep your ego aside and concentrate on the task at hand. Stick with tried and true methods don't go with any algorithm that you are not 100% comfortable with even if it makes the code less ugly. Be sure to follow good practices make many function/methods, and make each one as simple as possible, makes it easier to check each function for bugs when they are simple. Secondly document it like you never want to touch the code again (in code and out of code), you want to know what is going on at all time and the bigger it gets the larger chance you could get lost in your own code. When working in a team and you are in someone else's code document that you did the change.
Next take into account what causes most Crashes.
Bad/Overflow memory allocation.
Memory leaks.
Endless loops.
Bad calls to the hardware.
Bad calls to the OS.
Deadlock
If you are going to decouple modules keep in mind that you will need to do as much processing as possible with minimum message passing and allow for mirrors so if one system is down and other can take its place, without killing the network.
For IPC I tend to like TCP/IP Client server. But that is because it tends to offer a common platform independence and allows for expansion across the network. Or try other Server Methods such as a good SQL server Where you can put all the shared data in one spot and get it back. But not knowing the actual requirements it may just be a stupid idea.
I would suggest that you also ask in other places other then Slashdot. While there are many experts on this topic there are also equal if not greater amount of kids on there who think they know what they are talking about, or they have there ego in this technology/or method.
Agreed. and a few more thoughts. (Score:5, Insightful)
Yes, some of those do conflict. How to keep things simple AND have fault-tolerence, for example. That's where a good design comes in handy, because you can get a better feel for where you should make the trade-off between certainty of working, certainty of working later on and getting some sleep this side of 2008. It's all a matter of weighing the options and investing time in the place most likely to benefit.
(Because everything is a trade-off, anything listed above may not apply. But then, it may not need to. If you've tested a component thoroughly along all boundaries, a good sample of valid conditions and a good sample of erronious conditions, AND everything has been kept as simple as possible so that really wierd cases are unlikely to crop up, then you may decide you can simplify or eliminate fault-tolerent components. There is no point in catching errors that won't occur. In fact, that adds complexity and violates the Keep It Simple rule.)
Oh, and as this is a networked system, testing should include testing network I/O. Use packet generators if necessary, to see how the system handles erronious packets or massive packet floods. You don't want "perfect" responses (unless you can define what "perfect" means), you want reliable responses. If X occur
Uphill Battle (Score:2, Informative)
To make your program as crash proof as YOU can control you should validate your requirements using Use Cases, minimize Design Complexity, use good C++ programming practices, and do ext
Use state machines (Score:2, Informative)
Here's a great framework to start with:
http://www.quantum-leaps.com/products/qf.htm [quantum-leaps.com]
And the book:
http://www.quantum-leaps.com/writings/book.htm [quantum-leaps.com]
Fault Tolerance Vs. Stability (Score:5, Insightful)
Re:Fault Tolerance Vs. Stability (Score:3, Informative)
Wow. An entire thread devoted to this question, and so far this is the only answer that actually addresses the problem. Every other suggestions seems to be "changes languages", or "here's how to avoid bugs".
Anyway, let's talk specifics here. For the theoretical end of software fault tolerance, you can get a quick overview here [ibm.com] or here [cmu.edu].
In terms of practicalities, I know of an older fault tolerance library for Unix that includes watchdog, checkpointing, and replication utilities, and was created by AT&T
be assertive (Score:2, Informative)
Pretend you are writing code for an airplane... (Score:2, Informative)
I've spent over a decade refining how best to create stable, great software. And guess what? I still learn things every day. If you are really new to enterprise-grade software, the best thing you can do is search amazon and choose 3 to 5 great books about writing stable, bug-free enterprise code and just start reading and scheming. Give yourself lots of time. Be neurotic, type-A, attention to every detail, stay up at night wondering how your system could fail and what y
It's simple, really (Score:4, Funny)
Use TPS reports [wikipedia.org]. You'll thank me later.
Small code (Score:2)
I don't know how complex your system has to be, but I'd strip out anything that isn't 100% necessa
Another tool: Microreboots (Score:2)
c.f. http://crash.stanford.edu/ [stanford.edu]
+5 Funny (Score:2, Funny)
C++, automatics without limits will destroy you (Score:2, Insightful)
JimD.
A few tips off the top of my head (Score:2)
*avoid pointer arithmetic
*declare your copy constructors private (with no body) if you don't plan to use them. With this you'll catch unintentional use of the copy constructor through parameter passing.
*Use unit testing and make sure you can regression test your system
*Get a tool such as purify to find memory leaks and use of uninitialized memory
*turn on compiler warnings to its most anal setting
*Create a system to give you a call stack in case of errors (to quickly squash bugs b
Know your client. (Score:2)
But what they really need is a simple solution that is better then what they currently have. This is not an excuse to write sloppy code. But to keep in mind what is needed. If you can get the job done in something s
must steralize biological units... (Score:2)
...
insure that the resulting program is as crash-free as possible?
errrror.... eeeeeeeeeror... (computer explodes)
test with valgrind! (Score:5, Interesting)
It gives you massive amounts of great information about the memory usage of your program.
The other day I spent nearly 3 hours trying to decode what was happening from walking the backtrace in gdb. Couldn't for the life of me figure out what was happening. Valgrind figured out the problem on the first run and after that, I had a solution in a few minutes.
Highly recommended software, and installed by default on several distributions, AFAIK.
Enjoy!
Oh come on... (Score:2, Flamebait)
I know: -1 Flamebait. But really, this is Slashdot. A story with such a minor reference to Windows going without a Windows-bashing comment for this long is just inexcusable.
Forget it. (Score:5, Funny)
Those are low-level programming-jock languages disguised as high-level languages. As long as the punks who program them will have pissing contests in code obfuscation, you can count on having buffer overflows and memory leaks.
Re:Forget it. (Score:3, Interesting)
Unit Testing and Smart Pointers (Score:5, Insightful)
The reasons? A unit test suite that implements several million test cases (mostly pseudo-random probes -- the actual test code is about 1/3 the size of the functional code). In fact, the "defects" that hit production were more "oversights"; stuff that didn't get accounted for and hence didn't get implemented.
Just as importantly; every dynamically allocated object just got assigned to a "smart pointer" (see Boost's boost::shared_ptr implementation).
Quite frankly, compared to any Java implementation I've seen, I can't say that "Garbage Collection" would give me anything I didn't get from smart pointers -- and I had sub-millisecond determinism, and objects that destructed precisely when the last reference to them was discarded. The only drawback: loops of self-referencing objects, which are very simple to avoid, and dead trivial if you use Boost's Weak Pointer implementation.
We didn't have access to Boost (which I Highly Recommend using, instead of our reference counted pointer) when we first started the project, so we implemented our own Smart Pointers and Unit Testing frameworks [2y.net].
I've since worked on "Traditional" C++ applications, and it is literally "night and day" different; trying to do raw dynamic memory allocation without reference counting smart pointers is just insane (for anything beyond the most trivial algorithm). And developing with Unit Testing feels like being beaten with a bat, with a sack tied around your head...
Congratulations! Nice Work! (Score:5, Funny)
From zero to flame war in under 20 words. Well done!
my experience (Score:3, Informative)
I've dealt with software that automatically restarts a dead process, and in my experience, it doesn't work so good. If you want ultra-stable software, you want to know what caused the crash and why.
For your situation, where I guess you're doing lots of time consuming computing, I'd think you should also set checkpoints, save intermediate results, or something, so if it does crash, you can restart in the middle instead of going back to 0. (A standard practice when I was analyzing large databases for corruption, a task that could take days)
Don't reinvent the wheel (Score:5, Insightful)
Next, spend some time upfront on your design, with things like use cases, sequence diagrams, and other visualization tools to help you understand just what you want to happen in best case situations as well as failures. The level of detail/formality required is a moving target, so update as needed. You should have a solid error detection/correction plan so that you can design each component to follow it. Also design for test and with logging - it will help you while debugging, while testing, and while fixing the bug the customer is seeing.
Make sure management will allow sufficient time for testing. A lot more lip service goes into support for testing than actual schedule and money. Your test plan should be as bulletproof as your design.
That's my 2 cents. And a random book recommendation: books like Scott Meyers' "Effective " provide info on effective/error reducing ways to use the language/libraries, but won't help you get started with the architecture.
robust software (Score:5, Interesting)
As a result, for ten years Apple technical support would tell customers experiencing unexplained system problems to run the Graphing Calculator Demo mode overnight, and if it crashed, they classified that as a *hardware* failure. I like to think of that as the theoretical limit of software robustness.
Sadly, it was a unique and irreproducible combination of circumstance which allowed so much effort to be focused on quality. Releases after 1.0 were not nearly so robust.
good coding techniques (Score:5, Informative)
As for a GUI programming, if you are strictly tied to c++, i would recommend QT (www.trolltech.com) they have a fabulous API (takes getting used to, but it makes sense once you do). Nice part about QT is that it is source portable to just about every major platform (X11, Win32, Mac).
It is possible to write reliable, fault tolerate code in c++ (realize please that perfect code is impossible in any language), it just has to be well thought out and done right.
proxy
A few guidelines (Score:5, Insightful)
1. Unless the GUI will be I/O bound, and that's unlikely, try to write it in a safer language that has better GUI support.
2. Make all your classes small and simple, and create test harnesses that are as complete as possible. Try to make the classes simple enough that they can be individually tested in such a way that all code paths are exercised.
3. Check your arguments. This includes checking for invalid combinations, and arguments that are invalid given the state of the object.
4. Don't use new or pointers directly. If there may be multiple references to an object, then reference count it and create handle classes that hold the references so all instantiation is controlled, and all destruction is implicit. Make these handles STL compatible, and never pass around pointers to them.
5. Try to design the application to fail fast and recover from failure. For example, maintain the state of work being done in discrete transactions that can be aborted if a failure is detected. This can be on disk or in memory depending on your performance needs. This could be combined with the ability to restart the app in a new process and have it pick up where the last one left off.
6. Have the app keep track of its memory usage, and be prepared to recover from memory leaks, possibly by restarting as in item 5.
7. If the compiler you're using supports structured exceptions, then use them. They can degrade performance a bit, but they can also enable you to recover from NULL pointer exceptions.
8. If you have multiple threads, then to avoid both the performance hit from context switches and the chance of deadlocks, don't let them access the same data directly. Instead, have them communicate through lock free queue structures. That way, all your main threads can pretty much spin freely. Spawn worker threads for any I/O or other operations that can block. A context switch can take as much time as thousands of instructions. You want to use as much of every time slice as possible.
9. Keep the number of main threads down to the number of CPU's or less. That way, except for the times when the CPU is being used by the OS or other processes, (should be relatively rare) each non blocked thread gets its own CPU.
10. Have an experienced QA team, that understands their job goes beyond unit testing.
Now here's a few that are always important, but for what you want to do, they become critical.
11. Have the design laid out at least roughly before you start.
12. If at all possible, don't let requirements change in midstream.
13. Overestimate the time it will take very generously. You will probably still be crunched.
Recoverability (Score:4, Insightful)
It's simple to write a 100% correct program that checks the health of your main application, and restart it when it isn't responding.
Nooo!!! No separately restartable modules!! (Score:3, Interesting)
No, I'd go for:
* A "monolithic" application with module separation provided by OO design. At least you know that either your whole application is there, or it isn't. No inconsistencies between modules because of individual module re-starts, and if the app breaks, restart the whole thing. Starting the app is the code path you've tested, restarting separate modules usually isn't (and even if it were, there's usually 2^27324 different situations to test, i.e., all possible combinations of modules failing in any sort of way).
* Use smart pointers exclusively, preferably Boost's shared_ptr. Use weak pointers (Boost provides an implementation for that as well) to automatically break reference cycles.
* For error handling, use exception handling exclusively. Incredibly many bugs are caused by ignored return codes.
* Use "auto" objects for all resources that you acquire and that need to be released at the end of a code section. Cleanup that doesn't happen when a code path encounters an exception can cause resource leakage, instability and hangups (locks, anyone?). In my programming practice, when I allocate a resource (memory, refcount, open/close a recordset, etc.), I always wrap it in an auto object immediately, so that I can forget about managing it through all the code paths that follow.
* Use the correctness features that the language provides: write const-correct code from the start.
* Use automated testing right from the start, both unit testing and integration testing. If you don't do this, you will be forever tied to whatever bad design decisions you make in the first months of the project. Automated testing allows you to always make large implementation changes, giving you confidence that it will not break existing behaviour.
Coincidence from the toilet (Score:4, Insightful)
In a desperate rush for some reading material for the toilet, I grabbed what must be a 5 year old C/C++ User's Journal from a storage room. The theme of that month's issue was MULTITHREADING.
I thumbed through it and came across an interesting article ``ALWAYS HANDLED ERROR CODES''. The idea being that a lot of errors can go undetected because programmers are lazy about checking return values. And why not, who bothers checking printf()'s return value, for instance?
Simple enough design. The object constructor sets the result, the destructor will abort() the application if the Checked variable is false. The overridden == and != operators evaluate the result, and also set the Checked variable.
In your functions, instead of return SUCCESS; you write return ErrorCode(SUCCESS);
Wondering if anybody does this. If I needed something ULTRA STABLE I guess I might...
Use STL; avoid pointers; avoid fixed buffers. (Score:4, Informative)
I'm sure that there's tons I've left out, but this has worked reasonably well for me. The only problem is that STL can be slow. Sure, map may be O(log(n)), but the constants are huge. Unfortulately, for practical reasons, performance and security are often inversely proportional.
Have you have flown on a commercial airline? (Score:4, Informative)
Thare is a standard called DO-178B Level A that applies to aircraft software upon which lives depend. There is a saying in the commercial avionics business: "Nobody has ever died from software failure on an airplane, yet." There have been some accidents where software played a role, but I won't quibble with that now.
The point is that safety critical software is developed routinely. It has been developed in asembly language. It has certainly been developed in Ada, C, and sub-sets of C++. It is expensive. Validation of avionics software and certification in an aircraft can easilly cost an order of magnitude more that just writing the software, and writing the software using required processes and producing required artifacts is not cheap either.
QuickCheck (Score:3, Informative)
The idea is simply to define the "space" of legal inputs for each module and the correctness criterion for each input, and then generate random inputs based on the spec. This is far more effective than traditional hand-coded test data at both unit and system test levels, and as an added bonus the test spec doubles as a formal specification of the correct behavour that coders can actually work from. This is similar to the XP practice of "test-driven development".
Paul.
Sorry, *not* in C++ (Score:4, Interesting)
Re:Sorry, *not* in C++ (Score:3, Informative)
You might want to let them know about that.
Aeralib [faa.gov]
Re:Get another programmer (Score:5, Funny)
Re:XP (Score:4, Insightful)
Extreme programming is your worst enemy on this one. If you need a system that is truly reliable, you cannot take an approach that fundamentally bases its quality controls on a finite number of tests, unless you can test absolutely every possible set of inputs your program can ever receive (legitimately or otherwise).
Testing is good, of course, but for this sort of job, you must have a proper design, such that all components can be properly verified. (And of course, you must have a proper spec against which to verify.) The XP methodolgy is pretty much the antithesis of what's needed here.
Re:XP (Score:2, Informative)
Exactly. Three words: Test Driven Development.
Since you're tied to C++, may I suggest CppUnitLite2 1.1 [gamesfromwithin.com]...
It's incredible how much more productive you can be writing the tests first (contrary to what you might think initially). I hardly ever need a debugger anymore, and I know that the
Re: Consider Ada (Score:3, Informative)
Yes, it's very frustrating when you first start using it because it makes you say what you mean and mean what you say, but once you get over that you find yourself writing programs where certain kinds of bugs simply never happen.
The strong typing (no, strong, not the fluff some other languages call "strong") an