Ultra-Stable Software Design in C++? 690
null_functor asks: "I need to create an ultra-stable, crash-free application in C++. Sadly, the programming language cannot be changed due to reasons of efficiency and availability of core libraries. The application can be naturally divided into several modules, such as GUI, core data structures, a persistent object storage mechanism, a distributed communication module and several core algorithms. Basically, it allows users to crunch a god-awful amount of data over several computing nodes. The application is meant to primarily run on Linux, but should be portable to Windows without much difficulty." While there's more to this, what strategies should a developer take to insure that the resulting program is as crash-free as possible?
"I'm thinking of decoupling the modules physically so that, even if one crashes/becomes unstable (say, the distributed communication module encounters a segmentation fault, has a memory leak or a deadlock), the others remain alive, detect the error, and silently re-start the offending 'module'. Sure, there is no guarantee that the bug won't resurface in the module's new incarnation, but (I'm guessing!) it at least reduces the number of absolute system failures.
How can I actually implement such a decoupling? What tools (System V IPC/custom socket-based message-queue system/DCE/CORBA? my knowledge of options is embarrassingly trivial :-( ) would you suggest should be used? Ideally, I'd want the function call abstraction to be available just like in, say, Java RMI.
And while we are at it, are there any software _design patterns_ that specifically tackle the stability issue?"
Here's your best bet. (Score:5, Interesting)
2. Once it's bullet-proof, replace each function and object with C++ code.
3. Profit.
They Write the Right Stuff (Score:5, Interesting)
Re:You're not the first one.... (Score:4, Interesting)
And, in visual studio
Try the following (Score:1, Interesting)
1. Hire good programmers.
2. Make sure that EVERY function is defined with a specification, describing everything within the function. This allows you debug much easier.
3. Make sure that you've got all requirements written.
4. Try not to use fancy stuffs such as function pointers, de-referencing pointers, etc. Not all programmers are genius.
5. 1 good practice, if you allocate memory in an object, make sure that the same object is responsible for de-allocating memory. This is commonly practical.
6. For IPC, try not to use shared memory. Using message queue makes your work easier because of its guarenteed nature. Try to use MQ Series or something similar. They provides a robust mechanism for transferring and retrying data. It is the money worth spending. It is also compatible with Windows and Linux as well.
7. Stick to ANSI C++ functions to ensure compatibility.
8. Use a portable UI language such as qt.
9. Test, test and test. Peer review codes.
10. Establish a naming convention for variables and classes.
Good call (Score:2, Interesting)
Executive summary of this post: Keep it simple. As simple as it can be while getting the job done. The more buzzwords you think about implementing, the more you need to reconsider whether you really need that whiz-bang feature.
You need to abstract your design into really independent layers, such that the backend processing can be done across linux, windows and even beos slaves simultaneously, and the frontend is viewable via a web interface, fed into excel or whatever. You can't look at this as one big project, but many independent (and more easily verifiable!!) applications cooperating with each other.
My impression from the description is that you want a system like folding@home for corporate customers - they have a whole heap of data they want analyzed (parallel workload across many clients) and a small subset of results they're interested in. Don't make things any more complicated than they have to be - the data sets could simply be files that are partitioned by a master, sent out when requested to client workhorse computers, getting there by http, nfs or whatever, processed, and the results returned into an incoming directory for a simple frontend to tabulate.
The biggest mistake you could make is having one gargantuan application in charge of everything. The race conditions will drive you mad, be they in data access, allocation, retrieval, dispatch or anything else you're trying to manage that the OS could do for you.
Just look at Froogle. Their millions upon millions of store/price listings are fed by people ftp'ing a feed of tab-separated text values.
test with valgrind! (Score:5, Interesting)
It gives you massive amounts of great information about the memory usage of your program.
The other day I spent nearly 3 hours trying to decode what was happening from walking the backtrace in gdb. Couldn't for the life of me figure out what was happening. Valgrind figured out the problem on the first run and after that, I had a solution in a few minutes.
Highly recommended software, and installed by default on several distributions, AFAIK.
Enjoy!
Re:Forget it. (Score:3, Interesting)
robust software (Score:5, Interesting)
As a result, for ten years Apple technical support would tell customers experiencing unexplained system problems to run the Graphing Calculator Demo mode overnight, and if it crashed, they classified that as a *hardware* failure. I like to think of that as the theoretical limit of software robustness.
Sadly, it was a unique and irreproducible combination of circumstance which allowed so much effort to be focused on quality. Releases after 1.0 were not nearly so robust.
Re:You're not the first one.... (Score:2, Interesting)
Re:You're not the first one.... (Score:1, Interesting)
*cough* *choke* WHAT?
C++ takes a lot of platform-specific work to become portable, unless you're using portable libraries to do it for you. C#, if done well, is already portable -- and what's more, has Java's "Compile once, run anywhere", to some extent -- look at Mono.
C++ has lots of nice GUIs, and so does C. C# has some nice GUIs, but is lacking a few major this, like OpenGL. You can write bindings for them, but in C++, nobody has to write bindings, because almost all GUIs are natively C or C++, and must have bindings written for anything else.
And a well designed C# program is easier to write than a well designed C++ program. If your algorithm is broken, sure, nothing will work, but if it's your implementation that needs work, C# is a big help.
Now, if only someone could tell me why C# is better/worse than Java?
Re:inline code (Score:3, Interesting)
It worked amazingly well. There's a little bit of interfacing work that needs to be done, but I found that, in that project at least, the C code didn't need to be modified very often.
It very often DOES simplify to use two programming languages.
Nooo!!! No separately restartable modules!! (Score:3, Interesting)
No, I'd go for:
* A "monolithic" application with module separation provided by OO design. At least you know that either your whole application is there, or it isn't. No inconsistencies between modules because of individual module re-starts, and if the app breaks, restart the whole thing. Starting the app is the code path you've tested, restarting separate modules usually isn't (and even if it were, there's usually 2^27324 different situations to test, i.e., all possible combinations of modules failing in any sort of way).
* Use smart pointers exclusively, preferably Boost's shared_ptr. Use weak pointers (Boost provides an implementation for that as well) to automatically break reference cycles.
* For error handling, use exception handling exclusively. Incredibly many bugs are caused by ignored return codes.
* Use "auto" objects for all resources that you acquire and that need to be released at the end of a code section. Cleanup that doesn't happen when a code path encounters an exception can cause resource leakage, instability and hangups (locks, anyone?). In my programming practice, when I allocate a resource (memory, refcount, open/close a recordset, etc.), I always wrap it in an auto object immediately, so that I can forget about managing it through all the code paths that follow.
* Use the correctness features that the language provides: write const-correct code from the start.
* Use automated testing right from the start, both unit testing and integration testing. If you don't do this, you will be forever tied to whatever bad design decisions you make in the first months of the project. Automated testing allows you to always make large implementation changes, giving you confidence that it will not break existing behaviour.
Re:You're not the first one.... (Score:3, Interesting)
* No (or very, very limited) side-effects. In other words the result of a function is not dependent on the current program state. Once it is exhaustively verified in testing, that function will forever more return the correct results because the run-time state won't affect it.
* The language itself can often be treated as a specification of correctness, and even formally proved through static analysis. As a trivial example if you write an implementation of factorial in Haskell, it strongly resembles the mathematical definition of factorial -- the code is more of a description of what the correct result is, rather than a set of low-level steps for carrying out the computation as in C.
Haskell is nice, however I think the original questioner is better off with something like Erlang, which was designed for just this kind of situation. If it's good enough for telephone switches...
misc. advice and a small rant? (Score:3, Interesting)
Where do people get this idea? I have ported quite a few applications, and usually the porting done by locating the libraries you need on the new platform, and fix a few oddities in the current platform (like closing sockets in z/OS or switching to unsafe multitasking (p-threads) on windows. Porting to linux is so trivial that I often do it just to get access to the superior tools available there, especially valgrind. GUI is the exception, of course, unless you use a x-platform kit from the beginning.
Which leads me to my recommendations, in no particular order
The above approach works for me. You mileage may vary.
Re:You're not the first one.... (Score:5, Interesting)
Re:Bullshit (Score:3, Interesting)
I agree.
This is by nature one of the biggest strengths of C and C++, how someone could conclude that by using C++ adds some sort of complexity in cross platform development actually amazes me.
If it adds complexity, in comparison to what? I would like to see the poster above you explain what is actually easier to use for diverse application development that is actually better at cross platform.
And if they start with Java, la la, then they need to get a life and see what JAVA is built upon itself.
C and C++ is a great solution for cross plaform development, look at the nature of Linux, BSD, and even NT and then ask why they are as portable as they are. Do people think these OSes would be more portable in another language?
Take Care.
Re:You're not the first one.... (Score:2, Interesting)
Do some googling, and see that Microsoft says that "managed code" is something that is executed by the
Re:I don't know why this dominates the first page. (Score:3, Interesting)
Not understanding something is one thing, but not understanding something so let's reject it as being "elite-ivory-tower" and "academic" is another. I've seen a lot of buggy C++ code being rewritten employing this style in obvious places - many defects were automatically addressed.
I disagree. Reality is reality. Algorithmic or Functional are just ways people look at it. Aren't "algorithmic" also abstract? Isn't "object-oriented" abstract as well?
By the way, using your vocabulary, I view the world as a mixture of "algorithmic" and "functional". No pure anything can describe the world, in my opinion.
Being functional or algorithmic has *NOTHING* to do with one being "more mathematical" and the other "less mathematical". I advise you, that your use of the common peoples' fear for mathematics in your arguments is not going to help.
Templates, being code generators, differ by nature to hand-tuned codes. So your code generates only 33 instructions vs the template's 88. Great - now tell me - which architecture? What compiler? What version of that compiler, and whose STL are you using, and which version of THAT?
And before you count the instructions, did you realize that this code is waiting for keyboard inputs, therefore what you're doing is unnecessary (and obviously premature) optimization?
How does the constant use of auto_ptr relates to the understanding (or the lack thereof) of object lifecycle? Sorry, but understanding object lifecycle the liberal use smart pointers are not mutually exclusive.
It is fashionable *among incompetent* Java developers to create objects left and right which make their programs memory hogs. It is also fastionable for *incompetent* C++ programs to forget deallocations leaking memories. What's your point? This mentality, immature or not, is not unique to managed languages.
Sorry, *not* in C++ (Score:4, Interesting)
Re:While I can certainly respect your opinions, (Score:4, Interesting)
> for_each(components.begin(), components.end(), _1.disable())
It is never that simple. The fact that you can't do what you've typed is one of the reasons I dislike it so much. What you really need is:
Things suddenly got uglier, didn't they? But wait, what if you need to call a function with an argument? Gotta use a bind2nd adaptor to wrap it, and then it becomes:
Wait 'till you try to explain to some maintaining programmer how to untangle that! Oh, and just for laughs, try to debug this thing. Put an assert in SetParameter, and you get a lovely callstack from gdb:
Now that's something to scare newbie programmers with! Oh, and forget about putting a breakpoint inside the loop; templated functions aren't targetable until executed.
> in some code I need to maintain then to encounter
> for(i = 0; i < components.count(); ++i) components[i].disable()
So why not just use an iterator loop? for_each does not have a monopoly on it:
(foreach is a macro I wrote because I use this construct so often)
> first form permits, for instance, components to be a linked list or even a hash.
> The second is implementation-dependent and if you change the underlying data
> structure, you'll have extra work to refactor.
If you use iterator loops, this wouldn't happen to you.
> I once worked, changing all instances of SomeObject* to auto_ptr
> eliminated altogether 35 bugs we had lurking in the BTS for a long, long time,
> with less than one day of work (strange, delayed, errors were suddently
> transformed in EARLY null-pointer dereferences
Why were you using SomeObject* in the first place? When I was advocating moderation in the use of auto_ptr, I wa
How I do it... (Score:1, Interesting)
You have to think really hard about state, and do everything as a two-phase commit. Network connections to applications that don't resume cleanly are particularly tricky; you have to save and restore the sequence numbers across the crash/reboot. This requires NOT using the OS TCP implementation, or hacking it heavily to not send the ack until you've committed the state produced by the packet recpetion.
I have crashing bugs happily running in production, becuase it gets back up and keeps going with no problems. The bad problems are reboot loops, when the "resume from crash" code crashes. You have to be very paranoid there.
But it really does work remarkably well. Oh, one more tip: add a version number to your state files. Any time you don't change it, you can perform a software upgrade in-place by crashing the old version and letting the new one resume. Otherwise, you have to have scheduled downtime for every upgrade.