Debugging Asynchronous Applications? 78
duncan bayne asks: "I'm attempting to debug a complicated telephony application, written in C#, that's almost entirely event driven. This is the first time I've debugged a large asynchronous application that isn't a GUI, and I'm curious to know what advice the Slashdot crowd has to share - have you any recommended tools, best practices, or common pitfalls to avoid?"
VS2005 (Score:5, Informative)
One other suggestion... "event bus" apps like you describe are good candidates for capturing as much runtime data as possible, so make sure you adjust your build parameters and do as much of that as possible, especially in problem assemblies. Oh, and don't forget to build nUnits. Sounds like you're walking into some prewritten code, but the effort might be worthwhile.
Re:VS2005 (Score:2)
First get interfaces setup properly so that you know the function is getting the right data.
Then work on full functionality (disabling locking if need be. It only needs to work the first time it's ok if you have to reboot everything every time before it will work again.)
Then work on timing / locking issues.(logging each lock is a good idea)
Then work on cleaning up aka releasing resources.
Then start disabli
Re:VS2005 (BlueJ-based "innovation") (Score:2)
A Stab at Some Solutions & Strategies (Score:4, Informative)
Also, as I recall from my days of drudgery at college, create tons of output.
So I will suggest as a preliminary requirement that you create a nice logging system (if you haven't done so already). I haven't written much C# so I'm going to be talking abstractly. Hopefully the rest of Slashdot can help with the specifics to C#. Now, what I mean is that you should create a class that just creates an output log file that you can read for output later. I don't mean to put a message for every packet sent but maybe it wouldn't hurt to put a message for each stream or connection opened. It's going to help for you to generate random IDs for each call and to put the destination/receiving IP:Port in your log. This would most likely be helpful with a server. It also will be helpful to store printlns in your code (redirect standard out to the logger).
Now use this on every machine in the system. If one machine should start to give you problems, create a mutual exclusion on this log (or put all of the log entries in critical regions). In Java, you can use object locks or the synchronized keyword--in C# I'm pretty sure they have something similar. Just because it's not a GUI doesn't mean you can't record output.
Just a friendly warning, time stamping is usually worthless unless you have a logical network (i.e. a Lamport Clock [wikipedia.org]) clock scheme set up (which usually requires lots of time on one's hands). You could shoot for an NTP server but I wouldn't trust the accuracy past 500 ms. If you absolutely need a clock scheme, I recommend having one machine on the network tick tock an increasing number that is reflected in all the logs. Make the time between ticks adjustable--this way you'll be able to check out events roughly relevant to these ticks (assuming the time it takes to get there is similar).
In the end, your best tool is your brain. Designing tests and double checking the logs on each machine to see that the linear time sequence of relative events is correct. Logic will be your only friend in this journey. Don't be afraid to kick off more threads on the client side if they don't need to share resources. If you have a server side, be careful in how many threads you have and make sure you realize what memory scope they're limited to.
For the love of god, if you use ports--don't forget to free them when you're done using them!
Unfortunately, Nornir [www.sics.se] is not OSS
Good luck! Happy debugging!
Re:A Stab at Some Solutions & Strategies (Score:2)
async, you can end up serializing your app. I.E., dont
hold up the caller on logging.
Also, there are application blocks for logging in C#.
Dont recall the name of it right off the top of my head.
Another way to solve the timing issue, make it so that you have one
app in your system to receive and record the messages.
That one app should timestamp the message when it comes in,
put it in a queue, and release the caller. Then you dont
have the worry about
Re:A Stab at Some Solutions & Strategies (Score:1)
Re:A Stab at Some Solutions & Strategies (Score:2)
(Yes, log4net and log4j are practically identical, and I can always use its documentation, but this was a problem with the EventLogAppender, which log4j doesn't have...)
Re:A Stab at Some Solutions & Strategies (Score:2)
Ok, so you're either smarter than Dr. Mills or smoking crack - I'd bet the later. What crevice did you pull your 500ms threshhold from? Even a strat 10+ timesever should give you a relative accuracy better than that as long as it's in a room with a stable temperature.
Re:A Stab at Some Solutions & Strategies (Score:2)
Re:A Stab at Some Solutions & Strategies (Score:2)
If you need to test network events never use a real network in the first instance. Do it with a simulated network like BSD DUMMYNET and configure NTP to pass through unmolested. This will allow you to introduce arbitrary delays, packet loss, jitter and bandwidth constraints while retaining nearly perfect synchronisation between systems.
Don't use printf (Score:5, Informative)
I had this one project. It was to build a model car, not related to programming at all. I started out doing well, following the instructions and generally getting along fine. But then I lost patience with the tedium and left to get a beer to relax. When I finally got back to working on the model, I found that the dog had chewed it up and the wife had thrown it out (the trashed model, not the dog, but she'd love to throw out the dog too). I left it in the garage where I thought it would have been safe, but I guess you can't expect things to stay the same if you leave it sitting there for a year and a half.
The moral of the story is that if I look to see where things went wrong, it was the point where I lost patience and decided to do something different than what I should have been focused on. This is like how many people try to put breakpoints all over their code rather than where they should put them. Don't debug willy-nilly and expect to make any good progress. But also don't try to throw in some seemingly helpful actions (like printf) because it may end up changing the whole state of the program.
Re:Don't use printf (Score:2, Informative)
Re:Don't use printf (Score:2)
However more
Re:Don't use printf (Score:2)
Debugging Asynchronous Applications For Dummies (Score:3, Funny)
Stand up and slowly back away from the keyboard.
Chapter 2.
There is no chapter 2.
Reference (Score:5, Informative)
Logfiles (Score:5, Interesting)
The trick is to learn how to correlate information between different logfiles to build up a picture of how all the components (process or thread) behave together. The classic Unix utilities like find, grep, awk, cut and less are your friends.
Re:Logfiles (Score:2)
You do not debug a complex network application (or any other asynchronous application) via a debugger.
You log it.
Further on that, it is important to have selective logging. An example of good logging is recent sendmail whose logging can be selectively tuned and turned on and off for various parts of the application.
Re:Logfiles (Score:2)
Also, consider the likely need for time synch within your infrastructur
Re:Logfiles (Score:2)
Friends: Even though I find the MSVC search tool very usefull, one of the first things I do when setting up a development box at work is to dow
Re:Logfiles (Score:2)
Re:Logfiles (Score:2)
If you haven't seen this project, check out http://unxutils.sourceforge.net/ [sourceforge.net]. These tools end up on every windows machine I use. Make sure to grab the updates. Especially helpful on windows are the pclip and gclip utilities. Copy any text to the windows clipboard and the pclip utility will output the clipboard contents to STDOU
Re:Logfiles (Score:2)
We have such a large (multiplatform) telephony application - and capable of are generating huge logfiles if necessary (think in multiply Gb per day if necessary, and this is already compressed).
Some hints on that.
Write a tracing component that every other component talks to. This tracing component will be responsable for:
1. after receiving a certain number of lines, analysing whether the logfile is interesting enough to be written to disk (i.e. are there errors in there?)
2
Re:Logfiles (Score:2)
10:18:27.283 > main(a,b,c)
10:18:27.294 > HelloWorld(a,b)
10:18:27.301 Written 7 bytes
10:18:27.307 < HelloWorld returned 3
10:18:27.312 > GoodByeWorld(c)
10:18:27.401 c == NULL
10:18:27.416 < GoodByeWorld returned 7
10:18:27.472 < main() return
When using multiple threads add also your thread name in there will help, of co
Re:Logfiles (Score:2)
Does anyone know of any "plug-in" logging systems? This particular wheel has been reinvented enough times.
Re:Logfiles (Score:2)
Re:Logfiles (Score:1)
sort *.log > fulllog.txt
All my logs have microsecond precision, and they tend to grow at around 50 lines per second. Still, sort does fine on many 100+MB log files. This is INVALUABLE to solving strange timing bugs.
Re:Logfiles (Score:2)
Re:Logfiles (Score:1)
http://logging.apache.org/log4net/ [apache.org]
Mock Objects (Score:3, Insightful)
The usual rules apply (Score:3, Insightful)
Look for invariants and ensure that they hold true where they are supposed to. That doesn't require fine analysis, but will detect problems in the logic when you're running the full system. Use profilers and coverage analysers to make sure that when you DO do invariant checks that you're actually checking all the areas they're supposed to hold up.
Test "normal" values, borderline/extreme values (that's where overflows, underruns and other assorted nasties are likely to show up), and completely erronious data. Borderline can include borderline values, but since this is a network app, it can also include extreme volumes (very little or vast amounts of traffic, or even massive variations). Erronious data can be data that is invalid in and of itself (malformed packets, for example), or data that has no rational meaning (a non-existant codec, or a value that decodes to something absurd - I doubt many people will be doing 11.1 audio streams for VoIP, for example!)
Beyond that, it's all much of a muchness. There's very little that is async specific to testing, so long as you concentrate on the logic rather than the means of getting there.
Re:The usual rules apply (Score:2)
Huh? My experience is that it's usually the low level components that are most sensitive. Create an accidental buffer overrun and it won't show up in any unit tests.
Re:The usual rules apply (Score:2)
components will not care if they are called from the
real system or from a test harness. So, do the test
harness, and use it to test the snot out of that
sensitive component.
Re:The usual rules apply (Score:2)
Re:The usual rules apply (Score:2)
component before you begin integration testing. Then you will
be focusing on "more real" bugs.
B: There is nothing that says that the test harness cannot run
multiple threads and test some "real world" type conditions.
I know it will not drive out every integration or real world
bug, but it will be a good
Re:The usual rules apply (Score:2)
A unit test finds simple errors simply.
Sure, as you move closer to using real data in the field, you will find more errors. But it's stupid to run the whole system in real time in order to trigger a bug where an "and" should have been an "or", then go through mountains of debugging data in order to localize it. Get
Re:The usual rules apply (Score:2)
Hook into TAPI (Score:2)
Message queues (Score:2)
Re:Firebug (Score:2)
Re:Firebug (Score:2)
I award you no points, and may the FSM have mercy, on your immortal soul.
Unit Testing & Visual Studio .Net (Score:1)
Re:Unit Testing & Visual Studio .Net (Score:1)
Simple (Score:2)
An event driven system is an event driven system is an event driven system. The only difference you're likely to find is that you can often see the results of a GUI application, but you can't "see" the results of a telephony application. Otherwise, debugging isn't really that different. If you need to get a feel for it, configure a logging system and plac
Test cases!!! (Score:2)
Truss (Score:2)
Build a simulator (Score:1)
you call THAT asynchronous? (Score:2, Troll)
As for debugging asynchronous logic? You'll probably have best luck with a divining rod. These things can have sensitivities to supply voltages, temperature, humidity, EMI, and other things.
Re:you call THAT asynchronous? (Score:2)
examine code assumptions (Score:2, Interesting)
(a) when a "change text" event handler in the editor is invoked, the editor will always be done reporting the result of the previous change.
(b) event z will always be preceded by event w
If you know the assumptions for each event handler, cases where they break down may become obvious. I
Load Test (Score:5, Informative)
For this kind of application, you must, *must*, MUST create a heavy load on a production system. I've done work with big, complex, multi-threaded web apps that have similar characteristics -- event-driven (when an HTTP request comes in) and server-only (no GUI). There are many bugs that don't show up until you put the system under load, as in dozens or hundreds of transactions per second. For instance, under light load a queue will never fill up, but under heavy load bizzarro, difficult-to-trace bugs will crop up that you can't reproduce on your development system. Even under the same load, your development system may run into a different constraint (e.g. CPU-bound so that it can't fill the queue fast enough and thus never hits the bug).
To have any hope of catching these bugs, you need to instrument your application heavily, with logging calls that you can turn on and off easily with some sort of switch (kill signal, special dialing code, etc.). Running with a debugger attached will likely be next to impossible on your production or staging systems.
Lastly, definitely invest in an automated test environment. You will need to do these kinds of debugging runs hundreds of times in the course of developing your app, and it just isn't feasible to have everyone in the company drop what they're doing and call into your app a dozen times a day. While there are plenty of load test tools for web apps, I'm not familiar with any for telephony apps, although some must exist. You may end up rolling your own from a bunch of old modems.
Good luck, as the bugs in these systems are notoriously difficult to hunt down.
--Paul
Not nearly enough info. (Score:4, Informative)
By event driven, do you mean that you have events requiring immediate attention, or do you have events you can buffer with water marks?
Is it a direct producer-consumer with only 1-way communication, or do you need bi-directional communication?
If you give some one a poorly worded problem, they cannot be expected to solve it.
Recommendations (Score:4, Interesting)
1 - be cautious about testing debug mode - there's an awful lot the compiler tosses in to enable debugging which may impact how the code actually executes.
2 - use logging extensively. I'd recommend using log4net or something like that.
3 - use an integration model for your unit testing. Start with the smallest unit tests and build upwards. This will allow you to gradually build "correct" code and focus on the messages/events between components.
4 - build a simulator (someone mentioned that before), they are truly invaluable. Keep it as simple as possible.
5 - check, double check, and triple check variable access. It's easy to run into a race condition between reads and writes. Study and understand lock(...), reader-writer-locks, semaphores, and mutexes.
6 - when testing, don't forget to test expected conditions, unexpected conditions, boundary conditions (null objects, empty strings, negative values, 0's, positive values, and overflows), errors (like zombie conditions where a response is _never_ generated, dropped connections, garbage results), etc.
7 - learn Debug.Assert and check your pre- and post-conditions
8 - if you use strings, make sure you understand how strings and stringbuilder operate - they can have dramatic differences in efficiency/memory utilization/and GC.
9 - events can be static, and don't forget to encapsulate your event accessors (they look like properties, but instead of get/set, they're add/remove)
10 - if you plan on using compiler optimization switches, use them last during testing - after you can prove the app works correctly. Optimization switches can dramatically reorder things which is definitely not good if you're trying to determine correctness.
11 - set the compiler to give the maximum warning level. Your app should generate no warnings or errors while compiling.
12 - walk through the code with someone to double check your logic and field access. If you can convey it to someone else either through comments, design, etc., and can justify all field accesses as well as access control, you'll be in good shape. Yes, this is peer review. It's even useful to haul in a project manager that knows nothing about coding and nods like a chicken. Listen to yourself as you talk and if you stumble or have a hard time explaining something, that's a hint that a redesign might be in order.
13 - you might want to put in some profiling counters so you can capture metrics on it. This way, as you change the code over time, you can almost quantifiably determine if the code is truly improving with respect to throughput, responses, etc. or not.
That's all I can think of off the top of my head.
Good luck, it's a fun journey.
Log everything (Score:3, Informative)
I'm unfortunately not too familiar with C#, so I can't comment on it's logging facilities (or lack thereof) other than the .NET EventLog [microsoft.com] class.
There is a project on Sourceforge called C# Logger [sourceforge.net] that is supposedly similar to log4j [apache.org] in Java. But it seems to be stuck in alpha release mode, and not particularly active.
Just my two cents. Hopefully it helps. :)
create a logger. (Score:2)
The trick is not having two threads running at the same time. Or create one logger per thread. (Just be careful with avoiding deadlocks and all that multithreaded stuff)
If your app is multitier, have a logger for each tier. Altho it would be trickier.
Robust Logging (Score:2)
I've you're still peering through a text file trying to figure out a stream of events, stop. Set up a database and a logging system that will track entries by session and process. It will save you a huge amount of time in the long run.
-Rick
Proper design helps a lot (Score:2)
Wow... Short answer, "Don't ask Slashdot". (Score:4, Interesting)
First of all - Debugging takes hard work. Sorry folks, no matter how easy Microsoft tries to make it, no matter how tightly they integrate Java-killer-P into app-Q, you still need the ability to follow the flow of bits from point A to Z, and more importantly, figure out what B through Y need to do.
How to debug asynchronous events... Since you mentioned c#, I will presume you have a REALLY coarse granularity here when you say "async".
So... First step, force non-reentrancy and non-overlapped event handling. Does your problem go away? Find the global data you clobbered.
Step 2 (if #1 fails) - Run both ends of your app on the same machine. Does the problem go away? Don't trust
Step 3 - Okay, you have a "real" bug, in your code. But on the bright side, if you got here, you can probably reproduce it, so, piece of cake. Load up your trusty debugger and dump a COMPLETE stack trace up to the error. Don't trust the last line to have caused the error, it just failed to deal with whatever broken crap the actual problem threw its way.
Step 4 - trace through your code, on both sides, one line at a time. Sound tedious? Yup. You might spend a week on a single run. But you'll sure as hell know the flow of your by the time it finishes.
Step 5 - No step 5 exists. Step 4 WILL let you find your problem, as long as it resides in your code and not in the aether between the two sides of the connection (which steps 1 and 2 should have eliminated as a problem).
Messages will get you (Score:2)
A good starting point is an old-school data flow diagram. Draw each of your processes as a bubble, and show links between them. If you have one signal going from bubble A to bubble B, and another signal going from bubble B back to bubble A, you have a possibility for deadlocks or races. A data flow diagram will give you a good insight into which signals need checking.
Grab.
Re:Messages will get you (Score:4, Funny)
. . . deadlocks are going to be your enemies. Especially deadlocks.
An easily vanquished enemy. Just log into Visual Haircut 2005 at least once a month or two, and don't forget to run the Help Me Stop Smoking Ganjj wizard as needed.
Re:Messages will get you (Score:2)
Software Engineering 101, Week 4 (Score:1)
log4net (Score:2)
Suggested format for async logging (Score:1)
Because you have multiple threads and sub systems we find this log format to be the best for our environment:
Issue command example:
2006-02-07 12:02:58,385 DEBUG [7852] (ts.cs:1231) im_lib_aic613 - (5739:agent2200:1426:43e88c6)