Clearcase Problems with Linux? 32
joecooler asks: "I work for an ASIC company in the verification group. We use VCS and Vera to write and run simulations, Clearcase for revision control, and LSF to manage our server farm. At my instigation my employer has begun to move to Linux PC's for our simulation server farm instead of the much more expensive and much slower Solaris Sun machines. Everything has been working well and everyone has been very pleased with the performance except for one 'small' problem - every two weeks or so we will suddenly see all jobs running on Linux machines crash. After much pain we have been able to isolate this to an issue with Clearcase returning files 'slowly' to the Linux machines, causing VCS compiles to die. Has anyone else had issues with Clearcase and Linux running on a PC? If so, how did you debug this and isolate the exact source of the problem? Is this solvable, or is it one of the mysteries of networking?"
NFS mounting remote clearcase VOBS (Score:3, Interesting)
IIRC, we had that problem at a former place of employment once.
Try using snapshot views (Score:1)
You do not specify that you are using dynamic views, but it sounds like you are.... Try using snapshot views instead. Another ( ugly ) idea is a preemptive reboot.
Rational customer support is always very friendly too... Have you called yet?Check the Knowledge Base too... [rational.com]
Are you kidding me?! (Score:5, Informative)
No personal offense intended. We've wrestled with the same problems ourselves and have ultimately decided to look at alternatives to clearcase.
There's a couple big problems. The biggest one is that clearcase requires you to use a modified linux kernel, and they only provide stable modifications for a handful of older, stale kernels. If you want to keep up with security updates, you are on your own. If you want to update to a newer kernel that solves some device driver problem, forget it. If your product depends on you using a custom kernel like ours does, you are totally screwed. Unless rational finds some way to make their product work without requiring specific kernel versions, it will never be a good fit with Linux. Your stability problems may be caused by not using a Rational-approved kernel.
The second huge problem with clearcase is not linux specific- it has to do with clearcase's architecture. Clearcase requires each client to use a proprietary NFS-like filesystem (MVFS) in order to interface nicely with the server. MVFS has a very high overhead both in terms of network traffic and server CPU time. It has poor security, poor performance, and poor reliability. Even on solaris it's ugly, and on rational's second tier systems such as Linux and Irix it's even worse. Imagine trying to maintain an entire closed-source network filesystem codebase just for one application. That's the problem that clearcase's development team faces, and I guess I can't fault them for not doing it well.
Clearcase's architecture realistically limits your clients to being on the same local network with a persistent, always-on connection. In addition, the server needs to be a very expensive top-end solaris box. Also, if you want to support remote development you either have to wrestle with the unfriendly, unpolished "snapshot views" configuration or shell out huge dollars for a multisite license and a dedicated person to support it.
If you are misfortunate enough to be stuck with an older or poorly performing network clearcase can be unusable. You absolutely must have high bandwidth, low latency paths between your clearcase server, build platforms, and clients. It sounds offhand like this may be your problem. Put in a direct (no hops) 100bT line between a linux client and the clearcase server, make sure the clearcase server isn't under heavy load from other people, and rerun your tests.
Rational encourages you to use clearcase to manage your entire build operation, and version binaries and object files as well as source. This does has some benefits, but it makes already bad performance become downright abyssal and makes it very difficult to switch products once you realize Clearcase is no longer the right fit for your organization.
Finally, Rational appears to be completely ignoring these shortcomings with clearcase on Unix. Over the last couple years they have ported Clearcase to Windows and rewritten all of the administration tools. However, the second-generation admin tools are WINDOWS ONLY. If you want to use tools that don't suck, you need a Windows box. I find it incredulous that rational had a cross-platform product, and when they had the opportunity to make cross platform tools using any number of high quality cross-platform libraries, they chose to go with one platform only. I've asked when the next generation tools will be ported back to Unix/Linux, and they have no plans to do that. I love the command line as much as any card-carrying unix geek, but I demand the best tools for the job. I don't like being on rational's second-class platform.
To me, this underscores the fact that sales and marketing are running the show over at Rational. Rational aquires products so they can lock in customers, and then they scale back development and move on to the next product. Unfortunately people using clearcase on unix have invested so much time integrating clearcase into their workflow that the costs of changing to a different SCM platform are unbearable. Yet, if you look around, you will find competitors like Perforce and BitKeeper offering better products at orders of magnitude less license/maintenance fees. These competing products scale better, can be used over the internet easily, don't require a custom kernel (!!!), and require substantially less dedicated support staff to maintain.
Shop around. Moving to Linux might be a good time to use something that works better and costs less than maintaining clearcase, even in the short term.
responding to my own post (Score:5, Informative)
One thing you might try is switching your build machines to use snapshot views. This reduces the network overhead and allows for more disconnected style of operation. It's a huge win for compile-farms where you only want to pull recent files and rarely if ever commit changes back. Doing this may solve your reliability issues as well speed up compiles.
Re:Are you kidding me?! (Score:1)
The ClearCASE tools wouldn't be such a big deal, but with major releases 3 and 4 they altered some of the Unix GUI tools to make them actually harder to use, slower, and less intuitive. I don't know if this is because of the Windows port or not, but it's made the tool a lot worse, with really no added benefit for ClearCASE v4.1 than there was with ClearCASE v2.1 five or so years ago.
Curious (Score:2)
My question is, why does your software require a custom kernel, especially if you think that the use of a custom kernel is a bad idea?
Re:Curious (Score:2, Insightful)
We provide a customized kernel that includes the most up to date drivers for the periphrials the box needs to talk with, some of which are esoteric. We also do some performance tuning and add some publicly available security patches.
I believe what we do is fundamentally different from what Rational does. We're selling a black box solution that solves one particularly complex problem. That's what our customers want, and there is no expectation that the customer will be able to run other applications on the platform, never mind use a different kernel. The product includes hardware and software maintenance that keeps the system up to date and secure, so it's important for maintenance purposes that we keep the system configuration under tight control.
Rational sells a software development tool. The expectation is that the end user will be running the client on the development system, which presupposes a wide variety of both hardware and software, depending on whatever the customer wants to develop. When rational ties their product to a small subset of Linux kernels, they dramatically limit what kinds of development you can do, which is not a particularly competitive thing to do. Worse, their supported kernels do not keep pace with security patches or major driver bugs (like the ext3 bug in redhat's initial 7.3 kernel release).
Hope that clears it up..
Not a big surprise (Score:1)
The horrible problem (that you don't mention in your post) is that because changes aren't atomic, any time the system crashes, your repository could be left in a corrupt state. At this point it takes a Clearcase trained admin to unwedge it, which could take a while.
In any event, don't beat yourself up over it; it's not likely to be something that your IT department is able to fix.
What's with all the custom kernel crap? (Score:3, Insightful)
Now, it's true that one had to handle checkins and checkouts from a Sun box, but, as the build farms mounted the exported views read-only, what's the big deal? Is it really necessary to integrate the source control system that tightly with the Linux-based development environment?
Re:What's with all the custom kernel crap? (Score:1)
Still, it's clear that Clearcase support is dependent on specific Linux kernels. You can decide for yourself how bad this is.
http://www.rational.com/support/documentation/r
Re:What's with all the custom kernel crap? (Score:1, Informative)
Did you RTFM ? (Score:1)
No real problems here (Score:2, Interesting)
That said, I'm no big fan of ClearCase. It seems needlessly complex and sluggish, has limited platform support (compared to CVS, which is what we used to use and would basically run on anything you could compile it on), and I think there's something just wrong about having a version control system have modules that run in kernel mode.
They're sleeping (Score:5, Informative)
We've supposedly opened a call with Rational, but I haven't heard anything.
I can make the glibc binaries available that we use on my website if anyone is interested and doesn't want to go through the effort of recompiling glibc themselves.
Now why would they be sleeping for 6 seconds when it doesn't appear to be necessary:
1: conspiracy theory-- M$ told them to
2: inept programming-- deadlock in their code, ahh, just put a sleep in to fix it
3: smart programmer-- It's review time, need to make this faster, I know, change that sleep(6) to sleep(5).
4: problem with Linux NFS... no can't be!
Re:They're sleeping (Score:5, Informative)
To generate your own glibc for use by cleartool:
- extract the src (this is for RedHat)
mkdir my_glibc; cd my_glibc
rpm2cpio | cpio -iumd
tar jxf glibc-...-tar.bz2
- edit sysdeps/unix/sysv/linux/sleep.c to just return 0 if seconds==6
- make build dir within glibc-2.2... dir created by extract
- from within build dir, configure and make
cd my_glibc/glibc-2.2..../build
make
- put the pieces together
mkdir ~/myct
cp my_glibc/glibc-2.2..../libc.so.X ~/myct
create a wrapper script to execute cleartool using this glibc:
#!/bin/bash
LD_LIBRARY_PATH=~/myct:$LD_LIBRARY
exec cleartool ${*}
- use it
~/myct/ct update
Here's a stacktrace when cleartool is making the sleep call, showing that their sysutl_nfs_flush function is indeed calling a sleep(6), luckily I've overwritten the sleep(6) to return immediately:
#0 0x409a9f01 in __libc_nanosleep () from
#1 0x409a9e82 in __sleep (seconds=6) at
#2 0x40815c3d in sysutl_nfs_flush () from
#3 0x40815beb in sysutl_nfs_flush () from
#4 0x40815beb in sysutl_nfs_flush () from
#5 0x40815beb in sysutl_nfs_flush () from
#6 0x40815beb in sysutl_nfs_flush () from
#7 0x40815beb in sysutl_nfs_flush () from
#8 0x40815beb in sysutl_nfs_flush () from
#9 0x407fc15f in fileutl_walk_tree_any () from
#10 0x407fc389 in fileutl_walk_tree () from
#11 0x407fdf69 in fileutl_cp () from
#12 0x406d0482 in ws_copy_file () from
#13 0x406d40a6 in ws_add_wso_file () from
#14 0x406d44e9 in ws_add_wso () from
#15 0x406d70e6 in ws_load_one_object () from
#16 0x406d63fb in ws_load_dir_ents () from
#17 0x406d72a0 in ws_load_one_object () from
#18 0x406d63fb in ws_load_dir_ents () from
#19 0x406d72a0 in ws_load_one_object () from
#20 0x406d7678 in ws_load_one_scope () from
#21 0x406d9c34 in ws_load_scopes () from
#22 0x40120062 in cmd_update_subr () from
#23 0x4011f86c in cmd_update () from
#24 0x40050013 in cmdsyn_update () from
#25 0x4002feea in cmdsyn_do_command () from
#26 0x400300cd in cmdsyn_execv_dispatch () from
#27 0x4044b92e in tool_main () from
#28 0x080499cc in main ()
My God, someone mod parent up... (Score:1)
Why do my Mod points always expire when nothing interesting is going on...
Clearcase prob (Score:1)
We were using RH 7.2, and when doing a build of our java sourcecode it would regularly crash and then hang the jvm. We found out it was a problem with the very complex tables used by Clearcase that the automounter could not understand.
We solved the problem by updating the system with automounter 4 rc1 and installing the latest versions of libc6 2.2.4
Suggestion (Score:1)
Re:Suggestion (Score:1)
The CCase commands are:
o cleartool lsview -long yourviewname /usr/atria/etc/export_mvfs -I 1 /view/yourview/yourvob
o
Get the number to pass to '-I' from previous command