Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
×
Linux Software

Clearcase Problems with Linux? 32

joecooler asks: "I work for an ASIC company in the verification group. We use VCS and Vera to write and run simulations, Clearcase for revision control, and LSF to manage our server farm. At my instigation my employer has begun to move to Linux PC's for our simulation server farm instead of the much more expensive and much slower Solaris Sun machines. Everything has been working well and everyone has been very pleased with the performance except for one 'small' problem - every two weeks or so we will suddenly see all jobs running on Linux machines crash. After much pain we have been able to isolate this to an issue with Clearcase returning files 'slowly' to the Linux machines, causing VCS compiles to die. Has anyone else had issues with Clearcase and Linux running on a PC? If so, how did you debug this and isolate the exact source of the problem? Is this solvable, or is it one of the mysteries of networking?"
This discussion has been archived. No new comments can be posted.

Clearcase Problems with Linux?

Comments Filter:
  • by renehollan ( 138013 ) <[rhollan] [at] [clearwire.net]> on Tuesday September 03, 2002 @08:52PM (#4192579) Homepage Journal
    Are you, by any chance, NFS-mounting remote clearcase VOBS?

    IIRC, we had that problem at a former place of employment once.

  • You do not specify that you are using dynamic views, but it sounds like you are.... Try using snapshot views instead. Another ( ugly ) idea is a preemptive reboot.

    Rational customer support is always very friendly too... Have you called yet?

    Check the Knowledge Base too... [rational.com]

  • Are you kidding me?! (Score:5, Informative)

    by Outland Traveller ( 12138 ) on Tuesday September 03, 2002 @09:57PM (#4192915)
    Clearcase is returning files slowly? I don't believe it!

    No personal offense intended. We've wrestled with the same problems ourselves and have ultimately decided to look at alternatives to clearcase.

    There's a couple big problems. The biggest one is that clearcase requires you to use a modified linux kernel, and they only provide stable modifications for a handful of older, stale kernels. If you want to keep up with security updates, you are on your own. If you want to update to a newer kernel that solves some device driver problem, forget it. If your product depends on you using a custom kernel like ours does, you are totally screwed. Unless rational finds some way to make their product work without requiring specific kernel versions, it will never be a good fit with Linux. Your stability problems may be caused by not using a Rational-approved kernel.

    The second huge problem with clearcase is not linux specific- it has to do with clearcase's architecture. Clearcase requires each client to use a proprietary NFS-like filesystem (MVFS) in order to interface nicely with the server. MVFS has a very high overhead both in terms of network traffic and server CPU time. It has poor security, poor performance, and poor reliability. Even on solaris it's ugly, and on rational's second tier systems such as Linux and Irix it's even worse. Imagine trying to maintain an entire closed-source network filesystem codebase just for one application. That's the problem that clearcase's development team faces, and I guess I can't fault them for not doing it well.

    Clearcase's architecture realistically limits your clients to being on the same local network with a persistent, always-on connection. In addition, the server needs to be a very expensive top-end solaris box. Also, if you want to support remote development you either have to wrestle with the unfriendly, unpolished "snapshot views" configuration or shell out huge dollars for a multisite license and a dedicated person to support it.

    If you are misfortunate enough to be stuck with an older or poorly performing network clearcase can be unusable. You absolutely must have high bandwidth, low latency paths between your clearcase server, build platforms, and clients. It sounds offhand like this may be your problem. Put in a direct (no hops) 100bT line between a linux client and the clearcase server, make sure the clearcase server isn't under heavy load from other people, and rerun your tests.

    Rational encourages you to use clearcase to manage your entire build operation, and version binaries and object files as well as source. This does has some benefits, but it makes already bad performance become downright abyssal and makes it very difficult to switch products once you realize Clearcase is no longer the right fit for your organization.

    Finally, Rational appears to be completely ignoring these shortcomings with clearcase on Unix. Over the last couple years they have ported Clearcase to Windows and rewritten all of the administration tools. However, the second-generation admin tools are WINDOWS ONLY. If you want to use tools that don't suck, you need a Windows box. I find it incredulous that rational had a cross-platform product, and when they had the opportunity to make cross platform tools using any number of high quality cross-platform libraries, they chose to go with one platform only. I've asked when the next generation tools will be ported back to Unix/Linux, and they have no plans to do that. I love the command line as much as any card-carrying unix geek, but I demand the best tools for the job. I don't like being on rational's second-class platform.

    To me, this underscores the fact that sales and marketing are running the show over at Rational. Rational aquires products so they can lock in customers, and then they scale back development and move on to the next product. Unfortunately people using clearcase on unix have invested so much time integrating clearcase into their workflow that the costs of changing to a different SCM platform are unbearable. Yet, if you look around, you will find competitors like Perforce and BitKeeper offering better products at orders of magnitude less license/maintenance fees. These competing products scale better, can be used over the internet easily, don't require a custom kernel (!!!), and require substantially less dedicated support staff to maintain.

    Shop around. Moving to Linux might be a good time to use something that works better and costs less than maintaining clearcase, even in the short term.
    • by Outland Traveller ( 12138 ) on Tuesday September 03, 2002 @10:01PM (#4192928)
      Sorry to have ranted a little off subject to your original question.

      One thing you might try is switching your build machines to use snapshot views. This reduces the network overhead and allows for more disconnected style of operation. It's a huge win for compile-farms where you only want to pull recent files and rarely if ever commit changes back. Doing this may solve your reliability issues as well speed up compiles.
    • The ClearCASE tools wouldn't be such a big deal, but with major releases 3 and 4 they altered some of the Unix GUI tools to make them actually harder to use, slower, and less intuitive. I don't know if this is because of the Windows port or not, but it's made the tool a lot worse, with really no added benefit for ClearCASE v4.1 than there was with ClearCASE v2.1 five or so years ago.

    • I don't mean to be a troll but, I am curious about something you said. You stated that Clearcase uses a custom kernel and that this is bad, which I agree with. But, then you say that the software that you are developing, in Clearcase, requires a customized kernel.

      My question is, why does your software require a custom kernel, especially if you think that the use of a custom kernel is a bad idea?
      • Re:Curious (Score:2, Insightful)

        The main difference is that we sell a "turn-key solution". It's a black box destined to go straight to end users who don't interact with the OS directly and in 99% of cases don't want to even know whether Linux is under the hood or Windows, or Solaris, etc, so long as box does what it is supposed to do. Of course a super power user could get into the OS and run other applications and scripts- unlike some other vendors we make it relatively easy to do that.

        We provide a customized kernel that includes the most up to date drivers for the periphrials the box needs to talk with, some of which are esoteric. We also do some performance tuning and add some publicly available security patches.

        I believe what we do is fundamentally different from what Rational does. We're selling a black box solution that solves one particularly complex problem. That's what our customers want, and there is no expectation that the customer will be able to run other applications on the platform, never mind use a different kernel. The product includes hardware and software maintenance that keeps the system up to date and secure, so it's important for maintenance purposes that we keep the system configuration under tight control.

        Rational sells a software development tool. The expectation is that the end user will be running the client on the development system, which presupposes a wide variety of both hardware and software, depending on whatever the customer wants to develop. When rational ties their product to a small subset of Linux kernels, they dramatically limit what kinds of development you can do, which is not a particularly competitive thing to do. Worse, their supported kernels do not keep pace with security patches or major driver bugs (like the ext3 bug in redhat's initial 7.3 kernel release).

        Hope that clears it up..
  • What you are describing is classic, textbook Clearcase behaviour. It's not known for speed or stability. It's most likely to be a bug in the kernel patches you're required to install.

    The horrible problem (that you don't mention in your post) is that because changes aren't atomic, any time the system crashes, your repository could be left in a corrupt state. At this point it takes a Clearcase trained admin to unwedge it, which could take a while.

    In any event, don't beat yourself up over it; it's not likely to be something that your IT department is able to fix.
    • Back in the day, when I used Clearcase with Linux (c. 1999-2001), our Sun boxes were set up to make Clearcase views visible as NFS-mounted directories. No shmancy proprietary MVFS-hacked Linux. We used Multisite without any problems, either.

      Now, it's true that one had to handle checkins and checkouts from a Sun box, but, as the build farms mounted the exported views read-only, what's the big deal? Is it really necessary to integrate the source control system that tightly with the Linux-based development environment?

      • That's a good question. I don't know whether it's absolutely required for Clearcase to use MVFS.

        Still, it's clear that Clearcase support is dependent on specific Linux kernels. You can decide for yourself how bad this is.

        http://www.rational.com/support/documentation/re le ase/l-k_support_policy.jsp
        • by Anonymous Coward
          ClearCase servers do not require MVFS, just the client machine that accesses the VOBs or dynamic views. On Linux to use MVFS you need to insmod a kernel module to support MVFS, this module is can be re-linked to support a slightly different version of kernel, however if any kernel structure sizes change it will not work.
  • Looks like you aren't sure what the problem is, just don't panic, these things do happen once in a while, do read the manual properly..
  • I use ClearCase on Linux where I work and haven't had any major problems (except that no one here can quite figure out how to get the Linux automounter to work with ClearCase and I'm too lazy to try to figure it out myself).

    That said, I'm no big fan of ClearCase. It seems needlessly complex and sluggish, has limited platform support (compared to CVS, which is what we used to use and would basically run on anything you could compile it on), and I think there's something just wrong about having a version control system have modules that run in kernel mode.
  • They're sleeping (Score:5, Informative)

    by XavierPenguin ( 81787 ) on Wednesday September 04, 2002 @01:18PM (#4195794)
    Do a strace of cleartool. After every file they have a 6 second sleep. So we just link in a glibc with sleep(6) overridden to not sleep. Works like a charm.

    We've supposedly opened a call with Rational, but I haven't heard anything.

    I can make the glibc binaries available that we use on my website if anyone is interested and doesn't want to go through the effort of recompiling glibc themselves.

    Now why would they be sleeping for 6 seconds when it doesn't appear to be necessary:
    1: conspiracy theory-- M$ told them to
    2: inept programming-- deadlock in their code, ahh, just put a sleep in to fix it
    3: smart programmer-- It's review time, need to make this faster, I know, change that sleep(6) to sleep(5).
    4: problem with Linux NFS... no can't be!
    • Re:They're sleeping (Score:5, Informative)

      by XavierPenguin ( 81787 ) on Wednesday September 04, 2002 @03:50PM (#4196676)
      A little more info from my notes in case anyone is interested. YMMV and don't blame me if your views get trashed, but we haven't seen any problems with this approach:

      To generate your own glibc for use by cleartool:

      - extract the src (this is for RedHat)

      mkdir my_glibc; cd my_glibc
      rpm2cpio | cpio -iumd
      tar jxf glibc-...-tar.bz2

      - edit sysdeps/unix/sysv/linux/sleep.c to just return 0 if seconds==6

      - make build dir within glibc-2.2... dir created by extract

      - from within build dir, configure and make

      cd my_glibc/glibc-2.2..../build ../configure --enable-add-ons=yes --without-cvs
      make

      - put the pieces together

      mkdir ~/myct
      cp my_glibc/glibc-2.2..../libc.so.X ~/myct

      create a wrapper script to execute cleartool using this glibc:

      #!/bin/bash
      LD_LIBRARY_PATH=~/myct:$LD_LIBRARY_ PATH
      exec cleartool ${*}

      - use it

      ~/myct/ct update

      Here's a stacktrace when cleartool is making the sleep call, showing that their sysutl_nfs_flush function is indeed calling a sleep(6), luckily I've overwritten the sleep(6) to return immediately:

      #0 0x409a9f01 in __libc_nanosleep () from /home/xp/glibc-2.2.4/build/libc.so.6
      #1 0x409a9e82 in __sleep (seconds=6) at ../sysdeps/unix/sysv/linux/sleep.c:85
      #2 0x40815c3d in sysutl_nfs_flush () from /usr/atria/shlib/libatriaks.so
      #3 0x40815beb in sysutl_nfs_flush () from /usr/atria/shlib/libatriaks.so
      #4 0x40815beb in sysutl_nfs_flush () from /usr/atria/shlib/libatriaks.so
      #5 0x40815beb in sysutl_nfs_flush () from /usr/atria/shlib/libatriaks.so
      #6 0x40815beb in sysutl_nfs_flush () from /usr/atria/shlib/libatriaks.so
      #7 0x40815beb in sysutl_nfs_flush () from /usr/atria/shlib/libatriaks.so
      #8 0x40815beb in sysutl_nfs_flush () from /usr/atria/shlib/libatriaks.so
      #9 0x407fc15f in fileutl_walk_tree_any () from /usr/atria/shlib/libatriaks.so
      #10 0x407fc389 in fileutl_walk_tree () from /usr/atria/shlib/libatriaks.so
      #11 0x407fdf69 in fileutl_cp () from /usr/atria/shlib/libatriaks.so
      #12 0x406d0482 in ws_copy_file () from /usr/atria/shlib/libatriaview.so
      #13 0x406d40a6 in ws_add_wso_file () from /usr/atria/shlib/libatriaview.so
      #14 0x406d44e9 in ws_add_wso () from /usr/atria/shlib/libatriaview.so
      #15 0x406d70e6 in ws_load_one_object () from /usr/atria/shlib/libatriaview.so
      #16 0x406d63fb in ws_load_dir_ents () from /usr/atria/shlib/libatriaview.so
      #17 0x406d72a0 in ws_load_one_object () from /usr/atria/shlib/libatriaview.so
      #18 0x406d63fb in ws_load_dir_ents () from /usr/atria/shlib/libatriaview.so
      #19 0x406d72a0 in ws_load_one_object () from /usr/atria/shlib/libatriaview.so
      #20 0x406d7678 in ws_load_one_scope () from /usr/atria/shlib/libatriaview.so
      #21 0x406d9c34 in ws_load_scopes () from /usr/atria/shlib/libatriaview.so
      #22 0x40120062 in cmd_update_subr () from /usr/atria/shlib/libatriacmd.so
      #23 0x4011f86c in cmd_update () from /usr/atria/shlib/libatriacmd.so
      #24 0x40050013 in cmdsyn_update () from /usr/atria/shlib/libatriacmdsyn.so
      #25 0x4002feea in cmdsyn_do_command () from /usr/atria/shlib/libatriacmdsyn.so
      #26 0x400300cd in cmdsyn_execv_dispatch () from /usr/atria/shlib/libatriacmdsyn.so
      #27 0x4044b92e in tool_main () from /usr/atria/shlib/libatriatool.so
      #28 0x080499cc in main ()
    • People posting rants about how bad the software is get +5 Informatives (twice even), people suggesting open source alternatives get insightfuls, and an actual cool hack to get around lazy/stupid programming that ANSWERS the question posed and involves actually getting down into the nitty gritty hidden details of how Linux handles system calls and ways to make bad programs behave using some neat coding goes UNMODDED!? Sheesh people!
      Why do my Mod points always expire when nothing interesting is going on...
  • We had a similar problem, this may help..
    We were using RH 7.2, and when doing a build of our java sourcecode it would regularly crash and then hang the jvm. We found out it was a problem with the very complex tables used by Clearcase that the automounter could not understand.
    We solved the problem by updating the system with automounter 4 rc1 and installing the latest versions of libc6 2.2.4
  • Don't want to put down the commercial software bashers but who do you call to support your 'solutions' ? Rational has good customer support available anytime. We now have CCase clients running on the 7 platforms we port our software to, and clean integration with CQuest, a defect tracking product. Rational's products are expensive but management sees their solutions as supportable even when turnover ( and RIFs ) dilute the company's knowledge base. Not trolling, on to my suggestion. One thing you can try is mounting your dynamic view from your server directly to your linux box. This is what we use for unsupported kernel levels, assuming your server is *nix. Create, or modify, a dynamic view to be MVFS exportable with the '-nca' flag. Then when you use command 'cleartool lsview -long ' you will see 'View export ID (registry): 1'. On your CCase server, then use command '/usr/atria/etc/export_mvfs -I 1 /view//' to share this MVFS drive. Example: usr/atria/etc/export_mvfs -I 1 /view/john/vobs/test. At this point you can mount this view/VOB combination like it is a shared NFS drive. Good luck, John
    • Some of my syntax got garbled when I used special characters

      The CCase commands are:

      o cleartool lsview -long yourviewname
      o /usr/atria/etc/export_mvfs -I 1 /view/yourview/yourvob
      Get the number to pass to '-I' from previous command

He has not acquired a fortune; the fortune has acquired him. -- Bion

Working...