Slashdot is powered by your submissions, so send in your scoop

 



Forgot your password?
typodupeerror
×
Debian

Solaris And Linux NFS Problems 3

mrgrumpy asks: "I run Debian (unstable, woody) kernel 2.2.14 with everything dangerously up to date, and I also run a Solaris7 (Sparc not Intel). I've had NFS working (not with autofs, just mount and share) between the two boxes fine for a few weeks since I set them up. I recently applied a patch cluster from SunSolve to the Sun box, and lo and behold, NFS stopped working. In the patch list there were quite a few NFS fixes with the kernel patch 106541-10. I have the home directories, and a development directory from the Sun box (which serves NIS, NFS, and just about everything) mounted on the Linux box. Most of the time nothing goes wrong. But, when I run the distributed.net client on the Linux box, which needs to read and write files that are mounted across from the Sun box, it locks up and I get messages such as 'Apr 17 14:20:31 boink kernel: nfs: task 1473 can't get a request slot...' in the logs." Can anyone figure out what's going wrong here? (Read more)

"My machines are: boink (GNU/Linux 2.2.14 kernel), and splat (Solaris7 [Sparc]). If I run snoop on the Sun box I get:

root@splat$ snoop splat and boink rpc nfs
boink.home.cyber4.org -> splat.home.cyber4.org NFS C LOOKUP2 FH=009B_lJAQ.boink
splat.home.cyber4.org -> boink.home.cyber4.org NFS R LOOKUP2 OK FH=9668
boink.home.cyber4.org -> splat.home.cyber4.org NFS C LOOKUP2 FH=009B root.lock
splat.home.cyber4.org -> boink.home.cyber4.org NFS R LOOKUP2 OK FH=9E2D
boink.home.cyber4.org -> splat.home.cyber4.org NFS C REMOVE2 FH=009B_lJAQ.boink
splat.home.cyber4.org -> boink.home.cyber4.org NFS R REMOVE2 OK
boink.home.cyber4.org -> splat.home.cyber4.org NFS C LOOKUP2 FH=009B root.lock
splat.home.cyber4.org -> boink.home.cyber4.org NFS R LOOKUP2 OK FH=9E2D
boink.home.cyber4.org -> splat.home.cyber4.org NFS C WRITE2 FH=79D9 at0 for 4096 (retransmit)
boink.home.cyber4.org -> splat.home.cyber4.org NFS C CREATE2 FH=009B_yGAQ.boink
splat.home.cyber4.org -> boink.home.cyber4.org NFS R CREATE2 OK FH=AA9B
boink.home.cyber4.org -> splat.home.cyber4.org NFS C WRITE2 FH=AA9B at0 for 1
all throughout the logs. I had read that locking for nfs on Linux is not great, and I am using knfsd with NFS compiled in the kernel

Usually the errors cascade down a spiral of death until I reboot the Linux box. in.lockd on Linux doesn't have a debug mode (Solaris does, restart it with -d3) and I don't seem to be able to find any other way of debugging it."

So at this point, grumpy either needs a solution method or a way to debug in.lockd. Are there any methods that may prove useful in attempting to recover from an NFS "spiral of death"?

This discussion has been archived. No new comments can be posted.

Solaris And Linux NFS Problems

Comments Filter:
  • by dlc ( 41988 ) <dlc@noSPaM.sevenroot.org> on Friday April 21, 2000 @03:04AM (#1120012) Homepage

    NFS under Linux is notorious for it's locknig problems. There looks like there are a few options here:

    • Back out of that Solaris patch. You did backup everything up and install the patch with the backout option, didn't you?
    • If it only happens with the distributed.net client, stop using it. ("Doctor, it hurts when I do this" "Then don't do that"). Or, you could run the entire distributed.net process on a partition local to the Linux box.
    • Make sure that you are running the latest stable glibc (you're using a development version of Debian, right?), see about any NFS-related kernel patches (don't know know if there are any). Consider trying a development kernel; I've had good luck with 2.3.51, for example. The latest is 2.3.99-pre-something-or-other, these might also be good to try. Try a few different kernels: the NFS stuff is constantly changing. The 2.3.51 config contains experimental support for NFS v3, you might want to try that out. If you're running something like the distributed.net stuff, then it can't be too essential a machine and you can afford to test new kernels.

    You'll probably run into this problem with anything that tries to do locking over NFS. I would slap together a few scripts or small programs that do locking over NFS to see if it might be the something strange that distributed.net client is doing. Maybe it's compiled for libc5 or libc6, and that might be causing problems (wild guesses here).

    good luck.

    darren


    Cthulhu for President! [cthulhu.org]
  • *sun* NFS support (not just plain vanilla NFS) compiled into the 2.2.14 kernel. works fine with my solaris 2.6 and 2.7 boxen.
  • I'd start at the following URL.

    http://sourceforge.net/project/?group_id=14

    Linux and NFS are gradually maturing, but there still seems to be some incompatibilities lurking around. Following the NFS mailing list is a good way to keep on top of things, and would be the best place to post this question if none of the web pages answer your question.

"Show business is just like high school, except you get paid." - Martin Mull

Working...