Jeff Thomas
2008-05-13 21:48:12 UTC
Hello All,
We have a cluster with a windows2003 master node and 8 linux (Fedora 4)
slave nodes. Everything was working properly but now rsh fails to connect
to nodes 1-7.
pvm> add node1
add node1
0 successful
HOST DTID
node1 Can't start pvmd
Auto-Diagnosing Failed Hosts...
node1...
Verifying Local Path to "rsh"...
Error - File /usr/ucb/rsh Not Found!
Determine the path to the "rsh" command on your
system, and edit PVM_ROOT\conf\WIN32.def
to adjust the path for the -DRSHCOMMAND=\"\"
flag. Then recompile PVM and your applications.
I have restarted the entire cluster and cleaned the /tmp pvm*.* files on
each node multiple times. As well as restarting the BsdRshd service.
I can not rsh from the slave to the master:
[root at node1 ~]# rsh master "c:\cluster\wrshd\id.exe"
connect to address 192.168.66.250: Connection refused
Trying krb4 rsh...
connect to address 192.168.66.250: Connection refused
trying normal rsh (/usr/bin/rsh)
Access denied.
WRSHD in debug mode yields this:
C:\Cluster\wrshd>rshd -d
(/5/9 10:48:58) Checking WinSockets Version... (/5/9 10:48:58) done.
(/5/9 10:48:58) Loading Equivalence List...(/5/9 10:48:58) Getting
Information f
rom Trustbase
(/5/9 10:48:58) done.
(/5/9 10:48:58) Binding main socket.
(/5/9 10:48:58) cannot bind to the rshd daemon port.Debugging BsdRshd
In StartServiceCtrlDispatcher
Error number: 1063
The pvml Log file
[t80040000] master (192.168.66.250:1036) WIN32 3.4.3
[t80040000] ready Fri May 09 10:46:05 2008
[t80040000] netinput() bogus pkt from 192.168.66.1:32774
[t80040000] netinput() bogus pkt from 192.168.66.2:32771
[t80040000] netinput() bogus pkt from 192.168.66.3:32771
[t80040000] netinput() bogus pkt from 192.168.66.5:32771
[t80040000] netinput() bogus pkt from 192.168.66.6:32771
[t80040000] netinput() bogus pkt from 192.168.66.7:32771
[t80040000] netinput() bogus pkt from 192.168.66.8:32770
[t80040000] startack() host node1 expected version, got "PvmCantStart"
[t80040000] startack() host node2 expected version, got "PvmCantStart"
[t80040000] startack() host node3 expected version, got "PvmCantStart"
[t80040000] startack() host node4 expected version, got ""
[t80040000] startack() host node5 expected version, got "PvmCantStart"
[t80040000] startack() host node6 expected version, got "PvmCantStart"
[t80040000] startack() host node7 expected version, got "PvmCantStart"
[t80040000] startack() host node8 expected version, got "PvmCantStart"
[t80040000] netinput() bogus pkt from 192.168.66.1:32775
[t80040000] startack() host node1 expected version, got "PvmCantStart"
[t80040000] netinput() bogus pkt from 192.168.66.8:32771
[t80040000] startack(
I know it must be something simple becuase it was working fine before
this, any suggestions would be greatly appreciated.
Thanks
Jeff Thomas
We have a cluster with a windows2003 master node and 8 linux (Fedora 4)
slave nodes. Everything was working properly but now rsh fails to connect
to nodes 1-7.
pvm> add node1
add node1
0 successful
HOST DTID
node1 Can't start pvmd
Auto-Diagnosing Failed Hosts...
node1...
Verifying Local Path to "rsh"...
Error - File /usr/ucb/rsh Not Found!
Determine the path to the "rsh" command on your
system, and edit PVM_ROOT\conf\WIN32.def
to adjust the path for the -DRSHCOMMAND=\"\"
flag. Then recompile PVM and your applications.
I have restarted the entire cluster and cleaned the /tmp pvm*.* files on
each node multiple times. As well as restarting the BsdRshd service.
I can not rsh from the slave to the master:
[root at node1 ~]# rsh master "c:\cluster\wrshd\id.exe"
connect to address 192.168.66.250: Connection refused
Trying krb4 rsh...
connect to address 192.168.66.250: Connection refused
trying normal rsh (/usr/bin/rsh)
Access denied.
WRSHD in debug mode yields this:
C:\Cluster\wrshd>rshd -d
(/5/9 10:48:58) Checking WinSockets Version... (/5/9 10:48:58) done.
(/5/9 10:48:58) Loading Equivalence List...(/5/9 10:48:58) Getting
Information f
rom Trustbase
(/5/9 10:48:58) done.
(/5/9 10:48:58) Binding main socket.
(/5/9 10:48:58) cannot bind to the rshd daemon port.Debugging BsdRshd
In StartServiceCtrlDispatcher
Error number: 1063
The pvml Log file
[t80040000] master (192.168.66.250:1036) WIN32 3.4.3
[t80040000] ready Fri May 09 10:46:05 2008
[t80040000] netinput() bogus pkt from 192.168.66.1:32774
[t80040000] netinput() bogus pkt from 192.168.66.2:32771
[t80040000] netinput() bogus pkt from 192.168.66.3:32771
[t80040000] netinput() bogus pkt from 192.168.66.5:32771
[t80040000] netinput() bogus pkt from 192.168.66.6:32771
[t80040000] netinput() bogus pkt from 192.168.66.7:32771
[t80040000] netinput() bogus pkt from 192.168.66.8:32770
[t80040000] startack() host node1 expected version, got "PvmCantStart"
[t80040000] startack() host node2 expected version, got "PvmCantStart"
[t80040000] startack() host node3 expected version, got "PvmCantStart"
[t80040000] startack() host node4 expected version, got ""
[t80040000] startack() host node5 expected version, got "PvmCantStart"
[t80040000] startack() host node6 expected version, got "PvmCantStart"
[t80040000] startack() host node7 expected version, got "PvmCantStart"
[t80040000] startack() host node8 expected version, got "PvmCantStart"
[t80040000] netinput() bogus pkt from 192.168.66.1:32775
[t80040000] startack() host node1 expected version, got "PvmCantStart"
[t80040000] netinput() bogus pkt from 192.168.66.8:32771
[t80040000] startack(
I know it must be something simple becuase it was working fine before
this, any suggestions would be greatly appreciated.
Thanks
Jeff Thomas