Zhiliang Hu
2008-09-09 18:18:28 UTC
Dug further on my qsub/mpirun problems, now I came to an interesting situation:
(1)
I used to have following qsub/mpirun that worked for half a year (I reported on its initial success on this list last December):
--------------------------------------
qsub -l nodes=6:ppn=2
-e /path/to/locationA
-o /path/to/locationA
/path/to/program
where "program" is:
/path/to/bin/mpirun
/path/to/mpiblast
-p blastn
-d seq.db
-i /path/to/input.seq
-o /path/to/output.txt
--------------------------------------
After we fixed some hardware issues (I can't see anything relevant but just as it occurred for your info), now it complains (in torque's "..ER" file): "Sorry, mpiBLAST must be run on 3 or more nodes". (Also in the node's /undeliverred/ errors).
(2)
If I modify the "program" to run on command line as following, it works fine:
----------------------------------------------
/path/to/bin/mpirun -np 12 -machinefile /path/to/mpimachines
/path/to/mpiblast
-p blastn
-d seq.db
-i /path/to/input.seq
-o /path/to/output.txt
----------------------------------------------
(3)
I do not think this is right but for trial, if I run it as in:
--------------------------------------
qsub -l nodes=6:ppn=2
-e /path/to/locationA
-o /path/to/locationA
/path/to/program
where "program" is:
/path/to/bin/mpirun -np 12 -machinefile /path/to/mpimachines
/path/to/mpiblast
-p blastn
-d seq.db
-i /path/to/input.seq
-o /path/to/output.txt
--------------------------------------
It fails with error: "pls:tm: failed to poll for a spawned proc, return status = 17002".
I am hoping, with some improvements on (1) will make it work again, but it ran out of my knowledge; therefore seek helps here.
Thank you in advance,
Zhiliang
(1)
I used to have following qsub/mpirun that worked for half a year (I reported on its initial success on this list last December):
--------------------------------------
qsub -l nodes=6:ppn=2
-e /path/to/locationA
-o /path/to/locationA
/path/to/program
where "program" is:
/path/to/bin/mpirun
/path/to/mpiblast
-p blastn
-d seq.db
-i /path/to/input.seq
-o /path/to/output.txt
--------------------------------------
After we fixed some hardware issues (I can't see anything relevant but just as it occurred for your info), now it complains (in torque's "..ER" file): "Sorry, mpiBLAST must be run on 3 or more nodes". (Also in the node's /undeliverred/ errors).
(2)
If I modify the "program" to run on command line as following, it works fine:
----------------------------------------------
/path/to/bin/mpirun -np 12 -machinefile /path/to/mpimachines
/path/to/mpiblast
-p blastn
-d seq.db
-i /path/to/input.seq
-o /path/to/output.txt
----------------------------------------------
(3)
I do not think this is right but for trial, if I run it as in:
--------------------------------------
qsub -l nodes=6:ppn=2
-e /path/to/locationA
-o /path/to/locationA
/path/to/program
where "program" is:
/path/to/bin/mpirun -np 12 -machinefile /path/to/mpimachines
/path/to/mpiblast
-p blastn
-d seq.db
-i /path/to/input.seq
-o /path/to/output.txt
--------------------------------------
It fails with error: "pls:tm: failed to poll for a spawned proc, return status = 17002".
I am hoping, with some improvements on (1) will make it work again, but it ran out of my knowledge; therefore seek helps here.
Thank you in advance,
Zhiliang