You may want to try this (http://www.biotech.ufl.edu/people/sun/esprit.html
Post by Paulo NuinHi
Just my two cents. Aligning rRNA is not a straightforward process and
it shouldn't be attempted to be accomplished automatically. Muscle,
MAFFT and other fast algorithms will generate very low quality
alignments if it's done blindly. Based on the number of sequences you
have, and their nature, you would be OK by wrapping some script around
ClustalW or ClustalW-MPI.
- align two sequences
- add a third sequence to it by using the first two as a profile
- add a fourth sequence using the first three as a profile
- add a fifth sequence ...
- at some point you will have a good enough profile that would allow
you to use the aligned sequences as a model to the ones added to the
alignment
The reason is rRNA has a secondary (and tertiary) structure that
contains stems and loops. Stems are short segments that are somewhat
"duplicated" along the flat sequence and attache to each other when
forming the secondary structure. This connection sometimes don't
follow the usual A-T(U) C-G connection. Due to the stems there is a
pattern on the primary structure that has to be followed to generate a
good (but not excellent) alignment.
I guess a rRNA alignment software would be too slow for your
requirements, but I guess by using ClustalW-MPI and some sequences as
profile would you get a slightly good alignment in maybe a couple of
days.
Hope that helps
Paulo
Post by Nick HolwayHello,
Steve actually posted this on behalf of me, so to cut out the middle
man I'll answer.
I'm trying to assist a scientist with a bioinformatics project. He's
trying to align 16s rDNA sequences to identify the bacterial species.
I launched a Muscle job on his behalf which took ~5.5 days to run (on
3GHz "Harpertown" Xeons). The file the scientist gave me had ~5000
sequences in which were mostly 1000-1500 bases long.
I'm trying to persuade the scientist to see if he can reduce the
number of sequences that he needs to align and also to see if his data
needs to let Muscle run to completion rather than just the first two
iterations.
My reason for wanting to know if there are any good parallel sequence
alignment tools is that we've seen some excellent speed increases with
our MD code. Knowing this scientist I imagine he'll need the entire
data set to be aligned :)
If you need me to find out any more information from the scientist
please let me know.
Thanks
Nick
Post by Juan Carlos PerinAre you looking to align short reads from ngs, or other data?
~ juan
Post by slitsterDoes anyone have recommnedations for a parallel sequence alignment tool
User investigation so far has turned up clustalW-MPI, but it seams to be
using an older version of clustalW.
Any imput much appreciated.
Cheers
Steve
_______________________________________________
Bioclusters maillist - Bioclusters at bioinformatics.org
http://www.bioinformatics.org/mailman/listinfo/bioclusters
_______________________________________________
Bioclusters maillist - Bioclusters at bioinformatics.org
http://www.bioinformatics.org/mailman/listinfo/bioclusters
_______________________________________________
Bioclusters maillist - Bioclusters at bioinformatics.org
http://www.bioinformatics.org/mailman/listinfo/bioclusters
_______________________________________________
Bioclusters maillist - Bioclusters at bioinformatics.org
http://www.bioinformatics.org/mailman/listinfo/bioclusters