Discussion:
[Bioclusters] BLAST speed mystery
Justin Powell
2010-02-14 20:17:04 UTC
Permalink
I have a couple of new fast servers with 24GB Ram and raid 0 15K SAS hard drives (2 in each server). I've run some tests using BLASTN on the est_mouse database which is 1.7GB. As one would expect the results when repeating identical BLASTS are fairly impressive as the database becomes effectively cached in RAM. However I have also been doing some timings of the first BLAST when no cached data is available. I've found that if I have the database on the local 15K drives the first blast takes about 45 seconds with my particular query sequence. However doing the identical BLAST but instead NFS mounting the databases off an old Apple G4 Xserver attached to an old Apple XRaid (which has 8 parallel ATA disks) the first blast runs in 30 seconds. (I ensure that there is no cached data in the NFS server too).

Measuring straight throughput off disk using dd shows that the 15K disks can deliver 300MB/sec, whereas the NFS mounted Xserve/Xraid combination only delivers 70MB/second. So its not a problem with streaming throughput. Possibly its something to do with IOPS - I'm not sure what a decent benchmarking tool would be for that so I don't have figures currently - is BLAST particularly sensitive to this?

Interestingly I have tried mounting the database via NFS from one of the new servers across to the other. When using an 8K block size (same as for the XServe NFS mount) I again get 45 seconds for the first BLAST iteration. Interestingly when I increase the block size to 32K the time for the first BLAST iteration drops down to 30 seconds, comparable to the Xserve case.

I'm not sure what this means. Possibly the block size result implies some sort of read-ahead would improve things, but turning on read-ahead on the RAID controller did not improve the performance of the SAS disk based BLAST. Is the problem possibly IOPs limitation and solvable by putting more disks in the raid 0 array? The NFS block size results imply some sort of tuning should be possible even with the existing disks, but I'm not sure what to try.

Anyone have any ideas?

Justin
Georgios Magklaras
2010-02-20 20:12:47 UTC
Permalink
In general, I tend to use iozone (http://www.iozone.org/) to measure
IOPS before I put cluster nodes into production. I assume that your
BLAST versions between the G4 and the new server (Linux?) environment
are the same.

Doing a vm_stat (on MACOSX) and vmstat (Linux) during the BLAST op (both
precached and with est_mouse cached) can give you rough figures of disk
throughput and buffer cache (yes, having more stripes is useful, but
something else might be happening)

However, it would be useful to give us software (OS/kernel version) and
hardware (RAID controller) versions on your new servers.

GM
Post by Justin Powell
I have a couple of new fast servers with 24GB Ram and raid 0 15K SAS hard drives (2 in each server). I've run some tests using BLASTN on the est_mouse database which is 1.7GB. As one would expect the results when repeating identical BLASTS are fairly impressive as the database becomes effectively cached in RAM. However I have also been doing some timings of the first BLAST when no cached data is available. I've found that if I have the database on the local 15K drives the first blast takes about 45 seconds with my particular query sequence. However doing the identical BLAST but instead NFS mounting the databases off an old Apple G4 Xserver attached to an old Apple XRaid (which has 8 parallel ATA disks) the first blast runs in 30 seconds. (I ensure that there is no cached data in the NFS server too).
Measuring straight throughput off disk using dd shows that the 15K disks can deliver 300MB/sec, whereas the NFS mounted Xserve/Xraid combination only delivers 70MB/second. So its not a problem with streaming throughput. Possibly its something to do with IOPS - I'm not sure what a decent benchmarking tool would be for that so I don't have figures currently - is BLAST particularly sensitive to this?
Interestingly I have tried mounting the database via NFS from one of the new servers across to the other. When using an 8K block size (same as for the XServe NFS mount) I again get 45 seconds for the first BLAST iteration. Interestingly when I increase the block size to 32K the time for the first BLAST iteration drops down to 30 seconds, comparable to the Xserve case.
I'm not sure what this means. Possibly the block size result implies some sort of read-ahead would improve things, but turning on read-ahead on the RAID controller did not improve the performance of the SAS disk based BLAST. Is the problem possibly IOPs limitation and solvable by putting more disks in the raid 0 array? The NFS block size results imply some sort of tuning should be possible even with the existing disks, but I'm not sure what to try.
Anyone have any ideas?
Justin
_______________________________________________
Bioclusters maillist - Bioclusters at bioinformatics.org
http://www.bioinformatics.org/mailman/listinfo/bioclusters
Justin Powell
2010-03-02 18:19:15 UTC
Permalink
I should clarify that the actual BLAST executable is running on the same Intel based Linux server in both cases, it's just that in one case the data is coming off of local RAID0 15K SAS drives and in the other the data is coming in over NFS from the Mac server. So there is no BLAST version on the G4.

Meanwhile I did some tests on altering the read-ahead setting on the RAID0 config on the local drives. The default read-ahead is 256 sectors, which corresponds to 128K. I've found that putting this up to 1024 sectors (512K) or higher reduces the times significantly (see below). However tests with dd show that 1024 sector read-ahead is not really any faster than 256 sector read-ahead for simple streaming of a file (both are a lot better than no read-ahead). So I'm wondering whether this still fundamentally some kind of IOPs limitation resulting from BLAST not streaming the entire file in but instead making random accesses to it - and the larger read-ahead just means that the random access is more likely to read data already present in RAM.

So now I can get improved performance sufficient for my needs by tweaking the read-ahead, but I'm still curious as to what is going on.

I have run vmstat - however this does not seem to report any IO for the NFS based blast. I assume the IO column relates only to local disk IO? For the server this data is taken from, uname -a gives:
2.6.18-164.11.1.el5 #1 SMP Wed Jan 20 07:32:21 EST 2010 x86_64 x86_64 x86_64 GNU/Linux
The RAID hardware in this case is a Dell SAS 6/iR, but I get similar results from the other system which has a PERC6i.

BLAST here is 32 bit (64 bit gives less advantage to NFS vs 256 sector read-ahead of local disks).

Results are:

BLASTN with data on local RAID0 drives, read-ahead 1024, BLAST completes in 24 seconds
[root at cam-clu-01 testblast]# vmstat 2
procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------
r b swpd free buff cache si so bi bo in cs us sy id wa st
0 0 0 24450688 644 36156 0 0 2 0 2 1 0 0 100 0 0
0 0 0 24450700 644 36152 0 0 0 0 1017 150 0 0 100 0 0
2 0 0 24424908 784 60212 0 0 12318 0 1099 304 3 0 96 1 0
2 0 0 24364052 848 120192 0 0 30224 40 1139 416 11 0 87 1 0
2 0 0 24304600 912 179536 0 0 29648 0 1138 410 11 0 88 1 0
2 0 0 24246444 968 237600 0 0 29228 0 1137 411 11 0 88 1 0
2 0 0 24189060 1020 295976 0 0 28630 0 1132 390 11 0 88 1 0
1 2 0 24129816 1064 353904 0 0 29436 6 1134 390 11 0 87 1 0
2 0 0 24073940 1104 410124 0 0 27912 0 1129 405 11 0 88 1 0
2 0 0 24016624 1140 465988 0 0 28556 0 1131 407 11 0 88 1 0
2 0 0 23959072 1184 524148 0 0 28754 4 1135 412 11 0 88 1 0
2 0 0 23901584 1204 581564 0 0 28568 0 1132 401 11 0 88 1 0
2 0 0 23843364 1216 639748 0 0 29036 0 1133 399 11 0 88 1 0
0 1 0 23743492 1676 738508 0 0 49406 6 1197 490 2 0 93 5 0
0 0 0 23701168 1820 782056 0 0 21732 0 1123 347 0 0 96 3 0
0 0 0 23701300 1828 781900 0 0 0 24 1018 155 0 0 100 0 0
0 0 0 23701424 1828 781908 0 0 0 0 1014 151 0 0 100 0 0
0 0 0 23701416 1828 781908 0 0 0 0 1018 155 0 0 100 0 0

BLASTN with data on local RAID0 drives, read-ahead 256, BLAST completes in 44 seconds
[root at cam-clu-01 testblast]# vmstat 2
procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------
r b swpd free buff cache si so bi bo in cs us sy id wa st
0 0 0 24449624 664 35808 0 0 2 0 2 1 0 0 100 0 0
0 0 0 24449872 664 36028 0 0 0 0 1017 157 0 0 100 0 0
0 0 0 24449872 664 36028 0 0 0 0 1019 161 0 0 100 0 0
1 1 0 24421508 796 64032 0 0 14038 40 1254 792 5 0 91 4 0
1 1 0 24391328 824 93744 0 0 15080 0 1257 852 6 0 92 2 0
1 1 0 24359532 856 126048 0 0 16022 0 1268 891 5 0 90 4 0
1 1 0 24329476 876 155972 0 0 14946 4 1254 851 6 0 91 4 0
1 1 0 24299776 900 185672 0 0 14880 0 1251 840 6 0 90 4 0
1 1 0 24270904 936 214556 0 0 14432 12 1245 825 6 0 91 3 0
1 1 0 24240948 968 244624 0 0 15018 0 1257 854 6 0 91 3 0
1 1 0 24211732 996 273732 0 0 14568 0 1247 833 6 0 91 3 0
2 0 0 24182448 1020 302824 0 0 14590 0 1249 841 6 0 91 4 0
1 1 0 24153228 1052 331860 0 0 14598 0 1247 832 6 0 91 4 0
2 0 0 24122556 1076 362432 0 0 15298 0 1267 870 6 0 92 2 0
1 1 0 24095168 1088 389920 0 0 13654 100 1254 774 5 0 91 4 0
1 1 0 24066904 1100 417856 0 0 14070 0 1239 800 6 0 92 2 0
0 2 0 24038824 1116 445800 0 0 13916 2 1235 809 6 0 90 4 0
1 1 0 24008820 1128 475476 0 0 14938 0 1251 837 6 0 92 2 0
1 1 0 23979948 1136 504272 0 0 14434 0 1244 819 6 0 91 3 0
2 1 0 23950004 1160 534244 0 0 14934 0 1254 846 6 0 91 4 0
0 2 0 23920188 1172 563884 0 0 14872 0 1248 839 5 0 92 2 0
1 1 0 23891020 1180 592824 0 0 14484 0 1246 824 6 0 91 3 0
1 1 0 23859216 1188 624540 0 0 15838 0 1267 887 6 0 91 4 0
0 1 0 23829436 1316 654112 0 0 14706 10 1241 783 5 0 92 4 0
0 1 0 23795088 1780 687504 0 0 17088 4 1290 686 0 0 94 6 0
0 0 0 23791312 1780 692876 0 0 2556 0 1043 204 0 0 99 1 0
0 0 0 23791324 1788 692876 0 0 0 26 1019 164 0 0 100 0 0
0 0 0 23791436 1788 692876 0 0 0 0 1014 152 0 0 100 0 0

BLASTN with data from XServe via NFS, BLAST completes in 32 seconds
[root at cam-clu-01 testblast]# vmstat 2
procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------
r b swpd free buff cache si so bi bo in cs us sy id wa st
0 0 0 24450716 620 36028 0 0 2 0 2 1 0 0 100 0 0
0 0 0 24450728 620 36324 0 0 0 0 1015 154 0 0 100 0 0
2 1 0 24419696 708 65760 0 0 1940 0 4349 4198 5 1 91 3 0
2 0 0 24372632 716 112204 0 0 0 42 6924 7277 9 1 87 3 0
3 0 0 24325280 716 159144 0 0 0 0 6935 7335 9 1 87 3 0
1 1 0 24279788 716 204156 0 0 0 4 6740 7062 9 1 87 3 0
3 0 0 24234508 716 249572 0 0 0 0 6766 7109 9 1 87 3 0
2 0 0 24188752 716 294300 0 0 0 0 6722 7090 9 1 87 3 0
2 0 0 24141920 732 339748 0 0 70 6 6850 7157 9 1 87 3 0
2 0 0 24094916 732 386520 0 0 0 0 6952 7291 9 1 87 3 0
1 1 0 24049868 732 431720 0 0 0 0 6753 7019 9 1 87 3 0
2 0 0 24003308 740 478280 0 0 0 6 7003 7356 9 1 87 3 0
1 1 0 23956744 740 524452 0 0 0 0 6937 7228 9 1 87 3 0
2 0 0 23921464 740 559912 0 0 0 0 5552 5580 7 1 87 5 0
0 2 0 23875432 740 605580 0 0 0 0 6850 7156 9 1 87 3 0
1 0 0 23832364 740 648584 0 0 34 40 6531 6806 8 1 88 3 0
0 1 0 23811748 740 669672 0 0 0 0 3548 2958 0 0 93 6 0
0 1 0 23788460 748 690408 0 0 82 12 3732 3263 0 0 93 6 0
0 0 0 23791388 748 691296 0 0 64 0 1025 162 0 0 100 0 0

Re-running with data already in RAM from previous run, BLAST completes in 11 seconds
[root at cam-clu-01 ~]# vmstat 2
procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------
r b swpd free buff cache si so bi bo in cs us sy id wa st
0 0 0 23701684 1688 781800 0 0 2 0 2 1 0 0 100 0 0
0 0 0 23701968 1688 781836 0 0 0 0 1018 157 0 0 100 0 0
2 0 0 23700952 1688 781836 0 0 0 0 1016 166 11 0 89 0 0
2 0 0 23701220 1696 781836 0 0 0 44 1017 153 12 0 87 0 0
2 0 0 23701204 1696 781836 0 0 0 0 1015 146 12 0 87 0 0
2 0 0 23701228 1696 781836 0 0 0 0 1018 149 12 0 87 0 0
2 0 0 23701104 1696 781836 0 0 0 0 1015 147 12 0 88 0 0
0 0 0 23701880 1696 781836 0 0 0 0 1017 159 6 0 94 0 0
0 0 0 23702004 1696 781836 0 0 0 22 1019 153 0 0 100 0 0

Justin


-----Original Message-----
From: bioclusters-bounces at bioinformatics.org [mailto:bioclusters-bounces at bioinformatics.org] On Behalf Of Georgios Magklaras
Sent: 20 February 2010 20:13
To: HPC in Bioinformatics
Subject: Re: [Bioclusters] BLAST speed mystery

In general, I tend to use iozone (http://www.iozone.org/) to measure
IOPS before I put cluster nodes into production. I assume that your
BLAST versions between the G4 and the new server (Linux?) environment
are the same.

Doing a vm_stat (on MACOSX) and vmstat (Linux) during the BLAST op (both
precached and with est_mouse cached) can give you rough figures of disk
throughput and buffer cache (yes, having more stripes is useful, but
something else might be happening)

However, it would be useful to give us software (OS/kernel version) and
hardware (RAID controller) versions on your new servers.

GM

Continue reading on narkive:
Loading...