Uires a preprocessing stage to prepare the input files, so this additional time was incorporated

Uires a preprocessing stage to prepare the input files, so this additional time was incorporated within the measurements. Alternatively, pBWA requires to execute every single phase with the BWA-backtrack algorithm independently despite they may be executed in parallel. In this way, pBWA instances were calculated as the sum of each phase time. No preprocessing is performed by pBWA. As BWA-backtrack was specifically created for shorter reads (<100 bp), we have considered D1 as input dataset but, for completeness, D2 is also included in the comparison. Fig 9 shows the alignment times using different number of mappers. In this case, each map process uses one core, so both terms, mappers and cores, are equivalent. Results show that SparkBWA clearly outperforms SEAL and pBWA for all the cases. As we have mentioned previously, SEAL times include the overhead caused by the preprocessing phase which takes on average about 1.9 and 2.9 minutes for D1 and D2 respectively. This overhead has a large impact on performance, especially for the smallest dataset. The corresponding speedups obtained by the aligners for BWA-backtrack are displayed in Fig 10. As reference we have used the BWA sequential time. Results confirm the good behavior of SparkBWA with respect to SEAL and pBWA. For instance, SparkBWA reaches speedups up to 57?and 77?for D1 and D2 respectively. The maximum speedups achieved by SEAL are only about 31?and 42? while the corresponding values for pBWA are 46?and 59? In this way, SparkBWA is on average 1.9?and 1.4?faster than SEAL and pBWA respectively. Finally, the BWA-MEM algorithm is evaluated considering the following tools: BWA, BigBWA, Halvade, and SparkBWA. Fig 11 shows the corresponding execution times for all the datasets varying the number of mappers (cores). BWA uses Pthreads in order to parallelize thePLOS ONE | DOI:10.1371/journal.pone.0155461 May 16,16 /SparkBWA: Speeding Up the Alignment of High-Throughput DNA Sequencing DataFig 10. Speedup considering several BWA-based aligners running the BWA-backtrack algorithm (axes are in log scale). doi:10.1371/journal.pone.0155461.galignment process, so it can only be executed on a single cluster node (64 cores). Both JNJ-42153605 web BigBWA and Halvade are based on Hadoop, and they require a preprocessing stage to prepare the input data for the alignment process. BigBWA requires, on average, 2.4, 5.8 and 23.6 minutes to preprocess each dataset, whereas Halvade spends 1.8, 6.6 and 22.7 minutes, respectively. Preprocessing is carried out sequentially for BigBWA, while Halvade is able to perform it in parallel. This overhead does not depend on the number of mappers used in the computations. For comparison fairness, the overhead of this phase is included in the corresponding execution times of both tools, since times for BWA and SparkBWA encompass the whole alignment process. Performance results show that BWA is competitive with respect to Hadoop-based tools (BigBWA and Halvade) when 32 mappers are used, but its scalability is very poor. Using more threads in the computations do not compensate the overhead caused by their synchronization unless the dataset was big enough. BigBWA and Halvade show a better overall performance with respect to BWA. Both tools behave in a similar way, and differences in their performance are small. Finally, SparkBWA outperforms all the considered tools. In order to illustrate the benefits of our proposal it is worth noting that, for example, PubMed ID:http://www.ncbi.nlm.nih.gov/pubmed/21178946 SparkBWA is on average 1.5?faster than BigBWA and Halvade when.

Leave a Reply