These graphs summarise the performance of Data Parallel Haskell for three simple benchmarks on two different architectures, both of which have 8 cores. At the top, we have 8 cores with one hardware thread each, in the form of two Quad-Core 3GHz Xeon processors. At the bottom, we have 8 cores with 8 hardware threads each (for a total of 64 hardware threads), in the form of a single 1.4GHz UltraSPARC T2.
The benchmarks are the following: (1) sumsq computes the parallel sum of the squares from 1 to 10 million; (2) dotp computes the dot product of two dense vectors of 100 million double-precision floating-point numbers; and (3) smvm multiplies a sparse matrix with 10 million non-zero double-precision floating-point elements with a dense vector.
The graphs show the speedup of Data Parallel Haskell with respect to a sequential C implementation of each benchmark – whenever, a curve climbs above 1 on the y-axis, the parallel Haskell program beats the sequential C program in absolute runtime.
The scalability of these three programs in Data Parallel Haskell is very good, although dotp is limited by memory bandwidth on the Quad-Core Xeon processors, as discussed in my previous post. The grey curve is a parallel C implementation of dotp that is bandwidth limited in the same manner.

These graphs summarise the performance of Data Parallel Haskell for three simple benchmarks on two different architectures, both of which have 8 cores. At the top, we have 8 cores with one hardware thread each, in the form of two Quad-Core 3GHz Xeon processors. At the bottom, we have 8 cores with 8 hardware threads each (for a total of 64 hardware threads), in the form of a single 1.4GHz UltraSPARC T2.

The benchmarks are the following: (1) sumsq computes the parallel sum of the squares from 1 to 10 million; (2) dotp computes the dot product of two dense vectors of 100 million double-precision floating-point numbers; and (3) smvm multiplies a sparse matrix with 10 million non-zero double-precision floating-point elements with a dense vector.

The graphs show the speedup of Data Parallel Haskell with respect to a sequential C implementation of each benchmark – whenever, a curve climbs above 1 on the y-axis, the parallel Haskell program beats the sequential C program in absolute runtime.

The scalability of these three programs in Data Parallel Haskell is very good, although dotp is limited by memory bandwidth on the Quad-Core Xeon processors, as discussed in my previous post. The grey curve is a parallel C implementation of dotp that is bandwidth limited in the same manner.