These graphs summarise the performance of Data Parallel Haskell for three simple benchmarks on two different architectures, both of which have 8 cores. At the top, we have 8 cores with one hardware thread each, in the form of two Quad-Core 3GHz Xeon processors. At the bottom, we have 8 cores with 8 hardware threads each (for a total of 64 hardware threads), in the form of a single 1.4GHz UltraSPARC T2.

The benchmarks are the following: (1) `sumsq`

computes the parallel sum of the squares from 1 to 10 million; (2) `dotp`

computes the dot product of two dense vectors of 100 million double-precision floating-point numbers; and (3) `smvm`

multiplies a sparse matrix with 10 million non-zero double-precision floating-point elements with a dense vector.

The graphs show the speedup of Data Parallel Haskell with respect to a *sequential C* implementation of each benchmark – whenever, a curve climbs above 1 on the y-axis, the parallel Haskell program beats the sequential C program in absolute runtime.

The scalability of these three programs in Data Parallel Haskell is very good, although `dotp`

is limited by memory bandwidth on the Quad-Core Xeon processors, as discussed in my previous post. The grey curve is a parallel C implementation of `dotp`

that is bandwidth limited in the same manner.