Regular, shape-polymorphic, parallel arrays in Haskell

The paper Regular, shape-polymorphic, parallel arrays in Haskell describes a new Haskell library, named Repa, for multi-dimensional, regular arrays based on our work on Data Parallel Haskell and a delayed array representation avoiding unnecessary data structures. As the following table shows, the sequential performance already gets close to the performance of C programs. More importantly, the Haskell code transparently makes use of multicore hardware — something that is much more difficult in C.

The Haskell code is purely functional, using collective array operations.  Here is matrix-matrix multiplication:

mmMult :: (Num e, Elt e)
       => Array DIM2 e -> Array DIM2 e -> Array DIM2 e
mmMult arr brr = sum (zipWith (*) arrRepl brrRepl)
 where
  trr     = force (transpose2D brr)
  arrRepl = replicate (Z :.All :.colsB :.All)   arr 
  brrRepl = replicate (Z :.All :.All   :.rowsA) trr               
  (Z:. colsA:. rowsA) = extent arr
  (Z:. colsB:. rowsB) = extent brr

In addition to good performance, the library facilitates reuse through the use of shape polymorphism.

Filed under  //  haskell   multicore   parallelism  
Comments (17)
Posted

Multicore Haskell Now!

Don Stewart summarised the state of play of parallel programming in Haskell at "ACM Reflections | Projections 2009". He covers strategies, Concurrent Haskell, STM, and Data Parallel Haskell: http://donsbot.wordpress.com/2009/10/17/multicore-haskell-now-acm-reflections-projections-2009/

Filed under  //  dph   haskell   multicore   parallelism   stm  
Comments (0)
Posted

Implementing Data Parallel Haskell

Here is a video of Roman Leshchinskiy's talk on our work on implementing Data Parallel Haskell (DPH), which he presented at the Haskell Implementors' Workshop (co-located with ICFP'09):

 

Filed under  //  dph   haskell   multicore   parallelism  
Comments (0)
Posted

These graphs summarise the performance of Data Parallel Haskell for three simple benchmarks on two different architectures, both of which have 8 cores. At the top, we have 8 cores with one hardware thread each, in the form of two Quad-Core 3GHz Xeon processors. At the bottom, we have 8 cores with 8 hardware threads each (for a total of 64 hardware threads), in the form of a single 1.4GHz UltraSPARC T2.

The benchmarks are the following: (1) sumsq computes the parallel sum of the squares from 1 to 10 million; (2) dotp computes the dot product of two dense vectors of 100 million double-precision floating-point numbers; and (3) smvm multiplies a sparse matrix with 10 million non-zero double-precision floating-point elements with a dense vector.

The graphs show the speedup of Data Parallel Haskell with respect to a sequential C implementation of each benchmark – whenever, a curve climbs above 1 on the y-axis, the parallel Haskell program beats the sequential C program in absolute runtime.

The scalability of these three programs in Data Parallel Haskell is very good, although dotp is limited by memory bandwidth on the Quad-Core Xeon processors, as discussed in my previous post. The grey curve is a parallel C implementation of dotp that is bandwidth limited in the same manner.

Filed under  //  dph   haskell   multicore   parallelism  
Comments (0)
Posted

This is the performance of a dot product of two vectors of 10 million doubles each using Data Parallel Haskell. Both machines have 8 cores. Each core of the T2 has 8 hardware thread contexts. The performance is memory bound. The DPH code is

dotp' :: [:Double:] -> [:Double:] -> Double
dotp' v w = D.sumP (zipWithP (*) v w)

Interesting is that despite the much lower single-thread performance, the T2 makes it all up in excellent multi-threading performance. Interesting to Haskell folks is that the corresponding multi-threaded C code using pthreads is much harder to write and barely any faster when using all parallelism (the Sun C-Compiler still manages to reduce the runtime by about 30% with 64 threads, but on the Xeons, there is no significant difference).

NB: This uses the latest development version of GHC and DPH.  Thanks to Ben Lippmeier for his nice work on the SPARC backend of GHC.

Filed under  //  dph   haskell   multicore   parallelism  
Comments (0)
Posted