Video and slides of "Data Parallelism in Haskell" @ Brisbane FP Group

The Brisbane FP Group (BFPG) kindly invited me to give a talk about our work on data parallel programming in Haskell. The talk motivates the use of functional programming for parallel, and in particular, data parallel programming and explains the difference between regular and irregular (or nested) data parallelism. It also discusses the Repa library (regular data parallelism for multicore CPUs), the embedded language Accelerate (regular data parallelism for GPUs), and Data Parallel Haskell (nested data parallelism for multicore CPUs).  The slides of the presentation are available in two formats: HTML5 slideshow and PDF. Thank you to everybody who attended. Special thanks go to OJ Reeves and Tony Morris for organising everything, to Rob Manthey for producing the video, to Mincom for the venue, and to Functional IO for the sponsorship.

Filed under  //  accelerate   dph   haskell   parallelism   repa  
Posted

Data.Array.Accelerate now on GitHub

Prompted by GHC's move to Git and the unreliability of the community.haskell.org infrastructure, I decided to move Data.Array.Accelerate to GitHub: mchakravarty/accelerate. The GitHub repository is now the main public repository for the project. I will also move the tickets of the old Trac bug tracker over to GitHub Issues.

Filed under  //  edsl   gpgpu   haskell   parallelism  
Posted

Final version of the Accelerate paper (GPGPU in Haskell)

We will present our paper Accelerating Haskell Array Codes with Multicore GPUs at ACM SIGLAN Declarative Aspects of Multicore Programming (DAMP 2011), which is co-located with POPL'11 in Austin, TX, in January. The final version of our paper is now available, and we plan to soon release a significantly improved version of the Accelerate library (matching the API used in the paper), which enables high-level GPGPU programming in Haskell based on NVIDIA's CUDA environment.

Filed under  //  edsl   gpgpu   haskell   parallelism  
Posted

Quickly searching Hackage and Hoogle on Mac OS X

Haskellers on Macs should check out Alfred App.  It's a launcher application that also does web searches and a few other things.  In particular, it enables you to define custom searches that you can quickly access with a hot key from wherever you are.  I am using the two following custom searches to quickly search through Hackage and Hoogle:

The strings "hackage" and "hoogle" are the keywords, followed by the query strings.

Filed under  //  alfredapp   haskell   mac  
Posted

Accelerating Haskell Array Codes with Multicore GPUs

In the paper Accelerating Haskell Array Codes with Multicore GPUs, we describe Accelerate, embedded language for array computations in Haskell, as well as a dynamic code generator targeting NVIDIA's CUDA GPGPU programming environment.  Our CUDA code generator is based on the idea of algorithmic skeletons for parallel programming, minimises re-compilation overheads, and overlaps host-to-device data transfers, code generation, and kernel execution.

The paper outlines our embedding in Haskell, details the design and implementation of the dynamic code generator, and reports on initial benchmark results.  These results indicate that we can compete with moderately optimised native CUDA code, while enabling much simpler source programs.

Filed under  //  edsl   gpgpu   haskell   parallelism  
Posted

Second beta release for the CUDA backend of Accelerate

The second beta release of Data.Array.Accelerate (version 0.8.0.0) includes support for all array operations in the CUDA backend — except for a new stencil operation that is new in this release and currently only supported by the interpreter.  The CUDA backend requires CUDA 3.x and has been tested on consumer-grade GeForce cards, TESLA high-performance GPUs, and the new Fermi cards.

If you are interested in general-purpose GPU programming in Haskell, give this release a spin — it is available on Hackage.  While it is far from perfect, it should already prove useful for a considerable range of applications.  Together with the other Accelerate developers, I would be very interested to hear what works for you and what doesn't.  We are also interested in suggestions for the future development of the library.  Accelerate has a mailing list and a bug tracker on which we also accept feature requests.

Filed under  //  edsl   gpgpu   haskell   parallelism  
Posted

First release of the CUDA backend for Accelerate

During the first Australian Haskell Hackathon, AusHac2010, we finally managed to complete the integration of the CUDA backend, written by Sean Lee and Trevor McDonell, into the array EDSL Accelerate.  It still doesn't cover the complete functionality of the current version of the Accelerate EDSL, but already allows for interesting GPU computations.  The package source (version 0.7.1.0) can be obtained from Hackage as usual.  There is now also a bug tracker.

Filed under  //  edsl   gpgpu   haskell   parallelism  
Posted

Haskell 2010 is complete!

The final version of the Haskell 2010 edition has been released: http://www.haskell.org/pipermail/haskell/2010-July/022189.html This completes my stewardship of the Haskell foreign function interface specification. It is now part of the language standard proper. The next language feature I personally like to see being integrated are type families, but that still requires much work.

Filed under  //  ffi   haskell   haskell2010  
Posted

New paper: An LLVM Backend For GHC

In An LLVM Backend For GHC, we describe the design and implementation of an LLVM backend for the Glasgow Haskell Compiler (GHC).  We outline the conceptual obstacles that we had to overcome and give a detailed performance analysis of the new backend compared to the existing C backend and native code generator.

Filed under  //  codegen   compilers   ghc   haskell   llvm  
Posted

Regular, shape-polymorphic, parallel arrays in Haskell

The paper Regular, shape-polymorphic, parallel arrays in Haskell describes a new Haskell library, named Repa, for multi-dimensional, regular arrays based on our work on Data Parallel Haskell and a delayed array representation avoiding unnecessary data structures. As the following table shows, the sequential performance already gets close to the performance of C programs. More importantly, the Haskell code transparently makes use of multicore hardware — something that is much more difficult in C.

Pastedgraphic-2

The Haskell code is purely functional, using collective array operations.  Here is matrix-matrix multiplication:

mmMult :: (Num e, Elt e)
       => Array DIM2 e -> Array DIM2 e -> Array DIM2 e
mmMult arr brr = sum (zipWith (*) arrRepl brrRepl)
 where
  trr     = force (transpose2D brr)
  arrRepl = replicate (Z :.All :.colsB :.All)   arr 
  brrRepl = replicate (Z :.All :.All   :.rowsA) trr               
  (Z:. colsA:. rowsA) = extent arr
  (Z:. colsB:. rowsB) = extent brr

In addition to good performance, the library facilitates reuse through the use of shape polymorphism.

Filed under  //  haskell   multicore   parallelism  
Posted