Optimising Purely Functional GPU Programs

We just completed a draft of a new paper, Optimising Purely Functional GPU Programs, that explains two crucial optimisations of Accelerate, our embedded array language for GPU programming in Haskell. These two optimisations are a novel typed form of sharing recovery for embedded languages and a new array fusion method for massively parallel SIMD programs. The paper includes details on eight benchmark programs that support the effectiveness of our optimisations and pit Accelerate against competing frameworks, including CUDA C code.