More Fun Tracing Parallel Haskell Programs
I recently wrote about profiling sparse-matrix vector multiplication implemented with Data Parallel Haskell. Let's take this a step further and look at a program parallelised with package parallel, instead of Data Parallel Haskell. Specifically, we look at the dense matrix-matrix multiplication benchmark of the parallel section of the nofib benchmark suite. This benchmark uses a linewise version of the torus-based Gentleman algorithm implemented with vanilla Haskell lists. As we see in the following profile, the algorithm parallelises very well on two cores. It also has a low allocation rate; hence, there is little garbage collection.








