Tracing Sparse Matrix-Vector Multiplication with DTrace

The following is an example trace of a Data Parallel Haskell program multiplying a sparse matrix (10k x 10k elements with 10% non-zero elements) with a dense vector.  The program runs on both cores of an Intel Core 2 Duo processor and uses GHC's new DTrace support on Mac OS X to gather the trace data, which I visualised in Instruments.  Of the three tracks, the topmost shows garbage collection activity in blue.  The other two show the activity of the two HECs (Haskell Execution Contexts) running the application code on two cores in green.  The program starts by loading the matrix from disk and generating a random vector — this is where only HEC #1 does any work.  Then, during the actual matrix multiplication, both cores are utilised almost evenly.

Prim

There is little garbage collection during the parallel matrix multiplication as GHC successfully unboxes the numeric code resulting in little heap allocation.

Posted