Heads Up: GHC devs on Macs - GHC's testsuite crashes spotlight indexer on SL

GHC's testsuite[1] contains a test encoding001 (in the I/O library section) that generates files in various unicode encodings including one using UTF-8 with byte order markers, namely encoding001.utf16.utf8_bom. On Mac OS X 10.6 (Snow Leopard), this file causes the indexing process of Spotlight to hang. More precisely, the mds (meta dataserver) process appears to go into a loop (eating all cycles of one processor core) — it appears to hang in the library libmecap trying to parse what it probably believes to be Japanese or Chinese text.

Interestingly, the file command regards the file to be "Unicode text, UTF-32, big-endian".

Posted

1 comment

Sep 27, 2009
Actually, 'encoding001.utf16.utf8_bom' is in UTF-16BE. It starts with a funny sequence (namely, 0xfe 0xff 0x0 0x0) which could be a UTF-16BE bom followed by a NULL character or a UTF-32BE bom marker. The 'file' utility claims, it's the latter.

Leave a comment...