First a bug, then an exercise in assuming the common case.
We were parallelizing a single-threaded operation to increase throughput and came upon a race condition that at first manifested as crashing only on Ubuntu. One thread read jobs from the disk and passed them to an existing worker pool for processing. Once all processing was complete, the reading thread exited. When a worker finished processing it marked its job as complete, then released its locks. This could result in releasing a freed lock – because the thread pool was also needed for other purposes it wasn’t joined first. My best guess as to why it only showed up on Ubuntu (and Debian, it turned out) is that it has different defaults for what we assume to be stack protection, though that wasn’t clear from listing them with gcc -Q --help=target.
Though changing the order of lock releases solved the immediate problem there were also issues with deadlock due to dependencies involving the thread pool. We ended up having an additional processing thread that was joined before exiting, which avoided the problem.
While this parallelization does get its speed from doing previously single-threaded work in a thread pool, it has a twist. A final processing step must be done in disk order, but waiting for the processing slowed down reading. By reading speculatively as though processing had succeeded, we achieved a ~2x performance increase. Yay!