You're assuming that the work can be split into "smallish chunks", instead of a ...

You're assuming that the work can be split into "smallish chunks", instead of a single large CPU-bound computation. Sure, "injecting a stop package in its stream of work" is the best design, but only if you do have a stream of work in the first place. Otherwise, it's either signals, or constantly checking a shared "stop" variable in the middle of a very hot CPU-bound loop.