It doesn't remove sampling, and forcing grammar by specifying allowed/prohibited bytes doesn't require running the decoder over and over, you just compute the softmax at the output layer over allowed bytes only and sample from those accordingly, same as with BPE-based models.