It doesn't remove sampling, and forcing grammar by specifying allowed/prohibited...

yorwba 47 days ago | parent | context | favorite | on: Byte Latent Transformer: Patches Scale Better Than...

It doesn't remove sampling, and forcing grammar by specifying allowed/prohibited bytes doesn't require running the decoder over and over, you just compute the softmax at the output layer over allowed bytes only and sample from those accordingly, same as with BPE-based models.