Hacker News new | past | comments | ask | show | jobs | submit login

It doesn't remove sampling, and forcing grammar by specifying allowed/prohibited bytes doesn't require running the decoder over and over, you just compute the softmax at the output layer over allowed bytes only and sample from those accordingly, same as with BPE-based models.



Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: