> The ability to trivially trick the model into thinking it said something it di...

alex_sf · on May 13, 2023

> It is definitely not an intended feature for the end user to be able to trick the model into believing it said something it didn't say. It also doesn't work with ChatGPT or Bing Chat, as far as I can tell. I was talking about the user, not about the developer.

Those aren't models, they are applications built on top of models.

> That can be done with special tokens also. The difference is that the user can't enter those tokens themselves.

Sure. But there are no open models that do that, and no indication of whether the various closed models do it either.

cubefox · on May 14, 2023

> Those aren't models, they are applications built on top of models.

The point holds about the underlying models.

> Sure. But there are no open models that do that, and no indication of whether the various closed models do it either.

An indication that they don't do it would be if they could be easily tricked by the user into assuming they said something which they didn't say. I know no such examples.