I've seen this theorized before, but so far to my knowledge there hasn't been a documented example of a prompt leak attack that actually resulted in a hallucination and not in the real prompt.
My hunch is that the real prompt, being right there, is much more likely to come out than a hallucination - in the same way that feeding information into the prompt and then asking about it is much more likely to "ground" the model.
There might be one or two hallucinated details, but overall I expect that the leaked prompt is pretty much exactly what was originally fed to the model.
> so far to my knowledge there hasn't been a documented example of a prompt leak attack that actually resulted in a hallucination and not in the real prompt.
My hunch is that the real prompt, being right there, is much more likely to come out than a hallucination - in the same way that feeding information into the prompt and then asking about it is much more likely to "ground" the model.
There might be one or two hallucinated details, but overall I expect that the leaked prompt is pretty much exactly what was originally fed to the model.