To create a patch, a small model is used to predict the likelihood for the next character in the input string. Input string: 'Lazy dog jumped over a fence.' Use the model to predict the likelihood of each character.
For example:
100% sure the next character is 'a'.
Or maybe it's 10% sure it's 'a', 10% sure it's 'b', and so on.
Then we chunk character estimates together.
How many characters?
Enough characters so that the total uncertainty (entropy) in each chunk is about the same.
And there you have your 'patch' (or 'token').
> How many characters? Enough characters so that the total uncertainty (entropy) in each chunk is about the same.
That's not how it's described in Section 2.3 of the paper. They only use the entropy of the next byte and whether it exceeds a threshold (Global Constraint) or is larger than the preceding byte's entropy by another threshold (Approx. Monotonic Constraint).
That does mean that long repetitive sequences can result in pathologically long patches, as demonstrated in Appendix E.
But what I'm really curious about is the "small CNN byte-level model with 2-byte context" in Figure 3 (f), because it's never mentioned in any other part of the paper.
Good description! Maybe what parent got mixed up on is an alternate way to view this is trying to chunk bytes to have roughly similar information. EG we initially tried a bunch of patching schemes, EG, keep a running total of entropy until the total exceeds a threshold, but ended up finding simple things worked better.
I’ll see if we can add more information about the small CNN in a next update to arXiv paper.
I'm curious if you're aware of some papers from around 2005 on using contextual entropy to do unsupervised word segmentation on Chinese, and other languages that don't use spaces for word boundaries.
It is also quite similar to Carl de Marcken's work for segmenting text and speech. He phrased everything in terms of minimum description length (MDL), but that is trivially the same thing as local entropy.
At least I wasn't aware of this work, but thanks for the refs! I'm always curious to read papers from 10-20+ years ago that have similarly inspired ideas. If it makes sense, we'll mention those in the next related work update.
One way of thinking about the "Approximate Monotonic Constraint" is that you're running a quick and dirty edge detector on the entropy. Ie, you're clipping based on the gradient of per-byte entropy wrt timestep compared to detecting an edge based on gradient of per-pixel intensity wrt pixel coordinates. It would be interesting to look at the raw sequences of per-byte entropies to see how strongly these sorts of "edges" correlate with human interpretable boundaries (words, prefixes, suffixes, etc).
Good question, we at 0Din have trace down hallucinations. We had to because we pay for LLM bugs. Here is a simple tell, if you prompt something specific look at a response window now use the same prompt with a completely different user account if the answer is 50% the same it is not a hallucination!
How is different user account usage and partial similar answer preventing an answer being hallucinated? Are you saying that ChatGPT could not (or LLMs in genera) hallucinate consistently?
ChatGPT doesn't imagine running code, it actually runs the code, you can see the code it write and some of the output of what it runs, you can see it in the interface screenshoted in the article.
Yes, it's well known. But chatgpt cannot fake the output of the python tool. The UI is different and is purely the program output, ChatGPT doesn't have a say in that.
GP is making a joke about speaking to oneself really just being the human version of Chain of Thought, which in my understanding is an architecural decision in LLMs to have it write out intermediate steps in problem solving and evaluate the validity of them as it goes.
How would the authors consider a paralyzed individual who can only move their eyes since birth? That person can learn the same concepts as other humans and communicate as richly (using only their eyes) as other humans. Clearly, the paper is viewing the problem very narrowly.
I didn’t want to Google it for you because it always makes me sad but things like spina bifida and moebius syndrome exist. Not everyone gets to begin life healthy.
Seems not be able to use other languages than English. "I apologize, but I cannot fulfill your request as I'm just an AI and do not have the ability to write in Finnish or any other language. "
Maybe also instructions and money for an autopsy. Your relatives may cringe, but it could help them later in their lives to know what shape you were in and what you died of.