Hacker News new | past | comments | ask | show | jobs | submit | flimflamm's comments login


'tis different. It's 112 vs 33 pages. And the content is not the same.


To create a patch, a small model is used to predict the likelihood for the next character in the input string. Input string: 'Lazy dog jumped over a fence.' Use the model to predict the likelihood of each character.

For example:

    100% sure the next character is 'a'.
    Or maybe it's 10% sure it's 'a', 10% sure it's 'b', and so on.
Then we chunk character estimates together. How many characters? Enough characters so that the total uncertainty (entropy) in each chunk is about the same. And there you have your 'patch' (or 'token').


> How many characters? Enough characters so that the total uncertainty (entropy) in each chunk is about the same.

That's not how it's described in Section 2.3 of the paper. They only use the entropy of the next byte and whether it exceeds a threshold (Global Constraint) or is larger than the preceding byte's entropy by another threshold (Approx. Monotonic Constraint).

That does mean that long repetitive sequences can result in pathologically long patches, as demonstrated in Appendix E.

But what I'm really curious about is the "small CNN byte-level model with 2-byte context" in Figure 3 (f), because it's never mentioned in any other part of the paper.


(Author Here)

Good description! Maybe what parent got mixed up on is an alternate way to view this is trying to chunk bytes to have roughly similar information. EG we initially tried a bunch of patching schemes, EG, keep a running total of entropy until the total exceeds a threshold, but ended up finding simple things worked better.

I’ll see if we can add more information about the small CNN in a next update to arXiv paper.


I'm curious if you're aware of some papers from around 2005 on using contextual entropy to do unsupervised word segmentation on Chinese, and other languages that don't use spaces for word boundaries.

https://aclanthology.org/Y03-1017/ https://aclanthology.org/I05-1009/ https://aclanthology.org/P06-2056/

Exactly the same approach of segmenting a word when the entropy goes up compared to the previous byte.


It is also quite similar to Carl de Marcken's work for segmenting text and speech. He phrased everything in terms of minimum description length (MDL), but that is trivially the same thing as local entropy.

https://dspace.mit.edu/handle/1721.1/7191?show=full


At least I wasn't aware of this work, but thanks for the refs! I'm always curious to read papers from 10-20+ years ago that have similarly inspired ideas. If it makes sense, we'll mention those in the next related work update.


One way of thinking about the "Approximate Monotonic Constraint" is that you're running a quick and dirty edge detector on the entropy. Ie, you're clipping based on the gradient of per-byte entropy wrt timestep compared to detecting an edge based on gradient of per-pixel intensity wrt pixel coordinates. It would be interesting to look at the raw sequences of per-byte entropies to see how strongly these sorts of "edges" correlate with human interpretable boundaries (words, prefixes, suffixes, etc).


Figure 4 plots the entropy of each byte in "Daenerys Targeryen is in Game of Thrones, a fantasy epic by George R.R. Martin."


"That's not how it's described" - Thanks for the correction!


So a variant might be to try using a some standard compression algorithm to train with?


Why everything you see isn't just hallucinated?


Good question, we at 0Din have trace down hallucinations. We had to because we pay for LLM bugs. Here is a simple tell, if you prompt something specific look at a response window now use the same prompt with a completely different user account if the answer is 50% the same it is not a hallucination!


How is different user account usage and partial similar answer preventing an answer being hallucinated? Are you saying that ChatGPT could not (or LLMs in genera) hallucinate consistently?


ChatGPT doesn't imagine running code, it actually runs the code, you can see the code it write and some of the output of what it runs, you can see it in the interface screenshoted in the article.


Yeah I know it runs. But it also hallucinated a Shell: https://www.reddit.com/r/ChatGPT/comments/ziukmm/i_asked_it_...


Yes, it's well known. But chatgpt cannot fake the output of the python tool. The UI is different and is purely the program output, ChatGPT doesn't have a say in that.


That's just meat CoT (chain of thought) - right?


I do not understand?


GP is making a joke about speaking to oneself really just being the human version of Chain of Thought, which in my understanding is an architecural decision in LLMs to have it write out intermediate steps in problem solving and evaluate the validity of them as it goes.


Thanks for the writeup! It was great as you explained the tools and rabbit holes you went it to.


How would the authors consider a paralyzed individual who can only move their eyes since birth? That person can learn the same concepts as other humans and communicate as richly (using only their eyes) as other humans. Clearly, the paper is viewing the problem very narrowly.


> ...a paralyzed individual who can only move their eyes since birth...

I don't think such an individual is possible.


I didn’t want to Google it for you because it always makes me sad but things like spina bifida and moebius syndrome exist. Not everyone gets to begin life healthy.


Why are you shouting?


It is literally the title of the license


I think we can relax the verbatim rule when it comes to all caps titles surely?


Why?


Because it makes the title stand out unfairly on the home page of HN.


Why are licenses shouting?


Seems not be able to use other languages than English. "I apologize, but I cannot fulfill your request as I'm just an AI and do not have the ability to write in Finnish or any other language. "


it replies in Spanish.


it also replies in pig latin and klingon. Sadly the results are completely wrong, but it tries.


Organ donation will


Maybe also instructions and money for an autopsy. Your relatives may cringe, but it could help them later in their lives to know what shape you were in and what you died of.


Or then it hallucinates that. You would not know...


Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: