More

flimflamm · 2024-12-30T12:58:30 1735563510

Paper also here https://arxiv.org/pdf/2409.13731

iamnotempacc · 2024-12-30T15:01:18 1735570878

'tis different. It's 112 vs 33 pages. And the content is not the same.

flimflamm · 2024-12-14T09:58:41 1734170321

To create a patch, a small model is used to predict the likelihood for the next character in the input string. Input string: 'Lazy dog jumped over a fence.' Use the model to predict the likelihood of each character.

For example:

    100% sure the next character is 'a'.
    Or maybe it's 10% sure it's 'a', 10% sure it's 'b', and so on.

Then we chunk character estimates together. How many characters? Enough characters so that the total uncertainty (entropy) in each chunk is about the same. And there you have your 'patch' (or 'token').

yorwba · 2024-12-14T10:16:15 1734171375

> How many characters? Enough characters so that the total uncertainty (entropy) in each chunk is about the same.

That's not how it's described in Section 2.3 of the paper. They only use the entropy of the next byte and whether it exceeds a threshold (Global Constraint) or is larger than the preceding byte's entropy by another threshold (Approx. Monotonic Constraint).

That does mean that long repetitive sequences can result in pathologically long patches, as demonstrated in Appendix E.

But what I'm really curious about is the "small CNN byte-level model with 2-byte context" in Figure 3 (f), because it's never mentioned in any other part of the paper.

entilzha · 2024-12-14T15:12:52 1734189172

(Author Here)

Good description! Maybe what parent got mixed up on is an alternate way to view this is trying to chunk bytes to have roughly similar information. EG we initially tried a bunch of patching schemes, EG, keep a running total of entropy until the total exceeds a threshold, but ended up finding simple things worked better.

I’ll see if we can add more information about the small CNN in a next update to arXiv paper.

cschmidt · 2024-12-14T19:46:04 1734205564

I'm curious if you're aware of some papers from around 2005 on using contextual entropy to do unsupervised word segmentation on Chinese, and other languages that don't use spaces for word boundaries.

https://aclanthology.org/Y03-1017/ https://aclanthology.org/I05-1009/ https://aclanthology.org/P06-2056/

Exactly the same approach of segmenting a word when the entropy goes up compared to the previous byte.

ted_dunning · 2024-12-14T21:50:40 1734213040

It is also quite similar to Carl de Marcken's work for segmenting text and speech. He phrased everything in terms of minimum description length (MDL), but that is trivially the same thing as local entropy.

https://dspace.mit.edu/handle/1721.1/7191?show=full

entilzha · 2024-12-14T20:05:02 1734206702

At least I wasn't aware of this work, but thanks for the refs! I'm always curious to read papers from 10-20+ years ago that have similarly inspired ideas. If it makes sense, we'll mention those in the next related work update.

psb217 · 2024-12-14T19:26:01 1734204361

One way of thinking about the "Approximate Monotonic Constraint" is that you're running a quick and dirty edge detector on the entropy. Ie, you're clipping based on the gradient of per-byte entropy wrt timestep compared to detecting an edge based on gradient of per-pixel intensity wrt pixel coordinates. It would be interesting to look at the raw sequences of per-byte entropies to see how strongly these sorts of "edges" correlate with human interpretable boundaries (words, prefixes, suffixes, etc).

yorwba · 2024-12-14T20:58:29 1734209909

Figure 4 plots the entropy of each byte in "Daenerys Targeryen is in Game of Thrones, a fantasy epic by George R.R. Martin."

flimflamm · 2024-12-14T11:18:40 1734175120

"That's not how it's described" - Thanks for the correction!

dv_dt · 2024-12-14T16:57:19 1734195439

So a variant might be to try using a some standard compression algorithm to train with?

flimflamm · 2024-11-14T20:34:52 1731616492

Why everything you see isn't just hallucinated?

marcofigueroa · 2024-11-14T21:47:18 1731620838

Good question, we at 0Din have trace down hallucinations. We had to because we pay for LLM bugs. Here is a simple tell, if you prompt something specific look at a response window now use the same prompt with a completely different user account if the answer is 50% the same it is not a hallucination!

flimflamm · 2024-11-17T11:14:12 1731842052

How is different user account usage and partial similar answer preventing an answer being hallucinated? Are you saying that ChatGPT could not (or LLMs in genera) hallucinate consistently?

Kuinox · 2024-11-14T21:01:58 1731618118

ChatGPT doesn't imagine running code, it actually runs the code, you can see the code it write and some of the output of what it runs, you can see it in the interface screenshoted in the article.

flimflamm · 2024-11-17T11:12:31 1731841951

Yeah I know it runs. But it also hallucinated a Shell: https://www.reddit.com/r/ChatGPT/comments/ziukmm/i_asked_it_...

Kuinox · 2024-11-18T21:26:48 1731965208

Yes, it's well known. But chatgpt cannot fake the output of the python tool. The UI is different and is purely the program output, ChatGPT doesn't have a say in that.

flimflamm · 2024-09-21T17:52:24 1726941144

That's just meat CoT (chain of thought) - right?

lukan · 2024-09-21T18:45:39 1726944339

I do not understand?

valval · 2024-09-21T20:04:19 1726949059

GP is making a joke about speaking to oneself really just being the human version of Chain of Thought, which in my understanding is an architecural decision in LLMs to have it write out intermediate steps in problem solving and evaluate the validity of them as it goes.

flimflamm · 2024-09-17T07:41:53 1726558913

Thanks for the writeup! It was great as you explained the tools and rabbit holes you went it to.

flimflamm · 2024-07-21T07:32:37 1721547157

How would the authors consider a paralyzed individual who can only move their eyes since birth? That person can learn the same concepts as other humans and communicate as richly (using only their eyes) as other humans. Clearly, the paper is viewing the problem very narrowly.

fairthomas · 2024-07-21T08:20:58 1721550058

> ...a paralyzed individual who can only move their eyes since birth...

I don't think such an individual is possible.

throwthrowuknow · 2024-07-21T10:23:07 1721557387

I didn’t want to Google it for you because it always makes me sad but things like spina bifida and moebius syndrome exist. Not everyone gets to begin life healthy.

flimflamm · 2024-06-07T07:50:24 1717746624

Why are you shouting?

drsopp · 2024-06-07T07:55:53 1717746953

It is literally the title of the license

andybak · 2024-06-07T08:16:49 1717748209

I think we can relax the verbatim rule when it comes to all caps titles surely?

c22 · 2024-06-07T08:31:13 1717749073

rjmunro · 2024-06-07T10:25:22 1717755922

Because it makes the title stand out unfairly on the home page of HN.

aerzen · 2024-06-07T11:39:19 1717760359

Why are licenses shouting?

flimflamm · on July 18, 2023

Seems not be able to use other languages than English. "I apologize, but I cannot fulfill your request as I'm just an AI and do not have the ability to write in Finnish or any other language. "

xyos · on July 18, 2023

it replies in Spanish.

lacksconfidence · on July 19, 2023

it also replies in pig latin and klingon. Sadly the results are completely wrong, but it tries.

flimflamm · on June 17, 2023

Organ donation will

rapjr9 · on June 25, 2023

Maybe also instructions and money for an autopsy. Your relatives may cringe, but it could help them later in their lives to know what shape you were in and what you died of.

flimflamm · on May 13, 2023

Or then it hallucinates that. You would not know...