Wave Network: An Ultra-Small Language Model

starlite-5008 · 2024-11-22T14:28:38 1732285718

The Wave Network is an innovative ultra-small language model that employs a unique token representation and update method. It significantly reduces video memory usage and training time compared to models like BERT base, achieving reductions of 77.34% and 85.62% during wave modulation.

jerpint · 2024-11-21T13:20:52 1732195252

> In summary, we used a 2.4-million-parameter small language model to achieve accuracy comparable to a 100-million-parameter BERT model in text classification.

Neat, but the question will be how the scaling laws hold up

PaulHoule · 2024-11-21T14:59:39 1732201179

Doesn't have to.

I use models like the 100M parameter BERT model for text classification and they work great. I get a 0.78 AUC with one model; Tik Tok gets about 0.82 for a similar problem and I'm sure they spent at least 500x what I spent on mine. I could 10x my parameters and get an 0.79 AUC but I don't know if I'd feel the difference. (I got about 0.71 AUC with bag of words + logistic regression and perceive a big difference between the output of the SBERT model and that)

My current model can do a complete training cycle which involves training about 20 models and picking the best in about 3 minutes. The process is highly reliable and can run unattended every day, I could run it every hour if I wanted. I worked on another classifier based on fine-tuning a larger model and it took about 30 minutes to train just one model and was not reliable at all.

If you can 50x the speed the BERT model and 1/50 the resources that's a big boon that makes text classification more accessible, the only excuse people have now is that it is too hard to make a training set.

jerpint · 2024-11-21T19:16:17 1732216577

Somewhat agreed for use cases of text classification, but for anything requiring more language understanding it is a desirable property

froonly · 2024-11-21T16:55:22 1732208122

is there a github for this?