More

chank · 2024-09-04T13:26:29 1725456389

It's actually a perfect analogy, IMO. Tattoo, is a form of (typically chosen) self-branding. A lot of companies make great products and then deminish them by thier mis/use of branding. Usually in a tactless way. This is a prime example of that.

LargeWu · 2024-09-04T14:12:06 1725459126

A lot of people buy products because of the branding. How many people would buy a YETI cooler if or a Coach bag if it didn't have the branding to show off that they have a YETI cooler or a Coach bag? It's conspicuous consumption.

chank · 2024-09-04T19:53:32 1725479612

I think those two in particular might be poor examples, since they generally do quite a bit of de-branding on their products. I know what you mean however. Supreme is a prime brand for this. Nobody is buying a Supreme t-shirt for any other reason than the logo. Whatever thoughts on people who buy things to impress others may be, those kind of people exist and to each thier own. They at least made the choice to let everyone else know what they value.

chank · 2024-07-29T20:57:16 1722286636

> Enforcing traffic laws is good, actually. Automated enforcement is even better so that we don't need to use armed police and can enforce consistently.

We don't use armed police to enforce traffic laws. Police mainly monitor traffic as a revenue device. It's already been proven that monitoring traffic and automating fines in fact promotes reckless driving and causes more accidents than it stops.

spankalee · 2024-07-29T21:06:00 1722287160

> We don't use armed police to enforce traffic laws.

In what world? In the US "manual" traffic enforcement is almost exclusively done by armed police and sheriffs. Unarmed civilian traffic enforcement is only done in Berkeley, CA, and a town in Minnesota, afaik.

> It's already been proven that monitoring traffic and automating fines in fact promotes reckless driving and causes more accidents than it stops.

Do you have a citation for that? There are numerous studies that show significant drops in accident rates in areas with red light cameras. On the order of 10-23%!

- https://tti.tamu.edu/researcher/tti-study-underscores-safety... - https://information.auditor.ca.gov/pdfs/reports/2001-125.pdf

jimmaswell · 2024-07-29T21:03:43 1722287023

They are all armed in the US at least.

chank · 2024-07-30T01:08:41 1722301721

Do they really stop you at gun-point to get you to pull over? Didn't think so. Of course police are armed. You're taking the comment to literally.

jimmaswell · 2024-08-01T01:48:51 1722476931

I took it exactly as anyone who understands English would take it. Try communicating better next time.

chank · 2024-08-01T03:07:55 1722481675

Nah. Stop taking everything you read on the internet so literally.

chank · 2024-07-29T20:55:08 1722286508

It's more something Fascists tend to do. Strict Monitoring/Policing of the populace to elicit desired behavior.

chank · 2024-07-29T20:53:28 1722286408

It is, but it's also a slippery slope to constantly monitor behavior in order to coerce a desired response.

coffeecloud · 2024-07-29T21:12:37 1722287557

What’s the difference between “constantly monitor behavior in order to coerce a desired response” and “enforcing the law”?

Are traffic cameras a slippery slope to cameras in your house to make sure your aren’t doing drugs or building an unpermitted additon?

chank · 2024-07-30T01:14:18 1722302058

They definitly can be. That's the slope. Where do you stop. Any totalitarian worth thier salt can easily make that leap. Use monitoring to curb one type of crime and "undesirable" behavior, why not use it for other types and before you know it, your entire existince is monitored in detail just to make sure you're acting exactly the way "they" want you to. That's how it works. The "I have nothing to hide" is a long debunked argument.

coffeecloud · 2024-07-30T02:57:45 1722308265

Then why enforce any laws? Any enforcement is on the same slippery slope.

I don’t think slope is nearly as slippery as you claim. There are miles of high friction slope between enforcing traffic laws on public roads and totalitarianism.

chank · 2024-08-01T03:19:31 1722482371

Edgar Friendly said it best: https://www.youtube.com/watch?v=mjoSQ-lCA58

chank · 2024-07-28T02:27:09 1722133629

Technically, it's both. Parents don't teach their kids because they didn't/don't really know. Financial literacy isn't in the curiculum at all pre-college level. Even then it's not a "core" competency in any degree that isn't finance related. Even just ~10-15 years ago I wouldn't say the topic itself had as wide of discussion as it does today.

chank · 2024-03-18T01:28:00 1710725280

COVID really pumped up the market. Lots of hiring at insane TCO especially for unproven talent. Correction had to happen. As someone else said, if you have the experience and skills and fall into the comp. range companies are willing to pay you'll be fine. If you're missing any that it's going to be rough. My company is hiring like crazy, but only for very specific dev roles.

chank · 2024-02-25T22:41:05 1708900865

Answer is still no and still for the above reason. Compute resources are only relevant to how fast it can answer not the quality.

pixl97 · 2024-02-25T23:20:23 1708903223

Then why does chain of thought work better than asking for short answers?

p1esk · 2024-02-25T23:27:25 1708903645

Because it’s a better prompt. Works better for people too.

og_kalu · 2024-02-25T23:43:49 1708904629

That's not the only reason.

More tokens = more useful compute towards making a prediction. A query with more tokens before the question is literally giving the LLM more "thinking time"

razodactyl · 2024-02-26T13:17:01 1708953421

It correlates but the intuition is a bit misleading. What's actually happening is that by asking a model to generate more tokens, it increases the amount of information it has learnt to be present in its context block.

It's why "RAG" techniques work, the models learn during training to make use of information in context.

At the core of self-attention is dot product measurement which causes the model to act like a search engine.

It's helpful to think about it in terms of search: the shape of the outputs look like conversation but were actually prompting the model to surface information from the QKV matrices internally.

Does it feel familiar? When we brainstorm we usually chart graphs of related concepts e.g. blueberry -> pie -> apple.

og_kalu · 2024-02-26T18:41:55 1708972915

>What's actually happening is that by asking a model to generate more tokens, it increases the amount of information it has learnt to be present in its context block.

I'm not saying this isn't part of it but even if it's just dummy tokens without any new information, it works.

https://arxiv.org/abs/2310.02226

p1esk · 2024-02-26T00:18:57 1708906737

It’s not clear that more tokens are better.

og_kalu · 2024-02-26T01:02:15 1708909335

I think it's pretty clear

https://arxiv.org/abs/2310.02226

I mean, i can imagine you wouldn't always need the extra compute.

p1esk · 2024-02-26T02:22:06 1708914126

This paper is a great illustration of how little is understood about this question. They discovered that appending dummy tokens (ignored during both training and inference) improves performance somehow. Don’t confuse their guess as to why this might be happening with actual understanding. But in any case, this phenomenon has little to do with increasing the size of the prompt using meaningful tokens. We still have no clue if it helps or not.

og_kalu · 2024-02-26T03:36:51 1708918611

I just found this paper i read a while ago. Doesn't this answer the question ?

The Impact of Reasoning Step Length on Large Language Models - https://arxiv.org/abs/2401.04925

>They discovered that appending dummy tokens (ignored during both training and inference) improves performance somehow. Don’t confuse their guess as to why this might be happening with actual understanding.

More tokens is more compute time for the model to utilize, that is completely true.

What they guess is that the model can utilize the extra compute for better predictions even if there's no extra information to accompany this extra "thinking time".

p1esk · 2024-02-26T05:11:00 1708924260

Yes, more tokens means doing more compute, that much is true. The question is whether this extra compute helps or hurts. This question is yet to be answered, as far as I know. I tend to make my GPT-4 questions quite verbose, hoping it helps.

This is completely orthogonal to CoT, which is simply a better prompt - it probably causes some sort of better pattern matching (again very poorly understood).

og_kalu · 2024-02-26T05:33:20 1708925600

>The question is whether this extra compute helps or hurts.

I've linked 2 papers now that show very clearly the extra compute helps. I honestly don't understand what else it is you're looking for.

>This is completely orthogonal to CoT, which is simply a better prompt - it probably causes some sort of better pattern matching (again very poorly understood).

That paper specifically dives in on the effect of the length of the CoT prompt. It makes little sense to say - "oh it's just the better prompt" when Cot prompts with more tokens perform better than the shorter ones even when the shorter ones contain the same information. There is also the clear correlation with task difficulty and length.

p1esk · 2024-02-26T20:28:11 1708979291

Yes, the CoT paper does provide some evidence that a more verbose prompt works better. Thank you for pointing me to it.

Though I still don’t quite understand what is going on in the dummy tokens paper - what is “computation width” and why would it provide any benefit?

frannyg · 2024-02-26T19:08:14 1708974494

So "compute" includes just having more data ... that can also be "ignored"/ "skipped" for whatever reasons (e.g. weights), ok.

razodactyl · 2024-02-26T13:22:08 1708953728

I have a theory that the results are actually a side effect of having the information in a different area of the context block.

Models can be sensitive to the location of a needle in the haystack of its input block.

It's why there are models which are great at single turn conversation but can't hold a conversation past that without multi-turn training.

You can even corrupt the outputs by pushing past the number of turns / show the model data in a form it hasn't really seen before.

p1esk · 2024-02-26T20:44:11 1708980251

Models can be sensitive to the location of a needle in the haystack of its input block.

But only if we use some sort of attention optimization. For the quadratic attention algo it shouldn’t matter where the needle is, right?

chank · on Jan 22, 2024

It's not a loss if they're still net positive from their unrealized gains. It's basically the home ownershipt version of "I know what I've got" but they in fact don't.

chank · on Dec 30, 2023

> I’ve heard similar tactics being used at other companies–mostly large companies–and it’ll only continue in 2024 as they make decisions that drive short term profits over all else.

When you tie leadership incentives to short-term profits, that's the only type of decision making that will be done.

lotsofpulp · on Dec 30, 2023

How can Amazon be guilty of incentivizing short term profit when their profit margin history looks like this?

https://www.macrotrends.net/stocks/charts/AMZN/amazon/profit...

Compare to Alphabet/Microsoft/Apple/Meta’s 20%+ profit margins.

chank · on Dec 27, 2023

* https://lmstudio.ai

LM Studio is far superior these days.

fy20 · on Dec 27, 2023

LM Studio uses llama.cpp under the hood, so if you don't need a fancy UI, you are probably better of running that.

bbor · on Dec 27, 2023

Absolutely hilarious exchange that I hope highlights the truth of the original comment!