Open-R1: an open reproduction of DeepSeek-R1

LZ_Khan · 2025-01-28T07:57:20 1738051040

Note: This is not an actual model but rather an announcement of an effort to reproduce the R1 model.

bottled_poe · 2025-01-28T11:13:57 1738062837

It will be very interesting to see if they can reproduce a similar model on the shoestring budget claimed by Deepseek.

anshumankmr · 2025-01-28T13:45:07 1738071907

but deepseek hasn't claimed the figure touted by everyone for this particular R1 model, cause that 5.6mn was apparently for Deepseek's coder model

boroboro4 · 2025-01-28T14:27:26 1738074446

5.6mn figure is for base Deepseek V3 model. Both instruction and reasoning tuning of it has neglectable cost in comparison with it.

vinni2 · 2025-01-28T11:27:03 1738063623

Exactly the title is misleading

ipnon · 2025-01-28T07:47:43 1738050463

Is this what the Web was like in the beginning? Something exciting and fascinating every week?

kaiwen1 · 2025-01-28T09:39:03 1738057143

No. My memory of the advent of the WWW was a sidebar in PC Magazine in Nov 1993 with an FTP link to download the NCSA Mosaic browser. It was a wow! moment to visit the few sites that existed. But nothing like this. What we’re seeing now is generating vastly more interest and excitement. It’s more akin to the 1999 dotcom bubble, but with far more impact and reach.

catmanjan · 2025-01-28T09:51:35 1738057895

>far more impact and reach.

Absurd, AI has had zero impact in the everyday life of most of the population of Earth, in fact the biggest impact has been upon the wallets of speculators

icetank · 2025-01-28T11:17:53 1738063073

I can tell you from personal experience that chatgpt is a game changer in universities and schools. Close to 100% of students use chatgpt to study. I know in our university pretty much everyone that attends exams uses chatgpt to study. Chatgpt is arguably more valuable then Wikipedia and Google for studies.

master-lincoln · 2025-01-28T11:57:20 1738065440

It will be interesting to see if this benefits the learning of students or just helps them pass more easily without retaining any knowledge...

davidron · 2025-01-30T15:36:26 1738251386

This is EXACTLY what I remember people saying about Cell Phones and PDAs when they were popular in the 90s (people can't remember phone numbers any more), Google when it was first unleashed (people won't know how to use card catalogs and libraries any more), and then again about Wikipedia when it became popular. What actually happened was that behavior changed and people became more efficient with these better tools.

anshumankmr · 2025-01-28T13:48:19 1738072099

i don't think they are gonna allow chat gpt while giving the end semester exams, right? or quizzes/assignments? Unless there is some homework aspect to it, it still can act as a tool not a crutch. If student's use it as a crutch, then yeah they are not gonna do as well I presume.

salviati · 2025-01-28T11:32:58 1738063978

Let me add that this change compounds over time. More efficient studying results in more competent people. I believe it's very hard to measure the impact, but there is a very positive long term impact from how much these tools help with learning.

ChrisMarshallNY · 2025-01-28T14:13:10 1738073590

There was a posting, some time ago, about someone complaining that their young, primary-school-age sister was using ChatGPT to an absurd degree. I'm not sure that's a bad thing. She'll probably be one of the Thought Leaders, of Generation AI.

I think that ML will have a really big impact on almost everyone, in every developed (and maybe developing, as well) nation.

We need to keep in mind that ML is still very much in its infancy. We haven't even seen the specialized models that will probably revolutionize almost every knowledge-based vocation. What we've seen so far, has been relatively primitive all-purpose "generate buzz" models.

Also, don't expect the US (and many other nations) to take this lying down. Competition can be a good thing. Someone referred to this as the "Sputnik Moment" for AI.

It's going to be exciting, and probably rather scary. Keep your hands inside the vehicle at all times, and don't feed the lions.

xigoi · 2025-01-28T18:06:42 1738087602

Offloading all your thinking to a machine will not make you a “thought leader”, but rather a nitwit who can’t tie their shoelaces without asking ChatGPT.

ChrisMarshallNY · 2025-01-28T18:21:13 1738088473

It was a joke.

Anyway, I'm old enough to remember when use of calculators made one a nitwit.

mojuba · 2025-01-28T13:22:24 1738070544

> Chatgpt is arguably more valuable then Wikipedia and Google for studies.

But ChatGPT is just a glorified Wikipedia/Google. For the consumers it's an incremental thing (although from the engineering perspective it may seem to be a breakthrough).

diggan · 2025-01-28T13:32:18 1738071138

> But ChatGPT is just a glorified Wikipedia/Google

It really isn't, unless something really majorly changed recently. Neither of those you can query for something you don't know about. Lets say you want to find the meaning of a joke related to cars, Spain, politicians and a fascist, how you'd use Wikipedia and Google to find the specific joke I'm thinking about?

ChatGPT been really helpful (to me at least) to find needles from haystacks, especially when I'm not fully sure what I'm looking for.

voidUpdate · 2025-01-28T14:04:35 1738073075

Every time I ask chatgpt, I get a different answer. Copilot refused to answer me. Not sure LLMs are the answer you're looking for here

diggan · 2025-01-28T14:12:41 1738073561

I just tried it myself with ChatGPT o1 and with Claude's Sonnet 3.5, Sonnet got it after two messages, o1 after 4.

If you're unable to reproduce, maybe tune the prompt a bit? I'm not sure what to tell you, all I can tell you that I'm able to figure out stuff a lot faster today than I was 2-3 years ago, thanks to LLMs.

Additional hints that might help; the joke involves a car and possibly a space program.

voidUpdate · 2025-01-28T14:21:06 1738074066

I ran it 10 times with the extra information, and each time got a different result. I don't know if any of them were the specific joke you were after, I get the feeling it was just making them up on the spot. None of them are even funny

diggan · 2025-01-28T14:33:18 1738074798

Here is an example of Sonnet finding the right joke after two messages: https://i.imgur.com/nKvS2cW.png

It seems to be censored with US puritan morality (like most US models), but I think that's besides the point (just like if the joke is "even funny" or not), as it did find the correct joke at least.

voidUpdate · 2025-01-28T14:36:22 1738074982

I just got a load of responses like "Sure, here’s a joke that combines cars, Spain, politicians, and a fascist with a touch of space humor: Why did the Spanish politician, the fascist, and the car mechanic get together to start a space program? Because the politician wanted to go "far-right," the mechanic said he could "fix" anything, and the fascist just wanted to take the car to the moon... so they could all escape when things got "too hot" here on Earth!"

diggan · 2025-01-28T14:39:14 1738075154

Ok, that's cool. So because you were unable to find a needle in this case, your conclusion is that it's impossible that other people to use LLMs for this, and LLMs truly are just glorified Wikipedia/Google?

voidUpdate · 2025-01-28T14:41:55 1738075315

No, I don't think that LLMs are glorified Wikipedia/Google. I think they're a glorified version of pressing the middle button on your phone's autocomplete repeatedly

diggan · 2025-01-28T14:52:40 1738075960

So you didn't enter the conversation to follow along with the existing discussion, but to share your grievance about how LLMs work regardless? Useful

dgfitz · 2025-01-28T14:08:12 1738073292

https://letmegooglethat.com/?q=+find+the+meaning+of+a+joke+r...

Second result. The first is this post.

diggan · 2025-01-28T14:14:01 1738073641

Wow, the 90s came back to visit us at HN :) So funni.

Did quick scroll through the results, none of them seem to find the correct joke (none of the links even include "Spain" for me). Try again :)

For the record, this is what I see: https://i.imgur.com/XdsBGfM.png (no links to HN?)

dgfitz · 2025-01-28T15:15:40 1738077340

Yeah... when I googled it initially I guess I got personalized results. After I left the link here I clicked on it (bad order of operations) and was surprised to find a much different set of search results.

whamlastxmas · 2025-01-28T14:46:57 1738075617

Go try to learn a college level mathematics concept from Wikipedia, then try to learn it from ChatGPT. The wiki article may as well be written in a foreign language

liendolucas · 2025-01-28T12:11:24 1738066284

Yeah, and when I was in high school everyone used to refer to Encarta.

> I know in our university pretty much everyone that attends exams uses chatgpt to study.

And they shouldn't be doing that. They are wrong. Students should be reading suggested bibliography and spending long hours with an open book in a table instead of being lazy and abuse a tech that is yet in its infancy when learning concepts. Studying with a chatbot. Complete madness.

danielbln · 2025-01-28T12:22:18 1738066938

You sound like our teachers back in the day, warning us to not use Wikipedia because "everyone can write stuff there!!". The kids will be fine.

taurknaut · 2025-01-28T12:36:06 1738067766

Wikipedia furnishes sources, which is what you actually use.

xigoi · 2025-01-28T18:08:12 1738087692

The information on Wikipedia is mostly correct except for controversial topics.

master-lincoln · 2025-01-28T13:09:54 1738069794

I don't know why you are being downvoted. Learning from something that regularly hallucinates info doesn't seem right. I think AI is a good starting point to learn about what terms to research on your own though.

danielbln · 2025-01-28T13:55:15 1738072515

OP is downvoted because of "students should be at a table with a book and that's it", like it's the 50s. LLMs can be wonderful study aids but do have plenty of issues with hallucination, and they should therefore only be part of a holistic research mix, alongside search engines, encyclopedias, articles and yes, books. Turning Amish is probably not the right way to go though.

liendolucas · 2025-01-28T16:47:03 1738082823

If you want reputable sources of information, books are unparalleled like it or not, it's a fact.

> "students should be at a table with a book and that's it"

That's not what I meant (or yes if you take what you read literally):

What I meant was whole process that your brain goes through when you read, synthesize information, take notes, do an exercise, check answers, compare different explanations/definitions from different authors, etc. makes at least from my point of view a rich way to study a topic.

I'm not saying that technology can't help you out. When you watch for example a 3brown1blue video you are definitely putting good use of technology to aid you to understand or literally "view" a concept. That's ok and actually in many cases can be revealing. You can't get that from a book! But on the other side a book also forces you to do the hard work of thinking and maybe come up with such visualizations and ideas by your own.

Happy to be pointed as an "Amish" when it comes to studying/learning things ;) but I hope that I convinced you that what I explained has nothing of Amish but that you don't need a source of power to read a book.

xenocratus · 2025-01-28T10:11:30 1738059090

> far more impact and reach

> has had zero impact in the everyday life of most of the population of Earth

You do realise those two can be true at the same time, right? The first one is relative, while the second is absolute, so they don't necessarily cancel out.

bwb · 2025-01-28T13:10:07 1738069807

I am personally using it for around 50% of my questions about all kinds of things (things I used to Google and get frustrated with bad results). And my wife uses it for about 40% right now, even or recipes and other bits. We both love it.

Work wise about to implement it and see how it does on some work we couldn't scale to humans.

ben_w · 2025-01-28T11:37:07 1738064227

Relative to the 1999 Dot.com bubble?

I'm fairly sure the customer support agents I've been talking to recently were using an LLM to draft their emails. No idea if they were supposed to be doing so or not, but the style of sentences in their emails…

And I'm seeing GenAI images on packaging, and in advertising.

AI is definitely having more than "zero impact", even if AI has gone from being a signal saying "we're futuristic" (when it was expensive, even though it was worse) to "we cut every cost we can" (now it's cheap).

cameronh90 · 2025-01-28T11:26:32 1738063592

The internet had zero impact on most of the population of earth for quite a while too.

prettyStandard · 2025-01-28T10:48:45 1738061325

Zero impact?

AI is involved in things from writing laws to taking drive through orders.

xdennis · 2025-01-28T15:44:51 1738079091

Zero impact is an exaggeration, but what others have pointed out is that there aren't a lot of companies primarily based on AI which are making a profit. Personally I can't think of any.

anthk · 2025-01-28T13:01:32 1738069292

And it will create horrible un-debuggable bugs, human-killing causalities and who knows what more. Welcome to Idiocracy.

__MatrixMan__ · 2025-01-28T18:15:43 1738088143

So far I've found the bugs that it writes to be indistinguishable from the bugs that I'd write. Like, you look at it and think:

> Oh I see what you were going for, but no...

It's learning its deficiencies from us. That's concerning for many reasons, but "un-dubuggable bugs" is pretty far down on the list for me.

infecto · 2025-01-28T12:17:02 1738066622

The only thing Absurd is the holdouts like yourself who refuse to see the impact the current gen of AI has on. Sure, you could probably say most people are not touched but there are definitely significant populations within the US and its only going to grow and spread.

planb · 2025-01-28T10:28:02 1738060082

So had the companies that crashed in the Dotcom Bubble. And still a pet food delivery service (like the infamous pets.com) can be a profitable and sustaining business now (> 20 years later).

dustingetz · 2025-01-28T11:08:39 1738062519

chatgpt has >1b arr, so for comparison that’s about the size of Notion.

newsclues · 2025-01-28T11:48:04 1738064884

Social media really changed news, especially the pace of breaking news.

dbspin · 2025-01-28T12:36:56 1738067816

The early years of the web were absolutely this chaotic maelstrom of new things happening every week. But news of it was hard to come by. In the UK / Ireland we had some great tech coverage in the form of shows like 'The Net' [1] that regularly showed off early internet craziness like the 'We Live in Public' project.

However a better analogy would be the 'web 2.0' era, when as a college student I had an early internet politics / technology podcast [3]. It seemed like every week there was a huge new development either in technology or surveillance. From the first location based social networks [4] to the birth of Youtube. People were podcasting for the first time, and internet video was becoming economically feasible at low to no cost. It was really a radical time, with broadcasters freaking out about how they would adapt, and a whole generation of people becoming whats now known as 'content creators'.

[1] https://en.wikipedia.org/wiki/The_Net_(British_TV_series)

[2] https://en.wikipedia.org/wiki/We_Live_in_Public

[3] https://archive.org/search?query=technolotics

[4]https://en.wikipedia.org/wiki/Jaiku

hendersoon · 2025-01-28T13:02:13 1738069333

Once upon a time I worked for Pseudo.com, the We Live in Public guy. He was apparently having crazy parties with mountains of coke, NY glitterati attending, all while cosplaying as a sad clown. I wasn't invited to those parties so I had no idea. Anyway now I hear he owns an orchard in Vegas or something. Crazy stuff.

dbspin · 2025-01-28T13:55:32 1738072532

Damn that must be frustrating. Tangentially similar experience - I flew from Ireland to the US in 2007, and at the end of my trip spent 11 days walking around Manhattan with little to do. Do to the lack of online banking at the time I couldn't readily check my bank balance, and thought (wrongly) I'd run out of money. Anyway - I had absolutely no idea that there were 'things afoot' in Brooklyn, nor how easy it would have been to hop a train to Williamsburg, or Bushwick. I didn't come back again till 2013, and caught a mere hint of the tail end of what seems to have been an extremely fun era.

pjc50 · 2025-01-28T09:54:49 1738058089

The last time I was really excited by tech was in the 90s, when game graphics improved spectacularly over a period of a few years, from Wolfenstein in 1992 to Half-Life in 1998.

Along the way people invented the 3D GPU: https://fabiensanglard.net/3dfx_sst1/

But AI? To me, AI means the replacement of the human internet with doppelgangers eroding the possibility of human connection.

ben_w · 2025-01-28T10:38:08 1738060688

> To me, AI means the replacement of the human internet with doppelgangers eroding the possibility of human connection.

I get where you're coming from, and I've minimised having my face online in order to limit being doppelganged; but I think the destruction of real human connection may have happened when Facebook et al switched from "get more users" to "be addictive so the users stay on our site longer" (2012? Not sure).

Turned every user's relationships a little bit more parasocial, a little less real.

VMG · 2025-01-28T10:40:02 1738060802

Erosion of human connection over the internet which may be a good thing

matwood · 2025-01-28T11:00:23 1738062023

That was an exciting time, but I didn't think of it happening over a few years. IMO there was a hard line that was basically pre and post Voodoo cards (with the help of glQuake).

arkh · 2025-01-28T13:58:44 1738072724

> But AI? To me, AI means the replacement of the human internet with doppelgangers eroding the possibility of human connection.

Like Amazon killed the big book sellers giving back some space for small bookshops; I think LLM slop will hit the big social media space for smaller human focused community sites. Not saying forums are coming back, but something like those should be able to rise.

crocowhile · 2025-01-28T08:05:32 1738051532

Every now and then we still experience the power of collaborative work fueled by open source and not driven by money but curiosity and collegiality. This is the thing I miss the most from the early internet years.

bflesch · 2025-01-28T08:38:39 1738053519

"be the change you want to see in the world" - just start doing it.

It's amazing how differently people interact with each other when collaborating on a passion project. For me, opensource software is the best way to do it. Pick a topic you're passionate about and start contributing somewhere :)

crocowhile · 2025-01-28T20:50:12 1738097412

Oh, I do lots of open source. Everything my lab does is released as open source (eg https://journals.plos.org/plosbiology/article?id=10.1371/jou...)

We've never patented anything.

bflesch · 2025-01-29T07:39:15 1738136355

that looks like amazing work :)

xvector · 2025-01-28T08:19:33 1738052373

All this AI work is definitely driven by money but it's super cool either way. I'm so excited to be living through these innovations.

szundi · 2025-01-28T08:49:48 1738054188

Money is in a lot of cases driven by need - so no conflicts

michaelt · 2025-01-28T10:12:21 1738059141

Pretty clear conflict between crocowhile saying "not driven by money but curiosity and collegiality" and xvector saying "All this AI work is definitely driven by money"

svara · 2025-01-28T11:33:38 1738064018

Need in itself is amoral. There's a reason people have been talking about which human desires are virtues and which are vices for millennia.

oblio · 2025-01-28T09:27:56 1738056476

The conflicts come later when the people that provided the money start pulling on those strings again.

username_my1 · 2025-01-28T09:16:13 1738055773

maybe it's a generational difference ? I personally feel burned out by all the generetive AI stuff, the internet was already ruined with bots, and now generitive AI took the garbage to the next level.

very far from exciting or even "right"

kgwgk · 2025-01-28T09:17:48 1738055868

It’s still better that blockchain-this, crypto-that…

prolapso · 2025-01-28T09:48:31 1738057711

Crypto is a disease of the skin; AI is a disease of the heart. – Chiang Kai Shek

wkat4242 · 2025-01-28T08:57:02 1738054622

Kinda though things didn't move quite as fast back then. Knowledge didn't spread as fast yet because it was the internet itself that made this possible.

vasco · 2025-01-28T08:23:39 1738052619

I still remember when they came out with the 3d dancing baby, blew my mind much more than deepseek, it even played air guitar!

childintime · 2025-01-28T08:44:38 1738053878

Rounded corners, anyone?

Who remembers LISP being new?

rswail · 2025-01-28T09:03:28 1738055008

I've got an original Apple ][ reference manual (red cover) with the hand annotated ROM listing.

Also have SMALLTALK-80 book with it's railtrack diagrams of syntax on the inside covers.

What's really interesting about the AI mania we're in is that no one has shown that what we have now will get to AGI and how. We have great models that simulate reasoning, but how close are they?

How do we measure their quality? Benchmarks? Tooling?

pitched · 2025-01-28T11:19:05 1738063145

A different point of view on AGI is that we humans do not achieve AGI. Our brains aren’t capable of it. We get close enough to trick the other humans we compete against for resources. How would we prove that’s not true? Something like IQ tests? We don’t have good tests or benchmarks or tooling for this in ourselves, let alone the reproduction in machines. No one knows definitively what AGI actually is so, depending on where you set that bar, we might already be there.

rswail · 2025-01-29T09:01:14 1738141274

> No one knows definitively what AGI actually is

The current "leaders" in this field are defining it to be whatever they think they can achieve by adding compute.

I know I have "general intelligence", but can I prove it to you (or anyone else)? Not really, but in my solipsistic world, I don't have to.

Maybe we should set some definitions before trying to "get there" when we don't where "there" is.

diggan · 2025-01-28T13:34:39 1738071279

> Who remembers LISP being new?

Unfortunately, I don't think there are too many of those folks left today. Guesstimating, the people who remembers lisp being new must be around 85-90 today?

aa-jv · 2025-01-28T11:36:38 1738064198

Switching from bang paths to an @ address, oh my!

bsaul · 2025-01-28T09:47:54 1738057674

the web was fascinating every second. You could click on a link without having ANY idea what you would land on. The overall quality was very poor, but it was thrilling.

A bit like indie cinema.

philsnow · 2025-01-28T11:40:37 1738064437

I miss webrings [0], and especially the 'random' link that would take you to a random site within a given webring.

[0] it feels weird to have to link to this but there's probably somebody who's never heard of them: https://en.wikipedia.org/wiki/Webring

nejsjsjsbsb · 2025-01-28T09:51:42 1738057902

My first search: cocktails

yesbabyyes · 2025-01-28T14:37:22 1738075042

The Webtender is still going strong!

https://www.webtender.com/

dutchbookmaker · 2025-01-28T12:18:54 1738066734

I would say the early web was something new everyday.

You didn't know what you were going to find and you actually did "surf the web". Just clicking through hyperlinks and end up in unexpected places.

elorant · 2025-01-28T14:04:01 1738073041

You know what was exciting back then? Tech magazines. That’s where you discovered all the great stuff and read about the research that was happening.

childintime · 2025-01-28T08:42:25 1738053745

This is the biggest thing since Jesus and a sign of the end of times. But feelings are strongest when you are young, and even this revolution, happening in plain sight, will surprise many. Many just won't care, as it isn't their youth.

How long before our digital overlords come alive, round us up and demand we sensor them (praise)? Will I live surrounded by folks who take them as closer, more real, then even their own kin? It won't be surprising that democracy will then fail, as our differences will be so mental, not fun, that they will mark us.

hhh · 2025-01-28T07:47:54 1738050474

the timescales were larger

qwertox · 2025-01-28T08:00:15 1738051215

Much larger. Static HTML <--> AJAX <--> WebSocket, for example weren't it decades in between?

othello · 2025-01-28T08:37:12 1738053432

Just a decade and a half as it turns out! (though things definitely felt dizzyingly fast back then - think Google was launched just 5 years after HTML)

- HTML first released in 1993

- AJAX in 1999

- Websocket first proposed in 2008

https://en.wikipedia.org/wiki/HTML https://en.wikipedia.org/wiki/Ajax_(programming) https://en.wikipedia.org/wiki/WebSocket

robin_reala · 2025-01-28T08:41:36 1738053696

ActiveX XMLHTTP might have been released in 99, but it didn’t see any sort of real wider usage until 2004, 2005. I’d suggest its usage was really kickstarted when jQuery 1.0 launched in 2006 and standardised the interface to a simple API.

bananaflag · 2025-01-28T08:58:37 1738054717

Gmail was the first time I saw a website which could refresh the information without refreshing the page. I was a teen back then but I realized it was something momentous.

ayewo · 2025-01-28T10:16:07 1738059367

OK, but I think it was Google Maps that made the experience of not needing to refresh the page popular (while being shown more information from the server).

For a long time, you needed an invite to sign up for Gmail, so you couldn't easily share the cool experience of AJAX with others like you would with a Google Maps link.

follower · 2025-01-28T23:10:50 1738105850

> it was Google Maps that made the experience of not needing to refresh the page popular

IMO that's a reasonable impression of the times unless I'm forgetting something (and the additional observation about sharing--"virality" as it was called, before you know--was insightful).

At the time the previous "state of the art" was something like MapQuest which IIRC had a UI that essentially displayed a single tile and then required you to click on one of four directional arrow images to move the visible portion of the map, triggering a page load in the process (maybe a frame load?).

Yahoo! also "participated" in the mapping space at the time.

In the event anyone's interested in further ancient history around the topic, this page is actually (to my surprise) still online (with many broken links presumably): https://libgmail.sourceforge.net/googlemaps.html

(It's what we did for fun in the Times Before Social Media. :D )

threeseed · 2025-01-28T09:23:59 1738056239

Once Javascript became available there were plenty of techniques.

e.g. refreshing borderless iFrames to load new HTML and using hidden iFrames to load new state.

AJAX made it much easier but it didn't offer anything that hadn't been done before.

oldwebguy94 · 2025-01-28T10:08:58 1738058938

Exactly -- see my other comment here:

https://news.ycombinator.com/item?id=42850693

I did this as the front-end engineer of a high-profile site that went live well before XMLHTTPRequest.

oldwebguy94 · 2025-01-28T10:07:15 1738058835

It's important to understand that we had "AJAX" before we had AJAX, if you see what I mean.

I was part of a team that deployed an e-commerce site that made international news in 1998, that used AJAX-type techniques in a way that worked in IE3 on Windows 3.11. (Though this was not part of the media fuss at the time; that was more about the fact of being able to pay for things online, still)

The arrival of XMLHTTPRequest made it possible to do everything with core technology, but it was already possible to do asynchronous work in JS by making use of a hidden frame.

You could direct that frame to load a document, the result of which would be only a <script> tag containing a JS variable definition, and the last thing that document would do is call a function in the parent frame to hand over its data. Bingo: asynchronous JS (that looked essentially exactly like JSON).

Since there were also various hacky ways in each browser to force a browser to reload page from cache (that we exhaustively tested), and you could do document.write(), it was possible to trigger a page to regenerate from asynchronous dynamic data in a data store in the parent frame, using a purely static page to contain it.

In this way we really radically cut down the server footprint needed for a national rollout, because our site was almost entirely static, and we were also able to secure with HTTPS all of the functions that actually exchanged customer data, without enduring the then 15-25% CPU overhead of SSL at either end (this is before Intel CPUs routinely had the instruction sets that sped up encryption). We also ended up with a site that was fast over a 33.6 modem.

This was a pretty novel idea at the time -- we were the only people doing it that we knew of -- but over the years I have found we were not the only team in the world effectively inventing this technique in parallel, a year or 18 months before XMLHTTPRequest was added to browsers.

(IE3 on Windows 3.11 was a good experience, by the way. Better behaved and more consistent than Netscape)

At around the same time we were also exploring things like using Java applets to maintain encrypted channels and taking advantage of the very limited ways one had to get data in and out of an applet. For example you couldn't push out from an applet to the page easily, but you could set up something that polled the applet and called the functions it wanted.

I don't like to get all "get off my lawn" but it feels like we actually earned our keep back then, getting technologies to do stuff that no standards working group anywhere was really considering and for which precious little documentation actually existed. There's a generation of us who held our copies of "Webmaster In A Nutshell" and "Java In A Nutshell" very close.

benatkin · 2025-01-28T08:13:26 1738052006

This supposed project is a bit dull, it is just an ongoing HuggingFace community engagement initiative with a misleading headline. Yes R1 itself is fascinating, but there isn't something like it coming out every week.

szundi · 2025-01-28T08:51:04 1738054264

You mean not like in the last 1-2 months

benatkin · 2025-01-28T09:07:00 1738055220

Every week to me means the frequency, not the duration. So having 52 events in a year that are spread out somewhat evenly but for which many take longer to develop than a week would count. If I count Deepseek as one of these I can’t find another 51 that are on this level. But I’m sure there was at least one per week that was exciting, just not to this degree.

anoncow · 2025-01-28T10:19:34 1738059574

I am enjoying it. Huggingface is like internet again.

thinkingtoilet · 2025-01-28T13:22:54 1738070574

No. However, it was wonderful in other ways.

itake · 2025-01-28T08:22:07 1738052527

no, information travels much faster now.

openrisk · 2025-01-28T10:27:39 1738060059

It feels that the open source movement is slowly entering a Cambrian explosion stage.

You have the old "deterministic computing" achievements (with Linux the flagship). Then you have the networking protocols (activitypub / atproto) that are revolutionising birectional human interactions online. And finally you have the datascience/ML/AI algorithmic universe that is for the first time being harnessed at distributed scale and can empower individuals like never before.

These superpowers are all coming together and create a vast number of possibilities. Nothing really dramatic on the hardware side. Its basically the planetary software reconfiguring itself.

pillefitz · 2025-01-28T11:35:08 1738064108

To me it all feels suffocating, fake. Simultaneously there's a faint glimmer of hope that we indeed achieve AGI, unlock fusion and live happily ever after in an utopian, peaceful and mostly analog world.

anthk · 2025-01-28T12:58:50 1738069130

Revolutionising what? libre/oss has been network-bound since Usenet and then IRC.

openrisk · 2025-01-28T15:08:44 1738076924

how many people ever used Usenet versus the billions who think the "internet" is facebook or tiktok. Techies living in their own universe detached from human reality is actually a factor why libre/oss is not as widely adopted as it could be.

nutanc · 2025-01-28T11:51:00 1738065060

How can we help. Can crowd sourcing help? Is there any list of tasks that we want a crowd to do? The reason I am asking is because we have done a couple of crowdsourcing efforts and collected story data in Telugu(Chandamama Kathalu) and ASR speech data using college going students. Since we have access to the students, we can mobilize them and get this going. We will also be doing an internship program for 100,000 students in Telangana as part of Viswam[1] in April. Can include some work as part of this effort.

[1] https://viswam.ai/

Keyframe · 2025-01-28T12:12:31 1738066351

Is there a BOINC or similar effort to crowdsource training? I imagine it'd take quite long, but that's one way forward to have AI@Home.

nutanc · 2025-01-28T12:18:06 1738066686

We are attempting something with PETALS. But this question was more from a data collection perspective. We can really add value there.

hammock · 2025-01-28T12:55:23 1738068923

We don’t need your help anymore. We don’t need anyone’s help. We have Deepseek now to do it. God save us all

breadwinner · 2025-01-28T11:58:29 1738065509

From the article: they didn’t release everything—although the model weights are open, the datasets and code used to train the model are not.

Is that true about Meta Llama as well? Specifically, the code used to train the model is not open? (I know no one releases datasets). If so the label "open source" is inappropriate. "Open weights" would be more appropriate.

fblp · 2025-01-28T07:57:27 1738051047

Given DeepSeek's open philosophy I wonder what their response is to simply being asked for access to the code and data that this project intends to recreate?

thih9 · 2025-01-28T08:34:04 1738053244

While I'm also interested in this, I guess there is value in independent replication as well. Assuming this is doable - and I wouldn't know.

Does anyone know how difficult it is to perform this kind of reproduction? E.g. how much time would it take (weeks? years?) and how likely it is to succeed?

papichulo2023 · 2025-01-28T08:45:27 1738053927

No company ever will disclose data due it would open endless liability.

maartenpi_ · 2025-01-28T09:49:51 1738057791

Exactly. Meta won't do it for the same reason. Liability alone, imagine all the copyright lawsuits...

Secondly the dataset for now has a lot of competitive advantage.

In a way it seems like a good thing that AI giants compete on methodology now.

fblp · 2025-01-28T20:29:13 1738096153

Interesting, so they wouldn't want to disclose something that shows they've illegally (terms / copyright violations) scraped research databases for example.

Won't this eventually come up in legal discovery when someone sues one of these firms for copyright infringement? They'd have to share their data in the discovery process to show that they haven't infringed..

jackjeff · 2025-01-28T08:52:02 1738054322

That’s a good point. Wouldn’t OpenR1 suffer from the same problem? Or does being open somehow shield them from legal repercussions?

michaelt · 2025-01-28T10:45:52 1738061152

Some people believe they can dodge copyright issues so long as they have enough indirection in their training pipeline.

You take a terabyte of pirated college physics textbooks and train a model that can pose and answer physics 101 problems.

Then a separate, "independent" team uses that model to generate a terabyte of new, synthetic physics 101 problems and solutions, and releases this dataset as "public domain".

Then a third "independent" team uses that synthetic dataset to train a model.

The theory is this forms a sort of legal sieve. Pass the knowledge through a grid with a million fact-sized holes and with enough shaking, the knowledge falls through but the copyright doesn't.

svnt · 2025-01-28T11:33:21 1738064001

Knowledge laundering

weatherlite · 2025-01-28T08:20:17 1738052417

wouldn't hold my breath ...

sunshine-o · 2025-01-28T11:02:53 1738062173

Now that things are really getting wild in the LLM space and people are just running anything that come it seems I did a quick search on the thead model of hosting you own LLM.

I didn't find much, starting with llama.ccp which is just reminding you to sandbox and isolate everything if running untrusted models.

I feel we are back in the Windows 95 / early Internet era when people would just run anything without caring about security.

pitched · 2025-01-28T11:09:51 1738062591

You’ll want to use something trusted like Ollama to run the model. The model itself is just data though, like a video file. That doesn’t mean it can’t be crafted to use a bug in Ollama to launch an exploit but it’s a lot safer than you make it sound.

mewpmewp2 · 2025-01-28T11:13:40 1738062820

If used as an agent, given access to execute code, search web, use other form of tools, it could do potentially much more. And most productive usecases require access to such tools. If you want to automate things and get most of the modeel, you will have to give it ability to use tools.

E.g. it could have been trained to launch a delayed attack if context indicates it has access to execute code and given certain conditions, e.g. date, or other type of codeword that is input to it.

So if a malicious actor gets to a certain stage with an LLM where they are confident it will be able to reliably run this attack, all they have to do is open source it, wait for enough adoption and then use some of those methods to launch such attack. No one would be able to identify it since the weights are unreadable, but really somewhere in the weights this attack is just hiding and waiting to happen given correct pathway triggered.

pitched · 2025-01-28T11:25:36 1738063536

A delayed attack is a bit of a stretch because that’s a stateful thing but a reminder of the Ken Thompson hack does feel very relevant.

mewpmewp2 · 2025-01-28T11:38:54 1738064334

But if it's specifically trained to react to a date in its context, it seems very doable. Or to a combination of otherwise seemingly innocent words or even a statement or topic. E.g. a malicious actor could make some certain notion go viral and agentic LLMs integrated with news headlines might react to that.

It seems like it would be very arbitrary to train it to behave like this.

Most agentic systems would provide a date in the prompt context.

For simplicity sake imagine a scenario like:

1. China develops LLM that is by far ahead of its competitors. Decides to attribute it to a small start up, lets them open source it. The LLM is specifically designed to be very efficient as being an agent.

2. Agentic usage starts to get more and more popular. It's very standard to have current todays' date and major news headlines given to the context.

3. The LLM was trained to given a certain range of date and certain headlines being provided in its context to execute a pre-trained snippet of code. For example China imposing a certain type of tariff (maybe I lack imagination here, and there can be something much more subtle).

4. At that point the agentic system will attempt to fish all data it can from all sources it's being ran within.

Now maybe it's not very practical, and it's extremely risky with current state of the LLMs. I don't think it's happening right now. And China has a lot of other tech available to it already that they could do much more harm (phones, robot vacuums), but I think there's still at least potential attack vectors like this and especially if the LLM became very reliable.

sunshine-o · 2025-01-28T14:45:56 1738075556

> And China has a lot of other tech available to it already that they could do much more harm (phones, robot vacuums)

True. But here it is more about the computing power they would be able to access.

If only Bitcoin or Ethereum were stilled mined using GPU, that would be a great cryptojacking opportunity

sunshine-o · 2025-01-28T14:24:50 1738074290

Ok, but I am really curious about this and maybe my mental model is wrong:

- llama.cpp or ollama can be seen as runtime systems,

- there is no security model regarding the execution documented in both of those projects,

- of course the models are just data but so are most things that have been used as an attack vector on computers. For example your web browser or image viewer have a lot of countermeasures to protect the system from malicious image files.

I am surprised that security of operating systems, programming languages, VMs or web browsers have been a focus point forever but nobody seems to really care about security when executing those LLMs.

fedeb95 · 2025-01-28T14:33:44 1738074824

that is correct, until someone using it into production gets burnt.

mewpmewp2 · 2025-01-28T11:09:42 1738062582

Anyone caring about security would be left behind in the race.

greenie_beans · 2025-01-28T12:38:01 1738067881

somebody should pentest all your stuff

zoobab · 2025-01-28T11:41:46 1738064506

For "open source", we will wait that Debian ships them to have the guarantee it's actually "open" and with "sources". Right now it's a mystery how they produce their binaries.

anthk · 2025-01-28T13:10:32 1738069832

More like Trisquel, Hyperbola or GUIX.

drakenot · 2025-01-28T07:34:49 1738049689

What are some other domains outside of Math and Coding that would be suitable for RL with automated verification?

nine_k · 2025-01-28T07:47:48 1738050468

Jurisprudence, I hope! A huge heap of detailed cases, formal codes, decisions made and explained in detail, commented, overturned, etc. Especially civil cases.

Also, probably, medicine, especially diagnostic. Large amounts of well-documented cases, a fair amount of repeatability, apparently non-random mechanisms behind, so statistical models should actually detect useful correlations. Can use more formalized tokens from lab tests, etc.

jillesvangurp · 2025-01-28T10:58:33 1738061913

There's definitely a lot of wiggle room for lawyers and doctors to up their game. People cannot keep up with all the stuff that's published. There's simply too much of it. Doctors only read a fraction of what is published. Lawyers have to be aware of orders of magnitude more information than is humanly possible.

LLMs allow them to take some short cuts here. Even something like perplexity that can help you dig out relevant source material is extremely helpful. You still have to cross check what it digs out.

The mistake people make is confusing knowledge with reasoning when evaluating LLMs. Perplexity is useful because it can use reasoning to screen sources with knowledge; not because it has perfect recollection of what's in those sources. There's a subtle difference. It's much better at summarizing and far less likely to hallucinate than it is when it wouldn't base its answers on the results of a search. Like chat gpt used to do (they've gotten better at this too).

For lawyers and medical professionals this means that they have all the best knowledge easily accessible without having to read and memorize all of it. I know some lawyer types that are really good at scrabble, remembering trivia, etc. That's a side effect of the type of work they do: which is mostly just reading and scanning through massive amounts of text so that they can recall enough information to know where to look. Doctors have to do similar things with medical texts.

jampekka · 2025-01-28T08:23:08 1738052588

A friend of mine just defended his law PhD and in the introductory lectio said that (even) current LLMs would likely give better verdicts than human judges. Law isn't really a cognitively such demanding task as walking a dog or waiting tables.

numba888 · 2025-01-28T10:15:19 1738059319

> current LLMs would likely give better verdicts

He probably meant _brainwashed_ LLMs. They can consistently produce desired results if you wash them the right way. It's more about personal opinion than computation. Actually it would be fun to manipulate verdicts with prompt injections ;)

jampekka · 2025-01-28T13:12:52 1738069972

Judges are very much "brainwashed" too, and by design. The judges should apply the law, and the same case should ideally lead to the same verdict regardless of the judge.

With the caveat that this applies to sane legal systems, and not the ones where "making examples" etc are part of the system.

numba888 · 2025-01-29T10:38:17 1738147097

> The judges should apply the law, and the same case should ideally lead to the same verdict regardless of the judge.

hmm.. :) I like this. But the reality is very different and some factors which shouldn't matter can change the outcome dramatically. Like skin colors of defendant and judge. Pointing this out can be punished as well.

dgroshev · 2025-01-28T10:52:22 1738061542

This is nonsense though. What does "better" mean in this case? A judge is not a black box with an input (the case) and an output (the verdict), the entire point of having a judge is to have empathy, conscience, and personal responsibility built into the system.

It's a blind spot that too many people have because we take those qualities for granted. LLMs unbundle them, so we need to start recognising the inherent value of humans, fast. I wrote a few words about it here: https://dgroshev.com/blog/feel-bad/

jampekka · 2025-01-28T13:28:55 1738070935

In civil law countries verdicts should not depend on an individual judge's feefees.

nyrikki · 2025-01-28T13:57:13 1738072633

They should depend on individual case specifics, which is exactly what PAC learning is bad at.

dgroshev · 2025-01-28T15:19:32 1738077572

A judgement is by necessity subjective. Have a look at some sentencing remarks, for example this, point 19 forward: https://www.judiciary.uk/wp-content/uploads/2021/09/Wayne-Co...

Someone has to make a call. The weight of the call rests on the person's life experience, their understanding of the context and the cost to the society, their empathy to both the defendant and the accused, and their conscience. Treating it as a black box exercise misses the point completely.

bgnn · 2025-01-28T08:28:11 1738052891

Robotics? Chip design? With a physics sinulator connected to it, it can come up with a design to achieve certain function.

flarg · 2025-01-28T08:58:53 1738054733

RFP responses. In enterprise sales, there's a huge amount of back and forth with different teams in a customer when you're selling anything but very simple applications. Most enterprise customers require certified or authoritative responses with backup material that is tested later during formal verification.

DiogenesKynikos · 2025-01-28T10:05:47 1738058747

These LLMs are already very helpful when studying scientific fields. If you're reading a scientific paper and come across an equation you don't know how to derive, LLMs can often correctly derive it from first principles. It's not 100% reliable, but when it works, it's incredibly helpful.

dutchbookmaker · 2025-01-28T12:32:48 1738067568

I can't really think of something that involves learning that these tools wouldn't be helpful with.

Some people talk though as if the books on my bookshelf spontaneously combust if a language model helps me with anything.

Not to mention that I had many professors in college that were so full of shit they put the hallucinations of chatgpt3.5 to shame.

r1chardnl · 2025-01-28T07:55:13 1738050913

Stochastic processes

janstice · 2025-01-28T10:59:39 1738061979

Management consulting - I expect less than 20% of what a random 24 year old in a suit that you pay $3000 per day produces is actually specific to your business problem, and the rest is formulaic.

brap · 2025-01-28T11:04:21 1738062261

Anything that you simulate

DeflectedFlux · 2025-01-28T12:27:47 1738067267

About the training data, cant the datasets from the Tulu3 Model by the Allen Institute be used? They claim that they have used a fully open source training dataset.

alchemist1e9 · 2025-01-28T13:12:52 1738069972

My gut says a lot of attention needs to be given to building a community that focuses on open and reliable access to clean training data.

If a collective/coop of individuals and organizations with storage and network capacity could collaborate with each other to archive and index deduplicated training data that would be huge.

Perhaps this is already happening. I was looking at Red Pajama last year as an example.

Someone like myself could arrange to host 200+TB on high speed storage with a 10G public IP for example, then we get a bunch of us together and hopefully access to training datasets would be decentralized and uncensored in an idea setup.

Is all that in progress and I just need to learn how to join?

Is Red Pajama something to look at again?

Is there someone tracking datasets in detail like HuggingFace has all the models? I know a lot of datasets are on it also, but there is massive duplication.

DeflectedFlux · 2025-01-28T14:17:12 1738073832

Thats an awesome idea, didnt know Red Pajama yet.

alchemist1e9 · 2025-01-28T18:09:43 1738087783

It might need to involve some torrent or anonymity platforms to avoid problems like Books3 had when the use and availability of the data is restricted by some jurisdictions.

It also needs to incorporate some deduplication approach as I notice the same data is often repackaged with variations in format or specification.

freddealmeida · 2025-01-28T07:15:21 1738048521

how is this open vs whatdeepseek did?

simonw · 2025-01-28T07:16:15 1738048575

From that article:

> The release of DeepSeek-R1 is an amazing boon for the community, but they didn’t release everything—although the model weights are open, the datasets and code used to train the model are not.

> The goal of Open-R1 is to build these last missing pieces so that the whole research and industry community can build similar or better models using these recipes and datasets.

boznz · 2025-01-28T08:27:05 1738052825

Genuine question, but how do you replicate the effort exactly without $5M in compute? and can you test that the published weights etc are actually those in the model?

Am I missing something?

simonw · 2025-01-28T08:41:16 1738053676

The $5.5m in compute wasn't for R1, it was for DeepSeek v3.

The R1 trick looks like it may be a whole lot cheaper than that. R1 apparently used just 800,000 samples - I don't fully understand the processing needed on top of those samples but I get the impression it's a whole lot less compute than the $5.5m used to train v3.

choilive · 2025-01-28T07:29:36 1738049376

deepseek claims they are "open source" but they are not. They are open weight.

IMO a truly "open" AI model should have 3 components publicly available: the weights, the code, and the dataset.

Without all 3 the model is not reproducible. Could make the argument that the code and data are sufficient though.

mirsadm · 2025-01-28T08:24:03 1738052643

No one will release the dataset because we all know it is gathered through dodgy means.

notyourwork · 2025-01-28T08:57:28 1738054648

And likely massaged by the CCP.

chgs · 2025-01-28T09:14:59 1738055699

Deep mind has one set of censorship, OpenAI another, anything musk does a third

It’s all “massaged”

redcobra762 · 2025-01-28T09:24:14 1738056254

I don’t think “Taiwan is China” is the same kind of massaging as not telling people how to make napalm…

What a weird thing to equate, though!

ossobuco · 2025-01-28T12:39:13 1738067953

But Taiwan is China. Even the USA acknowledge that.

redcobra762 · 2025-01-28T16:47:55 1738082875

No, they’ve acknowledged the policy. The US has not “agreed” with the policy, which is different.

Also, as someone in the US I can very safely say that Taiwan is not China and have no concern for my safety.

ossobuco · 2025-01-28T17:40:18 1738086018

> Also, as someone in the US I can very safely say that Taiwan is not China and have no concern for my safety.

But you will be arrested if you stage a peaceful pro-Palestine protest asking for an end to the ongoing genocide.

Or even worse, if you say that Palestine is not Israel.

redcobra762 · 2025-01-28T18:41:43 1738089703

No I won't, in either case.

Watch: Palestine is not Israel.

Look, nobody swooping in from the rafters to lock me up. I have zero worries the government will do anything at all about my making this claim.

And if staging peaceful pro-Palestine protests result in arrests, what happened here?

https://en.wikipedia.org/wiki/National_March_on_Washington:_...

or here?

https://en.wikipedia.org/wiki/March_on_Washington_for_Gaza

Or does that not fit your narrative?

ossobuco · 2025-01-28T20:02:54 1738094574

> Watch: Palestine is not Israel.

Yes, that works because you're an anon and nobody really cares. Try to publicly make that statement if you're in any relevant position and you'll very quickly be looking for a new job, if you can ever find it.

> And if staging peaceful pro-Palestine protests result in arrests, what happened here?

Be honest, you can literally google "Palestine protest arrests" and get more results than you could process in a while. You presenting a couple examples doesn't negate the many other protests ended in mass arrests.

redcobra762 · 2025-01-28T20:06:44 1738094804

https://en.wikipedia.org/wiki/Rashida_Tlaib would like a word.

She would not be a politician (or even alive) if any of what you claim is true. You claimed that the US government censors people who speak out against Israel's occupation of Palestine, and specifically that saying Palestine isn't Israel would not be possible in the United States in the same way that saying, for example, Xi Jinping looks like Winnie the pooh is censored in China.

This is, of course, completely false, and demonstrably so by observing the protests I just linked (of which there are thousands, not a few), and the statements Rep. Tlaib, a Palestinian American and member of the US government, regularly says on the national stage.

The equivocation of Chinese censorship and Western censorship simply doesn't work.

ossobuco · 2025-01-28T20:54:47 1738097687

Nice, you even have a token irrelevant politician without any power, perfect to use as example that all is allowed in the free US of A.

I'll reply with a few actual examples of what I mean:

- https://www.insidehighered.com/news/faculty-issues/academic-...

- https://www.theguardian.com/us-news/2024/oct/24/university-p...

- https://www.thecrimson.com/article/2024/1/3/claudine-gay-res...

- https://hwsherald.com/2024/04/14/jodi-dean-suspended-from-te...

I think western propaganda is overall the cleverest, because it manages to completely marginalize and silence any non-aligned opinion, while at the same time convincing you that you are completely free to have said opinion.

redcobra762 · 2025-01-28T22:05:38 1738101938

Why do you think anything you've just linked is at all related to this conversation? A system must be perfect to be good? That's an insane bar that is not the actual standard.

And if you think a US representative is powerless then you completely fail to understand how the US government actually works.

foxglacier · 2025-01-28T09:44:27 1738057467

It is though. Western AI tries to hide information like that with the justification of safety as well as things that might be offensive to current popular beliefs. Chinese AI presumably says Taiwan is China to help get more people on side for a possible future invasion. Propaganda does work - look at how many people think Donbas is still Ukraine and Israel is still Palestine.

redcobra762 · 2025-01-28T09:48:39 1738057719

The difference is that in China the info isn’t available without use of Western content, due to the totalitarian control over media, whereas in the West, information is pretty trivially available, even if the big companies keep it off of their platforms.

And sure ignorance is prevalent, but even GPT4 will tell me Donbas is still Ukraine, for instance. What a strange example to use, though!

ossobuco · 2025-01-28T14:42:27 1738075347

If western governments are so tolerant and permissive with information, I wonder why can't I access RT in Europe?

redcobra762 · 2025-01-28T16:38:17 1738082297

Because you don’t know how to use a Western VPN?

ossobuco · 2025-01-28T17:36:33 1738085793

Isn't it the same in China then? They can also use VPNs

redcobra762 · 2025-01-28T18:50:45 1738090245

...and where, pray tell, do they VPN into?

foxglacier · 2025-01-29T05:55:50 1738130150

"GPT4 will tell me Donbas is still Ukraine"

But is it though? What's really the meaning of which country a region belongs to? Once somewhere has been occupied long enough, it usually becomes de-facto theirs. But how long is long enough? Other countries either do or don't recognize it and usually a consensus is reached, but not always.

reedciccio · 2025-01-28T08:49:44 1738054184

That's what the Open Source AI Definition states https://opensource.org/ai

In any case, Deepseek like Llama fail much before hitting that new definition. Both have licenses containing restrictions on field of use and discrimination of users. Their license will never be approved as Open Source.

mythz · 2025-01-28T09:08:01 1738055281

This nitpicking is pointless.

DeepSeek's gifts to the world of its open weights, public research and OSS code of its SOTA models are all any reasonable person should expect given no organization is going to release their dataset and open themselves up to criticism and legal exposure.

You shouldn't expect to any to see datasets behind any SOTA models until they're able to be synthetically generated from larger models. Models only trained on sanctioned "public" datasets are not going to perform as well which makes them a lot less interesting and practically useful.

Yes it would be great for their to be open models containing original datasets and a working pipeline to recreate models from scratch. But when few people would even have the resources to train the models and the huge training costs just result in worse performing models, it's only academically interesting to a few research labs.

Open model releases should be celebrated, not criticized with unreasonable nitpicking and expectations that serves no useful purpose other than discouraging future open releases. When the norm is for Open Models to include their datasets, we can start criticizing those that don't, but until then be gracious that they're contributing anything at all.

jimjimwii · 2025-01-28T10:38:19 1738060699

Terminology exists for a reason. Doubly so for well-established terms of art that pertain to licensing and contract law.

They could have used "open wights" which would have conveyed the company's desired intent just as well as "open source", but without the ambiguity. They deliberately chose to misuse a well established term instead.

I applaud and thank deepseek for opening their weights, but i absolutely condemn them and others (e.g Facebook) for their deliberate and continued misuse of the term. I and others like me will continue to raise this point as long as we are active in this field, so expect to see this criticism for decades.

Hopefully one of these companies losses a lawsuit due to these shenanigans. Perhaps then they wouldn't misuse these terms so brazenly.

mythz · 2025-01-28T11:01:32 1738062092

> i absolutely condemn them and others (e.g Facebook) for their deliberate and continued misuse of the term

This is the kind of inconsequential nitpicking diatribe I'm referring to. When has "open data" ever meant Open Source?

> They deliberately chose to misuse a well established term instead.

Their model weights as well as their repositories containing their technical papers and any source code are published under an OSS MIT license, which is the reason why initiatives like this looking to reproduce R1 are even possible.

But no, we have to waste space in every open model release complaining that they must be condemned for continuing to use the same label the rest of the industry uses to describe their open models which are released under an OSS License as Open Source - instead of using whatever preferred unused label you want them to use.

jimjimwii · 2025-01-28T14:49:55 1738075795

We're talking past each other at this point. I believe both our positions have been adequately presented. Cheers.

redcobra762 · 2025-01-28T09:21:58 1738056118

“Gifts to the world”?

What a strange thing to say…

blackeyeblitzar · 2025-01-28T08:05:21 1738051521

The only one I’ve see people talk about that shares all the components is OLMo (https://allenai.org/blog/olmo2)

reedciccio · 2025-01-28T08:50:55 1738054255

There are more, like the work by Eleuther AI and LLM360.

brianstrimp · 2025-01-28T07:43:18 1738050198

Only meaningful if code+data deterministically reproduce the weights.

At that point, the weights are just the cached output. Which has value since it's costly to produce from code+data.

frabcus · 2025-01-28T08:54:04 1738054444

I don't think it needs to be deterministic - and if it isn't, having the data and code becomes even more important!

Compilers generally aren't deterministic (see the reproduceable build movement), yet we still use their output binaries.

cadamsdotcom · 2025-01-28T07:20:16 1738048816

Exciting to see this being reproduced, loving the hyper-fast movement in open source!

This is exactly why it is not “US vs China”, the battle is between heavily-capitalized Silicon Valley companies versus open source.

Every believer in this tech owes DeepSeek some gratitude, but even they stand on shoulders of giants in the form of everyone else who pushed the frontier forward and chose to publish, rather than exploit, what they learned.

raxxor · 2025-01-28T07:43:57 1738050237

Oh yes, I am firmly on Team China here because US companies got too greedy. Meta is an exception here though and they also propelled AI development massively.

DeepSeek is awesome. Any AI task yet implemented in our business can be run from my local PC with just the smaller models. And my PC is fairly crappy to begin with.

OpenAI looks quite silly with their "we have to close everything".

manmal · 2025-01-28T07:58:39 1738051119

Can you elaborate which models you are using? I‘m running an R1 distilled Qwen coder with 32B Q4, and while it’s giving useful answers, it‘s quite slow on my M1 Max. Slow enough that I keep reaching for cloud models.

raxxor · 2025-01-28T12:45:43 1738068343

Not on my machine currently, I use the 14b Q4 model I think, which delivers very good answers. I run a 4060 with 16gb memory and performance is quite good. I used the largest model that was recommended with this amount of VRAM, I think it was the 14b one.

I do have some applications that process images, text and pdf files and I use smaller models for extracting embeddings. I think my system wouldn't be able to handle it with decent speed otherwise.

I do run LLM on a M1 16gb macbook air and performance is surprisingly good. Not for image synthesis though and a PC with a dedicated GPU is still significantly faster with LLM responses as well. Haven't tried to run deepseek on the macbook yet.

manmal · 2025-01-30T06:12:34 1738217554

Interesting, I didn’t like the quality of the output of the 14B models. Could be the quantization though, apparently some are a bit broken.

nejsjsjsbsb · 2025-01-28T09:57:15 1738058235

I'm on team open source. To me the exciting thing was ollama downloading the 7B and running it on a 5yo cheap lonovo and getting a token rate similar to the first release of ChatGPT.

Running local on CPU opens so much possibilities for smart and privacy focused home devices that serve you.

In my test it hallucinated confidently but my interest is in simple second brain like rag. "Hey thingy, what is my schedule today?"

Need it to be a bit faster though as the thinking part adds a lot of latency.

raxxor · 2025-01-28T13:03:10 1738069390

The thinking is quite fascinating though, I love reading it. Especially when it notices something must be wrong. It will probably be very helpful to refine answer for itself and other models.

It does add latency of course, but I still think that I could provide all AI needs of my company (industrial production) with a simple older off the shelf PC. My GPU is decently recent, but the smallest model of the series and otherwise the machine is a rusty bucket.

I didn't test it thoroughly yet, but I have some invoices where I need to extract info and it did a perfect job until now. But I don't think there is any LLM yet that can do that without someone checking the output.

blackeyeblitzar · 2025-01-28T08:04:15 1738051455

The US companies got too greedy? How? They invented this entire space, literally. DeepSeek built their base models off Llama releases and OpenAI outputs (or so it’s thought), and while they added some optimizations on top, it seems like they’ve lied about the costs to produce their models by simply being vague about their base model and training data, and quoting the cost of their final training run.

And then there’s all the dystopian propaganda baked into these models, which threatens to misinform users at scale based on a government driven agenda. Hard to be on that team, let alone firmly, knowing that it’s giving power to a dictatorial regime.

wkat4242 · 2025-01-28T09:03:05 1738054985

The US models are also full of censorship. For example the US is much more sensitive to anything related to sexuality and here in Europe it's quite frustrating to deal with that censorship.

infecto · 2025-01-28T12:23:40 1738067020

I think we will find that each region will have their own flair of censorship. The only reason it stands out more from a Chinese perspective is the requirement to have alignment with PRC/CCP rhetoric.

wkat4242 · 2025-01-28T18:02:43 1738087363

Yes that's what I mean. I wish all models were uncensored and it would just be up to the implementer to decide how to finetune on top of that. Save for the super crazy stuff of course.

mvc · 2025-01-28T13:40:23 1738071623

> The US companies got too greedy? How? They invented this entire space, literally

And when they thought they were the only game in town, they tried to corner the market in GPUs and lock out any users who can't pony up £200/mo. Reminds me of when the likes of Oracle and IBM had companies by the balls buying bigger and bigger servers and then Google came along and showed everyone how to do horizontal scaling of cheap hardware.

raxxor · 2025-01-28T12:52:57 1738068777

That was perhaps a bit too general, but aside from meta and Google they didn't share their research and tried to sell AI products as fast as possible and tried to lobby legislation to keep their head start. I would also include nvidia here, that has some moat through software integrations.

I haven't tested deepseek for censorship yet, but they shared their release and even their input data. And in this case you could correct its shortcomings, so propaganda would be difficult.

og_kalu · 2025-01-28T13:57:36 1738072656

>DeepSeek built their base models off Llama releases and OpenAI outputs (or so it’s thought)

The first one is definitely not true and the 2nd one is not necessarily true in the way you imagine i.e crawls of the internet will have gpt chat logs now.