Most-mentioned links in Hacker News comments, 2006–2015

minimaxir · on Nov 12, 2020

A reminder that BigQuery (as used in the query in this link) is the best way to play with Hacker News data; don't scrape HN data manually!

The `bigquery-public-data.hacker_news.full` table appears to be up to date with the most recent HN data as well (table last updated today).

However, I'm not 100% sure the query is correct for unilaterally getting all links, as running the query on the full dataset returns the same results as running it from 2006-2015. And I value my sanity enough to not fuss around with the regex.

bigdict · on Nov 12, 2020

What is the best way to download this dataset? Last time I messed with it I had to pay for a Google Cloud bucket and run through some awkward sequence of steps to eventually get a local copy.

I think I ended up following the advice here: https://stackoverflow.com/questions/18493533/how-to-download....

minimaxir · on Nov 12, 2020

That's essentially it (export the BQ table as a CSV to a Google Cloud Storage bucket, then download it from there), but you can do that entirely in the web UI, no CLI needed.

If you just want a subset of the data, run a query, then save the query as a table in your project and export from there.

wslh · on Nov 13, 2020

Does it cost money to do that or it is only time consuming?

minimaxir · on Nov 13, 2020

It’s not free: Only storage and egress, which are trivial.

Running the query in BQ is free up to 1 TB and the 16000 row download is free so I recommend that if necessary.

wslh · on Nov 14, 2020

I think an alternative option could be to have a torrent (or other file sharing mechanism) with the public HN information. Am I missing something? 16k rows seems very tiny for doing an analysis of HN.

minimaxir · on Nov 14, 2020

The vast majority of potential HN analysis can be done in SQL alone within BigQuery, which is the proper way to handle a data warehouse regardless.

You mostly need the raw data for ML model training which is a very niche use case.

ZeroCool2u · on Nov 13, 2020

You can read it directly into a pandas DataFrame. [1]

https://pandas.pydata.org/pandas-docs/stable/reference/api/p...

minimaxir · on Nov 13, 2020

This pandas trick is for small result datasets, in which case the 16,000 row limit from BigQuery more than satisfies it.

I would not recommend using it for the full many-million-row dataset.

bigdict · on Nov 13, 2020

Oh no way, thank you!

simonsarris · on Nov 12, 2020

Funny that searchyc.com was so necessary for so long, coming in at #54 and #68 (it seems it should be higher as these should be combined). Now it just redirects to a spam/ad website, but before HN had a search bar it was very useful.

Also interesting that it contains at #55: https://news.ycombinator.com/best

But not the ostensibly more useful: https://news.ycombinator.com/active

The url "u.ly/73I" #63 is very interesting, its not seen almost anywhere else on the web (at least on Google), and is apparently spam, now, and when you click on mentions for that matter you get:

> We found no comments matching u.ly/73I

What's the deal with that one? Was it spam comments that all got removed? It may be since some of the others were spam, like this one: https://goo.gl/l5v0b

It's impressive how little spam spam (as opposed to submarines, this is where I link to PGs essay) is on HN.

duckerude · on Nov 12, 2020

The u.ly link seems to have been spam for shoes. The wayback machine captured it: https://web.archive.org/web/20110319115346/http://u.ly/73I

pgt · on Nov 12, 2020

HN has a search bar? scrolls down Oh, wow! I always use hn.algolia.com.

flobosg · on Nov 12, 2020

That bar uses hn.algolia.com for search as well.

vlmutolo · on Nov 12, 2020

I’m pretty sure the search bar just redirects to that.

wombatmobile · on Nov 13, 2020

The HN guidelines deserve to be #1, not just in this list, but for the whole internet.

If conversation was as civil elsewhere as it is on HN, Americans might rediscover the value of community which they have lost to mindless bickering encouraged by commercial algorithms elsewhere.

ChrisMarshallNY · on Nov 13, 2020

I very much appreciate the urbane nature of HN, and strive to contribute in a way that brings light; not darkness.

My digital community experience harkens from the USENET days, which made the worst pissing matches on Faecesbook look like polite disagreements between scholars.

A day or two ago, someone made a real mild slap at me (I can come across as a bit tiresome, if you haven’t noticed. Take my word for it; it’s preferable to my USENET persona), using a very tired old troll technique, and someone else flagged it.

I was actually surprised it was flagged, but it does show that people take civility seriously, hereabouts.

surround · on Nov 13, 2020

Since we’re on the topic of civil conversation, I’m going to be a bit picky:

“American” is a generalization. Accusing them is not productive to discussion.

wombatmobile · on Nov 13, 2020

I was thinking of American social media companies with exploitative engagement algorithms that foment adversarial discourse. You are right if you are saying the users are all over the world, not just America.

Is that what you mean? Or is the term “American” problematic in some other way?

surround · on Nov 13, 2020

I though you were referring to American users. Accusing American companies is ok.

dgritsko · on Nov 12, 2020

If you're like me, and immediately clicked on all the XKCD links:

#3. http://xkcd.com/927/

#11. http://xkcd.com/386/

#12. http://xkcd.com/538/

#46. http://xkcd.com/936/

#60. http://xkcd.com/327/

#64. http://xkcd.com/605/

#69. http://xkcd.com/378/

#78. http://xkcd.com/810/

#91. http://xkcd.com/1053/

ssl232 · on Nov 12, 2020

I'm quite surprised #3 has only been mentioned 197 times (well, a few more after this thread) in 14 years.

JacobAldridge · on Nov 12, 2020

Linked 197 times, possibly referenced more than that; plus it’s a neatly exaggerated example of a joke that I’d imagine has been made many more times again:

“I have a problem I’m trying to fix with X.”

“Ah, so now you have 2 problems.”

carapace · on Nov 13, 2020

I find it very hopeful that that particular one has been linked to so many times, maybe it's sinking in? :)

snowwrestler · on Nov 13, 2020

Surprised that 552 is not among these. One of my favorites:

https://xkcd.com/552/

progre · on Nov 13, 2020

My personal experience is that posting this leads to and instant surge of downvotes

KineticLensman · on Nov 13, 2020

The semi-inevitable downvotes may be because xkcd links are seen as cheap canned injects that although relevant and often funny don't really add anything novel to a debate. And perhaps - the horror! - they might not actually always be correct in a specific case.

Disclaimer: I really love xkcd.

goto11 · on Nov 13, 2020

Probably because someone posts "correlation does not imply causation" as a rote "I am very smart" reply to any published study, regardless of the actual contents and claims of the study.

deeg · on Nov 12, 2020

#11 literally changed my life. I used to debate people online and get irrationally upset when I couldn't change their opinions. Reading that xkcd was like getting a dope slap. I still debate but I rarely let myself get upset and if I do I try to hold that in my mind.

sq_ · on Nov 12, 2020

I'm honestly surprised that the Wisdom of the Ancients xkcd [0] isn't on the list...

[0] https://xkcd.com/979/

Exmoor · on Nov 13, 2020

The only thing worse than that is when they come back and say, "Oh nevermind, I figured it out." and don't post the fix.

mitchbob · on Nov 13, 2020

Surprised 2347, Dependency [1], didn't make the list. But it's pretty recent.

[1] https://xkcd.com/2347/

Impossible · on Nov 13, 2020

#3 makes sense (I believe I've posted it in a few comments) because HN has a lot of news on new standards

peterburkimsher · on Nov 13, 2020

Some XKCD comics have been translated to Chinese:

https://github.com/stevenliuyi/xkcd-cn/tree/master/pics

I've combined them side-by-side with the English:

https://mega.nz/folder/modDRIYB#uxRKw78Fcv6mpclovOTRbw

I'd also like to transcribe them for use with Pingtype, but didn't get around to that yet.

simonebrunozzi · on Nov 13, 2020

936 is my personal favorite.

Shared404 · on Nov 13, 2020

My understanding is that diceware is no longer considered secure though.

Aren't there wordlists that just consist of combining dictionary words, rendering diceware more dangerous than a password manager?

shakna · on Nov 13, 2020

There have always been dictionary attacks.

You're judging it from the wrong position. Diceware is an improvement upon the usual kinds of passwords people feel forced to come up with, and then re-use everywhere.

A password manager is better than diceware, and whilst it feels like less friction to people who use them, people can be unwilling to use one.

Just rejecting passwords found in HIBP's stolen passwords list had me receive death threats and long rants about how password security isn't my responsibility and I was putting up too much friction.

Shared404 · on Nov 13, 2020

Ah, that makes sense. Thanks for clarifying for my slow self.

Edit: Also

> rejecting passwords found in HIBP's stolen passwords list had me receive death threats and long rants about how password security isn't my responsibility

That is amazing in a horrifying sort of way. I'm sorry you went through that.

surround · on Nov 13, 2020

The comic calculates entropy with the assumption that the attacker knows that your password is four common words. Even if they run a dictionary attack on your password, it will still be secure.

Diceware passwords are still recommended by security experts over any other method.

https://www.eff.org/dice

naveen99 · on Nov 13, 2020

#78 -xkcd 810 broke my brain. Nice !

Will add to my list of things I’d look into if I had more time.

ellis0n · on Nov 12, 2020

Haha, just #927, nothing else. #927 XKCD receipt about everything

SippinLean · on Nov 13, 2020

I thought we'd see #91 (1053/"Ten Thousand") but it's lame having a 4-digit XKCD in the top 100

noobermin · on Nov 13, 2020

"The submarine" was nice for its time. The reality is today astroturfing is very much a real thing especially as print media has died and the same types of firms find ways to hawk their wares on the internet.

pg also had the right idea but you could, if you desire sit further back and ask about how someone's personal self-interests affect what they write (or nowadays, post or whatever) about and how that can affect the sincerity of what they are writing.

joshu · on Nov 13, 2020

the submarine article was originally about me (but was wrong, we hadn’t hired PR)

oblio · on Nov 13, 2020

Well, maybe the target was wrong but the idea was still quite good.

rmorey · on Nov 13, 2020

browsing some of these, i'm reminded of "Cool URIs don't change": https://www.w3.org/Provider/Style/URI

saagarjha · on Nov 12, 2020

If I ever needed to summarize Hacker News into a single list, this would absolutely be it.

pbiggar · on Nov 13, 2020

Was kinda surprised to see https://circleci.com at #37. When I clicked through, turns out it was me repeatedly reminding anyone who'd listen that we existed.

zuhayeer · on Nov 12, 2020

Is it possible to do this for 2015 until now as well?

maciek · on Nov 12, 2020

Very cool! It's already useful, but you could make it even more so by enabling to sort by smaller time ranges (e.g. 1 year).

It would also be interesting to see a version of this list weighted by karma scores of users who posted the links.

Edit: even better, use your own h-index ranking https://github.com/antontarasenko/smq/blob/master/reports/ha...

mattmanser · on Nov 13, 2020

Looks like someone made this 5 years ago from a data dump, no updates to the repo in 5 years.

anonytrary · on Nov 12, 2020

Nice idea! Note the scraper seems to have produced a duplicate entry:

  1. en.wikipedia.org/wiki/Betteridges_Law_of_Headlines
  2. en.wikipedia.org/wiki/Betteridge%2527s_law_of_headlines

The second is a bad link, so I am curious how that link got shared so much.

krussell · on Nov 13, 2020

Interesting, looks like the wikipedia title spells it Betteridge's with an apostrophe. %27 is the url encoding for the apostrophe and %25 is the url encoding for the %. So it looks like the link somehow got url encoded twice before it was shared resulting in the apostrophe becoming %27 and then the %27 becoming %2527 and producing a bad link.

Some common method of people finding and sharing that url must have had an issue where it incorrectly encoded the url twice. I wonder what it was :) .

Or perhaps the scraper has a bug where it is double url encoding the en.wikipedia.org/wiki/Betteridge%27s_law_of_headlines link?

_28jh · on Nov 13, 2020

I watched https://www.youtube.com/watch?v=dBnniua6-oM a few days ago and was surprised to see is the #1 youtube link. It does have 11M views, however, and it is a great talk.

didibus · on Nov 13, 2020

Had missed this Paul Graham one: http://www.paulgraham.com/disagree.html but it's really good.

dang · on Nov 13, 2020

If curious see also

2016 https://news.ycombinator.com/item?id=11561711

d13 · on Nov 14, 2020

Can someone explain why Paul Graham? I read a few of those links and don’t get it.

luigi23 · on Nov 14, 2020

He is the founder of Y Combinator.

jchw · on Nov 13, 2020

I’m so happy to see XKCD 927, because it was my immediate first thought. It is one of the few XKCD comics I know by number, and while witty I do feel it gets misused in thought-terminating ways sometimes. Still, I feel validated in my knee-jerk reaction that it would be the top XKCD comic on the list.

kilroy123 · on Nov 13, 2020

What about 2015 - now?

kalendos · on Nov 13, 2020

Very cool! Inspired me to get the most upvoted XKCD comics on Reddit for 2019.

https://gist.github.com/davidgasquez/3aeaac54c5a61216ffc8f7d...

djrockstar1 · on Nov 13, 2020

Impressive that the 17th most upvoted XKCD comic isn't even an XKCD comic!

kalendos · on Nov 13, 2020

Nice catch! I used the first regex I could find on SO.

BenGosub · on Nov 13, 2020

This is gold!

abnry · on Nov 13, 2020

It amuses me that the Dunning–Kruger effect shows up here. It has become such a cliche for people to reference it.

ben509 · on Nov 13, 2020

I link this[1] every time I see it mentioned, because the standard depiction of it bears no resemblance to what's in their paper.

[1]: https://www.talyarkoni.org/blog/2010/07/07/what-the-dunning-...

Shared404 · on Nov 13, 2020

I'm one of today's 10,000.

Thanks!

username90 · on Nov 13, 2020

It seems like most people greatly overestimates their understanding of the Dunning-Kruger effect.