Hacker News new | past | comments | ask | show | jobs | submit login
The Discovery of Statistical Regression (priceonomics.com)
110 points by akg_67 on Nov 17, 2015 | hide | past | favorite | 7 comments



If you ever have a free hour, check out the wikipedia page for regression analysis.

https://en.wikipedia.org/wiki/Regression_analysis

It's by far the most approachable wikipedia page/section I've encountered for explaining a mathematical topic.

Regression analysis has a few things going for it over other topics.

1) It uses simple math. We aren't solving black scholes here and using Ito's lema. All the math you need to do least square regression you learned in high school.

2) Its graph-able. Some people are fine learning via numbers only, some people need to visualize the result. For the later, being able to show a line going through a series of points makes analysis super easy.

A few weeks ago my 7 year old daughter saw an R kniter doc with a grid of 25 plots on I was using for determining co-integration of various ETF's for the purpose of pairs trading.

She doesn't obviously understand linear regression though she was able to look at the plots and find the ones who had the best "fit".

2b) The results are dead simple in most cases to interpret.

I think one thing that trips people up with statistics is that up until they encounter statistics, most of the math they are introduced to is analytical, you take the numbers, apply the formula and get a definitive answer.

Then you encounter statistics and realize there is no black and white, you can get an average, but what about its standard deviation? Ok, now you have to analyze the standard deviation, and repeat to infinity.

You never really get a definitive, "this is the answer" style answer from statistics. Everything is "here is your result, but you should also apply this technique to analyze your result".

With linear regression, its often very simple to interpret your results, which makes it one of the more approachable statistical techniques.

3) It has well supported libraries in almost any programming language.


I am not sure about Wikipedia page on regression analysis being very understandable to everyone. I typically refer people to Chapter 11 Regression Analysis: The miracle elixir and Chapter 12 Common Regression Mistakes: The mandatory warning label of Charles Wheelan's "Naked Statistics: Stripping the Dread from the Data" book with positive response.


There is an important piece of historical trivia involving Galton and regression. Galton documented regression to the mean. But why did it happen? In an era before genetics was understood the most natural explanation was that each species and race had "a natural type" that it was regressing to. This contradicted Darwin's theory of evolution, and lead to widespread doubts about Darwin's theory until Fischer managed to demonstrate why statistics + Mendelian genetics predicted regression to the mean, and provided a possible mechanism for Darwin's theory of evolution.

It is natural to view history as a steady march of progress to our current scientific consensus. But history doesn't actually look like that. And this is an interesting example where it doesn't.


Was that sudden switch from Galton to Fisher conscious or a mistake?


Conscious. Sorry for not being clearer.

Galton documented the phenomena of regression to the mean in inheritance of height in the 1870s. (The children of two tall parents tended to be tall, but shorter than the parents. The children of two short parents tended to be short, but taller than the parents.) Similar phenomena were documented in a wide variety of species, as well as more complex variations. All of this lead to a belief that there was such a thing as a "natural type" which a species or race returned to, which cast doubt on Darwin's theories for several decades.

After Mendelian genetics was rediscovered in the early 1900s, Fisher explained the phenomena from population genetics. Once the results of the breeding experiments were explained by genetics and statistics, we could reconcile field observations about inheritance with Darwin's theory.


Thank you for the follow-up; I figured it was either a mistake and very simple, or not a mistake and with crucial information missing. The extra information cleared that up :)


I enjoy articles like this which trace the development of familiar concepts with unfamiliar origins. I find they're a strong aid to understanding, and interesting pieces of conversation in themselves. However, I take issue with the author's depiction of Gauss as possibly appropriating some credit from where it was due and the emphasis placed on the conflict with Legendre. The history of science is filled with such incidents where a discovery was made independently by two or more scientists close in time, and the usual convention is that the first demonstration is credited as the discovery in textbooks (even if the concept has gone on to bear the other scientist's name). Regardless, this is one kind of content I like to find online.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: