On average, once a second a
user comes to the, call
it, the home
page of the site, and is a "new"
user in the sense of the number
of unique users per month.
The ad people seem to want to
count mostly only the unique
users. At my site, if that user
likes it at all, then they
stand to see several
Web pages before they leave.
Then. with more assumptions,
the revenue adds to the number
I gave.
At this point this is a Ramen
noodle budget project. So, no
racks for now. Instead, it's
mid-tower cases.
One mid-tower case, kept busy
will get the project well in the
black with no further problems
about costs of racks, Xeon
processors, if they are worth it,
etc.
Then the first mid-tower
case will become my development
machine or some such.
This project, if successful, should
go like the guy that did Plenty of
Fish, just one guy, two
old Dell servers, ads just via
Google, and $10 million a year
in revenue. He just sold out for,
$575 million in cash.
My project, if my reading of
humans is at all correct,
should be of interest,
say, on average, once a
week for 2+ billion Internet
users.
So, as you know, it's a case
of search. I'm not trying
to beat Google, Bing, Yahoo
at their own game. But my
guesstimate is that
those keyword/phrase search
engines are good for only about
1/3rd of interesting (safe for work)
content
on the Internet, searches
people want to do, and results
they want to find.
Why? In part, as the people in
old information retrieval knew
well long ago, what keyword/phrase
search needs are three assumptions:
(1) the user knows what content
they want, e.g., a transcript
of, say, Casablanca,
(2) know that that content
exists, and (3) have some
keywords/phrases that accurately
characterize that content.
Then there's the other 2/3rds,
and that's what I'm after.
My approach is wildly, radically
different but, still, for users
easy to use. So, there is nothing
like page rank or keyword/phrases.
There is nothing like what the
ad targeting people use, say,
Web browsing history, cookies,
demographics, etc.
You mentioned probability.
Right. In that subject
there are random variables.
So, we're supposed to do
an experiment, with trials,
for some positive integer n,
get results x(1), x(2),
..., x(n). Then those trials
are supposed to be independent
and the data a simple random sample,
and then those n values form
a histogram and approximate
a probability density.
Could get all confused thinking
that way!
The advanced approach is quite
different. There, walk into
a lab, observe a number, call
it X, and that's a random
variable. And that's both the
first and last hear about random.
Really, just f'get about random.
Don't want it; don't need it.
And those trials, there's
only one, for all of this
universe for all time. Sorry 'bout that.
Now we may also have random
variable Y. And it may be
that X and Y are independent.
The best way to know is to
consider the sigma algebras
they generate -- that's
much more powerful than
what's in the elementary stuff.
And we can go on and define
expectation E[X], variance
E[(X - E{X})2], covariance
E[(X - E[X])(Y - E[Y])],
conditional expectation E[X|Y},
convergence of sequences of
random variables, in probability,
in distribution,
in mean-square, almost surely,
etc. We can define stochastic
processes, etc.
With this setup, a lot of
derivations wouldn't think of
otherwise become easy.
Beyond that, there were some
chuckholes in the road, but
I patched up all of them.
Some of those are surprising:
Once I sat in the big auditorium
as the NIST with 2000 scientists
struggling with the problem.
They "were digging in the
wrong place". Even L. Breiman
missed this one. I got a solution.
Of course, users will only see the
results, not the math!
Then I wrote the software.
Here the main problem was
digging through 5000+ Web
pages of documentation.
Otherwise, all the software
was fast, fun, easy, no
problems, no tricky debugging
problems, just typed the code into
my favorite text editor,
just as I envisioned it.
Learning to use Visual Studio
looked like much, much more
work than was worth it.
I was told that I'd have to
use Visual Studio at least for
the Web pages. Nope: What
IIS and ASP.NET do is terrific.
I was told that Visual Studio
would be terrific for debugging.
I wouldn't know since I didn't
have any significant debugging
problems.
For some issues where the
documentation wasn't clear,
I wrote some test code. Fine.
Code repository? Not worth
it. I'm just making good use
of the hierarchical file system --
one of my favorite things.
Some people laughed at my using
Visual Basic .NET and said that
C# would be much better. Eventually
I learned that the two languages
are nearly the same as ways to
use the .NET Framework and get to
the CLR and otherwise are just
different flavors of syntactic
sugar, and I find the C, C++, C#
flavor bitter and greatly prefer
the more verbose and traditional
VB.
So, it's 18,000 statements
in Visual Basic .NET with
ASP.NET, ADO.NET, etc. in
80,000 lines of my typing text.
On average, once a second a user comes to the, call it, the home page of the site, and is a "new" user in the sense of the number of unique users per month. The ad people seem to want to count mostly only the unique users. At my site, if that user likes it at all, then they stand to see several Web pages before they leave. Then. with more assumptions, the revenue adds to the number I gave.
At this point this is a Ramen noodle budget project. So, no racks for now. Instead, it's mid-tower cases.
One mid-tower case, kept busy will get the project well in the black with no further problems about costs of racks, Xeon processors, if they are worth it, etc.
Then the first mid-tower case will become my development machine or some such.
This project, if successful, should go like the guy that did Plenty of Fish, just one guy, two old Dell servers, ads just via Google, and $10 million a year in revenue. He just sold out for, $575 million in cash.
My project, if my reading of humans is at all correct, should be of interest, say, on average, once a week for 2+ billion Internet users.
So, as you know, it's a case of search. I'm not trying to beat Google, Bing, Yahoo at their own game. But my guesstimate is that those keyword/phrase search engines are good for only about 1/3rd of interesting (safe for work) content on the Internet, searches people want to do, and results they want to find.
Why? In part, as the people in old information retrieval knew well long ago, what keyword/phrase search needs are three assumptions: (1) the user knows what content they want, e.g., a transcript of, say, Casablanca, (2) know that that content exists, and (3) have some keywords/phrases that accurately characterize that content.
Then there's the other 2/3rds, and that's what I'm after.
My approach is wildly, radically different but, still, for users easy to use. So, there is nothing like page rank or keyword/phrases. There is nothing like what the ad targeting people use, say, Web browsing history, cookies, demographics, etc.
You mentioned probability. Right. In that subject there are random variables. So, we're supposed to do an experiment, with trials, for some positive integer n, get results x(1), x(2), ..., x(n). Then those trials are supposed to be independent and the data a simple random sample, and then those n values form a histogram and approximate a probability density.
Could get all confused thinking that way!
The advanced approach is quite different. There, walk into a lab, observe a number, call it X, and that's a random variable. And that's both the first and last hear about random. Really, just f'get about random. Don't want it; don't need it. And those trials, there's only one, for all of this universe for all time. Sorry 'bout that.
Now we may also have random variable Y. And it may be that X and Y are independent. The best way to know is to consider the sigma algebras they generate -- that's much more powerful than what's in the elementary stuff. And we can go on and define expectation E[X], variance E[(X - E{X})2], covariance E[(X - E[X])(Y - E[Y])], conditional expectation E[X|Y}, convergence of sequences of random variables, in probability, in distribution, in mean-square, almost surely, etc. We can define stochastic processes, etc.
With this setup, a lot of derivations wouldn't think of otherwise become easy.
Beyond that, there were some chuckholes in the road, but I patched up all of them.
Some of those are surprising: Once I sat in the big auditorium as the NIST with 2000 scientists struggling with the problem. They "were digging in the wrong place". Even L. Breiman missed this one. I got a solution.
Of course, users will only see the results, not the math!
Then I wrote the software. Here the main problem was digging through 5000+ Web pages of documentation. Otherwise, all the software was fast, fun, easy, no problems, no tricky debugging problems, just typed the code into my favorite text editor, just as I envisioned it. Learning to use Visual Studio looked like much, much more work than was worth it.
I was told that I'd have to use Visual Studio at least for the Web pages. Nope: What IIS and ASP.NET do is terrific.
I was told that Visual Studio would be terrific for debugging. I wouldn't know since I didn't have any significant debugging problems.
For some issues where the documentation wasn't clear, I wrote some test code. Fine.
Code repository? Not worth it. I'm just making good use of the hierarchical file system -- one of my favorite things.
Some people laughed at my using Visual Basic .NET and said that C# would be much better. Eventually I learned that the two languages are nearly the same as ways to use the .NET Framework and get to the CLR and otherwise are just different flavors of syntactic sugar, and I find the C, C++, C# flavor bitter and greatly prefer the more verbose and traditional VB.
So, it's 18,000 statements in Visual Basic .NET with ASP.NET, ADO.NET, etc. in 80,000 lines of my typing text.
But now, something real is in sight.
You are now on the alpha list.
That will be sooner if I post less at HN!