Too late for the game. Power should do this way earlier. The Linux community was booming around PowerPC a few years back and now all the brains left to work on either ARM or x86. Without that community support(Power does not really run Windows), the chip is just a piece of cold hardware even though it can shine spotty in the spec.
One of Power's biggest user is probably a licensee in China, who replaced its crypto logic and use it for their own needs, but that's far from enough to compete against Xeon.
I remember the announcements back in May, but I still don't see an offering on their site. Can one really get a Linux instance on Power by the hour / month in SL? Even better in my case would be just a VPS, as a build node does need a monster server.
At the same time, if they are "using it behind the scenes", yet not offering it to customers, that would be very bad for brand image. Isn't the crux of the selling points server consolidation? Which of course is the same exact thing "cloud provider".
I still haven't tried it, but siteox.com has it though.
IBM is a enterprise behemoth. If you look at their sources of revenues, they focus a lot more on B2B with a few selected spinoffs designed to generate some public interest. Its not surprising they don't publicize a lot of things they do.
Based on my work Power processors are simply the backbone of AIX and iSeries machines and perform very well in those applications. The interesting part is that either platform can move off the Power processor to another, the least impacted from a business perspective would be iSeries as the layer above the machine level is independent and would simply rebuild programs without customer intervention beyond upgrading the OS
I hear you man! It's a shame that there's never been a reasonably priced ATX POWER mainboard available to individuals without a major service contract with IBM.
Even that barebones dev server offer through OpenPower was over priced.
Question: I looked up the price of an 18
core Xeon processor, with a clock
speed under 3.0 GHz, and got $4000+.
So, why be willing to pay so much
per core for such slow cores?
At first I guessed that having lots of
cores per processor would reduce the
number of processors needed and, then,
maybe for some software reduce licensing
costs, but the IBM Power processors likely
are going to be running Linux. So, the
goal is to reduce licensing costs for
Oracle? Something else? Or licensing
costs has nothing to do with it?
As I plan my server farm for my startup,
what am I missing about the value of
paying $4000+ for 18 relatively slow
cores?
Those cores are going to be sold to you largely on their memory bandwidth benefits for the top end of the Xeon range.
If you need memory bandwidth, you will gladly pay for them, and look seriously at IBM (something we just did).
If you don't know that you need memory bandwidth, just skip them. Also, I would suggest you check out OVH or Hetzner for a new startup -- it's (sadly) very unlikely that you will need enough servers for long enough to make buying your own a good plan.
> Those cores are going to be
sold to you largely on their
memory bandwidth benefits
for the top end of the Xeon range.
That is, (A) bandwidth,
bytes moved per second or (B)
total permitted memory size,
say, 1/2 TB?
My guess is that memory sizes
of 1/2 TB require registered
memory, that is, a register
in the memory to simplify timing
which can be challenging for such
large memories, but the use of
the register is an intermediate
stop on the way to/from the processor
and its cache(s) so, really, reduces
bytes per second that might be
achieved with, say, the
simpler, consumer 1600 MHz,
DDR3?
Of course, other issues could
include number of electronically
independent channels to/from memory,
address interleaved memory,
etc.?
> look seriously at IBM
I looked at the article;
IBM seems to be trying
to sell hardware (again!).
Okay.
So far my software is all
written for Windows, and
my guess is that Windows (7
Pro or Server)
doesn't run on IBM's Power
processors? And even if
Windows does run, lots of
other software that runs
on Windows and Intel
x86 likely won't run
on IBM Power?
For where you sound like you're at, I wouldn't even worry about it. Usually when we say bandwidth, we mean bytes/second, to and from the caches and main system memory.
But, really, don't worry about it -- for windows, ovh or hetzner, or if your workload varies a lot, azure or AWS are almost certainly what you want; put the time in to product development until it's so successful that you _need_ the tech help to scale.
> Usually when we say bandwidth, we mean bytes/second, to and from the caches and main system memory.
I thought that the registered memory of high end
server processors that could support 100+
GB of main memory were significantly slower
in bandwidth than the DDR3/4 main memory
of consumer processors.
These processors have a lot of advanced features for big-time computing users (warehouse scale, super computers, etc). You can't just look at frequency or cachesize or number of cores for this chip. There are reliability features, quality of service features, and overall capabilities that consumer grade processors don't have.
Also I don't recommend you try to outfit your startups server farm. I dont know your needs, but you should find someone who can figure out what you really need.
Initially all I "need" is a relatively
high end of a consumer grade
mid-tower case, e.g., an AMD
8 core processor, 32 GB of ECC
main memory, several hard disks,
on a current motherboard
for less than $150 or so running
Windows Server or maybe,
initially, just Windows 7 Pro.
Keep that on average half busy
24 x 7 for a month, and I will
be able to afford more.
If I get, say, two wire rack shelf units
18 x 48 x 72", fill them with
a good router and mid tower
cases, and keep all that
on average half busy 24 x 7,
then I will consider
a colocation facility or the
cloud.
A few miles from me is
a nicely big, fully serious
colocation facility
that offers dual 10 GbE
Internet connections, etc.
No joke: One of those shelf
units has room for about 12
mid tower cases. Keep two
such shelf units busy, and that
will max out what I'm willing
to pursue without a lot more
server farm expertise than
I have (or want to get)
and also make cash not
a problem.
So, one hope is that I will be
able to pay some experts, two
or three days at a time, to
hold my hand into more --
reliable electrical power,
HVAC, floor space, cabling, racks,
servers for the racks,
internals of the servers for
the racks, e.g., maybe Xeon,
automation for software
installation, system management,
system monitoring,
farm performance analysis,
fail-over, virtual machine
exploitation, security,
test systems, development
systems, organized
code repository and testing,
all relevant documentation,
training, recruiting,
HR, legal, real estate leasing,
janitorial, physical security,
etc. Or, right, just use
a cloud!
But for the high end Xeon
processors, I'm looking
ahead. E.g., a motherboard
with two Xeons with 18 cores each,
all in just one full tower case,
or two for failover, or
three considering testing,
might provide enough
computing for my startup
all the way to exit so that
I could avoid taking seriously
what I'd have to do
with 100 people
and 50,000 square feet of
server farm, dual optical
fiber connections to an
Internet backbone point of
presence, etc. So, that's
why looking into high end
Xeons now is not totally
wasted time.
I found a 14-core Xeon to give optimal price/performance for my usage. But I guess if that's simply not fast enough to run your app there's always the 18-core; the cores may be slow but the total "MIPS" is still higher.
Initially and no doubt for
a long time, 14 cores will
be plenty for my startup.
I've done my software development
on one core at 1.8 GHz.
Going live, I'm considering
4 cores at about 3.2 GHz
or 8 cores at 4.0 GHz --
for total cost down in the
very low rent district.
Or, the month my startup
keeps 14 cores, or 8 cores,
or even 4 cores, busy will
be when I order a new, high
end Corvette and go shopping
for a really nice house!
My question was not really about
being "fast enough" but
just what the heck is the
cost per core per clock Hz? E.g.,
$160 buys an 8 core AMD
processor at 4.0 GHz, and
a high end Xeon costs
much, much more per
core per clock Hz.
I can see that an Intel
processor would save on
electrical power, but
at least initially that
will be, all other things
considered, small potatoes.
Your questions prompted a bit of curious research on what you're pursuing.
Your post history is very rewarding to pour over, and I'm glad I did. You definitely should publish a book and set the record straight once and for all. I think it would bring you a lot of closure. And you should make more noise about the mathematical models you've developed so you get more recognition.
I really hope this search/probability venture of yours pans out, and I'm quite interested to learn more about it. In fact, I'd feel very privileged if I could hitchhike along for a while as a curious passerby - not directly involved, more just observing and learning.
I did a little digging, but I can't be certain since you're rather good at the anonymity thing - is this your current email address? Obfuscated for privacy:
put list ('73'x||'69'x||'67'x||'6d'x||'61'x||'77'x||'61'x||'69'x||
'74'x||'65'x||'40'x||'6f'x||'70'x||'74'x||'6f'x||'6e'x||
'6c'x||'69'x||'6e'x||'65'x||'2e'x||'6e'x||'65'x||'74'x);
I think the best thing that would answer your questions would be direct testing, both for the experience of trying out a bunch of different hardware, and also to work out how to get the best bang for your buck with the workload you're using. Throwing your existing code on various different S3 configurations would probably be your best bet to start with.
On hardware, though, Xeon processors incorporate extra instructions appropriate to high-performance computing, whereas desktop/consumer-class CPUs include hardware acceleration and chipset support for theft prevention, basic media acceleration, wireless projector connectivity, etc.
In a server environment, workloads tend to need scalability more than processor power - it's generally significantly faster to run 100 tasks in parallel on slow hardware than in sequence on fast hardware. A desktop context is usually the opposite though - running a small number of applications, each of which generally needs to run as quickly as possible so the system feels snappy.
It also generally boils down to manufacturing. You don't get 24-core desktop-oriented i7 chips yet, because you can't yet pack 24 4GHz+ cores into a CPU die.
Oh, and that datacenter you mentioned, with dual 10GbE - is that $30k a month, or more?
> Oh, and that datacenter you mentioned,
with dual 10GbE - is that $30k a month, or
more?
I don't recall the details of their
pricing. The colocation facility is at
the old IBM Wappingers Falls, Myers
Corners lab complex. There were several
nice buildings, one with a lot of well
done raised floor. IIRC, they were
getting revenue from having a fiber
roughly down the Hudson River for the 70
miles or so to Wall Street.
I don't plan to go for a $4000+ Xeon
processor soon, but I wanted to understand
what the value is and see what I was
missing.
Or can get an eight core AMD processor at
4.0 GHz for about $160. And, say, an 18
core Xeon at 3.3 GHz for $4000.
So, the AMD has price per core per GH of
160 / ( 8 * 4.0 ) = 5
and the Xeon has
4000 / ( 18 * 3.2 ) = 69.44
for a ratio of
69.44 / 5 = 13.89
That's 1389%, and that's a biggie.
For my startup, the software for at least
early production appears to be ready. Now
I'm loading some initial data and
generally, e.g., for my server farm,
doing some planning.
Then I will do some testing, alpha, beta
and possibly some revisions.
Then I will go live and go for publicity,
users, ads, revenue, earnings, and growth.
The basic core applied math and software
really should work as I intend. The main
question is, will lots of users like the
site? If so, then the project should
become a big thing.
From my software and server farm
architecture and software timings, one 8
core AMD processor at 4.0 GHz should be
able to support on average, 24 x 7, one
new user a second. Then I should be able
to get, ballpark,
$207,360
dollars a month in revenue.
In that case, growth should be fast, and I
should consider Xeon processors if they
have some significant advantages.
E.g., some builders offer two Xeon
processors, 18 cores each, on a
motherboard in a full tower case with lots
of room for hard disks. So, that would be
36 cores, and four of the 8 core AMD
processors would be 32 cores in four
cases.
Then, sure, for more, get some standard
racks and put in servers designed for
racks.
I have a lot of flexibility in how many
processors and motherboards I use because
the software and server farm architecture
is highly scalable just via simple
sharding.
The architecture has just five boxes, Web
server, Web session state server, SQL
Server, and two specialized servers full
of applied math. Each of these five can
run in their own server(s), or, for a good
start, all five can run in one server.
> I don't recall the details of their pricing. The colocation facility is at the old IBM Wappingers Falls, Myers Corners lab complex. There were several nice buildings, one with a lot of well done raised floor.
Cool.
> IIRC, they were getting revenue from having a fiber roughly down the Hudson River for the 70 miles or so to Wall Street.
Ooh, interesting. Very interesting. That means deliciously low latency from anything inside that building to the stock market. If they know what's good for them, they'll have slapped inflated prices on the pipes to Wall Street, and done reasonably pricing for everything else. Ideal-case scenario, this means they have lower pricing for stuff that doesn't need those lines.
> I don't plan to go for a $4000+ Xeon processor soon, but I wanted to understand what the value is and see what I was missing.
Most definitely.
> Or can get an eight core AMD processor at 4.0 GHz for about $160. And, say, an 18 core Xeon at 3.3 GHz for $4000.
> ...
> That's 1389%, and that's a biggie.
oooo.
I'd be interested to know the exact models you compared there, because GHz is never, ever an effective baseline measurement. How big is/are the on-die cache(s)? What chip architecture does it use? What are the memory timings? What's the system bus speed?
Let me give you a nice example.
I used to use a 2.66Ghz Pentium 4, back in the bad old days of Firefox 3 when Gecko was dog-slow and Chrome didn't exist.
I now use a laptop based on a 1.86GHz Pentium M. The chip is almost running downclocked at 800MHz to conserve energy and produce less heat (this laptop could almost cook eggs).
Guess what? In practice, this laptop is noticeably faster.
It wasn't until a little while ago that I learned why: the Pentium 4 was using 100MHz SDRAM. This laptop's memory pushes 2000MB/s. I'd need to pop a cover to check the clock speed and memory type but I suspect it's 667MHz DDR2.
I also eventually figured out the other main cause of my woes: the chipset that computer was based on had an issue that made all IDE accesses synchronous, where the WHOLE SYSTEM would halt when the disk was waiting for data. Remember the old days when popping in a CD would make the system freeze for a few seconds? This was like that, but at the atomic level, in terms of fetching individual bytes from the disk. Whenever I'd need to request data the entire system would lock up for the few milliseconds it would take for that request to complete. If something was issuing tons of requests, the system could be brought to its knees pretty easily.
In practice, this meant that requesting even just a few MB/s from the disk could make my mouse pointer laggy and move across the screen like a slideshow. I can run "updatedb" - a program that iterates over my entire disk to build a quick-access index - on this laptop and only slightly notice it running in the background, whereas on the old system I had to walk away while it ran because I couldn't even move the mouse pointer smoothly. On this laptop it completes in about 3-4 minutes at the most, for a 60GB disk; the desktop had an 80GB disk and IIRC it took upwards of 10-15 minutes.
Other people could give you much more relevant examples, but these are some of my own experiences that I can share, that demonstrate that it's also the RAM, motherboard chipset - all the components put together - that contribute to a system's overall effectiveness.
Granted, few motherboards have serious issues like I experienced, and since most systems aim for maximum performance the differences are reasonably minor in the grand scheme of things; enough for people to nitpick, but ultimately equivalent, especially with server boards.
> For my startup, the software for at least early production appears to be ready. Now I'm loading some initial data and generally, e.g., for my server farm, doing some planning.
Then I will do some testing, alpha, beta and possibly some revisions.
Then I will go live and go for publicity, users, ads, revenue, earnings, and growth.
The basic core applied math and software really should work as I intend.
Sounds awesome...
> The main question is, will lots of users like the site? If so, then the project should become a big thing.
Please add me to your list of potential alpha testers. I'd love to see what this is, but I'm not sure if I'm squarely in your target market; you say this is a Project X for everyone, and probability applied to search sounds like a very enticing field, but unless it's something as ubiquitous as Google (applies to literally the entire Web, has multi-exabyte cache of the entire Internet held in RAM) I'm not sure how frequently I'd use it. I love WolframAlpha, for example, and yet I've used less than 10 times, and that just to play with.
> From my software and server farm architecture and software timings, one 8 core AMD processor at 4.0 GHz should be able to support on average, 24 x 7, one new user a second.
The way you've worded that generates a lot of curiosity. What do you mean by "one new user a second"? O.o
> Then I should be able to get, ballpark, $207,360 dollars a month in revenue.
Okay that's definitely worth it. :D
> In that case, growth should be fast, and I should consider Xeon processors if they have some significant advantages.
They do. They definitely do, especially compared to the AMD you put next to it earlier.
> E.g., some builders offer two Xeon processors, 18 cores each, on a motherboard in a full tower case with lots of room for hard disks. So, that would be 36 cores, and four of the 8 core AMD processors would be 32 cores in four cases. Then, sure, for more, get some standard racks and put in servers designed for racks.
I would start with racks, unless you have standard cases just lying around, and can afford (financially) to be a bit inefficient to begin with. Datacenters are designed explicitly for rackmount servers, not tower cases; two immediate advantages that come to mind with racked servers are exponentially superior cooling and significantly higher computation density - and that last one will greatly impact your bottom line: tower cases are atrocious for packing lots of computational power into a small space, so you'll use more space at the datacenter, and likely get charged higher rent because of it.
> I have a lot of flexibility in how many processors and motherboards I use because the software and server farm architecture is highly scalable just via simple sharding.
That's good, you may need it in the future.
> The architecture has just five boxes, Web server, Web session state server, SQL Server, and two specialized servers full of applied math. Each of these five can run in their own server(s), or, for a good start, all five can run in one server.
Awesome.
I have to say, some of the people here have mentioned starting using AWS nodes. I have to say, this may well work out to be significantly cheaper (in terms of time and energy, not just money) to start out with than renting space in a datacenter.
On average, once a second a
user comes to the, call
it, the home
page of the site, and is a "new"
user in the sense of the number
of unique users per month.
The ad people seem to want to
count mostly only the unique
users. At my site, if that user
likes it at all, then they
stand to see several
Web pages before they leave.
Then. with more assumptions,
the revenue adds to the number
I gave.
At this point this is a Ramen
noodle budget project. So, no
racks for now. Instead, it's
mid-tower cases.
One mid-tower case, kept busy
will get the project well in the
black with no further problems
about costs of racks, Xeon
processors, if they are worth it,
etc.
Then the first mid-tower
case will become my development
machine or some such.
This project, if successful, should
go like the guy that did Plenty of
Fish, just one guy, two
old Dell servers, ads just via
Google, and $10 million a year
in revenue. He just sold out for,
$575 million in cash.
My project, if my reading of
humans is at all correct,
should be of interest,
say, on average, once a
week for 2+ billion Internet
users.
So, as you know, it's a case
of search. I'm not trying
to beat Google, Bing, Yahoo
at their own game. But my
guesstimate is that
those keyword/phrase search
engines are good for only about
1/3rd of interesting (safe for work)
content
on the Internet, searches
people want to do, and results
they want to find.
Why? In part, as the people in
old information retrieval knew
well long ago, what keyword/phrase
search needs are three assumptions:
(1) the user knows what content
they want, e.g., a transcript
of, say, Casablanca,
(2) know that that content
exists, and (3) have some
keywords/phrases that accurately
characterize that content.
Then there's the other 2/3rds,
and that's what I'm after.
My approach is wildly, radically
different but, still, for users
easy to use. So, there is nothing
like page rank or keyword/phrases.
There is nothing like what the
ad targeting people use, say,
Web browsing history, cookies,
demographics, etc.
You mentioned probability.
Right. In that subject
there are random variables.
So, we're supposed to do
an experiment, with trials,
for some positive integer n,
get results x(1), x(2),
..., x(n). Then those trials
are supposed to be independent
and the data a simple random sample,
and then those n values form
a histogram and approximate
a probability density.
Could get all confused thinking
that way!
The advanced approach is quite
different. There, walk into
a lab, observe a number, call
it X, and that's a random
variable. And that's both the
first and last hear about random.
Really, just f'get about random.
Don't want it; don't need it.
And those trials, there's
only one, for all of this
universe for all time. Sorry 'bout that.
Now we may also have random
variable Y. And it may be
that X and Y are independent.
The best way to know is to
consider the sigma algebras
they generate -- that's
much more powerful than
what's in the elementary stuff.
And we can go on and define
expectation E[X], variance
E[(X - E{X})2], covariance
E[(X - E[X])(Y - E[Y])],
conditional expectation E[X|Y},
convergence of sequences of
random variables, in probability,
in distribution,
in mean-square, almost surely,
etc. We can define stochastic
processes, etc.
With this setup, a lot of
derivations wouldn't think of
otherwise become easy.
Beyond that, there were some
chuckholes in the road, but
I patched up all of them.
Some of those are surprising:
Once I sat in the big auditorium
as the NIST with 2000 scientists
struggling with the problem.
They "were digging in the
wrong place". Even L. Breiman
missed this one. I got a solution.
Of course, users will only see the
results, not the math!
Then I wrote the software.
Here the main problem was
digging through 5000+ Web
pages of documentation.
Otherwise, all the software
was fast, fun, easy, no
problems, no tricky debugging
problems, just typed the code into
my favorite text editor,
just as I envisioned it.
Learning to use Visual Studio
looked like much, much more
work than was worth it.
I was told that I'd have to
use Visual Studio at least for
the Web pages. Nope: What
IIS and ASP.NET do is terrific.
I was told that Visual Studio
would be terrific for debugging.
I wouldn't know since I didn't
have any significant debugging
problems.
For some issues where the
documentation wasn't clear,
I wrote some test code. Fine.
Code repository? Not worth
it. I'm just making good use
of the hierarchical file system --
one of my favorite things.
Some people laughed at my using
Visual Basic .NET and said that
C# would be much better. Eventually
I learned that the two languages
are nearly the same as ways to
use the .NET Framework and get to
the CLR and otherwise are just
different flavors of syntactic
sugar, and I find the C, C++, C#
flavor bitter and greatly prefer
the more verbose and traditional
VB.
So, it's 18,000 statements
in Visual Basic .NET with
ASP.NET, ADO.NET, etc. in
80,000 lines of my typing text.
I just did this math for our company a couple months ago.
Power8 memory bandwidth is VERY appealing. And, it's not always just about cost per benchmark unit -- if you have some realtime requirements for your analysis tools, then scale-up speed can be really valuable as compared to dev time and management time.
In the end, we made the call for Intel because golang runs some key parts of our tools, and the golang power8 story isn't there. But, as I gaze at our servers where we paid what feels like thousands for extra megabytes of L3 cache, I wouldn't say I'm happy about the decision. A good go story from IBM would have likely tipped things the other way.
I think you just kind of proved my point that real work load benchmarks would have to be much more attractive to offset the cost of supporting an architecture that's not tier one in many languages / software libraries / projects.
Totally. That said, IBM isn't short on optimizing compiler folks. Open ecosystem support is a strategy to pick up small and mid-size buyers; it will be interesting to see if IBM gets there. I would look again next time we source hardware.
And, if IBM put someone internally on a properly vectorized go compilation pathway for Power8, I would buy in a heartbeat, provided it ran some sort of debian variant.
I don't think that go as people tend to use it really benefits from vectorized code. Most people who are using go that I interact with are not writing numerical processing code but network servers and business logic for high level web APIs. You might get minor speed ups in vectorized memcpy but I can't see much else.
I imagine the most important CPU features for most go code would be a good branch predictor and fast atomics / synchronization primitives.
If you're using go for numerical processing code I'd like to hear more about it. Mostly because it's kind of a PITA.
Well, there are almost no real vectorizable primitives or functions in the core library so I'm not surprised that you don't run into people vectorizing much. And, go dev team compiler focus has been elsewhere the last year.
And, so far the go team hasn't seemed to be able to interest Intel in doing the heavy lifting that they might do for some other compilers..
So, branch predictions and faster sync primitives would be great, not least because they would speed up channels in many cases, which would be cool; it would be nice to widen the use cases for channel-based communication significantly, but they're just VERY slow if you want to use them at scale in a large application.
I am using go for some large scale numerical processing, although it's the sort with lots of logic attached, not just a giant matrix with some glue around it. It's kind of a PITA. We are picking and choosing some outside libraries, and spend a lot of time massaging the go code for speed and bitching about the garbage collector. (Did you know that for i, _ := range is often 3 to 4x faster than for _, v := range? Do you know how awful code written down four or five nested loops that uses indices looks?)
But, the size codebase our team can manage with go is pretty great. We wouldn't be nearly so productive in many other cool (or .. experienced) languages when you add up the full life cycle costs including innovation, enhancement, bug fixes, maintenance and deployment. It's a win. I'd do it again in a heartbeat.
> So, branch predictions and faster sync primitives would be great, not least because they would speed up channels in many cases, which would be cool; it would be nice to widen the use cases for channel-based communication significantly, but they're just VERY slow if you want to use them at scale in a large application.
These operations are already pretty good on IA* processors, at least in comparison to the less mainstream architectures. Other architectures focus on either bandwidth, parallelism (but often without a great synchronization story), and optimizing power usage. So I doubt that Go lang would benefit from moving to Power.
Some choices the Go people made about how channels how limited their options for optimizing channels / increase the complexity of a lock-free implementation (don't have the mailing list link handy). If you don't need all these guarantees you can use pick a SPSC, SPMC, MPMC implementation that might work better for your use case.
> (Did you know that for i, _ := range is often 3 to 4x faster than for _, v := range? Do you know how awful code written down four or five nested loops that uses indices looks?)
Yes, in the second version you have to make a copy of v. Depending on how large v this is how large the impact will be. The first version just references the array cell via a[i] and for that there's no copy needed and it's one assembly instruction. Maybe the optimizer can become better here but I'm guessing it might break some language contract.
That was my thought too. The "pain in the ass" factor associated with supporting a non-mainstream architecture is missing from the charts, and moves the dial away from IBM. That's not to say things couldn't change, but Xeons are proven, and unless you've got a surplus of money to throw at trying a new architecture (e.g. Google), you're better off sticking with the known good unless the predicted gains are sizable.
Yes if you need more memory bandwidth, POWER is where its at. OTHO, as I've been saying the base cores don't do as well on single threaded loopy or problems with a decent L1/L2 cache hit rate. The specint_rate numbers look good because of the x8 threading, the single core results probably don't look that good.
So for many workloads it won't be that great. For the one I was working on out of the box, the power was 1/2 as fast. But that wasn't fair because we had a couple highly optimized x86 code paths. Doing some basic POWER optimizations brought the performance in line with the x86. But on some benchmarks it would win, and lose on others. So while we utilized a _LOT_ of memory/IO bandwidth the fact that there was nearly 2.5x available in the power system over our E5 didn't give us enough of a boost to make it worth the higher price tag (nearly 3x in our case, cause we were comparing with a supermicro machine). Maybe this newer power machine changes that a little.
Until about 5 years ago the Power architecture dominated in the CPU performance domain. Saying "IBM is too incompetetent" makes you sound like you don't really know this market.
Because I really dont. Yes Power used to dominated performance. I have always thought with every tick and tock Intel has closed the gap or leap forward already.
Intel is still winning / dominating, there were talks of Google using POWER8, haven't seen anything more. And knowing Intel with a clear Roadmap coming up.
Do Power8 have any place in Cloud Computing?
I too, wish Intel would have more competition, but i dont see the choice of using it. Most of the distributed and In Memory Processing aren't using it. But then I guess i am not the target audience.
In general, what's kept POWER where it is is that you need to be willing to pay far higher prices than Intel charge, and have far higher power consumption than high-end Intel parts, and have the cooling to cope with far higher exhaust temperatures.
Intel's only got the ability to develop one microarchitecture family. When they try for more than one, things fall through the cracks and you get the Pentium 4 dead-end or the Atom's lackluster performance.
Intel's primary microarchitecture is aimed at laptops. It scales reasonably down to large tablet power levels and up to workstation power levels. For the high-performance server market, they can only throw cores and cache at the problem, with some enterprise features bolted on.
IBM's always targeted the high-performance server market. For a while, their Power cores were also being used for workstations and even a desktop by Apple, but that's never been the focus. They include things like decimal arithmetic and SMT and hardware transactional memory and they've been selling the high-end parts at 4-5GHz speeds for a long time.
OK, what's Intel missing for high-performance servers? They've got respectable performance, they've got VT-x and all the other virtualization hardware, what's missing?
The x86 architecture was not amenable to a high performance pipelined implementation, so what Intel did since Pentium Pro is to JIT the x86 instructions into an internal RISCy instruction set that can be executed out-of-order with competitive performance.
The x86 architecture was limited to 32-bit, severely limiting the virtual address space as well as hampering OS implementation with hacks like PAE, so what AMD did is to extend it to 64-bit.
The x86 architecture was not designed for SMP scalability due to the rather strict memory ordering requirements, but most of the architectures with laxer memory models (in particular, Alpha, which was the most lax of all) are out of business today (in fact, of the commercially relevant server architectures today, only POWER has lax memory ordering; SPARC can in theory but is usually configured to run with TSO which is similar to x86).
The x86 architecture was not designed for OS virtualization because various instructions did not trap when executed in user mode, so what Intel and AMD did is define a new protection level ("ring -1") to run the hypervisor so this works efficiently now.
What actual problem do you see with x86 that cannot be solved or worked around by some creative engineering at Intel/AMD?
> The x86 architecture was not amenable to a high performance pipelined implementation, so what Intel did since Pentium Pro is to JIT the x86 instructions into an internal RISCy instruction set that can be executed out-of-order with competitive performance.
POWER does the exact same (I can't remember which revision).
It's probably worthwhile to point out that Itanium still, just about, exists, and has traditionally been Intel's competitor to POWER. Though certainly, in recent years, x86_64 has largely taken over that role.
Last I knew it was still close to a billion dollars of hardware per year—and even at the high prices Itanium is at, that's still a considerable number.
One of Power's biggest user is probably a licensee in China, who replaced its crypto logic and use it for their own needs, but that's far from enough to compete against Xeon.