How IBM Stacks Up Power8 Against Xeon Servers

ausjke · on Oct 14, 2015

Too late for the game. Power should do this way earlier. The Linux community was booming around PowerPC a few years back and now all the brains left to work on either ARM or x86. Without that community support(Power does not really run Windows), the chip is just a piece of cold hardware even though it can shine spotty in the spec.

One of Power's biggest user is probably a licensee in China, who replaced its crypto logic and use it for their own needs, but that's far from enough to compete against Xeon.

teepo · on Oct 14, 2015

It's all about Linux. IBM's running Power8 with RedHat in SoftLayer data centers. Hadoop and jboss are some of the target workloads.

ctstover · on Oct 14, 2015

I remember the announcements back in May, but I still don't see an offering on their site. Can one really get a Linux instance on Power by the hour / month in SL? Even better in my case would be just a VPS, as a build node does need a monster server.

At the same time, if they are "using it behind the scenes", yet not offering it to customers, that would be very bad for brand image. Isn't the crux of the selling points server consolidation? Which of course is the same exact thing "cloud provider".

I still haven't tried it, but siteox.com has it though.

pm90 · on Oct 14, 2015

IBM is a enterprise behemoth. If you look at their sources of revenues, they focus a lot more on B2B with a few selected spinoffs designed to generate some public interest. Its not surprising they don't publicize a lot of things they do.

bluedino · on Oct 14, 2015

The ERP product we use has gone from an iSeries native product to a Linux version, it's so much faster it's not funny.

Shivetya · on Oct 14, 2015

Based on my work Power processors are simply the backbone of AIX and iSeries machines and perform very well in those applications. The interesting part is that either platform can move off the Power processor to another, the least impacted from a business perspective would be iSeries as the layer above the machine level is independent and would simply rebuild programs without customer intervention beyond upgrading the OS

jmnicolas · on Oct 14, 2015

My inner geek is longing for an affordable Power8 desktop machine ... do I NEED it ? Well ... you know ...

Quequau · on Oct 14, 2015

I hear you man! It's a shame that there's never been a reasonably priced ATX POWER mainboard available to individuals without a major service contract with IBM.

Even that barebones dev server offer through OpenPower was over priced.

graycat · on Oct 14, 2015

Question: I looked up the price of an 18 core Xeon processor, with a clock speed under 3.0 GHz, and got $4000+.

So, why be willing to pay so much per core for such slow cores?

At first I guessed that having lots of cores per processor would reduce the number of processors needed and, then, maybe for some software reduce licensing costs, but the IBM Power processors likely are going to be running Linux. So, the goal is to reduce licensing costs for Oracle? Something else? Or licensing costs has nothing to do with it?

As I plan my server farm for my startup, what am I missing about the value of paying $4000+ for 18 relatively slow cores?

vessenes · on Oct 14, 2015

Those cores are going to be sold to you largely on their memory bandwidth benefits for the top end of the Xeon range.

If you need memory bandwidth, you will gladly pay for them, and look seriously at IBM (something we just did).

If you don't know that you need memory bandwidth, just skip them. Also, I would suggest you check out OVH or Hetzner for a new startup -- it's (sadly) very unlikely that you will need enough servers for long enough to make buying your own a good plan.

graycat · on Oct 14, 2015

Thanks!

> Those cores are going to be sold to you largely on their memory bandwidth benefits for the top end of the Xeon range.

That is, (A) bandwidth, bytes moved per second or (B) total permitted memory size, say, 1/2 TB?

My guess is that memory sizes of 1/2 TB require registered memory, that is, a register in the memory to simplify timing which can be challenging for such large memories, but the use of the register is an intermediate stop on the way to/from the processor and its cache(s) so, really, reduces bytes per second that might be achieved with, say, the simpler, consumer 1600 MHz, DDR3?

Of course, other issues could include number of electronically independent channels to/from memory, address interleaved memory, etc.?

> look seriously at IBM

I looked at the article; IBM seems to be trying to sell hardware (again!). Okay.

So far my software is all written for Windows, and my guess is that Windows (7 Pro or Server) doesn't run on IBM's Power processors? And even if Windows does run, lots of other software that runs on Windows and Intel x86 likely won't run on IBM Power?

vessenes · on Oct 14, 2015

For where you sound like you're at, I wouldn't even worry about it. Usually when we say bandwidth, we mean bytes/second, to and from the caches and main system memory.

But, really, don't worry about it -- for windows, ovh or hetzner, or if your workload varies a lot, azure or AWS are almost certainly what you want; put the time in to product development until it's so successful that you _need_ the tech help to scale.

graycat · on Oct 15, 2015

> Usually when we say bandwidth, we mean bytes/second, to and from the caches and main system memory.

I thought that the registered memory of high end server processors that could support 100+ GB of main memory were significantly slower in bandwidth than the DDR3/4 main memory of consumer processors.

yshalabi · on Oct 14, 2015

These processors have a lot of advanced features for big-time computing users (warehouse scale, super computers, etc). You can't just look at frequency or cachesize or number of cores for this chip. There are reliability features, quality of service features, and overall capabilities that consumer grade processors don't have.

Also I don't recommend you try to outfit your startups server farm. I dont know your needs, but you should find someone who can figure out what you really need.

graycat · on Oct 14, 2015

Thanks!

Initially all I "need" is a relatively high end of a consumer grade mid-tower case, e.g., an AMD 8 core processor, 32 GB of ECC main memory, several hard disks, on a current motherboard for less than $150 or so running Windows Server or maybe, initially, just Windows 7 Pro.

Keep that on average half busy 24 x 7 for a month, and I will be able to afford more.

If I get, say, two wire rack shelf units 18 x 48 x 72", fill them with a good router and mid tower cases, and keep all that on average half busy 24 x 7, then I will consider a colocation facility or the cloud.

A few miles from me is a nicely big, fully serious colocation facility that offers dual 10 GbE Internet connections, etc.

No joke: One of those shelf units has room for about 12 mid tower cases. Keep two such shelf units busy, and that will max out what I'm willing to pursue without a lot more server farm expertise than I have (or want to get) and also make cash not a problem.

So, one hope is that I will be able to pay some experts, two or three days at a time, to hold my hand into more -- reliable electrical power, HVAC, floor space, cabling, racks, servers for the racks, internals of the servers for the racks, e.g., maybe Xeon, automation for software installation, system management, system monitoring, farm performance analysis, fail-over, virtual machine exploitation, security, test systems, development systems, organized code repository and testing, all relevant documentation, training, recruiting, HR, legal, real estate leasing, janitorial, physical security, etc. Or, right, just use a cloud!

But for the high end Xeon processors, I'm looking ahead. E.g., a motherboard with two Xeons with 18 cores each, all in just one full tower case, or two for failover, or three considering testing, might provide enough computing for my startup all the way to exit so that I could avoid taking seriously what I'd have to do with 100 people and 50,000 square feet of server farm, dual optical fiber connections to an Internet backbone point of presence, etc. So, that's why looking into high end Xeons now is not totally wasted time.

Thanks for the help!

wmf · on Oct 14, 2015

I found a 14-core Xeon to give optimal price/performance for my usage. But I guess if that's simply not fast enough to run your app there's always the 18-core; the cores may be slow but the total "MIPS" is still higher.

graycat · on Oct 14, 2015

Thanks!

Initially and no doubt for a long time, 14 cores will be plenty for my startup.

I've done my software development on one core at 1.8 GHz. Going live, I'm considering 4 cores at about 3.2 GHz or 8 cores at 4.0 GHz -- for total cost down in the very low rent district.

Or, the month my startup keeps 14 cores, or 8 cores, or even 4 cores, busy will be when I order a new, high end Corvette and go shopping for a really nice house!

My question was not really about being "fast enough" but just what the heck is the cost per core per clock Hz? E.g., $160 buys an 8 core AMD processor at 4.0 GHz, and a high end Xeon costs much, much more per core per clock Hz.

I can see that an Intel processor would save on electrical power, but at least initially that will be, all other things considered, small potatoes.

i336_ · on Oct 16, 2015

Your questions prompted a bit of curious research on what you're pursuing.

Your post history is very rewarding to pour over, and I'm glad I did. You definitely should publish a book and set the record straight once and for all. I think it would bring you a lot of closure. And you should make more noise about the mathematical models you've developed so you get more recognition.

I really hope this search/probability venture of yours pans out, and I'm quite interested to learn more about it. In fact, I'd feel very privileged if I could hitchhike along for a while as a curious passerby - not directly involved, more just observing and learning.

I did a little digging, but I can't be certain since you're rather good at the anonymity thing - is this your current email address? Obfuscated for privacy:

  put list ('73'x||'69'x||'67'x||'6d'x||'61'x||'77'x||'61'x||'69'x||
            '74'x||'65'x||'40'x||'6f'x||'70'x||'74'x||'6f'x||'6e'x||
            '6c'x||'69'x||'6e'x||'65'x||'2e'x||'6e'x||'65'x||'74'x);

I think the best thing that would answer your questions would be direct testing, both for the experience of trying out a bunch of different hardware, and also to work out how to get the best bang for your buck with the workload you're using. Throwing your existing code on various different S3 configurations would probably be your best bet to start with.

On hardware, though, Xeon processors incorporate extra instructions appropriate to high-performance computing, whereas desktop/consumer-class CPUs include hardware acceleration and chipset support for theft prevention, basic media acceleration, wireless projector connectivity, etc.

In a server environment, workloads tend to need scalability more than processor power - it's generally significantly faster to run 100 tasks in parallel on slow hardware than in sequence on fast hardware. A desktop context is usually the opposite though - running a small number of applications, each of which generally needs to run as quickly as possible so the system feels snappy.

It also generally boils down to manufacturing. You don't get 24-core desktop-oriented i7 chips yet, because you can't yet pack 24 4GHz+ cores into a CPU die.

Oh, and that datacenter you mentioned, with dual 10GbE - is that $30k a month, or more?

graycat · on Oct 17, 2015

> Oh, and that datacenter you mentioned, with dual 10GbE - is that $30k a month, or more?

I don't recall the details of their pricing. The colocation facility is at the old IBM Wappingers Falls, Myers Corners lab complex. There were several nice buildings, one with a lot of well done raised floor. IIRC, they were getting revenue from having a fiber roughly down the Hudson River for the 70 miles or so to Wall Street.

I don't plan to go for a $4000+ Xeon processor soon, but I wanted to understand what the value is and see what I was missing.

Or can get an eight core AMD processor at 4.0 GHz for about $160. And, say, an 18 core Xeon at 3.3 GHz for $4000.

So, the AMD has price per core per GH of

160 / ( 8 * 4.0 ) = 5

and the Xeon has

4000 / ( 18 * 3.2 ) = 69.44

for a ratio of

69.44 / 5 = 13.89

That's 1389%, and that's a biggie.

For my startup, the software for at least early production appears to be ready. Now I'm loading some initial data and generally, e.g., for my server farm, doing some planning.

Then I will do some testing, alpha, beta and possibly some revisions.

Then I will go live and go for publicity, users, ads, revenue, earnings, and growth.

The basic core applied math and software really should work as I intend. The main question is, will lots of users like the site? If so, then the project should become a big thing.

From my software and server farm architecture and software timings, one 8 core AMD processor at 4.0 GHz should be able to support on average, 24 x 7, one new user a second. Then I should be able to get, ballpark,

$207,360

dollars a month in revenue.

In that case, growth should be fast, and I should consider Xeon processors if they have some significant advantages.

E.g., some builders offer two Xeon processors, 18 cores each, on a motherboard in a full tower case with lots of room for hard disks. So, that would be 36 cores, and four of the 8 core AMD processors would be 32 cores in four cases.

Then, sure, for more, get some standard racks and put in servers designed for racks.

I have a lot of flexibility in how many processors and motherboards I use because the software and server farm architecture is highly scalable just via simple sharding.

The architecture has just five boxes, Web server, Web session state server, SQL Server, and two specialized servers full of applied math. Each of these five can run in their own server(s), or, for a good start, all five can run in one server.

i336_ · on Oct 18, 2015

> I don't recall the details of their pricing. The colocation facility is at the old IBM Wappingers Falls, Myers Corners lab complex. There were several nice buildings, one with a lot of well done raised floor.

Cool.

> IIRC, they were getting revenue from having a fiber roughly down the Hudson River for the 70 miles or so to Wall Street.

Ooh, interesting. Very interesting. That means deliciously low latency from anything inside that building to the stock market. If they know what's good for them, they'll have slapped inflated prices on the pipes to Wall Street, and done reasonably pricing for everything else. Ideal-case scenario, this means they have lower pricing for stuff that doesn't need those lines.

> I don't plan to go for a $4000+ Xeon processor soon, but I wanted to understand what the value is and see what I was missing.

Most definitely.

> Or can get an eight core AMD processor at 4.0 GHz for about $160. And, say, an 18 core Xeon at 3.3 GHz for $4000.

> ...

> That's 1389%, and that's a biggie.

oooo.

I'd be interested to know the exact models you compared there, because GHz is never, ever an effective baseline measurement. How big is/are the on-die cache(s)? What chip architecture does it use? What are the memory timings? What's the system bus speed?

Let me give you a nice example.

I used to use a 2.66Ghz Pentium 4, back in the bad old days of Firefox 3 when Gecko was dog-slow and Chrome didn't exist.

I now use a laptop based on a 1.86GHz Pentium M. The chip is almost running downclocked at 800MHz to conserve energy and produce less heat (this laptop could almost cook eggs).

Guess what? In practice, this laptop is noticeably faster.

It wasn't until a little while ago that I learned why: the Pentium 4 was using 100MHz SDRAM. This laptop's memory pushes 2000MB/s. I'd need to pop a cover to check the clock speed and memory type but I suspect it's 667MHz DDR2.

I also eventually figured out the other main cause of my woes: the chipset that computer was based on had an issue that made all IDE accesses synchronous, where the WHOLE SYSTEM would halt when the disk was waiting for data. Remember the old days when popping in a CD would make the system freeze for a few seconds? This was like that, but at the atomic level, in terms of fetching individual bytes from the disk. Whenever I'd need to request data the entire system would lock up for the few milliseconds it would take for that request to complete. If something was issuing tons of requests, the system could be brought to its knees pretty easily.

In practice, this meant that requesting even just a few MB/s from the disk could make my mouse pointer laggy and move across the screen like a slideshow. I can run "updatedb" - a program that iterates over my entire disk to build a quick-access index - on this laptop and only slightly notice it running in the background, whereas on the old system I had to walk away while it ran because I couldn't even move the mouse pointer smoothly. On this laptop it completes in about 3-4 minutes at the most, for a 60GB disk; the desktop had an 80GB disk and IIRC it took upwards of 10-15 minutes.

Other people could give you much more relevant examples, but these are some of my own experiences that I can share, that demonstrate that it's also the RAM, motherboard chipset - all the components put together - that contribute to a system's overall effectiveness.

Granted, few motherboards have serious issues like I experienced, and since most systems aim for maximum performance the differences are reasonably minor in the grand scheme of things; enough for people to nitpick, but ultimately equivalent, especially with server boards.

> For my startup, the software for at least early production appears to be ready. Now I'm loading some initial data and generally, e.g., for my server farm, doing some planning. Then I will do some testing, alpha, beta and possibly some revisions. Then I will go live and go for publicity, users, ads, revenue, earnings, and growth. The basic core applied math and software really should work as I intend.

Sounds awesome...

> The main question is, will lots of users like the site? If so, then the project should become a big thing.

Please add me to your list of potential alpha testers. I'd love to see what this is, but I'm not sure if I'm squarely in your target market; you say this is a Project X for everyone, and probability applied to search sounds like a very enticing field, but unless it's something as ubiquitous as Google (applies to literally the entire Web, has multi-exabyte cache of the entire Internet held in RAM) I'm not sure how frequently I'd use it. I love WolframAlpha, for example, and yet I've used less than 10 times, and that just to play with.

> From my software and server farm architecture and software timings, one 8 core AMD processor at 4.0 GHz should be able to support on average, 24 x 7, one new user a second.

The way you've worded that generates a lot of curiosity. What do you mean by "one new user a second"? O.o

> Then I should be able to get, ballpark, $207,360 dollars a month in revenue.

Okay that's definitely worth it. :D

> In that case, growth should be fast, and I should consider Xeon processors if they have some significant advantages.

They do. They definitely do, especially compared to the AMD you put next to it earlier.

> E.g., some builders offer two Xeon processors, 18 cores each, on a motherboard in a full tower case with lots of room for hard disks. So, that would be 36 cores, and four of the 8 core AMD processors would be 32 cores in four cases. Then, sure, for more, get some standard racks and put in servers designed for racks.

I would start with racks, unless you have standard cases just lying around, and can afford (financially) to be a bit inefficient to begin with. Datacenters are designed explicitly for rackmount servers, not tower cases; two immediate advantages that come to mind with racked servers are exponentially superior cooling and significantly higher computation density - and that last one will greatly impact your bottom line: tower cases are atrocious for packing lots of computational power into a small space, so you'll use more space at the datacenter, and likely get charged higher rent because of it.

> I have a lot of flexibility in how many processors and motherboards I use because the software and server farm architecture is highly scalable just via simple sharding.

That's good, you may need it in the future.

> The architecture has just five boxes, Web server, Web session state server, SQL Server, and two specialized servers full of applied math. Each of these five can run in their own server(s), or, for a good start, all five can run in one server.

Awesome.

I have to say, some of the people here have mentioned starting using AWS nodes. I have to say, this may well work out to be significantly cheaper (in terms of time and energy, not just money) to start out with than renting space in a datacenter.

i336_ · on Oct 18, 2015

(Got confused by HN threading, replied to myself by mistake)

graycat · on Oct 18, 2015

> What do you mean by "one new user a second"?

On average, once a second a user comes to the, call it, the home page of the site, and is a "new" user in the sense of the number of unique users per month. The ad people seem to want to count mostly only the unique users. At my site, if that user likes it at all, then they stand to see several Web pages before they leave. Then. with more assumptions, the revenue adds to the number I gave.

At this point this is a Ramen noodle budget project. So, no racks for now. Instead, it's mid-tower cases.

One mid-tower case, kept busy will get the project well in the black with no further problems about costs of racks, Xeon processors, if they are worth it, etc.

Then the first mid-tower case will become my development machine or some such.

This project, if successful, should go like the guy that did Plenty of Fish, just one guy, two old Dell servers, ads just via Google, and $10 million a year in revenue. He just sold out for, $575 million in cash.

My project, if my reading of humans is at all correct, should be of interest, say, on average, once a week for 2+ billion Internet users.

So, as you know, it's a case of search. I'm not trying to beat Google, Bing, Yahoo at their own game. But my guesstimate is that those keyword/phrase search engines are good for only about 1/3rd of interesting (safe for work) content on the Internet, searches people want to do, and results they want to find.

Why? In part, as the people in old information retrieval knew well long ago, what keyword/phrase search needs are three assumptions: (1) the user knows what content they want, e.g., a transcript of, say, Casablanca, (2) know that that content exists, and (3) have some keywords/phrases that accurately characterize that content.

Then there's the other 2/3rds, and that's what I'm after.

My approach is wildly, radically different but, still, for users easy to use. So, there is nothing like page rank or keyword/phrases. There is nothing like what the ad targeting people use, say, Web browsing history, cookies, demographics, etc.

You mentioned probability. Right. In that subject there are random variables. So, we're supposed to do an experiment, with trials, for some positive integer n, get results x(1), x(2), ..., x(n). Then those trials are supposed to be independent and the data a simple random sample, and then those n values form a histogram and approximate a probability density.

Could get all confused thinking that way!

The advanced approach is quite different. There, walk into a lab, observe a number, call it X, and that's a random variable. And that's both the first and last hear about random. Really, just f'get about random. Don't want it; don't need it. And those trials, there's only one, for all of this universe for all time. Sorry 'bout that.

Now we may also have random variable Y. And it may be that X and Y are independent. The best way to know is to consider the sigma algebras they generate -- that's much more powerful than what's in the elementary stuff. And we can go on and define expectation E[X], variance E[(X - E{X})2], covariance E[(X - E[X])(Y - E[Y])], conditional expectation E[X|Y}, convergence of sequences of random variables, in probability, in distribution, in mean-square, almost surely, etc. We can define stochastic processes, etc.

With this setup, a lot of derivations wouldn't think of otherwise become easy.

Beyond that, there were some chuckholes in the road, but I patched up all of them.

Some of those are surprising: Once I sat in the big auditorium as the NIST with 2000 scientists struggling with the problem. They "were digging in the wrong place". Even L. Breiman missed this one. I got a solution.

Of course, users will only see the results, not the math!

Then I wrote the software. Here the main problem was digging through 5000+ Web pages of documentation. Otherwise, all the software was fast, fun, easy, no problems, no tricky debugging problems, just typed the code into my favorite text editor, just as I envisioned it. Learning to use Visual Studio looked like much, much more work than was worth it.

I was told that I'd have to use Visual Studio at least for the Web pages. Nope: What IIS and ASP.NET do is terrific.

I was told that Visual Studio would be terrific for debugging. I wouldn't know since I didn't have any significant debugging problems.

For some issues where the documentation wasn't clear, I wrote some test code. Fine.

Code repository? Not worth it. I'm just making good use of the hierarchical file system -- one of my favorite things.

Some people laughed at my using Visual Basic .NET and said that C# would be much better. Eventually I learned that the two languages are nearly the same as ways to use the .NET Framework and get to the CLR and otherwise are just different flavors of syntactic sugar, and I find the C, C++, C# flavor bitter and greatly prefer the more verbose and traditional VB.

So, it's 18,000 statements in Visual Basic .NET with ASP.NET, ADO.NET, etc. in 80,000 lines of my typing text.

But now, something real is in sight.

You are now on the alpha list.

That will be sooner if I post less at HN!

mtanski · on Oct 14, 2015

The spark workflow is 10% cheaper (24.36 vs 21.88) over 3 years. Is that really worth having to support a non-mainstream architecture? I doubt it.

vessenes · on Oct 14, 2015

I just did this math for our company a couple months ago.

Power8 memory bandwidth is VERY appealing. And, it's not always just about cost per benchmark unit -- if you have some realtime requirements for your analysis tools, then scale-up speed can be really valuable as compared to dev time and management time.

In the end, we made the call for Intel because golang runs some key parts of our tools, and the golang power8 story isn't there. But, as I gaze at our servers where we paid what feels like thousands for extra megabytes of L3 cache, I wouldn't say I'm happy about the decision. A good go story from IBM would have likely tipped things the other way.

mtanski · on Oct 14, 2015

I think you just kind of proved my point that real work load benchmarks would have to be much more attractive to offset the cost of supporting an architecture that's not tier one in many languages / software libraries / projects.

vessenes · on Oct 14, 2015

Totally. That said, IBM isn't short on optimizing compiler folks. Open ecosystem support is a strategy to pick up small and mid-size buyers; it will be interesting to see if IBM gets there. I would look again next time we source hardware.

And, if IBM put someone internally on a properly vectorized go compilation pathway for Power8, I would buy in a heartbeat, provided it ran some sort of debian variant.

mtanski · on Oct 14, 2015

I don't think that go as people tend to use it really benefits from vectorized code. Most people who are using go that I interact with are not writing numerical processing code but network servers and business logic for high level web APIs. You might get minor speed ups in vectorized memcpy but I can't see much else.

I imagine the most important CPU features for most go code would be a good branch predictor and fast atomics / synchronization primitives.

If you're using go for numerical processing code I'd like to hear more about it. Mostly because it's kind of a PITA.

vessenes · on Oct 16, 2015

Well, there are almost no real vectorizable primitives or functions in the core library so I'm not surprised that you don't run into people vectorizing much. And, go dev team compiler focus has been elsewhere the last year.

And, so far the go team hasn't seemed to be able to interest Intel in doing the heavy lifting that they might do for some other compilers..

So, branch predictions and faster sync primitives would be great, not least because they would speed up channels in many cases, which would be cool; it would be nice to widen the use cases for channel-based communication significantly, but they're just VERY slow if you want to use them at scale in a large application.

I am using go for some large scale numerical processing, although it's the sort with lots of logic attached, not just a giant matrix with some glue around it. It's kind of a PITA. We are picking and choosing some outside libraries, and spend a lot of time massaging the go code for speed and bitching about the garbage collector. (Did you know that for i, _ := range is often 3 to 4x faster than for _, v := range? Do you know how awful code written down four or five nested loops that uses indices looks?)

But, the size codebase our team can manage with go is pretty great. We wouldn't be nearly so productive in many other cool (or .. experienced) languages when you add up the full life cycle costs including innovation, enhancement, bug fixes, maintenance and deployment. It's a win. I'd do it again in a heartbeat.

mtanski · on Oct 19, 2015

> So, branch predictions and faster sync primitives would be great, not least because they would speed up channels in many cases, which would be cool; it would be nice to widen the use cases for channel-based communication significantly, but they're just VERY slow if you want to use them at scale in a large application.

These operations are already pretty good on IA* processors, at least in comparison to the less mainstream architectures. Other architectures focus on either bandwidth, parallelism (but often without a great synchronization story), and optimizing power usage. So I doubt that Go lang would benefit from moving to Power.

Some choices the Go people made about how channels how limited their options for optimizing channels / increase the complexity of a lock-free implementation (don't have the mailing list link handy). If you don't need all these guarantees you can use pick a SPSC, SPMC, MPMC implementation that might work better for your use case.

> (Did you know that for i, _ := range is often 3 to 4x faster than for _, v := range? Do you know how awful code written down four or five nested loops that uses indices looks?)

Yes, in the second version you have to make a copy of v. Depending on how large v this is how large the impact will be. The first version just references the array cell via a[i] and for that there's no copy needed and it's one assembly instruction. Maybe the optimizer can become better here but I'm guessing it might break some language contract.

RankingMember · on Oct 14, 2015

That was my thought too. The "pain in the ass" factor associated with supporting a non-mainstream architecture is missing from the charts, and moves the dial away from IBM. That's not to say things couldn't change, but Xeons are proven, and unless you've got a surplus of money to throw at trying a new architecture (e.g. Google), you're better off sticking with the known good unless the predicted gains are sizable.

alyandon · on Oct 14, 2015

So my question is - how much extra power does the IBM Power8 cpu need in order to get performance that is on par with Intel's Haswell?

Sanddancer · on Oct 14, 2015

They use less power from what I can tell. The IBM's server has a TDP of 190 watts, and the equivalent Xeons are 135 watts a piece, or 270 watts total.

kev009 · on Oct 14, 2015

Look at the memory bandwidth numbers. These completely outclass the Xeon in certain types of workloads, apples to oranges.

StillBored · on Oct 14, 2015

Yes if you need more memory bandwidth, POWER is where its at. OTHO, as I've been saying the base cores don't do as well on single threaded loopy or problems with a decent L1/L2 cache hit rate. The specint_rate numbers look good because of the x8 threading, the single core results probably don't look that good.

So for many workloads it won't be that great. For the one I was working on out of the box, the power was 1/2 as fast. But that wasn't fair because we had a couple highly optimized x86 code paths. Doing some basic POWER optimizations brought the performance in line with the x86. But on some benchmarks it would win, and lose on others. So while we utilized a _LOT_ of memory/IO bandwidth the fact that there was nearly 2.5x available in the power system over our E5 didn't give us enough of a boost to make it worth the higher price tag (nearly 3x in our case, cause we were comparing with a supermicro machine). Maybe this newer power machine changes that a little.

vardump · on Oct 14, 2015

Supermicro is where it's at. 10k buys you 4x dual socket 6-core Xeon E5-v3 servers (2U Twin^2). Totally 48 cores, 256 GB DDR4 RAM (minimum config).

ksec · on Oct 14, 2015

I thought after all these years and improvement Intel is invisible in performance. Turns out i was wrong.

Although I would still buy Intel Chips for future prove. IBM is simply too incompetent to compete.

dorfsmay · on Oct 14, 2015

Until about 5 years ago the Power architecture dominated in the CPU performance domain. Saying "IBM is too incompetetent" makes you sound like you don't really know this market.

ksec · on Oct 14, 2015

Because I really dont. Yes Power used to dominated performance. I have always thought with every tick and tock Intel has closed the gap or leap forward already.

But where is it used? Is it shrinking? I have no idea, at least everything have shown http://www.theplatform.net/2015/06/04/x86-servers-dominate-t...

Intel is still winning / dominating, there were talks of Google using POWER8, haven't seen anything more. And knowing Intel with a clear Roadmap coming up.

Do Power8 have any place in Cloud Computing?

I too, wish Intel would have more competition, but i dont see the choice of using it. Most of the distributed and In Memory Processing aren't using it. But then I guess i am not the target audience.

gsnedders · on Oct 14, 2015

In general, what's kept POWER where it is is that you need to be willing to pay far higher prices than Intel charge, and have far higher power consumption than high-end Intel parts, and have the cooling to cope with far higher exhaust temperatures.

Tinyyy · on Oct 14, 2015

Hey, I wanna know more, can you elaborate?

wtallis · on Oct 14, 2015

Intel's only got the ability to develop one microarchitecture family. When they try for more than one, things fall through the cracks and you get the Pentium 4 dead-end or the Atom's lackluster performance.

Intel's primary microarchitecture is aimed at laptops. It scales reasonably down to large tablet power levels and up to workstation power levels. For the high-performance server market, they can only throw cores and cache at the problem, with some enterprise features bolted on.

IBM's always targeted the high-performance server market. For a while, their Power cores were also being used for workstations and even a desktop by Apple, but that's never been the focus. They include things like decimal arithmetic and SMT and hardware transactional memory and they've been selling the high-end parts at 4-5GHz speeds for a long time.

cbd1984 · on Oct 14, 2015

OK, what's Intel missing for high-performance servers? They've got respectable performance, they've got VT-x and all the other virtualization hardware, what's missing?

dijit · on Oct 14, 2015

their underlying architecture wasn't built for it.

it's like trying to build a skyscraper on the foundations of a log cabin.. you can get so far up and things will start to sink back into the mud.

the_why_of_y · on Oct 14, 2015

The x86 architecture was not amenable to a high performance pipelined implementation, so what Intel did since Pentium Pro is to JIT the x86 instructions into an internal RISCy instruction set that can be executed out-of-order with competitive performance.

The x86 architecture was limited to 32-bit, severely limiting the virtual address space as well as hampering OS implementation with hacks like PAE, so what AMD did is to extend it to 64-bit.

The x86 architecture was not designed for SMP scalability due to the rather strict memory ordering requirements, but most of the architectures with laxer memory models (in particular, Alpha, which was the most lax of all) are out of business today (in fact, of the commercially relevant server architectures today, only POWER has lax memory ordering; SPARC can in theory but is usually configured to run with TSO which is similar to x86).

The x86 architecture was not designed for OS virtualization because various instructions did not trap when executed in user mode, so what Intel and AMD did is define a new protection level ("ring -1") to run the hypervisor so this works efficiently now.

What actual problem do you see with x86 that cannot be solved or worked around by some creative engineering at Intel/AMD?

gsnedders · on Oct 14, 2015

> The x86 architecture was not amenable to a high performance pipelined implementation, so what Intel did since Pentium Pro is to JIT the x86 instructions into an internal RISCy instruction set that can be executed out-of-order with competitive performance.

POWER does the exact same (I can't remember which revision).

gsnedders · on Oct 14, 2015

It's probably worthwhile to point out that Itanium still, just about, exists, and has traditionally been Intel's competitor to POWER. Though certainly, in recent years, x86_64 has largely taken over that role.

bluedino · on Oct 14, 2015

They sold what, 400 Itanium servers last year?

gsnedders · on Oct 14, 2015

Last I knew it was still close to a billion dollars of hardware per year—and even at the high prices Itanium is at, that's still a considerable number.

groupmonoid · on Oct 14, 2015

By looking at the IBM-provided benchmarks, it looks like the new Power8 is better. But are there any independent benchmarks out there already?

TheCondor · on Oct 14, 2015

It probably doesn't matter, they aren't better enough. Now if IBM sold these maybe 30% cheaper, it could be a different story

eliben · on Oct 14, 2015

The almost $3K price tag on a "Linux OS" is disturbing

nabla9 · on Oct 14, 2015

Why you find it disturbing that someone wants to pay for OS support?

eliben · on Oct 14, 2015

That's a lot for supporting a free OS on a single server

fnordfnordfnord · on Oct 15, 2015

It comes with an engineer's phone number.

krylon · on Oct 14, 2015

I think they assumed these servers would run RHEL, which is not cheap, especially on non-x86(_64) architectures.