Reply
Thread Tools Display Modes
#1
Old 10-28-2010, 06:02 AM
Guest
Join Date: Nov 2002
Location: Copenhagen, Denmark
Posts: 4,731
1990 Cray supercomputer vers. my grandmotherís Dell

How would a 1990 Cray supercomputer stand up against a 2010 $2000 Dell PC? How about a 1980 Cray supercomputer?
#2
Old 10-28-2010, 06:37 AM
Guest
Join Date: Aug 2000
Location: Location: Location:
Posts: 10,544
Desktop PCs have generally been about 15-20 years behind super computers, but I believe it is now 15 years or less.

It's not apples to apples though. Someone can explain better than I can, but super computers place an emphasis on some things that PC's do not.

But generally, for pure power, the lag is about 15 years. So, within the past few years, consumers could get power on scale with a super computer from the late 80's to early 90's.

What to measure and how to make the apples-to-apples comparison will spur debate, but it is safe to say that the lag is about 15 years and no more than 20, with the lag narrowing further every year.

.

Last edited by Philster; 10-28-2010 at 06:37 AM.
#3
Old 10-28-2010, 06:42 AM
Guest
Join Date: Jan 2007
Posts: 189
It's difficult to compare directly, because they're designed for different tasks, but if we just measure FLOPS then a 1990 Cray ran at about 109 FLOPS, or 1 GFLOP (G for Giga).

From wikipedia:
Quote:
As of 2010, the fastest PC processors six-core has a theoretical peak performance of 107.55 GFLOPS (Intel Core i7 980 XE) in double precision calculations. GPUs are considerably more powerful. For example, NVIDIA Tesla C2050 GPU computing processors perform around 515 GFLOPS
#4
Old 10-28-2010, 06:51 AM
Charter Member
Join Date: Nov 2000
Location: Southeast Michigan, USA
Posts: 10,555
Quote:
Originally Posted by Philster View Post
What to measure and how to make the apples-to-apples comparison will spur debate[.]
Indeed.

Quote:
Originally Posted by phaemon View Post
It's difficult to compare directly, because they're designed for different tasks, but if we just measure FLOPS then a 1990 Cray ran at about 109 FLOPS, or 1 GFLOP (G for Giga).
I'd not consider this the gold standard for comparing, say, missile targeting planning with running the fastest version possible of Firefox. FLOPS -- floating point operations per second -- tests just that, exactly. Modern operating systems for most of their functions don't depend on a whole lot of floating point math. Big machines targeted for calculating sciency stuff are working with a lot of floats.

Heck, it's still relatively recently that floating point co-processors aren't an optional feature of consumer CPU and computer builds.
#5
Old 10-28-2010, 07:24 AM
Guest
Join Date: Mar 1999
Location: Miskatonic University
Posts: 12,180
That is rather like asking how my Nissan Versa compares to an Abrams tank. They are superficially similar but designed for to meet different needs.
#6
Old 10-28-2010, 07:37 AM
Guest
Join Date: Nov 2002
Location: Copenhagen, Denmark
Posts: 4,731
Ok then. How about: is a 2010 Dell better able to carry out the tasks for which a Cray 1990 was designed? Or a Cray 1980?
#7
Old 10-28-2010, 07:56 AM
Robot Mod in Beta Testing
Moderator
Join Date: Mar 2001
Location: Pennsylvania
Posts: 21,561
A supercomputer like the Cray was designed as a number cruncher. Crays were vector processors, meaning that they took large numbers of data and performed operations on them. This made them very good for number crunching, but not so good for more general tasks.

A modern top of the line desktop PC can outperform a Cray of 20 years ago in things like handling multiple users and serving files and that sort of thing. If you are doing number intensive calculations like the air flow around the space shuttle (one of the tasks a Cray actually did) the Cray will still outperform the desktop PC, but not by a huge margin.

But basically, you can have a "Cray light" sitting on your desk doing almost the same calculations and performance of a supercomputer of a couple of decades ago.

The special effects for "The Last Starfighter" (a movie from 1984) were rendered on a Cray X-MP supercomputer. You can get better graphics in most desktop PC games these days, and the latter are in real time.
#8
Old 10-28-2010, 08:02 AM
Guest
Join Date: Jan 2007
Posts: 189
Quote:
Originally Posted by Balthisar View Post
I'd not consider this the gold standard for comparing, say, missile targeting planning with running the fastest version possible of Firefox. FLOPS -- floating point operations per second -- tests just that, exactly.
It's the standard for comparing supercomputers which is why I chose it.

And so, to follow up Rune's question, yes, it would be able to complete the same program more quickly even though it's not optimised for such tasks.

Actually, GPUs (processor on graphics cards) perform similar types of computations so in real life, you might well offload the calculations to there. And indeed, modern high performance computing uses GPUs.
#9
Old 10-28-2010, 08:15 AM
Guest
Join Date: Jan 2007
Posts: 189
Quote:
Originally Posted by engineer_comp_geek View Post
If you are doing number intensive calculations like the air flow around the space shuttle (one of the tasks a Cray actually did) the Cray will still outperform the desktop PC, but not by a huge margin.
Well, I hesitate to disagree with you, but I think you're wrong. A modern CPU will outperform a 1990 Cray even on number intensive calcs. A modern GPU will absolutely trounce it. But I'm prepared to be proved wrong: can you give me any figures that will change my mind?
#10
Old 10-28-2010, 08:15 AM
Guest
Join Date: May 2009
Posts: 3,736
Quote:
Originally Posted by DrFidelius View Post
That is rather like asking how my Nissan Versa compares to an Abrams tank. They are superficially similar but designed for to meet different needs.
That is misleading. A modern PC can be programmed to perform the same tasks as a twenty year old Cray (both are turing complete), and can complete them within comparable timescales.
#11
Old 10-28-2010, 08:30 AM
Guest
Join Date: Feb 2007
Location: Oh-hiya-Maude
Posts: 4,356
Quote:
Originally Posted by Alka Seltzer View Post
That is misleading. A modern PC can be programmed to perform the same tasks as a twenty year old Cray (both are turing complete), and can complete them within comparable timescales.
This is a good point. I think the bigger difference is that crunching numbers on the Cray would have required specialized assembly programmers, whereas on a modern PC any chucklehead from a local university should be qualified to knock out roughly the same program in C++. If the Cray is still faster at that point, you can bring in better programmers and/or optimized compilers, and if the Cray is still faster, you can buy a couple more $500 PCs. The real advantage of the Dell is that it's cheap and flexible. While it might be sexy to compare FLOPS, in terms of real-world practicality the Cray is next to useless.
#12
Old 10-28-2010, 08:31 AM
Guest
Join Date: Aug 2000
Location: Location: Location:
Posts: 10,544
A desktop PC has to be bundled to complete a wide variety of tasks (graphics, connections, interface, settings, virus protection, use-ability, etc) whereas a supercomputer might place an emphasis on a more narrow band for the user needs.

Instead of comparing a Nissan Versa of 2010 to an older tank, think of this analogy instead: It's like trying to compare a modern family sedan with an older race car. On one hand it has many capabilities of older race cars, but probably can't do the one thing the race car could: Race.
#13
Old 10-28-2010, 08:42 AM
Robot Mod in Beta Testing
Moderator
Join Date: Mar 2001
Location: Pennsylvania
Posts: 21,561
Quote:
Originally Posted by phaemon View Post
Well, I hesitate to disagree with you, but I think you're wrong. A modern CPU will outperform a 1990 Cray even on number intensive calcs. A modern GPU will absolutely trounce it. But I'm prepared to be proved wrong: can you give me any figures that will change my mind?
A Cray T-90's vector processor could handle about 1.8 Gflops, which is less than a modern CPU, but you could have up to 32 of these vector processors in a T-90 for a total of 57.6 Gflops.

A modern PC is going to run somewhere under 10 Gflops for a typical Intel or AMD dual core system, if the numbers I looked up on google are accurate.

I wasn't really thinking about the GPU, and you are right that a GPU can outperform a 1990s era Cray.
#14
Old 10-28-2010, 09:12 AM
Guest
Join Date: May 2009
Posts: 3,736
Quote:
Originally Posted by Philster View Post
Instead of comparing a Nissan Versa of 2010 to an older tank, think of this analogy instead: It's like trying to compare a modern family sedan with an older race car. On one hand it has many capabilities of older race cars, but probably can't do the one thing the race car could: Race.
Sorry, that's still a bad analogy, as the modern PC can perform identical tasks to a supercomputer of the early 90s.

I think engineer_comp_geek's figures are correct, but bear in mind an old supercomputer will only perform well when working on highly parallel computations. A PC would be much faster than a Cray T-90 in many cases.
#15
Old 10-28-2010, 09:15 AM
Guest
Join Date: Jan 2007
Posts: 189
I'll repeat my wikipedia quote:
Quote:
As of 2010, the fastest PC processors six-core has a theoretical peak performance of 107.55 GFLOPS (Intel Core i7 980 XE) in double precision calculations.
Now, a Dell with that processor does go for $2,399 so it's $400 dollars over budget, but I think that even a slightly slower one would work.

You're [engineer_comp_geek] right that probably most currently used desktop PCs aren't that powerful, but new ones you buy should be.

Last edited by phaemon; 10-28-2010 at 09:16 AM. Reason: Make it clear who I was responding to
#16
Old 10-28-2010, 09:22 AM
Guest
Join Date: Sep 2009
Location: Adelaide, Australia
Posts: 4,877
There are a lot of differences between a Cray vector machine and an x86 processor, but many of them have been narrowed over time. The true Cray really was designed for a very specific purpose, and was very good at it.

Crays were always native 64 bit, and were designed around a floating point pipeline. Their big trick was being a vector machine, which meant that they were specifically designed to work well on regular arrays of floating point numbers (basically being good at matrix arithmetic.) Once the pipeline was loaded up they could emit two 64 bit floating point results each clock cycle, and keep doing it. Modern processors can't do this within the core instruction set. However the advent of the SSE and similar add on instructions (and essentially add on arithmetic units) allow them to get closer. (A modern x86 can retire 2 ops per clock in some cases, but in practice it is very hard to feed it in such a way that it keeps doing it.)

However there are other important differences. Modern processors rely very heavily on cache to keep often used data ready at hand, so that the processor actually has something useful to work on. Cache memory is expensive in all sorts of ways, and the balance of cache design is a tricky at best. Cray machines didn't have cache. The vector architecture, plus very expensive memory allowed them to essentially have the operands needed for a calculation always ready. Because of the regularity of the operations the machines typically undertook, the memory could be requested to deliver operands needed for calculations quite some time in the future. The memory controllers could manage up to hundreds of outstanding requests for data.

The big feature of these machines was really that if you measured peak floating point performance, and also sustained performance, they were very close. Most machines can't manage this. They can get quite high peak rates, but they can't keep it up. The caches start to thrash, the memory bandwidth starts to choke. The big vector machines were very well balanced, and just kept steaming on.

Early machines were very difficult to program. To get the potential performance a programmer needed to hand craft the code in many arcane ways. One of the critical advances was the development of high quality compilers. Supercomputers were, and in many cases still are, programmed in Fortran. Fortran has changed over the years, and in some ways modern Fortran variants have some very neat features. For big regular data and matrix algebra it is hard to beat. Nobody programs supercomputers in C++ unless they have a gun held to their head. It is bad enough programming normal code in the wretched trainwreck of a language.

The other aspect of the Cray supercomputers, and true supercomputers in general is IO performance. There is no value in being able to rip the head of a problem if you can't get the data in and out of the system quickly. Very large, expensive, and seriously quick IO controllers were the mark of these machines. Parallel disk arrays, and specially crafted OS mechanisms to allow proper parallel access to stripped disks.

The big change from a Cray of 1980 to a Cray of 1990 was really adding processors. You could get a 16 processor machine, whereas the 1980 Cray was a single processor. The late 1980's saw the introduction of data parallel machines that challenged the traditional vector machine's pre-eminence.

So, could your modern PC beat a 1980 Cray? 250MFLOPS 64 bit peak, 140MFLOPS sustained? 4 Million 64 bit words of memory - for a grand total of 64MB of memory? But memory with 25ns cycle time. Clearly a modern machine will beat it, but the difference is closer than you might think. A 1990 Cray? 8 processors, up to 2GB memory, 8 x 333MFLOPS of 64 bit? Ballpark performance, but I would bet on the Cray. The big wins will still be sustained performance. Your PC will be OK for a short while on a small problem, but will choke on a large problem. The Cray will hardly notice.
#17
Old 10-28-2010, 09:24 AM
Guest
Join Date: Feb 2003
Location: Washington, DC
Posts: 5,989
Quote:
Originally Posted by Balthisar View Post
Heck, it's still relatively recently that floating point co-processors aren't an optional feature of consumer CPU and computer builds.
Yup. Back in high school, I scored an old Compaq running on a 486 SX - who needs floating point math if all you're going to do is slap DR-DOS and Netwars on it, right?

Good times, good times.
#18
Old 10-28-2010, 09:31 AM
Guest
Join Date: Aug 2000
Location: Location: Location:
Posts: 10,544
Quote:
Originally Posted by Alka Seltzer View Post
Sorry, that's still a bad analogy, as the modern PC can perform identical tasks to a supercomputer of the early 90s.

I think engineer_comp_geek's figures are correct, but bear in mind an old supercomputer will only perform well when working on highly parallel computations. A PC would be much faster than a Cray T-90 in many cases.
And a modern family sedan can perform identical task to race cars of 20 years ago.
#19
Old 10-28-2010, 09:55 AM
Charter Member
Join Date: Sep 1999
Location: Raiderville, TX
Posts: 10,661
Hijack, just to brag: I've sat on the Cray 1 (serial number: 1). At the time I worked at the Lab in Los Alamos, the old Cray 1 had long been surpassed by Thinking Machines and Cray YMPs; the old girl drew far too much power for the results it produced. Therefore, it was in the lobby of the security-fenced building where I worked, performing in its last days of service as a bench.
#20
Old 10-28-2010, 10:03 AM
BANNED
Join Date: Jan 2010
Location: NY, USA
Posts: 4,545
A few years ago, I needed some double precision floating point numbers crunched for a few billion iterations. I programmed it in a compiled basic, somewhat similar to a typical fortran. It ran in 127 seconds on my 1.?? gHz AMD. Out of curiosity, I went looking to see how fast that would have run on early Crays programmed in fortran. To the best of my ability to determine, the nearly modern AMD would have equaled or slightly beaten an X-MP, and been easily trounced by a Y-MP. That was for a specific problem. Different problems using different math would, of course, give different results, so YMMV.
#21
Old 10-28-2010, 10:11 AM
Guest
Join Date: May 2009
Posts: 3,736
Quote:
Originally Posted by Philster View Post
And a modern family sedan can perform identical task to race cars of 20 years ago.
You ended your analogy by saying "On one hand it has many capabilities of older race cars, but probably can't do the one thing the race car could: Race.", which was the misleading part.
#22
Old 10-28-2010, 05:36 PM
Guest
Join Date: Jun 2008
Posts: 96
And in circular kind of way. The fastest supercomputer today is powered by Intel processors and Nvidia graphics cards. (admittedly there's 7,000 of them).

http://bbc.co.uk/news/technology-11644252
#23
Old 10-28-2010, 05:46 PM
BANNED
Join Date: May 2006
Location: michigan
Posts: 26,307
http://theatlantic.com/technolog...e-world/65326/ The fastest computer is made in China.
#24
Old 10-28-2010, 06:19 PM
Guest
Join Date: May 2003
Location: Manor Farm
Posts: 17,902
Quote:
Originally Posted by Deflagration View Post
And in circular kind of way. The fastest supercomputer today is powered by Intel processors and Nvidia graphics cards. (admittedly there's 7,000 of them).
Which gets back to the fundamental difference that Francis Vaughan described eloquently in his post; modern computing clusters, which are really just a compact rack of individual (albeit often multi-core) computers that are arranged in a master-slave hierarchy (sometimes in multiple levels) devour a problem by killing it with a million nibbles; the great supercomputers of yesteryear were able to crunch whole solution sets in large chunks. In terms of the number of logic operations required to perform a specific calculation effort, vector supercomputers were far more efficient, whereas breaking a problem up among a bunch of smaller, less capable computers is less computationally efficient. However, the cost of building individual processing units and volatile memory is now so cheap that this measure of efficiency is no longer very useful; it is much, much cheaper, faster, and easier to abstract the problem to a parallel virtual machine that breaks a problem into discrete computational chunks, passes them to individual children, waits for an answer, and then quilts the results back into a unified solution.

Note that this isn't all just hardware; the computational methodology for creating abstracted computation environments, and the message passing interfaces that transparently support it have advanced significantly in the past two decades. In theory you could take a bunch of Apple IIes and gang them together to solve a large computational problem, but the reality is that neither the network interface nor the tools for breaking a problem up into manageable parts existed at that time. It is also the case that basic algorithms for solving large sparse matrices have evolutionarily improved, so it takes less computation to solve the same system to an acceptable degree of precision.

The other problem is alluded to by Francis Vaughan's post; it has been only in the last ten years or so that commodity computing hardware has been capable of 64 bit computation, allowing it to carry out high precision floating point computation that is necessary when accurately solving large matrices. You could break up a large problem to the aforementioned cluster of Apple IIes, but they would only be able to provide a limited degree of precision in the answer, which may not be sufficient to give a high fidelity answer.

To address the question of the o.p. in a more general way: your grandmother's Dell is "more powerful" for most general computing purposes than the Cray C90 could hope to be. But for certain types of operations or metrics, the vector supercomputers remain technically faster, though not in ways that make them comparable to modern computing clusters for any practical purpose.

Stranger
#25
Old 10-28-2010, 08:23 PM
Guest
Join Date: Sep 2009
Location: Adelaide, Australia
Posts: 4,877
Something worth mentioning is the converse. If your problem wasn't a large highly regular one, and more like common garden variety program code, vector machines had no advantage, and indeed all that special extra capability really just got in the way. Crays were little faster than other contemporary machines with similar clock rates at running general purpose code.

Anther difference, the Cray vector architecture did not support virtual memory. So many modern operating systems mechanisms didn't exist. These early machines used Cray's homegrown OS. Latter Cray adapted Unix. These early machines used separate minicomputers to manage a lot of the more mundane OS, tasks leaving the serious work of number crunching to the vector processor. In many ways you could regard the vector processor(s) as the add on processors.
#26
Old 10-28-2010, 09:49 PM
Charter Member
Join Date: Jul 2000
Location: The Middle of Puget Sound
Posts: 21,819
That's interesting, and makes it even more directly equivalent to a modern desktop system with a high end graphics card.
#27
Old 10-28-2010, 09:50 PM
Guest
Join Date: Aug 2005
Location: California
Posts: 38,498
Quote:
Originally Posted by Max Torque View Post
Hijack, just to brag: I've sat on the Cray 1 (serial number: 1). At the time I worked at the Lab in Los Alamos, the old Cray 1 had long been surpassed by Thinking Machines and Cray YMPs; the old girl drew far too much power for the results it produced. Therefore, it was in the lobby of the security-fenced building where I worked, performing in its last days of service as a bench.
So; it really was a benchmark machine for its time, eh?
#28
Old 10-28-2010, 11:24 PM
Guest
Join Date: Apr 2004
Location: San Jose, CA
Posts: 3,514
For certain types of problems, a high end FPGA board can be equivalent to a desktop supercomputer of today. For a few thousand dollars you can get teraflops of highly parallel computing power
#29
Old 10-28-2010, 11:49 PM
Guest
Join Date: Sep 2009
Location: Adelaide, Australia
Posts: 4,877
Quote:
Originally Posted by Max Torque View Post
Hijack, just to brag: I've sat on the Cray 1 (serial number: 1).
I've got two cards out of a Cray 1. One of the other government labs was selling them off through their museum about 15 years ago. They are astoundingly heavy. The entire structure is built on a thick copper plate, used to get the heat out.
#30
Old 10-29-2010, 04:14 AM
Guest
Join Date: Mar 2001
Location: London
Posts: 730
Can you get supercomputers to run Windows? If I was super wealthy could I buy this:
http://bbc.co.uk/news/technology-11644252

tinker around with it a bit and run my games super fast?

Ovbiously it wouldn't be the optimal way to do it as it's not designed for that but if money's no option is this the way to get games running fastest?
#31
Old 10-29-2010, 05:13 AM
Guest
Join Date: May 2009
Posts: 3,736
Quote:
Originally Posted by ColdPhoenix View Post
Can you get supercomputers to run Windows? If I was super wealthy could I buy this:
http://bbc.co.uk/news/technology-11644252

tinker around with it a bit and run my games super fast?
No, it won't be running Windows.

Any games would require some serious re-writing to realise any performance gains. Most PC games on the market benefit little going from a dual-core to a quad core processor.

Last edited by Alka Seltzer; 10-29-2010 at 05:13 AM.
#32
Old 10-29-2010, 06:08 AM
Guest
Join Date: Nov 2008
Location: Chicago
Posts: 748
Quote:
Originally Posted by ColdPhoenix View Post
Can you get supercomputers to run Windows? If I was super wealthy could I buy this:
http://bbc.co.uk/news/technology-11644252

tinker around with it a bit and run my games super fast?

Ovbiously it wouldn't be the optimal way to do it as it's not designed for that but if money's no option is this the way to get games running fastest?
Nope. Even if you could wave a magic wand and get Windows installed on it (and got around the licensing problemóWindows licenses are limited by number of CPUs) and recognizing all that hardware, you've still hit the problem that games are not trivially parallelizable. On the other hand, you'd rocket up to the #1 spot in [email protected], I'd wager.
#33
Old 10-29-2010, 06:40 AM
Guest
Join Date: Mar 2001
Location: London
Posts: 730
I'm disappointed. Maybe I'll just get an i7 instead.
#34
Old 10-29-2010, 11:14 AM
Member
Join Date: Aug 1999
Location: Alabama
Posts: 14,543
Quote:
Originally Posted by BorgHunter View Post
Nope. Even if you could wave a magic wand and get Windows installed on it (and got around the licensing problem—Windows licenses are limited by number of CPUs) and recognizing all that hardware, you've still hit the problem that games are not trivially parallelizable. On the other hand, you'd rocket up to the #1 spot in [email protected], I'd wager.
The supercomputer in question has 7,168 NVIDIA Tesla M2050 GPUs and 14,336 Intel Xeon CPUs. So it seems it's basically a room full of blade servers. Most likely each blade has 2 CPUs and one GPU.

It should be easy to get Windows installed and running on each blade. The hard part is writing the software to break down a problem and distribute the workload onto all these individual computers. Also you need to spend something like $1 million on Windows licenses.

Last edited by scr4; 10-29-2010 at 11:14 AM.
#35
Old 10-29-2010, 01:14 PM
Voodoo Adult (Slight Return)
Charter Member
Join Date: Jul 2000
Location: Charlotte, NC, USA
Posts: 25,088
Quote:
Originally Posted by Der Trihs View Post
So; it really was a benchmark machine for its time, eh?
And even more so now, it would seem.
#36
Old 10-29-2010, 04:09 PM
Guest
Join Date: Jan 2003
Location: 7-Eleven
Posts: 6,264
Tangent question:
When programming supers - how are the individual cpu's/gpu's addressed from the controlling program? At some low level either the IP or Mac address is being used, but I'm wondering if there is typically some logical identifier at a higher level, like CPU number 1 or CPU number 2,347, etc.

Do each of the cpu's typically have a communications program running that sends and receives work over the network in addition to the program actually performing computations? Or is there special hardware to keep the CPU (which I assume doles out work to the attached GPU's) from having to handle that. In other words - how much does it look and act like a cluster/network of PC's and in what ways is it significantly different other than scale?
#37
Old 10-29-2010, 06:44 PM
Guest
Join Date: Sep 2009
Location: Adelaide, Australia
Posts: 4,877
Depends upon your supercomputer - real supercomputers use special interconnects that provide for a range of possible programming paradigms, and very fast speeds.

At one end you have the SGI UV systems that are a single system image with a single shared memory space. Just like a multi-core desktop but much much bigger. Here however locality issues mean that you still need to be aware of where you data is, and there is an underlying notion of CPU identity and some ability to direct where data and processing occurs. Everything after the CPU itself is custom hardware, so these machines are quite a bit more expensive than more conventional clusters. However for the right job they are pretty hard to beat.

More common supercomputer interconnects - Myrinet, Infiniband, and the like, actually usually do provide an IP connection, and even a virtualised Ethernet connection, but these are not usually used for the computational effort - being used for control duties. IP places a large and unwelcome burden on the communication. In general you will see the actual code run written to use a general purpose message passing library - such as MPI - which abstracts over the actual interconnects and protocols. The specialised interconnect manufacturers provide customised implementations of MPI that work efficiently with their hardware. MPI just identifies the target CPU as a number. (The hardware itself does of course have a Mac address.) Most programs are written in such a way that they dynamically configure themselves to work on the number of CPUs that are made available at the time. So you make calls the the MPI library asking for how many CPUs there are (or you have been given) and each instance of the program will ask what the ident of the CPU it is running on is. MPI provides a whole raft of ways of creating parallel communication between all the CPUs, so that all the common paradigms are directly supported, and any specialised hardware tweaks can be used. For instance global synchronisation is something that is really worth adding hardware in the interconnect to effect.
The commnications hardware will handle all of the work of moving the data to and from the memory of one machine to another. The local CPU only needs to issue a request to the hardware. An important optimisation in any code is to identify where it is possible to overlap communication and computation, so getting the data you need for the next bit of work before sending out the work just done is important. (This is of course not trivial to effect for the entire program, so paradigms like red-black interleaving are used to allow this to work.)


Another thing - the interconnects usually work directly in user space. Myrinet for instance maps the control registers of the controller card into the user program's memory. Requests to the hardware are therefore made directly from user code and do not require a system call. This significantly improves performance. However there is now essentially zero security. This is perhaps one of the more obvious differences between a supercomputer (even if it is little more than a stack of blade servers) and a computational cluster. Clusters tend to be managed to provide a full security model. Once you are past the management front end of real supercomputer anything that gets in the way of performance is thrown out.

Last edited by Francis Vaughan; 10-29-2010 at 06:47 PM.
#38
Old 10-29-2010, 07:47 PM
Guest
Join Date: Jan 2003
Location: 7-Eleven
Posts: 6,264
Thanks Francis Vaughan, exactly what I was looking for.
#39
Old 10-29-2010, 08:33 PM
Guest
Join Date: Mar 2003
Location: Minneapolis, MN
Posts: 14,221
Interestingly, we seem to have moved backwards in the volume of supercomputers.

The premier Cray computer, the Cray-1 (from 1975, not 1990) was physically a very compact machine (see photo) -- 2 people probably could have joined hands around it. But the current supercomputers are moving back toward ENIAC -- room-sized monstrosities.
#40
Old 10-23-2012, 06:57 PM
Guest
Join Date: Oct 2012
Posts: 2
The crays from the early 1990s can beat today's desktops on certain tasks

When comparing performance of your average desktop to the crays of the early 1990s or even 80s, you must take into consideration not the "peak mflop" ratings but the sustained mflop ratings. Peak mflops mean little if the machine never reaches such performance levels. The Army actually analyzed the cost/benefit of having a cluster of p4 2.8 ghz or going with a cray solution. They discovered that although the p4 2.8 ghz had a high peak mflop rating of 5.6gflops, in practice it only reached %2 of its peak performance due to bandwidth limitations! Please see the results of the army's study right here https://cug.org/5-publications/proce...zio_slides.pdf. Needless to say their conclusion was that the cray solution was more cost effective and easier to program and maintain.
Nasa's high performance computer lab also did a comparison between their old cray xmp 12 (one processor 2 megawords of memory) and a dual pentium II 366 running windows NT. They had to redesign the space shuttle's solid rocket boosters back in the late 80s after the challenger disaster and the cray xmp was used to model air flow and stresses on the new design. Some years later the code was ported to a windows NT workstation and the simulation rerun for comparison. The result is that a single processor cray xmp was able to compute the simulation in 6.1 hours versus 17.9 hours on the dual pentium II. The cray xmp could have up to four processors with an aggregate bandwidth of over 10gb a sec. to main memory, this kind of SUSTAINED bandwidth between cpu (not gpu) and main memory was not matched on the desktop until about 4 years ago. The pentium IIs had either a 66mhz or 100mhz bus speed so we are talking a maximum bandwidth of only 800mb (528mb with 66mhz bus) and with around 330mb/sec transfer rates sustained (remember pc's use dram and the crays mostly used very expensive sram memory). The importance of bandwidth and real world number crunching performance can be seen in the STREAM benchmark. Please go to http://streambench.org/ to see exactly what I mean.
In 1990 the C90 cray was the baddest super computer on the planet, and at $30 million fully configured it was also by far the costliest. Here's a photo of it: http://cisl.ucar.edu/zine/96/fall/images/c90.gif. The cray c90 could have up to 16 processors, with 16gb of memory, and could achieve a maximum performance of around 16glfops. "Well gee, my cheapo phenom x6 can do well over 16 gflops because that's what it says on my sisoft sandra score so I have a cray c90 sitting under my desk blah blah..." you are completely wrong if you think this. The sisoft sandra benchmark tests everything in cache which is easy for the cpu to access. Real world problems, the kind that crays are built to solve, can't fit into a little 4mb cache and thus we come to sustained bandwith problems. The c90 can fetch 5 mega words per clock cycle (for each processor) from main memory and has a real world bandwidth of 105gb/sec; compare this to a relatively modern, quad processor (4 processors and 16 cores) core i7 2600 that gets a measly 12gb a second sustained bandwidth. "But the core i7 2600 is clocked much higher than the c90 which only operate at 244mhz per processor". Ahhh but if the data is not available for the processor to operate on then it just sits there, wasting all cycles, waiting for the memory controller to deliver data to it. Without getting into too much detail (if you want a lot of detail read my analysis of the cray 1a versus pentium II below) the real world mflops of the C90, working on data sets too large for a typical pcs small cache, works out to roughly 8.6 gflops while the Intel Core i7 2600 will achieve only about 1gflops sustained on problems out of cache. So far there are no desktops, and won't be for quite a few years, that come EVEN close to the real world sustained bandwidth (and thus sustained performance) of a C90. Now for problems that do fit into the tiny cache and can be mostly pre-fetched, of course the desktop will be superior to the old crays. Here is a rough comparison I made between a cray 1a and a pentium II 400, read on only if you want to be bored to death:

The Cray !A had a clock cycle time of 12.5 ns, or an operational frequency of 80 mhz. It had three vector functional units and three floating point units that were shared between vector and scalar operands in addition to four scalar units. For floating point operations it could perform 2 adds and a multiply operation per clock cycle. It had a maximum memory configuration of 1 million megawords or 8 megabytes at 50ns access time interleaved into 16 banks. This interleaving had the effect of allowing a maximum bandwidth of 320 million megawords into the instruction buffers or 2560 mb/sec. Bandwidth to the 8 vector registers of the Cray 1A could occur at a maximum rate of 640 mb/sec. The Cray !A possessed up to eight disk controllers each with one to four disks, and each disk having a capacity of 2.424X10^9 bits for a maximum total hard disk capacity of 9.7 gigabytes. There were also 12 input/output channels for peripheral devices and the master control unit. It cost over 7 million in 1976 dollars and weighed in at 10,500 lbs with a power requirement of 115 kilo watts. So how does this beast compare with myr old clunker of a PC with 384 mb of SD100 ram and a P2 400 mhz cpu?

Well lets take a simple triad operation, with V representing a vector register and S representing a scalar register.

S*V0[i] + V1[i] = V2[i]

Without getting into too much detail this equation requires 24 bytes of data to perform once. There are two floating point operations going on here, the multiplication of the scalar value with the vector, then the addition of the second vector.Thus, assuming a problem too large to just loop in the cray 1A registers, and a bandwidth of 640 mb/s, the maximum performance of a Cray1A would equal (640/24) * 2 = 53 mflops on large problems containing data which could not be reused. This figure correlates well with the reported performance of the Cray 1A on real world problems

http://ecmwf.int/services/comput...r_history.html.

True bandwidth on a Cray 1A would also have to take into account bank conflicts plus access latency so about 533 mb/sec sustained is a more realistic figure. On smaller problems with reusable data the Cray 1A could achieve up to 240 mflops by utilizing two addition function units and one multiplication function unit simultaneously through a process called chaining. So you see the Cray 1A could be severely bandwidth limited when dealing with larger heterogeneous data sets.

My pentium II 400 has 512 kb of L2 cache, 384 mebabytes of SD100 ram, and a 160gb 7200 rpm hard drive. Theoretically it can achieve a maximum of 400 mflops when operating on data contained in its L1 cache, although benchmarks like BLAS place its maximum performance at 240 mflops for double precision operations which is what we are interested in here. Interestingly this is about the same as what a Cray !A can do on small vectorizable code. However once we get out to problem sizes of 128kb or 256kb or even 512kb my pentium 2 would beat the Cray 1A even in its greatest strength, double precision floating point operations, due to the bandwidth advantage of the L2 cache over the Cray's memory. At 1600 mb/s bandwidth my computer can do up to 133 mflops for problems under 512 kb in size but greater than the L1 Cache.

Once we get beyond 512 kilobytes the situation shifts as data would then need to be transferred from the SD100 ram.The theoretical bandwidth of SD100 ram is 800 mb/sec, still greater than the Cray 1A but here we run into some issues. The Cray 1A had memory comprised of much more expensive SRAM, while my memory is el crapo DRAM which require refresh cycles. So with these taken into account my DRAM actually has a theoretical maximum bandwidth of about 533mb/s and a real world maximum sustained bandwidth of a little over 300mb/s. This means that for problems out of cache, my pentium 2 gets slowed to a measly 315/12 = 26 mflops. In this special situation where the problem is vectorizable, the Cray 1A is still faster than my pentium 2, not bad for a computer that is 30 years old.

Once we get problems greater than 8 megabytes, the advantage shifts completely back to my pentium II as the Cray !A must then stream data from its hard disks (which were slower than ultra ATA/100) and my computer can go right on fetching data from ram. The Cray 1A could not realize its full potential as it was hampered by bandwidth
and memory size issues, yet in certain situations could outperform a desktop computer from 1998. Solid state disks,more memory ports, and larger memories were utilized in the subsequent cray xmp to address these problems.

A desktop like the core duo E6700 can do over 12 gigaflops, BUT only on problems that are small and fit into its cache. Once the data gets out of cache today's modern computers get their butts kicked by the old school Crays from the 80s. Just visit http://streambench.org/ to see what I mean.
#41
Old 10-23-2012, 10:26 PM
Guest
Join Date: Sep 2011
Location: Sunny California
Posts: 14,840
Quote:
Originally Posted by Max Torque View Post
Hijack, just to brag: I've sat on the Cray 1 (serial number: 1). At the time I worked at the Lab in Los Alamos, the old Cray 1 had long been surpassed by Thinking Machines and Cray YMPs; the old girl drew far too much power for the results it produced. Therefore, it was in the lobby of the security-fenced building where I worked, performing in its last days of service as a bench.
Let's see if I can trump that. I sat on (and was a computer operator of) the Cray 1 (serial number 1) in its original days at Livermore, in the late 1970's. I wrote a program for it using Livermore's enhanced version of Fortran, which allowed the programmer to write vector operations. I don't know what became of that machine in later years. It went to Los Alamos? Are we talking about the same machine?

There is one (or maybe two) Cray computers -- possibly including this very unit -- on display now at the Computer History Museum in Mountain View. Photo of someone (who isn't me) sitting on a Cray computer.

Links of everything you ever wanted to know about Cray computers from the Museum archives. (ETA: Lots of pictures and several technical reference manuals in PDF format.)

Last edited by Senegoid; 10-23-2012 at 10:27 PM.
#42
Old 10-23-2012, 11:16 PM
Guest
Join Date: Sep 2009
Location: Adelaide, Australia
Posts: 4,877
I can't trump sitting on, or writing for one, but I do own two circuit boards out of one. They are astoundingly heavy, basically being solid copper with two PCBs, one each side bolted to the copper core. The core inserted into rails that were liquid cooled. That was back when they really engineered computers. (My cards came from a machine at a US lab, but unless I dig deep, I can't remember which one.)
#43
Old 10-23-2012, 11:26 PM
Guest
Join Date: Mar 2010
Posts: 1,010
How are zombies at doing vector operations? Can they perform matrix calculations well?
#44
Old 10-23-2012, 11:29 PM
Member
Join Date: May 2001
Location: Scottsdale, more-or-less
Posts: 15,609
Quote:
Originally Posted by Francis Vaughan View Post
I can't trump sitting on, or writing for one, but I do own two circuit boards out of one. They are astoundingly heavy, basically being solid copper with two PCBs, one each side bolted to the copper core. The core inserted into rails that were liquid cooled. That was back when they really engineered computers. (My cards came from a machine at a US lab, but unless I dig deep, I can't remember which one.)
Well, they must not be memory cards, since you mentioned that before - post #29!
#45
Old 10-23-2012, 11:50 PM
Guest
Join Date: Sep 2009
Location: Adelaide, Australia
Posts: 4,877
Quote:
Originally Posted by beowulff View Post
Well, they must not be memory cards, since you mentioned that before - post #29!
#46
Old 10-24-2012, 02:17 AM
Member
Join Date: Feb 2007
Posts: 2,890
Quote:
Originally Posted by Francis Vaughan View Post
Something worth mentioning is the converse. If your problem wasn't a large highly regular one, and more like common garden variety program code, vector machines had no advantage, and indeed all that special extra capability really just got in the way. Crays were little faster than other contemporary machines with similar clock rates at running general purpose code.
One of the many reasons I hated Crichton is that in Jurrasic Park he had the real-time control of the park done by a Cray computer; probably the worst possible choice. The movie showed a Thinking Machines supercomputer in the background.

Last edited by fumster; 10-24-2012 at 02:17 AM.
#47
Old 10-24-2012, 04:39 AM
Guest
Join Date: Sep 2009
Location: Adelaide, Australia
Posts: 4,877
I thought the Cray was used for genome sequencing rather than control, but it is a loooong time ago. Thinking machines loaned four CM-5 Scale-4 cabinets for the making of the movie. The CM-5 LEDs were a standalone system and you could put them into a couple of standard blinkiing pattern modes, even if the rest of the cabinet was empty. (Unlike the CM-2 where the LEDs were on the matrix cards and would only blink when the system was running.) What was annoying was that in the movie the cabinets were set out side by side, not in the proper zigzag pattern of connected cabinets.

Genome sequencing is a good example of where conventional supercomputer architectures help not at all. The critical element is simply memory. Some colleges of mine have recently commissioned a dedicated sequencing system. Not much compute, but a Terabyte of RAM. OTOH, if you can't get enough memory, a really fast parallel IO system helps a lot. Something that is often overlooked with all the supercomputer system and something Cray were really good at, was fabulous IO. Thinking Machines were pretty good too. The CM-2 had one of the first really serious RAID systems (The Data Vault - 144 SCSI disks in RAID 6 for a massive 9GB) and the CM-5 offered the Scalable Disk Array, where the RAID system was directly coupled into the data network of the machine. Fine machines.

Not only do I have the Cray cards, but I have eight CM-5 LED panels in a box. There were four per cabinet. One day I'm going to get them going again
#48
Old 10-24-2012, 07:25 AM
Member
Join Date: Mar 2002
Location: Trantor
Posts: 11,768
What's a flop. For the record, the singular of flops is flops (floating point operation(s) per second.

In 1976, it took 1200 hours on the Illiac IV, a supercomputer of its day, to check around 1800 "minimal irreducible" map configurations to prove the 4 color theorem. Eighteen years later it took overnight on a PC to check around 600 such configurations (the mathematical analysis had reduced the number needed by 2/3).

In the early days of PCs, even older generations of mainframes had much greater I/O capacity than even high end PCs. The first PC hard drive I saw was just 10MB. Two days ago at Costco, I saw a 1TB drive on a PC!
#49
Old 10-24-2012, 07:52 AM
Guest
Join Date: Jan 2012
Posts: 1,616
If you count the GPU, the latest PCs can run at more than a teraflop, and software can use the GPU for non-graphical tasks, albeit with limitations, but the CPU can be used for what the GPU can't easily do.

Also, how fast are supercomputers when measured by instructions per second (not the same as flops, which are floating-point)? In this case, many programs can run without using any floating-point. PC CPUs are also designed to handle a wide diversity of tasks as efficiently as possible, so they'd likely easily outperform an equivalent-speed supercomputer in general purpose computing (i.e. what most people do on their computers).
#50
Old 10-24-2012, 08:02 AM
Guest
Join Date: Jan 2007
Location: Los Angeles
Posts: 2,713
I'm curious about questions like does my cell phone have more computing power than the Apollo 13 capsule?
Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off

Forum Jump


All times are GMT -5. The time now is 12:46 AM.

Send questions for Cecil Adams to: [email protected]

Send comments about this website to:

Terms of Use / Privacy Policy

Advertise on the Straight Dope!
(Your direct line to thousands of the smartest, hippest people on the planet, plus a few total dipsticks.)

Copyright © 2018 STM Reader, LLC.

Copyright © 2017
Best Topics: mmi cost change email timestamp vera ellen nude bose repair cost rotors rust mary frann sweater prickly beard tiny toad ma petite meaning fat powerlifter derisive pronunciation kiln dried pressure treated lumber lowes driving chicago to new orleans average life of an alternator king of the hill luanne pregnant where can i buy a teddy bear for my girlfriend barnes and noble prices online vs in store a possum or an opossum how long can you employ a temporary employee putting salt in beer usps parcel select vs retail ground dog ate raw beef