|Re: OT: The five trillionth digit of Pi is...... 2!|
Message #54 Posted by Egan Ford on 13 Aug 2010, 7:40 p.m.,
in response to message #53 by Pal G.
Dunno. The last system we built has over 30,000 cores, over 30 terabytes of RAM, and a multi-petabyte parallel file system. It also requires about a megawatt of power and has a MTBF of 24 hours. When I benchmarked it for the top500.org list last summer it was the 16th fastest system in the world (#1 in Canada).
If you wanted to, could you not own that record in a few hours??
However, systems like this need applications and data sets that can be broken up and run in parallel to realize any speed advantages. Although many of the current PC Pi programs better utilize multiple cores in a share memory system, I doubt that they can scale up to a 30K core distributed memory system. Something new would have to be written or something old in a different way. E.g. BBP.
BBP is a great algorithm that can compute any series of Pi digits without the overhead of computing all of Pi. So, I should be able to assign each of the 30K cores a slice of Pi. Furthermore, since BBP runs in O(n2) time you can speed up the computation by giving the cores that represent the beginning of Pi more digits and the cores representing the end of Pi fewer digits (IOW, equal time slices). BBP is not the best algorithm for computing all the digits of Pi, but it may be one of the simplest to run on a distributed memory machine. IOW, its a great algorithm to through hardware at.
Another option would be to use any series that computes Pi and split that up and use a binary tree to sum it all up. That would probably be much faster. However, it would slowly bottleneck on the sum and would require a shared file system. It would be fun to see and try. Google for "distributed Pi". One call PiHex used BBP as described above. Can this be done in hours on a 30K core system? I do not know. I'd have to run a few benchmarks first.
As for the keys, well, after we finish building a system and benchmarking it, we turn the keys over. In some cases we get access to run a few benchmarks, but real work takes priority and someone has to pay the power bill. On this most recent system we automatically power off unused resources to save power. So, cycle harvesting is not an option.
These beasts take a long time to architect, sell, build, and test. This cycle takes about two years. Perhaps next year I'll have a bigger system to play with and give it a shot.