Computing Reviews
Today's Issue Hot Topics Search Browse Recommended My Account Log In
Review Help
Search
Parallel implementation of multiple-precision arithmetic and 2,576,980,370,000 decimal digits of π calculation
Takahashi D. Parallel Computing36 (8):439-448,2010.Type:Article
Date Reviewed: Oct 25 2011

In this paper, Takahashi describes an apparently impressive computation, of 2.576 x 1012 decimal digits of &pgr;, in 29 hours on a 10240-core parallel processor with 16384 GB of memory. Note that, at its most efficient, the storage of that many digits of &pgr; requires 996 GB of memory, and 1200 GB as the author stores it (eight decimal digits in a 32-bit word). The computation was done with a modification of the Gauss-Legendre (arithmetic/geometric mean) algorithm due to [1], and was completely checked using the Borweins’ method [2], another quartically convergent method, in 44 hours. Both algorithms run in time O(M(n)log n) for n digits, where M(n) is the time taken to multiply n-digit numbers.

I say “apparently impressive” since Bellard [3] describes the computation of 2.7 x 1012 decimal digits of &pgr; in 131 days on a single four-core 6 GB computer. The author of this paper, therefore, took 24 times the core-hours and 25 times the GB-hours. Another computation [4] (too late to be cited in the paper) computed 5 x 1012 decimal digits of &pgr; in 90 days on a 12-core 96 GB processor. Linear scaling by the number of digits shows the paper under review to have used 22 times the core-hours and six times the GB-hours of this computation. Both of these used another algorithm [5], whose running time is O(log2(n)M(n)). It is noteworthy that both these computations were manipulating individual numbers that were larger, by a factor of 160 and 10 respectively, than their main memory.

Hence, the interesting question is why the computation under review uses so many more resources than the ones mentioned above, despite apparently using a more efficient algorithm. In answer to this, note what Brent and Zimmermann [6] say of the O(M(n)log n) Gauss-Legendre method: “The implied constant here can be quite large, so other methods are better for small n.”

Another clue is the fact that the computation used base 108, while the others used binary bases. Bellard [3] quotes 12 days of the 130 going into the binary-decimal conversion. This is a smaller overhead than the increase in number of words of memory used, caused by the decimal base in this paper.

Perhaps the most significant point is in section 5.2 of this paper, where it explains that during the fast Fourier transform (FFT), a 108 digit is split into two 104 digits, each stored in 64 bits. This increases the storage requirements to 4798 GB for a single number, and makes FFT-based multiplication impossible in the 17.5 TB of main memory available. Hence, the author was forced to manipulate numbers of one-fourth the desired size, and to use two layers of Karatsuba multiplication (that is, nine multiplications of ¼-sized numbers) to combine them. This alone costs a factor of 2.145.

In short, this paper is an eloquent testimony to the fact that constants do matter, and O isn’t everything.

Reviewer:  J. H. Davenport Review #: CR139524 (1202-0185)
1) Ooura, T. Improvement of the Pi calculation algorithm and implementation of fast multiple-precision computation. Trans. Jpn. Soc. Indust. Appl. Math. 9, (1999), 162–172.
2) Borwein, J. M.; Borwein, P. B. Pi and the AGM. John Wiley, New York, NY, 1987.
3) Bellard,F. Computation of 2700 billion decimal digits of Pi using a desktop computer http://bellard.org/pi/pi2700e9/pipcrecord.pdf (03/26/2011).
4) Yee,A. J. Kondo,S. 5 trillion digits of Pi - new world record: Pushing the limits of personal computing... How much further can we go? http://www.numberworld.org/misc_runs/pi-5t/details.html (03/26/2011).
5) Chudnovsky, D. V.; Chudnovsky, G. V. Ramanujan revisited. Academic Press, Boston, MA, 1988.
6) Brent, R. P.; Zimmermann, P. Modern computer arithmetic. Cambridge Univ. Press, New York, NY, 2010.
Bookmark and Share
  Editor Recommended
Featured Reviewer
 
 
Parallel And Vector Implementations (G.4 ... )
 
 
Computation Of Transforms (F.2.1 ... )
 
Would you recommend this review?
yes
no
Other reviews under "Parallel And Vector Implementations": Date
Automatic SIMD vectorization of chains of recurrences
Shou Y., van Engelen R.  Supercomputing (Proceedings of the 22nd Annual International Conference on Supercomputing, Island of Kos, Greece, Jun 7-12, 2008)245-255, 2008. Type: Proceedings
Nov 4 2008

E-Mail This Printer-Friendly
Send Your Comments
Contact Us
Reproduction in whole or in part without permission is prohibited.   Copyright 1999-2024 ThinkLoud®
Terms of Use
| Privacy Policy