Computing Reviews
Today's Issue Hot Topics Search Browse Recommended My Account Log In
Review Help
Search
Speeding up CRC32C computations with Intel CRC32 instruction
Gueron S. Information Processing Letters112 (5):179-185,2012.Type:Article
Date Reviewed: Jun 11 2012

Gueron introduces an algorithm to speed up the CRC32C implementation using the corresponding central processing unit (CPU) CRC32 instruction from Intel to compute cyclic redundancy checks of an arbitrary length buffer with inputs ranging from eight to 64 bits (powers of two).

The paper is a perfect illustration of classical pipelining techniques and throughput-oriented speedup, which are still pertinent today. Figure 3 is a classic. These techniques are taught to undergraduate students in computer science, software, and computer engineering during their system hardware, system software, and assembly programming courses. Thus, the paper has a lot of educational value where classical techniques are used in actual industrial practice. I strongly believe the content of this paper should become a part of a textbook on assembly and computer organization.

The paper discusses how to speed up CRC32C computations by a factor of at least three with the Intel instruction, a significant improvement over sequential and previous results. A speed increase like this can be applied to remote direct memory access (RDMA) and the Internet small computer system interface (iSCSI) protocol stack, where CRC32C computation can be a bottleneck when doing integrity checks on communicated data transfers.

The author further improves speed using the hardware CRC32 instruction (capable of processing up to eight bytes with a latency of three CPU cycles) instead of a serial instruction, by logically splitting the input data buffer into three disjoint chunks, processing them with CRC32 in a pipelined manner, and then recombining the results. This fully utilizes the CPU’s pipelined hardware, which Gueron makes throughput oriented rather than latency bound.

The author provides very good, easy-to-read background information, bit arithmetic, and theoretical foundations, and a detailed illustration of the instruction and algorithms. The paper concludes with performance results.

Reviewer:  Serguei A. Mokhov Review #: CR140251 (1210-1067)
Bookmark and Share
  Editor Recommended
Featured Reviewer
 
 
Algorithms (I.1.2 )
 
 
Algorithm Design And Analysis (G.4 ... )
 
 
Software Engineering (D.2 )
 
Would you recommend this review?
yes
no
Other reviews under "Algorithms": Date
Standard bases and some computations in rings of power series
Becker T. Journal of Symbolic Computation 10(2): 165-178, 1990. Type: Article
Dec 1 1992
Real zeroes of polynomials
Collins G. (ed), Loos R., Springer-Verlag New York, Inc., New York, NY, 1983. Type: Book (9780387817767)
Jun 1 1985
Fast parallel absolute irreducibility testing
Kaltofen E. (ed) Journal of Symbolic Computation 1(1): 57-68, 1985. Type: Article
Apr 1 1989
more...

E-Mail This Printer-Friendly
Send Your Comments
Contact Us
Reproduction in whole or in part without permission is prohibited.   Copyright 1999-2024 ThinkLoud®
Terms of Use
| Privacy Policy