Computing Reviews, the leading online review service for computing literature.

Search

Speeding up CRC32C computations with Intel CRC32 instruction
Gueron S. Information Processing Letters112 (5):179-185,2012.Type:Article

Date Reviewed: Jun 11 2012

Gueron introduces an algorithm to speed up the CRC32C implementation using the corresponding central processing unit (CPU) CRC32 instruction from Intel to compute cyclic redundancy checks of an arbitrary length buffer with inputs ranging from eight to 64 bits (powers of two). The paper is a perfect illustration of classical pipelining techniques and throughput-oriented speedup, which are still pertinent today. Figure 3 is a classic. These techniques are taught to undergraduate students in computer science, software, and computer engineering during their system hardware, system software, and assembly programming courses. Thus, the paper has a lot of educational value where classical techniques are used in actual industrial practice. I strongly believe the content of this paper should become a part of a textbook on assembly and computer organization. The paper discusses how to speed up CRC32C computations by a factor of at least three with the Intel instruction, a significant improvement over sequential and previous results. A speed increase like this can be applied to remote direct memory access (RDMA) and the Internet small computer system interface (iSCSI) protocol stack, where CRC32C computation can be a bottleneck when doing integrity checks on communicated data transfers. The author further improves speed using the hardware CRC32 instruction (capable of processing up to eight bytes with a latency of three CPU cycles) instead of a serial instruction, by logically splitting the input data buffer into three disjoint chunks, processing them with CRC32 in a pipelined manner, and then recombining the results. This fully utilizes the CPU’s pipelined hardware, which Gueron makes throughput oriented rather than latency bound. The author provides very good, easy-to-read background information, bit arithmetic, and theoretical foundations, and a detailed illustration of the instruction and algorithms. The paper concludes with performance results.

Reviewer: Serguei A. Mokhov	Review #: CR140251 (1210-1067)

Algorithms (I.1.2 )

Algorithm Design And Analysis (G.4 ... )

Software Engineering (D.2 )

Would you recommend this review?

yes

Other reviews under "Algorithms":	Date

Standard bases and some computations in rings of power series Becker T. Journal of Symbolic Computation 10(2): 165-178, 1990. Type: Article	Dec 1 1992

Real zeroes of polynomials Collins G. (ed), Loos R., Springer-Verlag New York, Inc., New York, NY, 1983. Type: Book (9780387817767)	Jun 1 1985

Fast parallel absolute irreducibility testing Kaltofen E. (ed) Journal of Symbolic Computation 1(1): 57-68, 1985. Type: Article	Apr 1 1989

more...

Reproduction in whole or in part without permission is prohibited. Copyright 1999-2024 ThinkLoud^®
Terms of Use | Privacy Policy