From the computer implementation point of view, polynomial arithmetic poses a challenge, though the underlying theory is fairly well understood. In this paper, the authors have presented several algorithms for doing modular polynomial multiplication in a single machine word. They have used a combination of standard techniques, such as Kronecker substitution, delayed reduction, simultaneous modular reduction, and Euclidean division by floating-point routines.
The objects of study are polynomials over small prime fields; the elementary operations to be performed on these polynomials, or on matrices based on these, are no more than addition, products, and scalar multiplication.
The strategy is to first pack the polynomials into integers using Kronecker substitution. The next step is to perform several modular operations simultaneously, either using machine integers or through floating-point arithmetic, exploiting the cache effects to improve speed. Since the coefficients are pre-prepared for the conversion, the technique helps to recover them efficiently.
Although it remains to be seen if the strategy works well even for larger precision, I found the paper to be interesting and useful for researchers and post-graduates in the mathematics of computing field.