This paper draws on an interplay of three disciplines: the classical theory of regular languages and finite automata; their generating series treated by analysis; and statistical quantities that can be obtained from these series. The paper discusses an application of this interplay to problems of string matching, especially those problems arising from molecular biology. The text includes an appendix that illustrates actual computations of some of the statistics in the computer algebra system MAPLE.
Under Markov models of the text string, the chief statistical quantities studied are the expected time to first match in the text for a given pattern, the corresponding standard deviation, and conditional variants for these. The basic conditional case involves the expected time for first match of pattern P1, given that pattern P2 occurs in the text.
Although the treatment is elementary, the paper is not self-contained. Readers would benefit from a review of concepts in generating series-based statistical calculations [1,2], and of appropriate standard reference material from automaton theory. One issue not taken up in the paper, but undoubtedly considered in implementations such as in MAPLE, is the sheer combinatorial explosion arising from product and subset constructions for automata.