Computing Reviews, the leading online review service for computing literature.

Search

How good is the factor concept in estimating the average number of character comparisons per item in string sorting?
Mustafa S., Masoud F. Journal of Systems and Software57 (3):221-226,2001.Type:Article

Date Reviewed: Oct 29 2002

For decades, sorting has been a popular topic. Sorting strings, however, has attracted less attention than sorting numbers. In particular, there have not been many metrics proposed to assist in the analysis of string sorting. Previous efforts mainly provided simplistic, average-case formulas. This paper reports the results of an empirical study to determine the average number of character comparisons during string sorting, as well as the required run times, assuming that strings (in English and Arabic) are sorted using the quicksort algorithm. The paper uses the concept of critical factor (CF), which is defined as the position number where two strings disagree in a character-by-character comparison. The paper also introduces the notion of average CF, which is defined over a set of sorted strings, and the notion of run-time average CF. Data strings were extracted from bibliographic databases: the data was not simple words, but rather sentences. Using the above metrics, the authors performed an extensive experiment to investigate differences between the English and Arabic languages. Interestingly, the average CF ranged from 1 to 7 for about 50 percent of English cases and about 60 percent of Arabic cases, whereas 24 percent of the Arabic strings and 16 percent of the English strings disagreed in the first, second, or third letter. Several similar remarks are extracted by interpreting the results. The paper is an interesting first step towards the empirical analysis of string sorting.

Reviewer: Yannis Manolopoulos	Review #: CR126577 (0301-0069)

Sorting And Searching (F.2.2 ... )

Would you recommend this review?

yes

Other reviews under "Sorting And Searching":	Date

Binary search trees of almost optimal height Anderson A., Icking C., Klein R., Ottmann T. (ed) Acta Informatica 28(2): 165-178, 1990. Type: Article	May 1 1992

More nearly optimal algorithms for unbounded searching, part II Reingold E., Shen X. SIAM Journal on Computing 20(1): 184-208, 1991. Type: Article	Apr 1 1992

A simple linear-time algorithm for in situ merging Mannila H., Ukkonen E. Information Processing Letters 18(4): 203-208, 1984. Type: Article	Feb 1 1985

more...

Reproduction in whole or in part without permission is prohibited. Copyright 1999-2024 ThinkLoud^®
Terms of Use | Privacy Policy