Computing Reviews
Today's Issue Hot Topics Search Browse Recommended My Account Log In
Review Help
Search
Who’s bigger? : Where historical figures really rank
Skiena P., Ward C., Cambridge University Press, New York, NY, 2014. 408 pp. Type: Book (978-1-107041-37-0)
Date Reviewed: Apr 8 2014

In order to assign people a numerical value rank, called their “historical significance,” the authors of this book use quantitative analysis: they analyze the contents of all English Wikipedia pages, and they use analyses by Google on all of the English books scanned. In both cases, they only search for proper names. The results are normalized and combined, according to a formula that is not exactly explained. From this, we get a ranking: an ordered list of all the historically significant people, including Jesus (ranked 1), Adolf Hitler (ranked 7), baseball players (Babe Ruth - 434), US presidents (Richard Nixon - 82), philosophers (Immanuel Kant - 59), and so on, ad nauseam.

The first question, of course, is, “What is the point of any such ranking?” At least the Pantheon project of MIT (http://pantheon.media.mit.edu/) states on its first page: “Small differences in ranking (i.e., who is first, second, or tenth) are not statistically meaningful and should not be used to draw strong conclusions about the popularity of similarly ranked individuals, or for press headlines.” In this book, such a caveat does not appear. In fact, the authors seem to think that this is an intrinsically useful exercise. Moreover, when their rankings more or less agree with other rankings, they use this fact as a confirmation that their method is correct.

There are many deficiencies of this method. The first, acknowledged from time to time by the authors, is that the data corpus is very strongly English-biased, or more specifically, US-biased. By reading the book, people whose mother tongue is Hindi, Mandarin, or Spanish, just to mention a few of the most-spoken languages in the world, will learn about rankings of US presidents, baseball players, or popular writers in the English language, but will learn almost nothing about their own countries’ cultures.

Another problem is that a name is an ambiguous way to designate a specific person. The combination of a first name and a last name is slightly better, but it can occur in many different ways in a document, for example, Mozart (which can be Mozart’s father, Leopold), Wolfgang Mozart, Wolfgang Amadeus Mozart, W. A. Mozart, and so on. Some significant people share the same name, and moreover it’s difficult to distinguish a name like David or Augustus, referring to the historical person of that name, from the use of these as the first names of people with the last names of Smith or Adams.

A third deficiency, perhaps the most important one, is that the text analysis counts the number of occurrences of a name, but not the number of times it is read. Admittedly, the PageRank algorithm used by Google tries to get around this problem by traveling along the internal links of hypertexts, but this algorithm is made for extracting the most meaningful pages answering some request, not for ranking the historical significance of people. Yet the authors use this algorithm for extracting data from Wikipedia pages.

The use of Wikipedia (English-language pages) as a source of universal knowledge is yet another important deficiency. All of the statistics used by the authors tell us interesting things about the contents of these pages and the frequency of occurrence of some proper names in them, but the rationale that transforms that information into a ranking of historical significance is misleading.

Using the large database of scanned books built by Google may be a better corpus of knowledge. However, in addition to the fatal limitation of considering only English books (or books translated into English), this gives the same weight to a book with 3,000 printed copies that is read only once by all of its owners, as to a book with millions of printed copies that are frequently read by its owners.

Another methodological deficiency is the way the authors try to validate their rankings by comparing them with others. The rationale is that if there is a good correlation between their rankings and those of the Hall of Fame for Great Americans in the Bronx, both rankings are validated. On one hand, this correlation is not very strong. On the other hand, it does not validate anything, since both rankings can be wrong. And the method used for choosing great Americans in the Bronx Hall of Fame is a typical example of a process that can lead to any result: so-called important people were chosen as voters, the voting process did not limit the number of ballots given to any voter, and strong lobbying biased the results.

This can be said about many voting systems, unfortunately. Consider the choice of the electing committee, of the way ballots are collected, and of the innumerable possibilities of distorting the result by lobbying or deviating from the voting process. The election of a US president should be a model of the perfect way to choose the best candidate, but in fact the chosen president can cast fewer ballots than his opponent. The rankings in musical competitions are decided by a system that does not need all of the members of the jury to agree about the top competitor, although it’s the only way to get an irrefutable result.

Unfortunately, when the authors move away from their basic background of knowledge, they accumulate factual errors, including the paternity of inventions, the main characteristics of an important person, or the date of foundation of European countries. The United Kingdom is supposed to have been founded in 1762, not too far from the actual year, which is 1707 (the year of the Union Act). But Italians will be surprised to learn that their country was founded in 1946 instead of 1860, and Germans will be astonished to learn that their country was founded in 1919 instead of 1871. The most surprised will be the French, who still think their country was founded in 987 by Hugues Capet, and are told that the correct year is in fact 1959. Another surprising view of geography is the one that considers Morocco, Algeria, and in fact all Muslim African countries to be a part of the Middle East.

The book is divided into two parts. In Part 1, “Quantitative History,” we learn about the technicalities of the ranking method, but some chapters are more surprising. One full chapter is devoted to comparing the rankings with the data found in the US history book of Bonnie, the fifth-grade daughter of the first author.

Part 2, “Historical Rankings,” includes eight chapters. Within these chapters, American readers may find rankings of people they know something about, but other readers will be more disappointed. The advantage of these chapters is that they avoid comparing a US mayor with a popular writer, or an Olympic sport star with a celebrated killer. But what is the purpose of all that?

The combination of factual errors, poorly designed methodology, and analytical sloppiness does not allow me to recommend this book.

More reviews about this item: Amazon, Goodreads

Reviewer:  O. Lecarme Review #: CR142146 (1406-0412)
Bookmark and Share
  Reviewer Selected
Featured Reviewer
 
 
Content Analysis And Indexing (H.3.1 )
 
 
Value of Information (H.1.1 ... )
 
 
General (A.0 )
 
Would you recommend this review?
yes
no
Other reviews under "Content Analysis And Indexing": Date
Personal bibliographic indexes and their computerisation
Heeks R., Taylor Graham Publishing, London, UK, 1986. Type: Book (9789780947568115)
Sep 1 1987
Development of a term association interface for browsing bibliographic data bases based on end users’ word associations
Pejtersen A., Olsen S., Zunde P., Taylor Graham Publishing, London, UK, 1987. Type: Book (9780947568306)
Nov 1 1989
Transforming text into hypertext for a compact disc encyclopedia
Glushko R. ACM SIGCHI Bulletin 20(SI): 293-298, 1989. Type: Article
May 1 1990
more...

E-Mail This Printer-Friendly
Send Your Comments
Contact Us
Reproduction in whole or in part without permission is prohibited.   Copyright 1999-2024 ThinkLoud®
Terms of Use
| Privacy Policy