Comic books (now more pompously named “graphic novels”) contain complex visual structures, more difficult to recognize than conventional text. Rigaud et al. designed a multi-level method to analyze them, considering a typical image as a sequence of panels, each with some people or animals and possibly balloons containing text. They use existing techniques to do low-level recognition of panel boundaries, texts, balloons, and the like; they had to add a method to recognize the “tail” that points from a balloon of text to the comic strip character speaking. They also have rules such as “each line of text must be in only one balloon.” After identifying the low-level features, they apply constraints to infer a high-level description that recognizes which characters are where, and which are saying what. As their system learns about the comic, it polishes its recognition by posing hypotheses and validating them, and then moving to further inferences.
This is an interesting application of graph grammars to a difficult and practical problem, and a use of knowledge representations to connect separately recognized elements in the images. The paper includes an evaluation based on a public database from France; text accuracy is low, but balloon and character recognition are fairly good. This paper is worth reading as an example of an overall strategy for an ambitious problem.