Optical character recognition (OCR) technologies are advancing fields like pattern recognition, machine learning, and artificial intelligence. However, character recognition is solely dependent on line and/or character segmentation. This becomes challenging when it encompasses online or offline cursive characters, and even more challenging when characters touch. This paper addresses this issue.
The authors consider the segmentation of touched characters and categorize the methods into two classes: explicit and implicit character segmentation. Further, they observe six different types of problems related to touched character segmentation and stress the recognition-based segmentation technique. The authors found that overdetection, underdetection, heavily touching characters, and several detected valleys affect the segmentation performance of explicit methods. In addition, “two character candidates absent in touched character pattern,” “single character as touched characters,” and “touching character indifferent as two candidate characters” are the remaining problems in implicit segmentation methods.
The authors perform comparative evaluations of the methods and focus on the pros and cons of each. They claim to have calculated segmentation rates and provide discussions on the test data and the methods involved. The authors specifically highlight applications like bank check recognition, postal services, and pharmaceutical services that have relatively small data sets. The authors also speculate on the possibility of having systems that can process data faster and produce precise results in a reasonable amount of time.
This paper focuses on theoretical concepts of character segmentation and their real-time implementations to OCR, making it worth reading.