Even though large language models (LLMs) continue to fuel innovations in the field of artificial intelligence (AI), researchers like Professor LeCun have highlighted a significant limitation: “There is absolutely no way ... we will ever reach human-level AI without getting machines to learn from high-bandwidth sensory inputs, such as vision” [1]. Simultaneously, efforts are underway to develop robots capable of zero-shot learning, exemplified by Tesla’s Optimus. This book is meticulously crafted to inspire and equip graduate and undergraduate students, researchers, and practitioners in the domains of computer vision and robotics. It enhances their knowledge with essential concepts from electrical engineering, mechanical engineering, physics, mathematics, statistics, and probability. Moreover, it provides a solid foundation for understanding the mathematical principles underpinning recent advancements in machine learning.
Selective topics can also be useful for high school and middle school students engaged in robotics projects. The inclusion of images showcasing diverse types of robots, influential mathematicians, physicists who have contributed to the field, and captivating facts about robotic components, the Global Positioning System (GPS), and more, serves to motivate younger students as well.
The book’s first nine chapters thoughtfully separate the mathematical and physical aspects of robotics, which are likely to remain pertinent despite the field’s rapid evolution. Chapter 2, which introduces special orthogonal groups and various representations, is identified as critical by the authors. They articulate a clear overview of other prerequisites in a chapter titled “Supplementary Information” at the book’s end, covering essential topics such as linear algebra, geometry, lie groups and algebras, Jacobians, Hessians, Kalman filtering, and more. Each chapter also includes a curated list for further reading. The book concludes with an extensive index of functions, classes, methods, apps, and models, serving as a model for future authors on how to simplify the learning of complex topics.
Engaging with the accompanying MATLAB codes, experimenting with them, and utilizing Simulink can further enhance understanding and facilitate the simulation of robots before actual construction. The inclusion of Simulink block diagrams in the book is also helpful. However, engaging with these tools is not a prerequisite for grasping the new concepts presented, as the authors systematically build upon fundamental principles throughout the text.
Chapter 11, dedicated to image and image processing, meticulously explains the journey from photons to pixel values (page 442). This is just one of the numerous instances where the book constructs complex concepts from fundamental principles.
Chapter 12 focuses on image feature extraction and reveals that the scope of the book does not cover neural networks and their training. This decision likely stems from a belief that a strong grounding in classical computer vision equips readers with the essential foundation needed to navigate the evolving landscape of the field. Despite this, the book incorporates a segment on object detection using deep learning (YOLO, page 519), acknowledging the rapid advancements within the domain almost daily. Capturing the field’s accelerated development in a single volume presents challenges, but future editions could benefit from discussing how zero-shot learning robots utilize simulated data and deep reinforcement learning. Additionally, revising the sections on navigation, localization, mapping, dynamics, and control to reflect learning mechanisms among robots or from humans could provide valuable insights.
Moreover, there is potential to expand chapter 14, which deals with image processing using multiple images, by introducing concepts such as 3D Gaussian splatting, various diffusion models for generating images and videos from text, and mature image classifiers like vision transformers and convolutional neural networks (CNNs).
The thoughtful presentation and depth of information on every page signals the authors’ genuine commitment to fostering readers’ understanding and learning. Their dedication ensures that this work will serve as a timeless resource for future generations of students, educators, researchers, and practitioners in robotics and computer vision.