Computing Reviews
Today's Issue Hot Topics Search Browse Recommended My Account Log In
Review Help
Search
Multimodal scene understanding
Ying Yang M., Rosenhahn B., Murino V., ACADEMIC PRESS, London, UK, 2019. 412 pp. Type: Book (978-0-128173-58-9)
Date Reviewed: Aug 3 2021

This edited book on multimodal scene understanding focuses on algorithms, applications, and deep learning. The topic of multimodal scene understanding is related to computer vision. The book’s 12 chapters are by several authors from universities across the globe.

The introductory chapter describes what multimodal scene understanding is all about. It provides a summary of the remaining 11 chapters. The authors of this chapter are in fact the editors of this book. They emphasize that multimodal scene understanding is crucial for many applications: “surveillance, autonomous driving, traffic safety, robot navigation, vision-guided mobile navigation systems, [and] activity recognition.”

The second chapter is on deep learning for multimodal data fusion. This chapter looks into “multimodal encoder–decoder networks to [tackle] the multimodal nature of multitask scene recognition.” The authors assess their method using two public datasets. The outcomes of the tests exemplify the strength of the proposed method.

Chapter 3, “Multimodal Semantic Segmentation: Fusion of RGB and Depth Data in Convolutional Neural Networks,” looks at the merger of “optical multispectral data (red-green-blue or near infrared-red-green) with 3D (and [particularly] depth [data]) information within a deep learning [convolutional neural network, CNN] framework.”

The fourth chapter, “Learning [CNNs] for Object Detection with Very Little Training Data,” treats “the problem of learning with very few labels.” The benefits of CNNs and random forests are aggregated to learn a patch-wise classifier.

Chapter 5, “Multimodal Fusion Architectures for Pedestrian Detection,” demonstrates a taxonomical rating of the executions of several multimodal feature fusion architectures in an effort to name “the optimal solutions for pedestrian detection.”

The sixth chapter is about multispectral person re-identification (Re-Id) using generative adversarial network (GAN) for color-to-thermal image translation. It discusses color-thermal cross-modality person re-Id. The chief takeaway is that thermal cameras partnered with the GAN Re-Id framework can improve Re-Id functioning in low-light conditions.

The seventh chapter, “A Review and Quantitative Evaluation of Direct Visual–Inertia Odometry,” blends visual and inertial sensor characteristics to resolve the “direct sparse visual–inertial odometry problem in the field of simultaneous localization and mapping (SLAM).”

Chapter 8, “Multimodal Localization for Embedded Systems,” showcases a study of “systems, sensors, methods, and application domains of multimodal localization.” Furthermore, several advances and “hardware configurations for specific practical applications ([such as] autonomous mobile robots)” and real-life products are reported.

Chapter 9, “Self-Supervised Learning from Web Data for Multimodal Retrieval,” deals with “the problem of self-supervised learning from image and text data.” Specifically, it deals with data that is usable and free. The chapter presents a careful and accurate analysis plus performance evaluation of “five different text embeddings in three different benchmarks.”

Chapter 10, “3D Urban Scene Reconstruction and Interpretation from Multisensor Imagery,” looks at “3D urban scene reconstruction based on the fusion of airborne and terrestrial images.”

Chapter 11, “Decision Fusion of Remote Sensing Data for Land Cover Classification,” includes two use cases that establish the effectiveness of the work and the limits of the suggested methods.

In the last chapter (12), “Cross-Modal Learning by Hallucinating Missing Modalities in RGB-D Vision,” “hallucinating” refers to comprehending what is not there. The chapter deals with the demanding question of how to acquire “robust representations [taking advantage of] multimodal data in the training stage, while considering limitations at test time, such as noisy or missing modalities.”

The book presents recent developments in the field of multimodal scene understanding. The emphasis is on algorithms, applications, and techniques from deep learning. The focus is on the use of multiple sources of information. All chapters include adequate references, and the index is helpful. Those dealing with computer vision, robotics, remote sensing, and photogrammetry, and also those handling data from multiple sources, will find the book useful.

Reviewer:  S. V. Nagaraj Review #: CR147324 (2112-0285)
Bookmark and Share
  Reviewer Selected
Featured Reviewer
 
 
Computer Vision (I.5.4 ... )
 
 
General (F.2.0 )
 
 
Learning (I.2.6 )
 
 
Vision And Scene Understanding (I.2.10 )
 
 
General (J.0 )
 
Would you recommend this review?
yes
no
Other reviews under "Computer Vision": Date
Machine vision
Vernon D., Prentice-Hall, Inc., Upper Saddle River, NJ, 1991. Type: Book (9780135433980)
Oct 1 1992
The perception of multiple objects
Mozer M., MIT Press, Cambridge, MA, 1991. Type: Book (9780262132701)
Mar 1 1993
Computer vision, models and inspection
Marshall A., Martin R., World Scientific Publishing Co., Inc., River Edge, NJ, 1992. Type: Book (9789810207724)
Jun 1 1993
more...

E-Mail This Printer-Friendly
Send Your Comments
Contact Us
Reproduction in whole or in part without permission is prohibited.   Copyright 1999-2024 ThinkLoud®
Terms of Use
| Privacy Policy