Computing Reviews
Today's Issue Hot Topics Search Browse Recommended My Account Log In
Review Help
Search
Cross-modality feature learning via convolutional autoencoder
Liu X., Wang M., Zha Z., Hong R. ACM Transactions on Multimedia Computing, Communications, and Applications15 (1s):1-20,2019.Type:Article
Date Reviewed: Dec 10 2020

This paper contributes to a hot research area that is the focus of many scientists, developers, and large corporations. The reason for the interest is that many important systems, for instance, ones for social media or data collection, produce large-scale multimedia datasets. Investigation by so-called “handcrafted features” becomes unsuitable for many non-numeric data types, such as text or pictures.

For many non-numeric data types, interesting features can be learned from the data itself. Different kinds of cross-modal feature learning are used in heterogeneous datasets/data stream analysis. Deep learning methods, among others, have been developed both for auto-encoding a data type (aiming at feature learning) and for attuned analysis of the determined component features of heterogeneous data.

For this purpose, the authors develop a sophisticated convolutional neural network (CNN), called multimodal convolutional autoencoder (MUCAE), and further develop some existing architectures. They use learning representative features from two modalities--pictures represented by image pixels, and text characters--to evaluate the method. To exploit the correlation between the hidden representations from the two modalities, the unified framework integrates an autoencoder and an objective function. The system jointly minimizes the representation learning error of each modality and the correlation divergence between different modalities.

The authors define the problem and describe the solution on an abstract level, showing the mathematical thoughts and the 11 levels of their CNN. There is no reference to the environment of the implementation; one can presume only that some of the powerful and popular tools and packages are used.

Some related work on multimodal, supervised, and unsupervised deep feature learning is enumerated. The paper contains precise figures about the efficiency of the implementation on two datasets: MIRFlickr, and a subset of NUS-WIDE. These results are compared to those of five former systems developed over the past decade. According to these results, MUCAE outperformed the others by two to ten percent for joint character-picture data analysis. The main parameters used in the algorithm are discussed. The behavior of the method depending on the size of the input dataset is not investigated.

By concretizing a bit of the essence of the abstract, the paper’s conclusion summarizes the approach, the method, the experiments, and the results. Neither intended (or further) developments of the method nor future directions are discussed.

I recommend the paper only for active specialists in the area.

Reviewer:  K. Balogh Review #: CR147134 (2104-0085)
Bookmark and Share
 
Content Analysis And Indexing (H.3.1 )
 
 
Neural Nets (I.5.1 ... )
 
Would you recommend this review?
yes
no
Other reviews under "Content Analysis And Indexing": Date
Personal bibliographic indexes and their computerisation
Heeks R., Taylor Graham Publishing, London, UK, 1986. Type: Book (9789780947568115)
Sep 1 1987
Development of a term association interface for browsing bibliographic data bases based on end users’ word associations
Pejtersen A., Olsen S., Zunde P., Taylor Graham Publishing, London, UK, 1987. Type: Book (9780947568306)
Nov 1 1989
Transforming text into hypertext for a compact disc encyclopedia
Glushko R. ACM SIGCHI Bulletin 20(SI): 293-298, 1989. Type: Article
May 1 1990
more...

E-Mail This Printer-Friendly
Send Your Comments
Contact Us
Reproduction in whole or in part without permission is prohibited.   Copyright 1999-2024 ThinkLoud®
Terms of Use
| Privacy Policy