Deep learning (DL) algorithms, characterized by mapping from input to output (labels or classes) with multiple hidden layers in between, have revived the excitement of artificial intelligence (AI) to get closer to its initial vision of building intelligent machines. The major DL network configurations include recursive neural networks (RvNNs), recurrent neural networks (RNNs), convolutional neural networks (CNNs), and deep generative networks such as deep belief networks (DBNs), deep Boltzman machines (DBMs), generative adversarial networks (GANs), and variational autoencoders (VAE).
Many variations of RNNs and CNNs have been applied to natural language processing (NLP) tasks such as sentiment analysis, translation, paraphrase identification, text summarization, and question answering (QA). CNNs have been dominant in visual data processing tasks such as image classification, resulting in many variations, for example, LeNet-5, VGGNet, GoogleNet, ResNet, RestNeXT, AlexNet, and so on, with varying numbers of inner network layers to enhance accuracy levels. In addition, CNN variations such as R-CNN, fast R-CNN, and “you only look once” (YOLO) are used for object recognition in images. For autonomous driving and medical applications, fully convolutional networks (FCNs) or mask RNNs are used for the semantic segmentation tasks to achieve pixel-level understanding of images, while recurrent convolutional networks (RCNs) show better performance for video processing.
Automatic speech recognition (ASR) technologies that apply to speech recognition (or speech transcription), speech enhancement, and phone and music classification tasks, have advanced through DBN, deep RNN, Deep LSTM, and hybrid models combining RNNs and CNNs. Applications include speech sentiment analysis and speech enhancements.
Although DNNs “improve the learning performance, broaden the scopes of applications, and simplify the calculation process,” they require extremely long training times and are sensitive to training data size and model parameters. “To simplify the implementation process and boost the system-level development,” efforts have focused on advanced techniques and frameworks that “combine the implementation of modularized DL algorithms, optimization techniques, distribution techniques, and support to infrastructures.” These frameworks include TensorFlow, Theano, MXNet, Torch, Caffe, DL4j, CNTK, and Neon.
The survey serves as a good summary for AI model developers who consider DNN solutions for given tasks in different domains. However, it is uncertain if these trained DNNs can be reused in their given tasks. What may be the ultimate goal of DNN research and development? Is improving accuracy and reducing errors through experiments and competition the only goal? One goal should be a catalog of trained DL networks that can be reused for similar tasks, regardless of the domain, as long as datasets are of the same type. So far, the bulk of the problem solutions in DNNs are devoted to finding the proper hyperparameters and fine-tuning the networks (that is, optimizing the number of layers, reducing errors, and shortening the training time), but there is no good guide other than trial and error experiments.