This paper uses a supervised dictionary learning method to classify music genres. It shows how text-like representation captures relevant music information. The main results prove the high accuracy of the method, with references to other state-of-the-art developments.
The method starts with an online dictionary learning (ODL) method, which is a “first-order stochastic gradient descent algorithm” that scans a training set and processes an element by alternating a sparse coding step for computing the codeword decomposition. The main algorithm proposed in the paper, known as the supervised dictionary learning (SDL) algorithm, incorporates ground truth labels in the dictionary learning phase to enlarge the differences between the learned codewords.
Using SDL, the GTZAN dataset (“composed of 1,000 30-second clips covering ten genres”) achieved 84.7 percent accuracy for music genre classification and the ISMIR2004Genre dataset (“1,458 full-length songs covering six genres”) achieved 90.8 percent accuracy. Overall, the methodology improves on the bag-of-frames (BOF) model, which represents each song “as a histogram over a dictionary of music ‘codewords’ selected or learned from a music collection,” by applying techniques from dictionary learning and sparse coding to music information retrieval.
The main contributions of the paper reside in the results section, where the authors benchmark multiple encoding and construction techniques, proving that the sparsity-enforced dictionary learning method achieves the highest accuracy. Most importantly, the authors note that the entire framework can be easily applied to other multimedia retrieval problems.