By Gael Richard
Telecom ParisTech, France
Gael.Richard@telecom-paristech.fr
1 Motivation
The enormous amount of unstructured audio data available nowadays and the spread of its use as a data source in many applications are introducing new challenges to researchers in information and multimedia signal processing. Automatic analysis of audio documents (music, radio broadcast audio streams,...) gathers several research directions including audio indexing and transcription (extraction of informative features leading to audio content recognition or to the estimation of high level concepts such as melody, rhythm, instrumentation or harmony,...), audio classification (grouping by similarity, by music genre or by audio events categories) and content-based retrieval (such as query by example or query by humming approaches).
In this context, the general field of Music signal Processing is receiving a growing interest and becomes more relevant and more visible in the audio community. Nevertheless, if much work is tackled in audio and music signal processing it is somewhat often presented only in specialized music or audio signal processing conferences. In the multimedia community, the focus of interest is often on the image or video signal with less emphasis on the audio signal and its potential for analyzing or interpreting a multimedia scene.
The aim of the proposed tutorial is then to provide a general introduction of audio signal processing which should be of broad interest for the multimedia community, to review the state of the art in music signal processing (this will be largely based on [1]) and to highlight with some examples the potential of music signal processing for multimedia streams.
2 Intended Audience and Benefices
The tutorial will mostly target an intermediate audience which has some knowledge in multimedia but may not be familiar with audio and music signals. The tutorial will nevertheless include some more advanced concepts which should also be of broad interest to students, researchers and engineers who are more knowledgeable in audio but who are not familiar with decomposition models or audio source separation principles.
The expected benefices for the multimedia community include:
- a better understanding of audio processing basics and potential for multimedia streams indexing. The tutorial will also include a brief presentation of existing open source tools which allow to rapidly design an audio indexing module.
- a better understanding of decomposition models for music signals and how they can be efficiently used to represent the signal as a set of objects or sources (with application to audio source separation). This will be illustrated using a number of sound examples in the context of karaoke applications or other audio remixing applications.
- a better understanding of the potential of multimodality through multimedia music examples. The tutorial will highlight this aspect on using two specific examples (a multimedia drum transcription system and a cross modal music video search).
3 Tutorial Content
As outlined above, the objective of the tutorial is first to introduce some basics of music signal processing, to provide some more insights on decomposition models which are at the heart of a number of audio signal processing methods and then to illustrate on some well chosen examples how audio processing tools are particularly interesting for music multimedia streams processing.
The tutorial is scheduled on half a day and is structured in four main parts:
- Introduction: this section will provide a general introduction on the domain of audio andmusic signal processing and will illustrate the interest of this domain through a number of recent applications [1]. A typical architecture of an audio classification system will also be given and further discussed on an illustrative music indexing task (e.g. music instrument automatic recognition).
- Signal representations and decomposition models: this section will start with the traditional Fourier representation of audio signal and related transformations which are particularly well suited for music signal analysis (Constant Q transform, Mel-Frequency transform, chromagrams,...). Decomposition models will then be rapidly presented and will include in particular greedy decomposition models and factorization models which are becoming popular for a wide variety of problems.
- Application: the signal representations and decomposition models will then be applied to music sources separation with examples on singing voice extraction, drum track separation and bass line separation.
- Multimodality: the potential of multimodality in music signal processing will then be highlighted through two specific examples: a multimodal drum track separation and audiobased video search for music videos. This part will also be the occasion to discuss some very early results on experiments conducted on the fully multimodal, multiple sensors database released for the ACM Multimedia 2011 Grand challenge sponsored by Huawei/3Dlife [1].
Biography
Prof. Gaël Richard received the State Engineering degree from Telecom ParisTech, in 1990, the Ph.D. and Habilitation à Diriger des Recherches degrees from University of Paris-XI, respectively in 1994 and 2001. He then spent two years at the CAIP Center, Rutgers University (USA), in the speech processing group of Prof. J. Flanagan, where he explored innovative approaches for speech production. Between 1997 and 2001, he successively worked for Matra Nortel Communications and Philips Consumer Communications. In particular, he was the project manager of several large-scale European projects in the field of multimodal speaker verification and audio processing. He then joined Télécom ParisTech where he is now full Professor and Head of the Audio, Acoustics and Waves research group. Co-author of over 80 papers and inventor in a number of patents, he is also an expert for the European Commission in Audio and Multimedia signal processing. Pr. Richard is a member of the EURASIP, senior member of IEEE, Associate Editor of the IEEE Transactions on Audio, Speech and Language Processing and member of the Audio Acoustics Signal Processing Technical committee of the IEEE. http://www.telecom-paristech.fr/~grichard/
References
[1] M. Mueller, D. Ellis, A. Klapuri, and G. Richard. Signal processing for music analysis. IEEE Journal on Selected Topics in Signal Processing, 2011, To appear.