By Alan Hanjalic and Martha Larson
Multimedia Information Retrieval Lab
Delft University of Technology
A.Hanjalic@tudelft.nl
M.A.Larson@tudelft.nl
Multimedia that cannot be found is, in a certain sense, useless. It is lost in a huge collection, or worse, in a back alley of the Internet, never viewed and impossible to reuse. Research in multimedia retrieval is directed at developing techniques that bring image and video content together with users—matching multimedia content and user needs. The aim of this tutorial is to provide insights into the most recent developments in the field of multimedia retrieval and to identify the issues and bottlenecks that could determine the directions of research focus for the coming years. We present an overview of new algorithms and techniques, concentrating on those approaches that are informed by neighboring fields including information retrieval, speech and language processing and network analysis. We also discuss evaluation of new algorithms, in particular, making use of crowdsourcing for the development of the necessary data sets.
The tutorial targets new scientists in the field of multimedia retrieval, providing instruction on how to best approach the multimedia retrieval problem and examples of promising research directions to work on. It is also designed to benefit active multimedia retrieval scientists—those who are searching for new challenges or re-orientation. The material presented is relevant for participants from both academia and industry. It covers issues pertaining to the development of modern multimedia retrieval systems and highlights emerging challenges and techniques anticipated to be important for the future of multimedia retrieval.
The tutorial begins with an overview of “multimedia search in the wild” that covers how and when we use multimedia access and retrieval technologies in our lives, both personal and professional. These considerations serve to inform the selection of research challenges faced by multimedia retrieval as the field continues to grow and expand. The main body of the presentation focuses on possibilities for exploiting and combining available information resources to optimize multimedia search results in view of these usefulness issues. We concentrate on three complementary information sources:
- User: Exploiting the interaction of the user with the search system, either to enhance the query so that it better reflects the user information need and search intent, or to enrich the collection with implicit or explicit metadata. Approaches discussed include: transaction log analysis, context modeling in multimedia search, (visual) query suggestion and user-supported query expansion.
- Collection: Exploiting the information inherent in the relationships that exist in the collection and in the search environment, for example, similarities between documents and connections among users. Two categories of approaches and techniques working in this direction will be discussed:
- Maximizing the quality of the top-ranked search results using IR concepts and cross-modal analysis through e.g., (visual) search reranking, query-class-dependent search and query performance prediction,
- Integrating social information from networked communities, including use of community-contributed metadata and techniques for exploiting social networks. A case study will address the problem of non-trivial collaborative recommendation expanding the current scope based on the classical collaborative filtering concept.
- Content: Exploiting all information channels in the content collection itself. Automatic indexing systems (e.g., speech recognition, audio event detection, semantic concept detection) are well known for their imperfections. The key to improving the usefulness of multimedia search is building systems that can elegantly handle their own shortcomings. Instead of endless resistance, multimedia search paradigms are required that can robustly deal with noise and present the user with a result with the highest possible utility.
- Making use of confidence scores (e.g., effective use of imperfect results, informing user when a result may be less than satisfactory),
- Exploiting characteristics of multimedia items that are revealed using simple methods of structural analysis,
- Integrating information from external sources to reduce influence of indexing noise.
The final section of the tutorial examines the opportunities to formulate research topics that are closely related to the needs of the user and to carry out work in the newly evolving multimedia search paradigms. A critical aspect of tackling new tasks is developing the data sets necessary to evaluate them. We give an introduction to the practice of crowdsourcing for the generation of data sets suited for the evaluation of new multimedia retrieval algorithms, including best practices for task design and quality control. Tasks and data sets are also offered to the multimedia retrieval research community by benchmarking initiatives. We conclude the tutorial with a short presentation of the MediaEval benchmark, which offers multimedia retrieval tasks concentrated on social and contextual challenges of multimedia, including geo-coordinate prediction, genre detection and prediction of viewer affective response.
Presenter’s biography: Dr. Alan Hanjalic is Associate Professor and Coordinator of the Delft Multimedia Information Retrieval Lab at Delft University of Technology, Netherlands. Research interests and expertise of Dr. Hanjalic are in the broad areas of multimedia computing, with focus on multimedia information retrieval and personalized multimedia content delivery. In his areas of expertise Dr. Hanjalic (co-)authored more than 100 publications, among which the books titled Image and Video Databases: Restoration, Watermarking and Retrieval (Elsevier, 2000), Content-Based Analysis of Digital Video (Kluwer Academic Publishers, 2004) and Online Multimedia Advertising (IGI Global, 2010). He was a visiting scientist at Hewlett-Packard Labs, British Telecom Labs, Philips Research Labs and Microsoft Research Asia. Dr. Hanjalic has been on Editorial Boards of the IEEE Transactions on Multimedia, IEEE Transactions on Affective Computing, Journal of Multimedia, Advances in Multimedia and the Image and Vision Computing journal. He was also a Guest Editor of special issues in a number of journals, including the Proceedings of the IEEE (2008), IEEE Transactions on Multimedia (2009), and Journal of Visual Communication and Image Representation (2009). He has also served in the organization committees of the major conferences in the multimedia field, including the ACM Multimedia Conference (General Chair 2009, Program Chair 2007), ACM CIVR conference (Program Chair 2008), ACM ICMR conference (Program Chair 2011), the WWW conference (Track Chair 2008), Multimedia Modeling Conference (Area Chair 2007), Pacific Rim Conference on Multimedia (Track Chair 2007), IEEE ICME (Track Chair 2007), and the IEEE ICIP conference (Track Chair 2010 and 2011). Dr. Hanjalic was a Keynote Speaker at the Pacific-Rim Conference on Multimedia, Hong-Kong, December 2007 and is an elected member of the IEEE TC on Multimedia Signal Processing.
Presenter’s biography: Dr. Martha Larson is a senior researcher in the area of at the Delft University of Technology in the Multimedia Information Retrieval Lab. Her expertise is in the area of speech and language technology for multimedia search with a focus on networked communities. Before joining the group at Delft, she researched and lectured in the area of audio-visual retrieval in the NetMedia group at Fraunhofer IAIS and at the University of Amsterdam. Martha Larson holds a MA and PhD in theoretical linguistics from Cornell University and a BS in Mathematics from the University of Wisconsin. She is an organizer of MediaEval, a multimedia retrieval benchmark campaign that emphasizes spoken content and social media. She is a guest editor of the upcoming ACM TOIS special issue on searching spontaneous conversational speech. Recently, much of her research focused on deriving and exploiting information from multimedia that is “orthogonal to topic”, not directly related to subject matter. Examples include information on user-perceived quality, affective impact and social trust. Such information can be used to improve the quality of multimedia search. Her research interests also include user-generated multimedia content, cultural heritage archives, indexing approaches exploiting multiple modalities, techniques for semantic structuring of spoken content and methods for reducing the impact of speech recognition error on speech-based retrieval. She has participated as both researcher and research coordinator in a number of projects including the EU-projects PetaMedia, MultiMatch and SHARE.