MediaEval 2011
June 2011
MediaEval (http://www.multimediaeval.org/) is a benchmarking initiative that offers tasks promoting research and innovation on multimodal approaches to multimedia annotation and retrieval. Its focus is on speech, language, context and social aspects of multimedia, in addition to visual content. The MediaEval 2011 benchmarking season culminates with the MediaEval 2011 workshop, held on 1&2 September MediaEval 2011 Workshop at Santa Croce in Fossabanda, Pisa, Italy. The workshop is an official satellite event of Interspeech 2011 (http://www.interspeech2011.org), the 12th Annual Conference of the International Speech Communication Association (ISCA).
Currently, the 2011 season of MediaEval is under way. For each task, participants receive a task definition, task data and accompanying resources (dependent on task) such as shot boundaries, keyframes, visual features, speech transcripts and social metadata. Participants are tackling the following tasks in the 2011 MediaEval season: Genre Tagging Given a set of genre tags (how-to, interview, review etc.) and a video collection, participants are required to automatically assign genre tags to each video based on a combination of modalities, i.e., speech, metadata, audio and visual (Data: Creative Commons internet video, multiple languages mostly English) Rich Speech Retrieval Given a set of queries and a video collection, participants are required to automatically identify relevant jump-in points into the video based on the combination of modalities, i.e., speech, metadata, audio and visual. The task can be approached as a multimodal task, but also as strictly a searching speech task. (Data: Creative Commons internet video, multiple languages mostly English) Spoken Web Search This task involves searching FOR audio content WITHIN audio content USING an audio content query. It is particularly interesting for speech researchers in the area of spoken term detection. (Data: Audio from four different Indian languages -- English, Hindi, Gujarati and Telugu. Each of the ca. 400 data item is an 8 KHz audio file 4-30 secs in length) Affect Task: Violent Scenes Detection This task requires participants to deploy multimodal features to automatically detect portions of movies containing violent material. Any features automatically extracted from the video, including the subtitles, can be used by participants. (Data: A set of ca. 15 Hollywood movies that must be purchased by the participants) Social Event Detection This task requires participants to discover events and detect media items that are related to either a specific social event or an event-class of interest. By social events we mean that the events are planned by people, attended by people and that the social media are captured by people. (Data: A large set of URLs of videos and images together with their associated metadata) Placing Task This task involves automatically assigning geo-coordinates to Flickr videos using one or more of: Flickr metadata, visual content, audio content, social information (Data: Creative Commons Flickr data, predominantly English language) MediaEval is coordinated by the EU FP7 PetaMedia Network of Excellence http://www.petamedia.eu and also by the ICT Labs of EIT http://eit.ictlabs.eu/ and is made possible by the many projects, institutions and researchers that contribute to the organization of the individual tasks. For more information on MediaEval, please contact Martha Larson m.a.larson@tudelft.nl.
|
|||
|
|||
|
|||
|