SIGMM Inaugural Workshop on Multimedia Frontiers

- with Invited Presentations by Multimedia Rising Stars

Brisbane, Australia
October 26, 2015, 9am – 5pm
In celebration of the rising leadership of the multimedia community, we are launching a new SIGMM Workshop on Multimedia Frontiers, a prestigious event exclusively reserved to highlight invited talks by rising stars who have received PhD degree within the last 10 years and demonstrated exceptional potential in multimedia research. The workshop will highlight oral presentations by the rising stars, each of 20-30 minutes length, followed by comments from other senior leaders in the related fields. Our goal is to use this event to recognize the outstanding research achievements made by the rising members of SIGMM, and at the same time, for them to share their exciting vision with the broad community. We expect the ignited dialogs among the rising stars, the senior members, and the community at large will help shape the direction and inspire new ideas in the multimedia community.
The workshop will be held on October 26, 2015, one day prior to the SIGMM flagship conference, ACM Multimedia Conference. This event will be freely available to all participants in ACM Multimedia Conference. We sincerely invite everyone interested in the emerging trends and frontiers of multimedia to join us in this inaugural endeavor!
9:00 am – 9:10 am
Opening Remarks
Shih-Fu Chang, Columbia University (SIGMM Chair)
9:10 am – 10:40 am
1. Cees Snoek, University of Amsterdam and Qualcomm
Title: What objects tell about actions
This talk considers the problem of automatic classification and localization of human actions in video. Whereas motion is the key ingredient in modern approaches, we assess the benefits of having objects in the video representation. Rather than considering a handful of carefully selected and localized objects, we study the benefit of encoding 15,000 object categories for action recognition. We show that objects matter for actions, and are often semantically relevant as well. We establish that actions have object preferences and that the object-action relations are generic. We show how objects can be leveraged to recognize and localize actions in video without the need for any video or action example.
Moderated by Shuicheng Yang, National University of Singapore
2. Cheng-Hsin Hsu, National Tsinghua University
Title: Minions in the Cloud and Crowd: Fog Computing for Multimedia
Minions in the cloud hide in far-away data-centers and are unfriendly to multimedia applications. The fog-computing paradigm pushes minions toward edge networks; we adopt a generalized definition, where minions get into heterogeneous devices owned by the crowd. Managing the minions in the fog is challenging: for example, end devices may move into wireless dead-zones, run out of battery, and be turned off, leading to severe uncertainty. In this talk, we share our experience on utilizing resources form the crowd to optimize multimedia applications. The lessons learned shed some lights on the optimal design of a fog platform for multimedia applications.
Moderated by Klara Nahrstedt, University of Illinois Urbana-Champaign
Title: Deriving Knowledge from Audio and Multimedia Data
Today's world is filled not only with cameras, but also with microphones, listening to us -- and to the environment we live in. This talk presents results and lessons learned from my research on extracting information from environmental audio and video data using scalable acoustic recognition methods. The research I will present is mainly focused on multimedia retrieval, but the underlying environmental audio recognition methods are being applied to robotics, autonomous vehicles and cell phones.
Moderated by Abdulmotaleb El Saddik, University of Ottawa
10:40 am – 11:00 am
11:00 am – 12:30 pm
4. Hervé Jégou, Facebook AI Research
Title: Indexing huge collections of media descriptors: a practitioner point of
In this talk, I discuss about the past, present and future of similarity search for indexing large collections of media content. I will first discuss the trends that have paved the way to our modern query by content search engines. Then I will give a personal view on the current trends on this topic. I will finally give some guidelines to other practitioners, and will make an attempt to anticipate the future of this field of research.
Moderated by Shih-Fu Chang, Columbia University
5. Kuan-Ta Chen, Academia Sinica
Title: Games on Demand: Are We There Yet?
Games on demand, a.k.a., cloud gaming, refers to a new way to deliver computer games to users, where computationally complex games are executed and rendered on powerful cloud servers rather than local computing devices. In this talk, I will give an overview of the challenges in developing cloud gaming systems, what we have done, and what remains to do. I will start from GamingAnywhere, an open-source cloud gaming system, followed by a number of studies based on the system. Finally I will conclude the talk with open issues in providing highly real-time and high-definition audio/visual quality multimedia experience (e.g., in the form of gaming and virtual reality).
Moderated by Wei Tsang Ooi, National University of Singapore
6. Peng Cui, Tsinghua University
Title: Social-Sensed Multimedia Computing
The ultimate goal of multimedia computing is to deliver multimedia content to users according to their information needs (intentions). However, how to bridge the multimedia content with end users, the last-mile technology for multimedia services, is rarely researched. This negligence directly causes an obvious Intention Gap between multimedia data and the real information needs of users, which has become a bottle-neck in advancing intelligent multimedia computing technologies for use in real applications. Here, we propose a new multimedia computing paradigm, social-sensed multimedia computing, to glue together all the recent works that bring social media, a valuable source of sensing user needs and social knowledge, into the loop of multimedia computing. This talk aims at: 1) reviewing and summarizing recent high-quality research works on social-sensed multimedia computing, including basic technologies and applicable systems, and 2) presenting insight into the challenges and futuHore directions in this emerging and promising area.
Moderated by Ramesh Jain, University of California Irvine
12:30 pm – 2:00 pm
Lunch Break
2:00 pm – 3:30 pm
7. Lexing Xie, Australia National University
Title: An Anatomy of Social Media Popularity
— from large-scale measurements to hawkes intensity processes
How did a video go viral? Or will it go viral, and when? These are some of the most intriguing yet difficult questions in social media analysis. This talk will cover a few recent results from my group on understanding and predicting popularity, especially for YouTube videos. I will start by describing a unique longitudinal measurement study on video popularity history, and introduce popularity phases, a novel way to describe the evolution of popularity over time. I will then discuss a physics-inspired stochastic model that connects exogenous stimuli and endogenous responses to explain and forecast popularity. With such novel representation and new models, we can correlate video content type to popularity patterns, make better predictions, describe the endo-exo factors driving popularity, and forecast the effects of promotion campaigns.
Moderated by Nicu Sebe, University of Trento
8. Pradeep Atrey, University at Albany
Title: Security and Privacy Issues in Multimedia Systems
In this talk, I will first highlight the security and privacy issues in multimedia systems used in various applications such as homeland security surveillance, social media, and medical imaging and then discuss some of my recent research contributions related to secure cloud-based multimedia analytics and privacyaware surveillance and social networking. Finally, I will present the open challenges in this area.
Moderated by Ralf Steinmetz, Technische Universität Darmstadt
Title: Utilizing human signals in interactive and AI applications
My research focuses on Human-centered computing, which aims to integrate the human/user within the computational loop of Artificial Intelligence (AI) systems. I am interested in developing interactive/AI applications that utilize human signals or behavioural cues. Examples include (i) analysing pose, speech and proxemics-related cues in interactive settings to predict F-formations (geometric formations produced by interacting persons) and personality traits of interactors, (ii) using eye movements for scene understanding and object recognition, and (iii) using physiological signals (heart rate, skin conductance, EEG and facial movements) for affect and personality recognition. I will briefly describe my research related to each of the above during my talk.
Moderated by Alberto Del Bimbo, University of Florence
3:30 pm – 3:50 pm
3:50 pm – 5:20 pm
10. Vivek Singh, Rutgers University
Title: Sensing, Understanding, and Shaping Human Behavior
Today, more than a trillion multimodal data points are mediating human interactions in various social settings. These data can be used to understand social behavior at scales and resolution not possible before, and at the same time bring up critical privacy challenges. For example, I discuss how multimodal interaction data (call logs, bluetooth, sms, apps, surveys) can be used to automatically detect ‘trusted’ ties in social networks, which in turn mediate behavior change in well-being scenarios. At the same time, such data necessitate newer ways to measure privacy, and pave way for social mechanisms to ‘nudge’ user privacy behavior.
Moderated by Tat-Seng Chua, National University of Singapore
11. Xavier Alameda-Pineda, University of Trento
Title: Multimodal Automatic Analysis of Group Behavior
Studying free-standing conversational groups (FCGs) in unstructured social settings (e.g., cocktail party ) is gratifying due to the wealth of information available at the group (mining social networks) and individual (recognizing native behavioral and personality traits) levels. However, analyzing social scenes involving FCGs is also highly challenging due to the difficulty in extracting behavioral cues. To this end, we introduce SALSA, a novel dataset facilitating multimodal and Synergetic sociAL Scene Analysis. SALSA records social interactions among 18 participants in a natural, indoor environment for over 60 minutes characterized low-resolution images, lighting variations, numerous occlusions, reverberations and interfering sound sources. The social interplay was recorded using four static surveillance cameras and sociometric badges worn by each participant, comprising the microphone, accelerometer, bluetooth and infrared sensors.
Moderated by Dick Bulterman, FxPal
12. Yu-Gang Jiang, Fudan University
Title: Content Recognition and Copy Detection in Big Video Data
With the explosive growth of videos on the Web, there is a strong need of techniques for automatically analyzing the big video data. In this talk, I will discuss two important problems in this area: (1) high-level video event recognition and (2) video copy detection. I will first survey a few representative benchmarks and algorithms for both problems, and then present several benchmarks and algorithms developed in my group. Finally, I will show a few demos and share views about promising future directions.
Moderated by Alan Smeaton, Dublin City University