NOSSDAV '20: Proceedings of the 30th ACM Workshop on Network and Operating Systems Support for Digital Audio and Video

NOSSDAV '20: Proceedings of the 30th ACM Workshop on Network and Operating Systems Support for<br /> Digital Audio and Video<br />

NOSSDAV '20: Proceedings of the 30th ACM Workshop on Network and Operating Systems Support for
Digital Audio and Video

Digital Library logo
Full Citation in the ACM Digital Library

SR360: boosting 360-degree video streaming with super-resolution

Jiawen Chen
Miao Hu
Zhenxiao Luo
Zelong Wang
Di Wu

360-degree videos have gained increasing popularity due to its capability to provide
users with immersive viewing experience. Given the limited network bandwidth, it is
a common approach to only stream video tiles in the user's Field-of-View (FoV) with
high quality. However, it is difficult to perform accurate FoV prediction due to diverse
user behaviors and time-varying network conditions. In this paper, we re-design the
360-degree video streaming systems by leveraging the technique of super-resolution
(SR). The basic idea of our proposed SR360 framework is to utilize abundant computation resources on the user devices to trade
off a reduction of network bandwidth. In the SR360 framework, a video tile with low
resolution can be boosted to a video tile with high resolution using SR techniques
at the client side. We adopt the theory of deep reinforcement learning (DRL) to make
a set of decisions jointly, including user FoV prediction, bitrate allocation and
SR enhancement. By conducting extensive trace-driven evaluations, we compare the performance
of our proposed SR360 with other state-of-the-art methods and the results show that
SR360 significantly outperforms other methods by at least 30% on average under different
QoE metrics.

Self-play reinforcement learning for video transmission

Tianchi Huang
Rui-Xiao Zhang
Lifeng Sun

Video transmission services adopt adaptive algorithms to ensure users' demands. Existing
techniques are often optimized and evaluated by a function that linearly combines
several weighted metrics. Nevertheless, we observe that the given function fails to
describe the requirement accurately. Thus, such proposed methods might eventually
violate the original needs. To eliminate this concern, we propose Zwei, a self-play reinforcement learning algorithm for video transmission tasks. Zwei
aims to update the policy by straightforwardly utilizing the actual requirement. Technically,
Zwei samples a number of trajectories from the same starting point, and instantly
estimates the win rate w.r.t the competition outcome. Here the competition result
represents which trajectory is closer to the assigned requirement. Subsequently, Zwei
optimizes the strategy by maximizing the win rate. To build Zwei, we develop simulation
environments, design adequate neural network models, and invent training methods for
dealing with different requirements on various video transmission scenarios. Trace-driven
analysis over two representative tasks demonstrates that Zwei optimizes itself according
to the assigned requirement faithfully, outperforming the state-of-the-art methods
under all considered scenarios.

FALCON: joint fair airtime allocation and rate control for DASH video streaming in software
defined wireless networks

Miguel Catalan-Cid
Daniel Camps-Mur
Mario Montagud
August Betzler

Software Defined Wireless Networks offer an opportunity to enhance the performance
of specific services by applying centralized mechanisms which make use of a global
view of the network resources. This paper presents FALCON, a novel solution that jointly
optimizes fair airtime allocation and rate recommendations for Server and Network
Assisted DASH video streaming, providing proportional fairness among the clients.
Since this problem is NP-hard, FALCON introduces a novel heuristic algorithm that
is proved to achieve almost optimal results in a practical amount of time. The performance
of FALCON is evaluated when used in conjunction with three referent Adaptive Bit Rate
strategies (PANDA, BOLA and RobustMPC) in a simulated ultra-dense In-flight Entertainment
System scenario. The obtained results show that FALCON provides significant benefits
by minimizing instability and buffer underruns, while obtaining a fair video rate
and airtime allocation among clients, thus contributing to an enhanced Quality of
Experience.

Evaluation of CMAF in live streaming scenarios

Tomasz Lyko
Matthew Broadbent
Nicholas Race
Mike Nilsson
Paul Farrow
Steve Appleby

HTTP Adaptive Streaming (HAS) technologies such as MPEG DASH are now used extensively
to deliver television services to large numbers of viewers. In HAS, the client requests
segments of content using HTTP, with an ABR algorithm selecting the quality at which
to request each segment to trade-off video quality with the avoidance of stalling.
This introduces significant end to end latency compared to traditional broadcast,
due to the the client requiring a large enough buffer for the ABR algorithm to react
to changes in network conditions in a timely manner. The recently standardised Common
Media Application Format (CMAF) has helped address the issue of latency by defining
segments as composed of independently transferable chunks. In this paper, we describe
a simulation model we have developed to evaluate the performance of four popular ABR
algorithms using DASH and CMAF in various low latency live streaming scenarios. Realistic
network conditions are used for the evaluation, which are based on throughput data
taken from the CDN logs of a commercial live TV service. We quantify the performance
of the ABR algorithms using a selection of QoE metrics, and show that CMAF can significantly
improve ABR performance in low delay scenarios.

Low-latency cloud-based volumetric video streaming using head motion prediction

Serhan Gül
Dimitri Podborski
Thomas Buchholz
Thomas Schierl
Cornelius Hellge

Volumetric video is an emerging key technology for immersive representation of 3D
spaces and objects. Rendering volumetric video requires lots of computational power
which is challenging especially for mobile devices. To mitigate this, we developed
a streaming system that renders a 2D view from the volumetric video at a cloud server
and streams a 2D video stream to the client. However, such network-based processing
increases the motion-to-photon (M2P) latency due to the additional network and processing
delays. In order to compensate the added latency, prediction of the future user pose
is necessary. We developed a head motion prediction model and investigated its potential
to reduce the M2P latency for different look-ahead times. Our results show that the
presented model reduces the rendering errors caused by the M2P latency compared to
a baseline system in which no prediction is performed.

Viewport prediction for 360° videos: a clustering approach

Afshin Taghavi Nasrabadi
Aliehsan Samiei
Ravi Prakash

An important component for viewport-adaptive streaming of 360° videos is viewport
prediction. Increasing viewport prediction horizon enables the client to prefetch
more chunks into the playback buffer. Having longer buffer results in less rebuffering
under fluctuating network conditions. We analyzed the recorded viewport traces of
viewers who watched various 360° videos. We propose a clustering-based viewport prediction
method that incorporates viewport pattern information from previous video streaming
sessions. For several videos, specifically those with well-defined region of interest,
the proposed approach increases the viewport prediction horizon and/or prediction
accuracy.

Sensing multimedia contexts on mobile devices

Mohammad A. Hoque
Ashwin Rao
Abhishek Kumar
Mostafa Ammar
Pan Hui
Sasu Tarkoma

We use various multimedia applications on smart devices to consume multimedia content,
to communicate with our peers, and to broadcast our events live. This paper investigates
the utilization of different media input/output devices, e.g., camera, microphone,
and speaker, by different types of multimedia applications, and introduces the notion
of multimedia context. Our measurements lead to a sensing algorithm called MediaSense, which senses the
states of multiple I/O devices and identifies eleven multimedia contexts of a mobile device in real time. The algorithm distinguishes
stored content playback from streaming, live broadcasting from local recording, and
conversational multimedia sessions from GSM/VoLTE calls on mobile devices.

PC-MCU: point cloud multipoint control unit for multi-user holoconferencing systems

Gianluca Cernigliaro
Marc Martos
Mario Montagud
Amir Ansari
Sergi Fernandez

This paper introduces the Point Cloud Multipoint Control Unit (PC-MCU): a key component
for multi-user holoconferencing systems, where remote participants are represented
as Point Clouds. The presented solution redefines the idea of MCU, broadly used to
optimize connections and communications between users in traditional videoconferencing,
and introduces a set of key features for the optimization of holoconferencing services
where multiple users can be remotely connected. The PC-MCU is a virtualized cloud-based
component, that aims at reducing the end-user client computational resources and bandwidth
usage, providing the following key features: fusion of volumetric videos, Level of
Detail (LoD) adjustment and non visible data removal. The results obtained for a scenario
with two remote users, show how the introduction of the PC-MCU provides significant
benefits in terms of computational resources and bandwidth savings, thus alleviating
the requirements at the client side in holoconferencing services when compared to
a baseline condition without using it. These improvements open the door to further
research on this area to enable scalable and adaptive holoconferencing services using
lightweight devices.

LiveClip: towards intelligent mobile short-form video streaming with deep reinforcement learning

Jianchao He
Miao Hu
Yipeng Zhou
Di Wu

Recent years have witnessed great success of mobile short-form video apps. However,
most current video streaming strategies are designed for long-form videos, which cannot
be directly applied to short-form videos. Especially, short-form videos differ in
many aspects, such as shorter video length, mobile friendliness, sharp popularity
dynamics, and so on. Facing these challenges, in this paper, we perform an in-depth
measurement study on Douyin, one of the most popular mobile short-form video platforms in China. The measurement
study reveals that Douyin adopts a rather simple strategy (called Next-One strategy) based on HTTP progressive download, which uses a sliding window with stop-and-wait
protocol. Such a strategy performs poorly when network connection is slow and user
scrolling is fast. The results motivate us to design an intelligent adaptive streaming
scheme for mobile short-form videos. We formulate the short-form video streaming problem
and propose an adaptive short-form video streaming strategy called LiveClip using a deep reinforcement learning (DRL) approach. Trace-driven experimental results
prove that LiveClip outperforms existing state-of-the-art approaches by around 10%-40%
under various scenarios.

What you see is what you get: measure ABR video streaming QoE via on-device screen recording

Shichang Xu
Eric Petajan
Subhabrata Sen
Z. Morley Mao

Analyzing delivered QoE for Adaptive Bitrate (ABR) streaming over cellular networks
is critical for a host of entities including content providers and mobile network
providers. However, existing approaches mostly rely on network traffic analysis. In
addition to potential accuracy issues, they are challenged by the increasing use of
end-to-end network traffic encryption. In this paper, we explore a very different
approach to QoE measurement --- utilizing the screen recording capability widely available
on commodity devices to record the video displayed on the mobile device screen, and
analyzing the recorded video to measure the delivered QoE. We design a novel system
VideoEye to conduct such screen-recording-based QoE analysis. We identify the various technical
challenges involved, including distortions introduced by the screen recording process
that can make such analysis difficult. We develop techniques to accurately measure
video QoE from the screen recordings even in the presence of recording distortions.
Our evaluations demonstrate that VideoEye accurately detects important QoE indicators
including the track played at different points in time, and stall statistics. The
maximal error in detected stall duration is 0.5 s. The accuracy of detecting the displayed
tracks is higher than 97%.