Over the last several years, IP Multicast has been incrementally deployed in the Internet by building a virtual network, embedded in the unicast Internet, called the Multicast Backbone or ``MBone''. In parallel with the deployment of this new multicast network, the MBone research community developed a number of applications for multimedia conferencing [23, 8, 19, 16, 13, 12] that each exploit the simple, efficient, and elegant IP multicast service model and its underlying routing mechanism. These tools, collectively known as the ``MBone tools,'' encapsulate real-time, digital media streams into packets using the Real-time Transport Protocol (RTP) and multicast these packets on the network by simply sending them to a multicast group address. Receivers interested in a particular transmission simply ``tune in'' by subscribing to the multicast group in question. This loosely coupled, light-weight, real-time multimedia communication model is known as the Light-weight Sessions architecture [11].
Although the MBone tools can be used for a variety of communication styles (e.g., point-to-point internet phone calls, small group design collaborations or research meetings, or seminar distribution to large groups with tightly coupled feedback from the remote audience), a non-interactive, passive style of communication has emerged as the dominant model for MBone sessions. Here a single source ``broadcasts'' a signal to the MBone in a one-to-many distribution where interaction or feedback from the session participants rarely, if ever, occurs. For example, the regular, ongoing MBone broadcasts of university seminars, conference talks, and the NASA shuttle missions almost never entail multimedia flows from any but the primary source. While this style of transmission is adequate for much of the MBone content, it does not meet the requirements of close-knit collaboration, where the focus of interest may shift rapidly from site to site as the discussion moves about the distributed conference.
We believe this phenomenon -- the rarity of large-scale interactive MBone sessions -- is due not to the inherent nature of the MBone and the audio/video conferencing tools, but instead results from the lack of adequate tools and control protocols to make tightly knit collaborations effective across the wide area. In this paper, we focus on one particularly critical barrier to richer collaborations: because session bandwidth is currently allocated in a fixed and inflexible fashion, and because users must manually configure, enable, and disable transmission of their video signals, collaborations using the MBone tools tend to be comprised of a small number of video streams running at a constant transmission rate (typically the full amount of advertised session bandwidth). This is exactly the wrong model for effective human-to-human communication -- people interact by shifting their focus from individual to individual not by simultaneously listening to and looking at a large number of other individuals. We believe that applications should reflect this model by automatically and dynamically shifting the allocation of bandwidth among session participants as the focus in a distributed conference moves about. This dynamic model not only supports effective collaboration, but it also saves network resources since every source need not generate and transmit a continuous video stream.
In essence, we propose that media sources adapt their transmission rates to meet the collective preferences of the receivers in a multicast session. But this style of adaptation is only one piece of the rate adaptation problem; another important goal of adaptation is to accommodate and avoid network congestion by controlling the aggregate rate of traffic injected into the network across all senders. While several solutions for rate-adaptive audio/video have been proposed [3, 4, 17, 5, 24], none explicitly account for receiver interest in the adaptation process. In fact, existing algorithms adjust the rate of each media source independently and model the impact of competing flows on a given flow's control algorithm simply as background measurement noise.
We claim that these existing adaptation schemes can be greatly improved by augmenting them with receiver feedback to constrain the media adaptation process. To this end, we decompose the media rate control process as two complementary and orthogonal mechanisms:
Although the research community has proposed a wide variety of rate-adaptation mechanisms, to our knowledge, no scheme for reflecting receiver interest has yet been proposed, and thus only part (a) of our two part problem has been solved. To address part (b), we have developed a scalable, light-weight, and tunable feedback protocol that reflects receiver interest back to the media sources. We deem the combination of rate-adaptation with receiver interest ``intelligent adaptation'' and we call our overall scheme Scalable ConsensUs-based Bandwidth Allocation or SCUBA because, roughly speaking, it employs receiver ``consensus'' to allocate session bandwidth among sources.
We created two variants of SCUBA -- a ``flat delivery'' variant to complement sender-based adaptation and a ``layered delivery'' variant to complement receiver-based adaptation. In the flat delivery model, we use receiver feedback to constrain the rate chosen by the sender-based adaptation algorithm, while in the layered delivery model, we use the feedback to control the manner in which source signal layers are mapped onto network channels. Since the layers in a hierarchical distribution are ordered according to their relative importance, we can prioritize the ensemble of sources by controlling where each source falls within the hierarchy. In both the flat and layered delivery models, the feedback signal is identical and thus a single protocol can be shared across the two schemes. Even in the simple case where applications do not adapt (as is commonly the case for MBone broadcasts), SCUBA is still useful -- its flat-delivery variant will effectively manage the session's fixed bandwidth by dynamically adjusting each source's rate to reflect receiver interest.
One of the key advantages of receiver-based multicast over its sender-based counterpart is that the burden of adaptation is moved from the source to the receivers thus enhancing the scalability of the system. One could argue that we could apply this same technique to the receiver-interest problem and eliminate SCUBA altogether. That is, receivers could explicitly tell the network which sources they are interested in (via multiple multicast groups or source-based prunes) and the network would automatically eliminate unwanted flows from the corresponding regions of the network. If no receiver expressed interest in a given source, the flow would be pruned back all the way to the originating host. While simple and elegant, this approach suffers from two drawbacks. First, the level of interest must be binary (i.e., a receiver must say either they want a source at full quality or not at all). And second, the scheme would share bandwidth inefficiently because it could not account for the constraints of shared paths. SCUBA instead enables sources to share bandwidth intelligently because the receivers convey their interest back to the source and thus effectively agree upon the best way to share the limited bottleneck capacities. We therefore believe a protocol like SCUBA is a necessary and important mechanism for rate-adaptive multimedia applications.
The rest of this paper describes SCUBA in detail, presents analysis to show that the protocol is viable and scalable, and demonstrates the utility of SCUBA in the context of several real applications. In the next section, we introduce and describe the basic operation of the SCUBA protocol. We then analyze the scalability of SCUBA and, in particular, derive confidence bounds on the algorithm's convergence time. Next we describe the deployment of SCUBA in the context of several applications. Finally, we summarize the status of SCUBA, present our plans for future work, and conclude.