ACM Multimedia 97 - Electronic Proceedings

November 8-14, 1997

Crowne Plaza Hotel, Seattle, USA

Latency Budgets for MHEG-5 Delivery in DAVIC Systems over ADSL Networks

John F. Buford: Dept. of Computer Science; Distributed Multimedia Systems Laboratory; University of Massachusetts Lowell; Lowell, Massachusetts, USA; 508 934 3618; buford@cs.uml.edu; http://dmsl.cs.uml.edu/~buford
Chetan Gopal: Dept. of Computer Science; Distributed Multimedia Systems Laboratory; University of Massachusetts Lowell; Lowell, Massachusetts, USA; 508 934 3528; cgopal@cs.uml.edu; http://dmsl.cs.uml.edu/~cgopal

ACM Copyright Notice

Abstract

We analyze the delivery of an object-oriented multimedia content model, namely MHEG-5 (Multimedia Hypermedia Expert Group), for interactive multimedia applications in a DAVIC-compliant ADSL access network using low-cost memory-constrained set top units. We present detailed latency budgets for MPEG-2 DSM-CC-based transactions including STU configuration, engine download, and application and scene activation. We use simulation and analysis to assess the tradeoffs in memory management and application response time. We discuss our implementation of a subset of DSM-CC and MHEG-5, and latency measurements for presenting MHEG objects. The results are formulated as graphs which can be used by an application designer when encoding an MHEG application.

Key words: MHEG; DAVIC; MPEG-2 DSM-CC; interactive television; video-dial-tone.

1. Introduction

We analyze the delivery of an object-oriented multimedia content model, ISO MHEG-5 (hereafter simply "MHEG"), for an interactive television environment (Cossmann et al. 1996, Furht et al. 1995) using an ADSL (Saarela 1995) (Asymmetric Digital Subscriber Line) access network and conforming to the recently defined DAVIC (DAVIC 1995) (Digital Audio Visual Interactive Council) 1.0 specification. Under the DAVIC 1.0 specification, each Set Top Unit (STU) provides either a built-in or downloaded MHEG engine to interpret and execute interactive multimedia applications. MHEG objects are stored at the service provider system and are accessed by the STU using the MPEG-2 DSM-CC (DSM-CC 1995) protocol. MHEG objects are transferred from the service provider to the STU using MPEG-2 transport system packets.

The object-oriented representation of MHEG-5 provides for incremental delivery of interactive applications, an important capability in networked environments. A distinctive feature of the MHEG object model is its inclusion of an object state definition that can be used by an application designer to control the pre-fetching of MHEG objects. Using a two-phase activate-run sequence, each MHEG application has some control over when an object is loaded in to the STU memory prior to its presentation to the user. Additionally, using other state controls, an MHEG application has some control over when an object that is no longer active is flushed from the STU memory. However, these features must be statically specified in a given encoding and cannot be specified at run-time based on environmental factors such as the STU memory size; thus MHEG applications optimized for different memory sizes must be separately encoded. Nevertheless, the performance implications of the MHEG object state controls have not previously been evaluated. Further no performance characterizations of MHEG delivery in DAVIC networks have been previously published.

In this paper we present an analysis of the MHEG object state controls for networked delivery of MHEG applications in a particular configuration of the DAVIC end-to-end model. We assume an ATM service network and an ADSL access network. This configuration is considered to be practical for network providers with significant amounts of twisted pair physical wiring in the local loop. Since the control channel in ADSL is relatively low bandwidth, this configuration is also interesting because it provides a lower bound on what can be expected for MHEG application performance in different DAVIC network environments.

The analysis is presented in two parts. In the first part we show a detailed breakdown of the delays for the basic STU transactions (configuration, engine download, application activation, and scene activation). We provide tables which show for each DSM-CC message in a given transaction, the associated delays of each step, exclusive of application processing. Application processing is more difficult to predict since functions such as accounting, service authorization, security and others may be involved and are system and implementation dependent. Nevertheless, these delays provide a lower bound on response time at the STU, and collectively form a latency budget for the corresponding transaction. The transactions we analyze include user to network configuration, user to network session setup, user to user directory service and user to user download.

In the second part we use the application and scene activation latency budget to analyze the tradeoff between STU memory size and application response time. Understanding the design is important for application designers who want to minimize response time for a given memory size. This tradeoff depends on various assumptions about the object composition of MHEG applications. We show how variations in specific assumptions effect the behavior. In the conclusion of the paper we provide summary points that should be useful to an application designer in terms of tailoring the objects in a presentation to meet response time requirements for a given STU memory configuration. Although DSM-CC supports transactions for stream content, we do not consider these transactions in our latency budgets as we are interested in performance issues related to user interaction which cause dynamic object activation.

The paper is organized as follows. Section two provides background information on the important standards used here, and also reviews related work including recent interoperability experiments for DAVIC systems. Section three specifies the system parameters we have assumed in our analysis. Section four describes the end-to-end architecture and the protocol stacks assumed in our analysis. Section five presents the latency budgets analytical results evaluating application response time versus STU memory size. Section six describes the STU memory model followed by a description of the implementation work done by the authors. Section seven presents our evaluation followed by a section describing potential use of these results. Finally, section ten concludes the paper.

2. Overview

2.1. Background

DAVIC (Digital Audio Visual Interactive Consortium) is an industry consortium of about 250 companies formed to develop internationally adopted specifications for systems supporting applications such as interactive television and video on demand. DAVIC published its first specification in December 1995. This specification provides an end-to-end definition that is built on a number of existing network and coding standards, including MPEG-2 Video, Systems, and DSM-CC and MHEG.

MHEG (Multimedia Hypermedia Expert Group - Coding of Multimedia and Hypermedia information) (Effelsberg 1995, Gopal 1995, MHEG 1995, Price 1993) is developed for the delivery of interactive multimedia application in a client server architecture. MHEG uses an object composition model with two types of compositions--the application and the scene. In addition to specifying the object model, it also specifies a life cycle model for activation, deactivation, preparation and destruction of object. This life cycle plays a critical role in scene and application object activation latencies.

An application object is a container for ingredients such as visual, audible, interaction, link and action objects. These objects are shared by scenes and are activated based on scene behavior. A scene is an object which groups ingredients such as visual, audible, interaction, link and action objects for their coordinated presentation. In DAVIC, an application, and consequently all its scenes and related objects are delivered to the STU over various possible networks. The objects are downloaded to the client as requested by the engine, and the engine is responsible for decoding, interpreting, presenting and managing these objects.

Since MHEG engines are intended to be deployed on set top units, the MHEG object model provides for specification of information such as caching and scene priorities. In addition to this, MHEG model supports two mechanisms for addressing presentable content in scenes and applications: 1) content inclusion, and 2) content reference. In the first case media objects are embedded inside the MHEG scene or application object. The lifetime of these media objects are the same as that of their container object. In the later case, a reference to the media object is maintained. Here, the lifetime of media objects depends on the memory model of the STU and caching strategy employed by the MHEG engine.

MPEG-2 DSM-CC is a set of protocol specifications for managing and controlling MPEG-1 and MPEG-2 bit streams. These protocols are intended for an application to setup, tear-down and manage a network connection using User-Network (U-N) primitives and for communication between a client and a server using User-User (U-U) primitives. U-N primitives are defined as a series of messages to be exchanged among the client, network and server. U-U primitives may use a Remote Procedure Call (RPC) protocol and may involve U-N messaging. DSM-CC may be carried as a stream within an MPEG-2 Transport Stream. Alternatively, DSM-CC may be carried over other delivery mechanisms, such as TCP or UDP.

2.2. Related Work

Development of an interoperable end-to-end ITV system has been reported in Cossmann et al. (1996). This system is called the Globally Accessible Services (GLASS) and consists of clients, application server, video server, system management functionality, and gateway to services like the World Wide Web (WWW), e-mail, and FAX. It is based on a non-DAVIC model with an MHEG-1 run-time engine on the STB. The results obtained in this experiment have influenced the standardization of MHEG-1, MHEG and DAVIC. Their current system incorporates the DAVIC protocol suite and an MHEG run-time engine, and has been used in a recent DAVIC interoperability test at Columbia University (Columbia 1996). Both MHEG and media objects are transported through MPEG-2 Transport streams.

Applications Retrieving Multimedia Information Distributed over ATM (ARMIDA) (Columbia 1996) is another test bed for DAVIC. This system is developed at Centro Studi E Laboratori Telecomunicazioni, Torino, Italy. It implements some of the core components of the DAVIC architecture. The STB is a PC with MPEG-2 transport packet decoder and MPEG-2 elementary stream decoders. They have also developed a visual MHEG editor to author MHEG applications.

Graphics Communication Laboratories (GCL) have developed DSM-CC and MHEG engine software that are compliant to DSM-CC, MHEG, and DAVIC. Their DSM-CC implementation includes Base, Access, Directory, File and Stream interfaces. The MHEG engine implements a number of features but does not support handling streaming video or graphic objects. This software is available for evaluation on SunOS and Linux platforms.

None of these systems have reported performance data as yet. Some of these systems have been demonstrated primarily in ATM networks where network bandwidth has not been a determining factor.

3. System Assumptions

In this section we present network processing and MHEG object processing estimates based on previously published work and our own experimental work. These estimates are used in the next section where we present detailed analysis of the breakdown of processing steps in sample DSM-CC transactions from transport layer and below.

Clark et al. (1989) analyze TCP processing and their results are the basis for our TCP/IP processing latencies at each stage of the architecture. They measured TCP processing overheads using logic analyzers and by instrumenting the UNIX kernel. Their measurements show that overheads are divided into two groups, one being processing cost incurred per byte and the other being the cost for packet level processing. The byte level processing involves buffer copy and TCP checksum computation and the packet level costs includes Ethernet driver processing, TCP+IP+ARP header processing cost and operating system overhead. Their results are given in. These costs were all computed on a 2 MIPS Sun-3/60.

Table 1 gives instruction cost equivalent for their measurement; Clark et al. (1989) also report that the instruction cost for TCP/IP were similar for different machines tested. Given an instruction count, we are able to estimate TCP/IP processing times on different machines involved in the end-to-end transaction of the DAVIC model.

Costs	Time	# instructions
Per byte:
User-system copy	200us	400
TCP Checksum	185us	370
Network-memory copy	386us	772
Per packet:
TCP + IP + ARP protocols	100us	200
OS overhead	240	480
Ethernet driver	100us	200

Table 1: TCP overhead measurements based on Clark et al. (1989)

Lazarou et al. (1996) validate simulation models of TCP/IP over ATM. TCP/IP packets are broken in to AAL5 segments before being broken into ATM cells. We used their processing measurements for computing AAL5 segmentation and re-assembly.

3.1. End-to-end Latency and Congestion

Since our aim is to provide lower bounds on end-to-end latency, we do not consider congestion and retransmission delays. In the ADSL-ATM network environment of interest in this paper, congestion could occur in the ATM network due to simultaneous service to many STUs, much of which could be streaming video. The ADSL network is point-to-point and would be dedicated for use by a single STU running a single application. The transactions we characterize occur when an MHEG application is active at an STU. During this period, all the traffic over the ADSL connection is application-related and is included in our latency estimates. Characterization of ATM network congestion during simultaneous streaming video and MHEG application sessions from many ADSL links involves many assumptions about applications and network configuration and is outside the scope of this paper.

3.2. Transport Protocol Overhead

Table 2 shows estimated overhead for the different network components of the DAVIC architecture, including the service provider, network access point and the STU. These times do not include application level processing however. For the service provider, we assume a service provider server system of at least 50 MIPS. The system should be a high end machine since it will be servicing many simultaneous application requests. The TCP/IP protocol estimates in Table 2 are derived from those in Table 1, but scaled for the higher MIPS rating of the server. The MPEG-2 transport packet encoding time in Table 2 is the time taken by the MPEG-2 transport packet encoder to encapsulate incoming data and produce a transport stream packet but does not include the MPEG-2 compression time which is not of concern here. This time is based on the DiviCom MPEG-2 transport encoder (DiviCom 1996).

Table 2 also shows the estimated overhead for the network access host, which is responsible for session setup and management. For generality sake we assume that this network access host is a different system than the service provider and this assumption is consistent with other video dial-tone architectures. Additionally we assume a high-end server system as the network access host. These estimated times are related to network processing and do not include application processing.

Finally Table 1 also shows estimated overhead for packet decoding and network processing at the STU. The STU is expected to be a low cost machine, therefore we assume a lower MIP rating. The MPEG-2 transport packet decode time is based on the LSI L64007 MPEG-2 Transport Decoder (LSI 1996).

MHEG applications are composed of clusters of objects called scenes. Each scene has a root scene object. All of the scene objects have a parent object called the application object. MHEG introduces a number of constraints about when the objects are to be delivered to the end system. One important constraint is that a component object for a given scene cannot be delivered without the corresponding scene object. An additional point is that the composition object model is quite flexible in that many different interactive presentations are available for a scene. Therefore, the complexity of the scene depends on the design of the presentation and there can be many variations in the number and type of MHEG objects needed to produce an interactive application.

Consequently, we have concluded that the questions regarding MHEG application response time versus MHEG memory cache size are highly dependent on the design of the presentation objects and the intentions of the presentation designer. In our previous experimental work we have created a number of interactive applications including the front end for an experimental ITV interface to retrieve video on demand and other related content. Depending on the media that are selected and the authoring tools that are used, different visual effects with a wide range of object encodings are possible. As a result we believe that the most useful information from the simulations that we have done can be communicated to the designers of MHEG applications as a comparison of MHEG object size versus response time versus caching. The MHEG application designer can then design a presentation based upon the object count implied by the target response time and available memory. These points are discussed further in a later section.

For simulating the application, scene, and ingredient activation, we used a composition model where the application object contains a initial set of ingredients which are shared across all scenes and these ingredients contain references to media objects. In addition to the ingredients contained in the application object, every scene contains a set of ingredients which are activated upon scene activation. Assumptions made about sizes of these objects are listed in Table 3. These sizes are not arbitrary, but are based upon our experience in development of MHEG-1 authoring and conversion systems as well as the experiences of reported by others during MHEG interoperability testing.

Service Provider
System rating	50 MIPS
TCP Max. Transfer Unit (MTU)size	1460 bytes
Application to System buffer copy(MTU)	400 instructions
TCP Checksum(MTU)	370 instructions
Network to System buffer copy(MTU)	772 instructions
TCP/IP/ARP Protocol processing(MTU)	200 instructions
MPEG-2 Transport packet encoding	0.00023 seconds
Network Access
System rating	50 MIPS
TCP MTU size	1460 bytes
Application to System buffer copy(MTU)	400 instructions
TCP Checksum(MTU)	370 instructions
Network to System buffer copy(MTU)	772 instructions
TCP/IP/ARP Protocol processing(MTU)	200 instructions
Set Top Box
System rating	10 MIPS
TCP MTU size	1460 bytes
Application to System buffer copy(MTU)	400 instructions
TCP Checksum(MTU)	370 instructions
Network to System buffer copy(MTU)	772 instructions
TCP/IP/ARP Protocol processing(MTU)	200 instructions
MPEG-2 Transport packet decoding	0.000038 seconds

Table 2: Network processing overhead at each element

The MHEG class hierarchy contains 34 classes of which 7 are abstract classes. All objects are encoded in ASN.1. Some objects contain additional variable size data, for example, to hold media such as color tables, image/graphics data or table structures. We do not consider ingredients or nesting of objects to estimate object size. Also, we do not include estimates for Token Manager and its related classes due to their complexity. String data in the objects are assumed to be 256 bytes. Table 3 gives the object sizes used in our analysis.

Figure 1Two-level architecture

Object	Estimated Size
Root	50
Group	100
Application	100
Scene	1200
Ingredient	300
Link	400
Procedure	300
Palette	300
Font	300
Cursor Shape	300
Variable	300
Presentable	300
Visible	310
Bitmap	310
Line Art	375
Rectangle	375
Text	400
Stream	320
Audio	300
Video	300
RT-Graphics	300
Interactible	25
Slider	385
Entry Field	430
Hypertext	425
Button	375
Hotspot	375
Push Button	400
Switch Button	400

Table 3: Estimated MHEG object sizes

4. Architecture

4.1. Overview

Figure 1 shows the DAVIC end-to-end architecture analyzed in this paper. According to the DSM-CC specification, all U-U ( User to User) signaling from the STU to service provider takes place over RPC, and content and data download is carried by MPEG-2 transport packets. The access network in our discussion is an ADSL network and consists of an upstream 640 Kb/s signaling channel and a downstream 6.144 Mb/s data channel.

To pass a message the STU needs to do a number of processing steps including message fragmentation and protocol encapsulation. Figure 2 shows the protocol stacks used for S1 (content and data delivery), S2 (U-U messaging) and S3 (U-N messaging) information flows between STU and Network Access.

Figure 2 S1, S2 and S3 information flows between STU and Access

The delivery sub-system provides the service consumer access to the ATM network and service providers. It provides for proper routing of messages to and from the STU. Figure 3 shows the protocol stack for delivery of messages between the service provider to the access network. All message and data flows are broken down into ATM Adaptation layer 5 (AAL5) segments before transmission over the ATM network.

DSM-CC specification comprises of eleven protocols. All of the protocols are based on message passing except for the U-U RPC stub library. Each of these protocols consist of a series of message transactions between the client and the server. We have chosen a subset of the protocols that are relevant to set top delivery and are quite basic to any delivery environment. Our approach can be easily extended to other transaction scenarios.

Figure 3 S1, S2 and S3 information flows between Service Provider and Access

4.2. End-to-End Model

The architecture shown in monly'Figure 4 represents the end-to-end model used in our discussion. Latency budgets for various functions in this model are discussed for each processing stage in later sections.

Figure 4 End-to-end model

5. Latency Budgets

In this section we analyze five transactions. The formula used to compute these latency budgets is given in Appendix 1.

5.1. Configuration of STU

Configuration of STU involves passing of messages between the STU and the Network Access (NA) to obtain a network identifier and attach to the service gateway. The message flow between the STU and network access, and network access and service provider is shown in Figure 5. These information flows are based on the DAVIC and DSM-CC specification.

Table 5 shows messages passed between various entities. Column 1 gives the DSM-CC specific message, columns 2 through 6 give the protocol processing latencies for the corresponding stage. The zero values indicate no message processing at that stage. The end to end latency is 60 ms.

5.2. Download and Installation of MHEG Engine

To play an MHEG application, an MHEG engine must be loaded from the service provider using the STU profile, which identifies the appropriate MHEG engine for the STU configuration. In general, the STU may load different engines for each application; hence the download time is an important factor. Downloading the MHEG engine consists of a series of message flows between the STU and service provider using the DSM-CC download protocol, as shown in Figure 7. Table 6 gives the latency budget for this scenario. The latency for downloading a 300 KB engine is about 600 ms. Table 4 shows latency versus engine size.

The download protocol transfers data between the server and the client in multiples of blocks. As the download is across two separate networks, the access and delivery network, there is an overlap of download data blocks (see Figure 6). The x-axis denotes the latency for data blocks across the two networks. Figure 8 shows the effect of download block size to download latency for various sizes of MHEG engines. For smaller block sizes the overlap is large. Consequently, the number of acknowledgements between the client and the server is also large. Hence download latencies are large ( region closer to the y-axis). But as the block sizes increase, download latencies which are small initially, start to increase with the decrease in overlap. The last column in Table 6 indicates overlap.

Engine Size (bytes)	Download time (sec)
100000	0.272116
300000	0.600205
500000	0.912166
700000	1.240256
900000	1.568577

Table 4: MHEG engine download latencies

Message	Size in bytes	STU	ADSL	NA	ATM	SP
UN-CONFIG-REQUEST	27	0.00023	0.000837	0.000005	0	0
UN-CONFIG-CONFIRM	310	0.000528	0.005375	0.000011	0	0
UN-CLIENT-SESSION- SETUP-REQUEST	1172	0.00144	0.02015	0.000029	0	0
UN-SERVER-SESSION- SETUP-INDICATION	1172	0	0	0.000035	0.000037	0.000029
UN-SERVER-SESSION- SETUP-RESPONSE	1134	0	0	0.000034	0.000036	0.00004
UN-CLIENT-SESSION- SETUP-RESPONSE	1134	0.001399	0.030675	0.000028	0	0
		0.003597	0.057038	0.000142	0.000072	0.000069
		Total STU Configuration time			=	0.060917

Table 5: Latency budget for STU configuration

Figure 5 STU Configuration message flow (DSM-CC 1995)

Figure 6 Download block overlap

Message	Size in bytes	STU	ADSL	NA	ATM	SP	Serial/ Overlap
UU-DIR-OPEN-REQUEST	1024	0.001283	0.0283	0.000057	0.000030	0.000026	S
UU_DIR_OPEN_RESPONSE	1024	0.001283	0.0273	0.000057	0.000030	0.000037	S
DOWNLOAD-INFO-REQUEST	296	0.000515	0.0182	0.00001	0	0	S
DOWNLOAD-INFO-REQUEST	296	0	0	0.000012	0.000008	0.00001	S
DOWNLOAD-INFO-RESPONSE	296	0	0	0.000012	0.000008	0.000013	S
DOWNLOAD-INFO-RESPONSE	296	0.000515	0.0092	0.00001	0	0	S
DOWNLOAD-DATA-REQUEST	17	0.000219	0.005713	0.000004	0	0	S
DOWNLOAD-DATA-REQUEST	17	0	0	0.000004	0.000001	0.000004	S
DOWNLOAD-DATABLOCK(30)	300000	0	0	0.000892	0.015797	0.381147	O
DOWNLOAD-DATABLOCK(30)	300000	0.087336	0.403906	0.000892	0	0	O
DOWNLOAD-DATA-REQUEST(1)	17	0.000219	0.001713	0.000004	0	0	S
DOWNLOAD-DATA-REQUEST(1)	17	0	0	0.000004	0.000001	0.000004	S
		0.09137	0.494331	0.001961	0.015877	0.381241
MHEG Engine download time					=		0.600205

Table 6: Latency budget for 300 KB MHEG engine download

Figure 7 MHEG engine download message flows (DSM-CC 1996)

5.3. Activation of a New MHEG Application Object or Scene Object

Once the MHEG engine is activated, the initial application object must be downloaded and activated. This involves the STU requesting from the service provider the identity of the application object, obtaining information about it using the DSM-CC directory service open command, and then downloading it using the DSM-CC download protocol. Once the object is downloaded, it is activated. If ingredients are referenced, they are first downloaded and then activated followed by the activation of the first scene. The application object activation time in this case includes time to download all ingredients but does not include scene activation latency.

After activation of the application object, the startup action is fired, which activates the first scene. Initially no scenes are active or cached. When a scene object is not in the cache, it is then retrieved from the service provider, prepared, and then activated. Upon activation, all ingredients in the scene are activated too. Activation time of ingredients not in the cache increase the overall scene activation time. Media can either be included in the scene objects or be referenced. In the latter case, a separate request for the media object must be made to the server. In the following graphs, cumulative media size includes the MHEG objects and the media data. The size of given scene is at the discretion of the application designer, but larger scenes led to longer activation times.

Activation time for application objects and scene objects for various media memory sizes, with media being contained and referenced, is shown in Figure 9 and Figure 10. For these estimates we have assumed that there is no sharing of media across scenes or applications. Also, to minimize DSM-CC acknowledgements, buffers allocated for download are of the same size as that of the object. For each of these plots, x-axis denotes the size of media content contained in a scene or application object and the y-axis denotes download latency in seconds. In the case of content reference, the plots indicates application objects with number of small media objects and a single large media object respectively. The latency for activating an application object with a large number of small media objects is larger because of an increase in the number of download transactions.

5.4. Activation of a MHEG Application with Caching

MHEG provides an abstract root class which is inherited by all classes. This root class provides a mechanism to specify whether the object can be cached or not. Accordingly, the MHEG engine may or may not cache the object. Both the application object and scene object inherit these properties.

Using this feature we estimate the average activation time of an application (see Figure 11 and Figure 12). When ingredients are all included, and when the application is found in the cache, no downloads are necessary. When ingredients are referenced, even though the application object is cached, referenced ingredients may not be. Hence these ingredients need to be downloaded and activated. Also, frequency of application activation are much lower than scenes. Hence sharing of content in application objects have much less effect on application activation latencies when compared to scene activation latencies as discussed in the following section. When the application is composed of small media objects that are referenced, latencies are larger due to the overhead of the download protocol and caching strategy as seen in Figure 12.

In Figure 12 the initial portion of the graph is skewed because of smaller overlap of ingredients across scenes contained in this application.

5.5. Activation of a MHEG Scene with Caching

To characterize the scene-to-scene transition cases we partition the memory of the STU so that 0.5 MB is allocated for MHEG objects, 2 MB to media objects and 1 MB for resource objects. This partitioning is static. This somewhat simple cache organization allows us to more easily characterized the effects of limited cache space for different types of objects, whatever the cause.

Also, when activation time with respect to multiple content reference is discussed, a 10 KB ingredient is assumed. This number was chosen to show the effect of having a number of small ingredients versus a single large ingredient.

Using the caching feature of the root object we estimate activation time of a scene. Scenes contain ingredients which may overlap with ingredients in other scenes and also in the application. When ingredients are all included, on a scene cache miss the entire scene with all included ingredients need to be downloaded. When ingredients are all referenced, only ingredients that are not cached will need to be downloaded before activation. Figure 13 and Figure 14 show activation time of a scene object with caching.

When ingredients are referenced and every scene has multiple content objects, activation time increases initially because of smaller overlap and as overlap increases significantly the activation time starts decreasing as seen in Figure 14.

Another factor in scene-to-scene transitions is when the current scene involves streaming video. During MPEG-2 video playout, the entire downstream bandwidth of the ADSL line would be used. During this interval there is insufficient network resource to prepare the next scene. So while MHEG permits pre-caching of next scenes to reduce activation delay, lack of excess network resources in scenes with streaming media would preclude pre-caching. In this case, preparation and activation of the next scene would begin at the termination of the current scene.

6. Memory Model

In this section we address the question of how STU memory size affects the response time for MHEG applications. In MHEG, scenes can either be retrieved on demand or pre-buffered. Pre-fetching of scenes can improve response time for the user but requires that the STU have sufficient memory to hold the scene objects as well as the current active scene. Increasing the STU memory makes the device more expensive; consequently, understanding the relationship between STU memory and MHEG application response time is an important practical matter.

We make the following assumptions. We assume that the retrieval of a scene includes the associated media i.e. content inclusion, except that for continuous media, it includes only what is needed for pre-buffering. At the conclusion of any scene there may be an arbitrary number of next scene choices. We transfer to the new scene and keep all the ingredients of the current scene in the cache.

We use a Least Frequently Used (LFU) policy for managing the memory model. MHEG has a prioritization scheme as described in the overview, so that when a new scene is loaded into the memory, the scene with low priority will be flushed out first. In this context LFU is the prioritization scheme in which frequency of user access is the weighting scheme. This is just an idealization as it is difficult to know how users access scenes. However, as discussed in the evaluation section it is possible for the system to keep statistics of user access pattern.

In our simulation we have devised a static memory partitioning at the STU which is related to the different types of MHEG objects and their application life times. For example, objects such as font tables and color tables may have a longer persistence than graphic or text objects.

There are at least three categories to consider:

MHEG objects - MHEG objects are stored in this layer and the MHEG engine manages the caching of the objects based on the status of the cache flag.
Media buffering - Every ingredient in a scene or application can be a presentable or one of the other MHEG objects like link, action, font etc. Every visible object contains either discrete or continuous media. This layer provides for storage and caching of the CM and DM objects.
Resources - This layer provides for memory space for the MHEG run-time engine and MHEG resource objects like color maps, font tables etc.

In general, resources have a longer life time than MHEG and media objects. Also, media objects can be larger than other objects and most of the cache will be used by media objects. Hence the partitioning need not necessarily be a static one.

7. Implementation Work

Figure 15 Snapshot of an MHEG scene

In this section we discuss our implementation. The purpose of this implementation was to assess 1) feasibility of the end-to-end protocol and integration issues, and 2) MHEG engine size. We implemented a subset of the MHEG engine sufficient to present MHEG objects generated from our IconAuthor to MHEG converter, and a portion of the DSM-CC U-U protocol suite that defines services like Directory and Download. We used RMI (RMI 1996) as the underlying RPC protocol. The engine and DSM-CC layers were implemented in Java (Java 1995). A snapshot of one scene is shown in Figure 15 and the MHEG composition is omitted due to space. For our implementation the MHEG engine is approximately 300 KB and the DSM-CC module is about 30 KB of Java byte code. No U-N protocol was implemented.

7.1. Design

The MHEG engine (see Figure 16) is a Java application which is downloaded to the STU. The engine is composed of three components: 1) scheduler, 2) link manager, and 3) download and decode module.

The scheduler coordinates all events, processes actions and orchestrates the presentation of the scene. When the initial application object is retrieved, decoded, and instantiated, the startup action associated with the application object is processed. This results in activation of the first scene. When a scene is activated, it is first retrieved, decoded, and instantiated. Now, all ingredients in the scene are activated based on their behavior, i.e., the ingredient is presented based on its initial condition or it is registered in the link table if it is a link. The cache is first checked before retrieval of any object requested.

Once a scene is active and running, the user may interact with it. These external events are mapped to MHEG events and the link manager is activated along with the object from which the event originated.

Figure 16 MHEG engine model

The DSM-CC interface provides an API for the MHEG engine to communicate with the DSM-CC U-U services like Directory and Download. These services must be registered with the RMI registry before being accessed by the MHEG engine. RMI ensures a uniform name space for all services.

The service provider system consists of two services: download, and directory service. The download service used the DSM-CC download protocol to download media objects. In our implementation the download service was not integrated with the MHEG engine due to limitation of the browser. The directory service maintains directory information of all MHEG and media objects and transfers this information to the service consumer using the DSM-CC directory service specification.

8. Evaluation

8.1. Engine Size

Making the MHEG engine small size means memory size in the STU can be small and download times can be reduced. Further if the engine can be downloaded incrementally (as in our Java implementation) then the run-time footprint of the engine can be smaller than that of a monolithic implementation, but with possible increased access costs due to retrieval of engine components. We have not evaluated the performance issues of an incrementally loaded engine.

8.2. Application and Media Composition

MHEG provides a composition model. Since downloading of media involves a lot of signaling, it is crucial for application and content developers prepare compositions in such a way that latencies due to signaling are reduced. One way to solve this problem is to put all frequently used ingredients in the application object rather than the scene object as the lifetime of the scene is much shorter than the application. Figure 17 shows the average response time for activating a scene. Here percentage of overlap is the ratio of the number of ingredients in a scene to total number of ingredients in the entire application. As overlap increases response times drop. The non zero response time for a 100% ingredient overlap is because of retrieval of media content as a result of cache misses.

9. Use of these results

In this section we provide a list of use of results based on this work.

Significance of latency budget estimation: Latency budgets provide a lower bound on expected response time when media must be transported from a remote server. This represents the best response time that an application designer can expect from a specific network architecture. Other costs not included in our latency budgets include network congestion, retransmission delays, application processing times.
MHEG object life cycle: Currently, the MHEG object model allows for static encoding of object life cycle. By static we mean that the control of object life cycle is determined at production time rather than at run-time. More flexibility could be obtained if the MHEG model provided some conditional assessment of system resources. In the absence of such a facility the production team could produce multiple encodings, each targeted for a different memory grade of STU or could choose a single encoding targeted for the lowest end STU. Unfortunately, such an encoding will not take advantage of memory on an STU with a larger memory.
Production environment: We view the issues of response time versus memory tradeoff as being post production issues for content creation. Optimization of response time and memory cache size tradeoff as opposed to authoring functions is an important issue. Production tools would use the kind of data we have generated to rearrange object preparation and activation
Scene prioritization: We expect that there will be scenes with multiple branches with different media objects. Scene prioritization scheme will be based initially on the author's input, but could be revised over time using user profiles. DSM-CC has tool for gather user statistics.
DSM-CC feasibility: DSM-CC is currently being extended in the context of MPEG-4 and internet deployment. Our results indicate feasibility of DSM-CC approach as currently specified.

10. Conclusion

DAVIC specifies a complex end-to-end model for ITV delivery. Many of the trials performed so far have validated interoperability but have not reported performance measurements. There are many of issues that determine the performance of these systems. In this paper we have provided a systematic characterization of the lower-bound of end-to-end latencies for a specific case of the DAVIC architecture. We have selected application transactions which are of general interest in application delivery.

Based on this analysis, we see that STU memory, the composition model of applications, signaling between STU and service providers, and network latencies play a key role in delivery performance. In the previous section we provide some general conclusions for content and application developers about system requirements and constraints. The quantitative results could be built in to post-production tools for automatic analysis of delivery performance for specific compositions.

11. References

Ahuja R, Keshav S, Saran H (1996) Design, Implementation, and Performance of Native Mode ATM Transport Layer. Proceedings of IEEE INFOCOM '96
Buford Koegel J (1992) On the Design of Multimedia Interchange Formats. Proceedings of NOSDAV 92
Buford J, Gopal C (1994) Standardizing a Multimedia Interchange Format: A Comparison of OMFI and MHEG. Proceedings of International Conference on Multimedia Computing and Systems, Boston
Chiariglione L (1996) DAVIC: A Tool to Achieve Global Interoperability. in Standards for Electronic Imaging Technologies, Devices, and Systems. SPIE Vol. CR61, Feb 1996
Clark D, Jacobson V, Romkey J, Salwen H (1989) Analysis of TCP Processing Overhead. IEEE Communication, pp. 23-29
Cossmann H, C Griwodz, G Grassel, M Pullhlhofer, M Schreiber, R Steinmetz, H Wittig, L Wolf (1996), Interoperable ITV Systems based on MHEG. SPIE Multimedia Computing and Networking 1996 2667: 60-69
Columbia University (1996) Proceedings of Workshop on Video on Demand Systems: Technology, Interoperability, and Trials. June 24, 1996
DAVIC (1995) DAVIC 1.0 Specification Parts 1-12, December, 1995
DiviCom^TM (1996) MPEG-2 Program Encoder Specification.
DSM-CC (1996) ISO/IEC, MPEG-2 DSM-CC: Digital Storage Media Command & Control, ISO/IEC 13818-6 IS, June 1996
Effelsberg W, Boudnik T M (1995) MHEG Explained. IEEE Multimedia, Vol. 2, No. 1, Spring 1995
Fibush D K (1996) Timing and Synchronization Using MPEG-2 Transport Streams. SMPTE Journal, Vol. 105, 7: 395-400
Furht B, et al. (1995) Design Issues for Interactive Television Systems. IEEE Computer, May 1995, pp. 25-39
Gopal C, Price, R (1995) Multimedia Information Delivery and the MHEG standard. Proceedings of DAGS '95. pp. 234-242
Jacobson V (1990) Compressing TCP/IP Headers for Low-Speed Serial Links. RFC 1144, Feb
Java (1995) Java^TM programming language specification and tutorial, Sun Microsystems
Lazarou G, Frost V, Evans J, Niehaus D (1996) Using Measurements to Validate Simulation Models of TCP/IP over High Speed ATM Wide Area Networks. IEEE International Conference on Communication, June 1996, pp. 309
LSI (1996) L64007 MPEG2 Transport Decoder Specification, LSI Logic Corporation
MHEG (1996) ISO/IEC. MHEG: Base Object Encoding. ISO/IEC 13522-5 IS. July 1996
Morales J, Patka A, Choa P, and Kui J (1995) Dial Tone Sessions. IEEE Network, pp. 42-47
Price R (1993) MHEG: An Introduction to the Future International Standard for Hypermedia Object Interchange. Proceedings of ACM Multimedia '93, pp. 121-128
RMI (1996) Remote Method Invocation, release Alpha2, Sun Microsystems
Saarela K (1995) "ADSL", Tampere University of Technology, Technical Report
Time Warner (1994) Time Warner Cable's Full Service Network Orlando Trial, December 1994