Crowne Plaza Hotel, Seattle, USA
We analyze the delivery of an object-oriented multimedia content model, namely MHEG-5 (Multimedia Hypermedia Expert Group), for interactive multimedia applications in a DAVIC-compliant ADSL access network using low-cost memory-constrained set top units. We present detailed latency budgets for MPEG-2 DSM-CC-based transactions including STU configuration, engine download, and application and scene activation. We use simulation and analysis to assess the tradeoffs in memory management and application response time. We discuss our implementation of a subset of DSM-CC and MHEG-5, and latency measurements for presenting MHEG objects. The results are formulated as graphs which can be used by an application designer when encoding an MHEG application.
Key words: MHEG; DAVIC; MPEG-2 DSM-CC; interactive television; video-dial-tone.
We analyze the delivery of an object-oriented multimedia content
model, ISO MHEG-5 (hereafter simply "MHEG"), for an
interactive television environment (Cossmann et al. 1996, Furht
et al. 1995) using an ADSL (Saarela 1995) (Asymmetric Digital
Subscriber Line) access network and conforming to the recently
defined DAVIC (DAVIC 1995) (Digital Audio Visual Interactive Council)
1.0 specification. Under the DAVIC 1.0 specification, each Set
Top Unit (STU) provides either a built-in or downloaded MHEG engine
to interpret and execute interactive multimedia applications.
MHEG objects are stored at the service provider system and are
accessed by the STU using the MPEG-2 DSM-CC (DSM-CC 1995) protocol.
MHEG objects are transferred from the service provider to the
STU using MPEG-2 transport system packets.
The object-oriented representation of MHEG-5 provides for incremental delivery of interactive applications, an important capability in networked environments. A distinctive feature of the MHEG object model is its inclusion of an object state definition that can be used by an application designer to control the pre-fetching of MHEG objects. Using a two-phase activate-run sequence, each MHEG application has some control over when an object is loaded in to the STU memory prior to its presentation to the user. Additionally, using other state controls, an MHEG application has some control over when an object that is no longer active is flushed from the STU memory. However, these features must be statically specified in a given encoding and cannot be specified at run-time based on environmental factors such as the STU memory size; thus MHEG applications optimized for different memory sizes must be separately encoded. Nevertheless, the performance implications of the MHEG object state controls have not previously been evaluated. Further no performance characterizations of MHEG delivery in DAVIC networks have been previously published.
In this paper we present an analysis of the MHEG object state controls for networked delivery of MHEG applications in a particular configuration of the DAVIC end-to-end model. We assume an ATM service network and an ADSL access network. This configuration is considered to be practical for network providers with significant amounts of twisted pair physical wiring in the local loop. Since the control channel in ADSL is relatively low bandwidth, this configuration is also interesting because it provides a lower bound on what can be expected for MHEG application performance in different DAVIC network environments.
The analysis is presented in two parts. In the first part we show a detailed breakdown of the delays for the basic STU transactions (configuration, engine download, application activation, and scene activation). We provide tables which show for each DSM-CC message in a given transaction, the associated delays of each step, exclusive of application processing. Application processing is more difficult to predict since functions such as accounting, service authorization, security and others may be involved and are system and implementation dependent. Nevertheless, these delays provide a lower bound on response time at the STU, and collectively form a latency budget for the corresponding transaction. The transactions we analyze include user to network configuration, user to network session setup, user to user directory service and user to user download.
In the second part we use the application and scene activation latency budget to analyze the tradeoff between STU memory size and application response time. Understanding the design is important for application designers who want to minimize response time for a given memory size. This tradeoff depends on various assumptions about the object composition of MHEG applications. We show how variations in specific assumptions effect the behavior. In the conclusion of the paper we provide summary points that should be useful to an application designer in terms of tailoring the objects in a presentation to meet response time requirements for a given STU memory configuration. Although DSM-CC supports transactions for stream content, we do not consider these transactions in our latency budgets as we are interested in performance issues related to user interaction which cause dynamic object activation.
The paper is organized as follows. Section two provides background information on the important standards used here, and also reviews related work including recent interoperability experiments for DAVIC systems. Section three specifies the system parameters we have assumed in our analysis. Section four describes the end-to-end architecture and the protocol stacks assumed in our analysis. Section five presents the latency budgets analytical results evaluating application response time versus STU memory size. Section six describes the STU memory model followed by a description of the implementation work done by the authors. Section seven presents our evaluation followed by a section describing potential use of these results. Finally, section ten concludes the paper.
DAVIC (Digital Audio Visual Interactive Consortium) is an industry consortium of about 250 companies formed to develop internationally adopted specifications for systems supporting applications such as interactive television and video on demand. DAVIC published its first specification in December 1995. This specification provides an end-to-end definition that is built on a number of existing network and coding standards, including MPEG-2 Video, Systems, and DSM-CC and MHEG.
MHEG (Multimedia Hypermedia Expert Group - Coding of Multimedia and Hypermedia information) (Effelsberg 1995, Gopal 1995, MHEG 1995, Price 1993) is developed for the delivery of interactive multimedia application in a client server architecture. MHEG uses an object composition model with two types of compositions--the application and the scene. In addition to specifying the object model, it also specifies a life cycle model for activation, deactivation, preparation and destruction of object. This life cycle plays a critical role in scene and application object activation latencies.
An application object is a container for ingredients such as visual, audible, interaction, link and action objects. These objects are shared by scenes and are activated based on scene behavior. A scene is an object which groups ingredients such as visual, audible, interaction, link and action objects for their coordinated presentation. In DAVIC, an application, and consequently all its scenes and related objects are delivered to the STU over various possible networks. The objects are downloaded to the client as requested by the engine, and the engine is responsible for decoding, interpreting, presenting and managing these objects.
Since MHEG engines are intended to be deployed on set top units, the MHEG object model provides for specification of information such as caching and scene priorities. In addition to this, MHEG model supports two mechanisms for addressing presentable content in scenes and applications: 1) content inclusion, and 2) content reference. In the first case media objects are embedded inside the MHEG scene or application object. The lifetime of these media objects are the same as that of their container object. In the later case, a reference to the media object is maintained. Here, the lifetime of media objects depends on the memory model of the STU and caching strategy employed by the MHEG engine.
MPEG-2 DSM-CC is a set of protocol specifications for managing and controlling MPEG-1 and MPEG-2 bit streams. These protocols are intended for an application to setup, tear-down and manage a network connection using User-Network (U-N) primitives and for communication between a client and a server using User-User (U-U) primitives. U-N primitives are defined as a series of messages to be exchanged among the client, network and server. U-U primitives may use a Remote Procedure Call (RPC) protocol and may involve U-N messaging. DSM-CC may be carried as a stream within an MPEG-2 Transport Stream. Alternatively, DSM-CC may be carried over other delivery mechanisms, such as TCP or UDP.
Development of an interoperable end-to-end ITV system has been reported in Cossmann et al. (1996). This system is called the Globally Accessible Services (GLASS) and consists of clients, application server, video server, system management functionality, and gateway to services like the World Wide Web (WWW), e-mail, and FAX. It is based on a non-DAVIC model with an MHEG-1 run-time engine on the STB. The results obtained in this experiment have influenced the standardization of MHEG-1, MHEG and DAVIC. Their current system incorporates the DAVIC protocol suite and an MHEG run-time engine, and has been used in a recent DAVIC interoperability test at Columbia University (Columbia 1996). Both MHEG and media objects are transported through MPEG-2 Transport streams.
Applications Retrieving Multimedia Information Distributed over ATM (ARMIDA) (Columbia 1996) is another test bed for DAVIC. This system is developed at Centro Studi E Laboratori Telecomunicazioni, Torino, Italy. It implements some of the core components of the DAVIC architecture. The STB is a PC with MPEG-2 transport packet decoder and MPEG-2 elementary stream decoders. They have also developed a visual MHEG editor to author MHEG applications.
Graphics Communication Laboratories (GCL) have developed DSM-CC and MHEG engine software that are compliant to DSM-CC, MHEG, and DAVIC. Their DSM-CC implementation includes Base, Access, Directory, File and Stream interfaces. The MHEG engine implements a number of features but does not support handling streaming video or graphic objects. This software is available for evaluation on SunOS and Linux platforms.
None of these systems have reported performance data as yet. Some of these systems have been demonstrated primarily in ATM networks where network bandwidth has not been a determining factor.
In this section we present network processing and MHEG object processing estimates based on previously published work and our own experimental work. These estimates are used in the next section where we present detailed analysis of the breakdown of processing steps in sample DSM-CC transactions from transport layer and below.
Clark et al. (1989) analyze TCP processing and their results are the basis for our TCP/IP processing latencies at each stage of the architecture. They measured TCP processing overheads using logic analyzers and by instrumenting the UNIX kernel. Their measurements show that overheads are divided into two groups, one being processing cost incurred per byte and the other being the cost for packet level processing. The byte level processing involves buffer copy and TCP checksum computation and the packet level costs includes Ethernet driver processing, TCP+IP+ARP header processing cost and operating system overhead. Their results are given in. These costs were all computed on a 2 MIPS Sun-3/60.
Table 1 gives instruction cost equivalent for their measurement; Clark et al. (1989) also report that the instruction cost for TCP/IP were similar for different machines tested. Given an instruction count, we are able to estimate TCP/IP processing times on different machines involved in the end-to-end transaction of the DAVIC model.
Per byte: | ||
User-system copy | 200us | 400 |
TCP Checksum | 185us | 370 |
Network-memory copy | 386us | 772 |
Per packet: | ||
TCP + IP + ARP protocols | 100us | 200 |
OS overhead | 240 | 480 |
Ethernet driver | 100us | 200 |
Lazarou et al. (1996) validate simulation models of TCP/IP over ATM. TCP/IP packets are broken in to AAL5 segments before being broken into ATM cells. We used their processing measurements for computing AAL5 segmentation and re-assembly.
Since our aim is to provide lower bounds on end-to-end latency, we do not consider congestion and retransmission delays. In the ADSL-ATM network environment of interest in this paper, congestion could occur in the ATM network due to simultaneous service to many STUs, much of which could be streaming video. The ADSL network is point-to-point and would be dedicated for use by a single STU running a single application. The transactions we characterize occur when an MHEG application is active at an STU. During this period, all the traffic over the ADSL connection is application-related and is included in our latency estimates. Characterization of ATM network congestion during simultaneous streaming video and MHEG application sessions from many ADSL links involves many assumptions about applications and network configuration and is outside the scope of this paper.
Table 2 shows estimated overhead for the different network components of the DAVIC architecture, including the service provider, network access point and the STU. These times do not include application level processing however. For the service provider, we assume a service provider server system of at least 50 MIPS. The system should be a high end machine since it will be servicing many simultaneous application requests. The TCP/IP protocol estimates in Table 2 are derived from those in Table 1, but scaled for the higher MIPS rating of the server. The MPEG-2 transport packet encoding time in Table 2 is the time taken by the MPEG-2 transport packet encoder to encapsulate incoming data and produce a transport stream packet but does not include the MPEG-2 compression time which is not of concern here. This time is based on the DiviCom MPEG-2 transport encoder (DiviCom 1996).
Table 2 also shows the estimated overhead for the network access host, which is responsible for session setup and management. For generality sake we assume that this network access host is a different system than the service provider and this assumption is consistent with other video dial-tone architectures. Additionally we assume a high-end server system as the network access host. These estimated times are related to network processing and do not include application processing.
Finally Table 1 also shows estimated overhead for packet decoding and network processing at the STU. The STU is expected to be a low cost machine, therefore we assume a lower MIP rating. The MPEG-2 transport packet decode time is based on the LSI L64007 MPEG-2 Transport Decoder (LSI 1996).
MHEG applications are composed of clusters of objects called scenes. Each scene has a root scene object. All of the scene objects have a parent object called the application object. MHEG introduces a number of constraints about when the objects are to be delivered to the end system. One important constraint is that a component object for a given scene cannot be delivered without the corresponding scene object. An additional point is that the composition object model is quite flexible in that many different interactive presentations are available for a scene. Therefore, the complexity of the scene depends on the design of the presentation and there can be many variations in the number and type of MHEG objects needed to produce an interactive application.
Consequently, we have concluded that the questions regarding MHEG application response time versus MHEG memory cache size are highly dependent on the design of the presentation objects and the intentions of the presentation designer. In our previous experimental work we have created a number of interactive applications including the front end for an experimental ITV interface to retrieve video on demand and other related content. Depending on the media that are selected and the authoring tools that are used, different visual effects with a wide range of object encodings are possible. As a result we believe that the most useful information from the simulations that we have done can be communicated to the designers of MHEG applications as a comparison of MHEG object size versus response time versus caching. The MHEG application designer can then design a presentation based upon the object count implied by the target response time and available memory. These points are discussed further in a later section.
For simulating the application, scene, and ingredient activation, we used a composition model where the application object contains a initial set of ingredients which are shared across all scenes and these ingredients contain references to media objects. In addition to the ingredients contained in the application object, every scene contains a set of ingredients which are activated upon scene activation. Assumptions made about sizes of these objects are listed in Table 3. These sizes are not arbitrary, but are based upon our experience in development of MHEG-1 authoring and conversion systems as well as the experiences of reported by others during MHEG interoperability testing.
Service Provider | |
System rating | 50 MIPS |
TCP Max. Transfer Unit (MTU)size | 1460 bytes |
Application to System buffer copy(MTU) | 400 instructions |
TCP Checksum(MTU) | 370 instructions |
Network to System buffer copy(MTU) | 772 instructions |
TCP/IP/ARP Protocol processing(MTU) | 200 instructions |
MPEG-2 Transport packet encoding | 0.00023 seconds |
Network Access | |
System rating | 50 MIPS |
TCP MTU size | 1460 bytes |
Application to System buffer copy(MTU) | 400 instructions |
TCP Checksum(MTU) | 370 instructions |
Network to System buffer copy(MTU) | 772 instructions |
TCP/IP/ARP Protocol processing(MTU) | 200 instructions |
Set Top Box | |
System rating | 10 MIPS |
TCP MTU size | 1460 bytes |
Application to System buffer copy(MTU) | 400 instructions |
TCP Checksum(MTU) | 370 instructions |
Network to System buffer copy(MTU) | 772 instructions |
TCP/IP/ARP Protocol processing(MTU) | 200 instructions |
MPEG-2 Transport packet decoding | 0.000038 seconds |
The MHEG class hierarchy contains 34 classes of which 7 are abstract classes. All objects are encoded in ASN.1. Some objects contain additional variable size data, for example, to hold media such as color tables, image/graphics data or table structures. We do not consider ingredients or nesting of objects to estimate object size. Also, we do not include estimates for Token Manager and its related classes due to their complexity. String data in the objects are assumed to be 256 bytes. Table 3 gives the object sizes used in our analysis.
Root | 50 |
Group | 100 |
Application | 100 |
Scene | 1200 |
Ingredient | 300 |
Link | 400 |
Procedure | 300 |
Palette | 300 |
Font | 300 |
Cursor Shape | 300 |
Variable | 300 |
Presentable | 300 |
Visible | 310 |
Bitmap | 310 |
Line Art | 375 |
Rectangle | 375 |
Text | 400 |
Stream | 320 |
Audio | 300 |
Video | 300 |
RT-Graphics | 300 |
Interactible | 25 |
Slider | 385 |
Entry Field | 430 |
Hypertext | 425 |
Button | 375 |
Hotspot | 375 |
Push Button | 400 |
Switch Button | 400 |
Figure 1 shows the DAVIC end-to-end architecture analyzed in this paper. According to the DSM-CC specification, all U-U ( User to User) signaling from the STU to service provider takes place over RPC, and content and data download is carried by MPEG-2 transport packets. The access network in our discussion is an ADSL network and consists of an upstream 640 Kb/s signaling channel and a downstream 6.144 Mb/s data channel.
To pass a message the STU needs to do a number of processing steps including message fragmentation and protocol encapsulation. Figure 2 shows the protocol stacks used for S1 (content and data delivery), S2 (U-U messaging) and S3 (U-N messaging) information flows between STU and Network Access.
The delivery sub-system provides the service consumer access to the ATM network and service providers. It provides for proper routing of messages to and from the STU. Figure 3 shows the protocol stack for delivery of messages between the service provider to the access network. All message and data flows are broken down into ATM Adaptation layer 5 (AAL5) segments before transmission over the ATM network.
DSM-CC specification comprises of eleven protocols. All of the protocols are based on message passing except for the U-U RPC stub library. Each of these protocols consist of a series of message transactions between the client and the server. We have chosen a subset of the protocols that are relevant to set top delivery and are quite basic to any delivery environment. Our approach can be easily extended to other transaction scenarios.
The architecture shown in monly'Figure 4 represents the end-to-end model used in our discussion. Latency budgets for various functions in this model are discussed for each processing stage in later sections.
In this section we analyze five transactions. The formula used to compute these latency budgets is given in Appendix 1.
Configuration of STU involves passing of messages between the STU and the Network Access (NA) to obtain a network identifier and attach to the service gateway. The message flow between the STU and network access, and network access and service provider is shown in Figure 5. These information flows are based on the DAVIC and DSM-CC specification.
Table 5 shows messages passed between various entities. Column 1 gives the DSM-CC specific message, columns 2 through 6 give the protocol processing latencies for the corresponding stage. The zero values indicate no message processing at that stage. The end to end latency is 60 ms.
To play an MHEG application, an MHEG engine must be loaded from the service provider using the STU profile, which identifies the appropriate MHEG engine for the STU configuration. In general, the STU may load different engines for each application; hence the download time is an important factor. Downloading the MHEG engine consists of a series of message flows between the STU and service provider using the DSM-CC download protocol, as shown in Figure 7. Table 6 gives the latency budget for this scenario. The latency for downloading a 300 KB engine is about 600 ms. Table 4 shows latency versus engine size.
The download protocol transfers data between the server and the client in multiples of blocks. As the download is across two separate networks, the access and delivery network, there is an overlap of download data blocks (see Figure 6). The x-axis denotes the latency for data blocks across the two networks. Figure 8 shows the effect of download block size to download latency for various sizes of MHEG engines. For smaller block sizes the overlap is large. Consequently, the number of acknowledgements between the client and the server is also large. Hence download latencies are large ( region closer to the y-axis). But as the block sizes increase, download latencies which are small initially, start to increase with the decrease in overlap. The last column in Table 6 indicates overlap.
100000 | 0.272116 |
300000 | 0.600205 |
500000 | 0.912166 |
700000 | 1.240256 |
900000 | 1.568577 |
UN-CONFIG-REQUEST | 27 | 0.00023 | 0.000837 | 0.000005 | 0 | 0 |
UN-CONFIG-CONFIRM | 310 | 0.000528 | 0.005375 | 0.000011 | 0 | 0 |
UN-CLIENT-SESSION- SETUP-REQUEST | 1172 | 0.00144 | 0.02015 | 0.000029 | 0 | 0 |
UN-SERVER-SESSION- SETUP-INDICATION | 1172 | 0 | 0 | 0.000035 | 0.000037 | 0.000029 |
UN-SERVER-SESSION- SETUP-RESPONSE | 1134 | 0 | 0 | 0.000034 | 0.000036 | 0.00004 |
UN-CLIENT-SESSION- SETUP-RESPONSE | 1134 | 0.001399 | 0.030675 | 0.000028 | 0 | 0 |
0.003597 | 0.057038 | 0.000142 | 0.000072 | 0.000069 | ||
Total STU Configuration time | = | 0.060917 |
UU-DIR-OPEN-REQUEST | 1024 | 0.001283 | 0.0283 | 0.000057 | 0.000030 | 0.000026 | |
UU_DIR_OPEN_RESPONSE | 1024 | 0.001283 | 0.0273 | 0.000057 | 0.000030 | 0.000037 | |
DOWNLOAD-INFO-REQUEST | 296 | 0.000515 | 0.0182 | 0.00001 | 0 | 0 | |
DOWNLOAD-INFO-REQUEST | 296 | 0 | 0 | 0.000012 | 0.000008 | 0.00001 | |
DOWNLOAD-INFO-RESPONSE | 296 | 0 | 0 | 0.000012 | 0.000008 | 0.000013 | |
DOWNLOAD-INFO-RESPONSE | 296 | 0.000515 | 0.0092 | 0.00001 | 0 | 0 | |
DOWNLOAD-DATA-REQUEST | 17 | 0.000219 | 0.005713 | 0.000004 | 0 | 0 | |
DOWNLOAD-DATA-REQUEST | 17 | 0 | 0 | 0.000004 | 0.000001 | 0.000004 | |
DOWNLOAD-DATABLOCK(30) | 300000 | 0 | 0 | 0.000892 | 0.015797 | 0.381147 | |
DOWNLOAD-DATABLOCK(30) | 300000 | 0.087336 | 0.403906 | 0.000892 | 0 | 0 | |
DOWNLOAD-DATA-REQUEST(1) | 17 | 0.000219 | 0.001713 | 0.000004 | 0 | 0 | |
DOWNLOAD-DATA-REQUEST(1) | 17 | 0 | 0 | 0.000004 | 0.000001 | 0.000004 | |
0.09137 | 0.494331 | 0.001961 | 0.015877 | 0.381241 | |||
MHEG Engine download time | = | 0.600205 |
Once the MHEG engine is activated, the initial application object must be downloaded and activated. This involves the STU requesting from the service provider the identity of the application object, obtaining information about it using the DSM-CC directory service open command, and then downloading it using the DSM-CC download protocol. Once the object is downloaded, it is activated. If ingredients are referenced, they are first downloaded and then activated followed by the activation of the first scene. The application object activation time in this case includes time to download all ingredients but does not include scene activation latency.
After activation of the application object, the startup action is fired, which activates the first scene. Initially no scenes are active or cached. When a scene object is not in the cache, it is then retrieved from the service provider, prepared, and then activated. Upon activation, all ingredients in the scene are activated too. Activation time of ingredients not in the cache increase the overall scene activation time. Media can either be included in the scene objects or be referenced. In the latter case, a separate request for the media object must be made to the server. In the following graphs, cumulative media size includes the MHEG objects and the media data. The size of given scene is at the discretion of the application designer, but larger scenes led to longer activation times.
Activation time for application objects and scene objects for various media memory sizes, with media being contained and referenced, is shown in Figure 9 and Figure 10. For these estimates we have assumed that there is no sharing of media across scenes or applications. Also, to minimize DSM-CC acknowledgements, buffers allocated for download are of the same size as that of the object. For each of these plots, x-axis denotes the size of media content contained in a scene or application object and the y-axis denotes download latency in seconds. In the case of content reference, the plots indicates application objects with number of small media objects and a single large media object respectively. The latency for activating an application object with a large number of small media objects is larger because of an increase in the number of download transactions.
MHEG provides an abstract root class which is inherited by all classes. This root class provides a mechanism to specify whether the object can be cached or not. Accordingly, the MHEG engine may or may not cache the object. Both the application object and scene object inherit these properties.
Using this feature we estimate the average activation time of an application (see Figure 11 and Figure 12). When ingredients are all included, and when the application is found in the cache, no downloads are necessary. When ingredients are referenced, even though the application object is cached, referenced ingredients may not be. Hence these ingredients need to be downloaded and activated. Also, frequency of application activation are much lower than scenes. Hence sharing of content in application objects have much less effect on application activation latencies when compared to scene activation latencies as discussed in the following section. When the application is composed of small media objects that are referenced, latencies are larger due to the overhead of the download protocol and caching strategy as seen in Figure 12.
In Figure 12 the initial portion of the graph is skewed because of smaller overlap of ingredients across scenes contained in this application.
To characterize the scene-to-scene transition cases we partition the memory of the STU so that 0.5 MB is allocated for MHEG objects, 2 MB to media objects and 1 MB for resource objects. This partitioning is static. This somewhat simple cache organization allows us to more easily characterized the effects of limited cache space for different types of objects, whatever the cause.
Also, when activation time with respect to multiple content reference is discussed, a 10 KB ingredient is assumed. This number was chosen to show the effect of having a number of small ingredients versus a single large ingredient.
Using the caching feature of the root object we estimate activation time of a scene. Scenes contain ingredients which may overlap with ingredients in other scenes and also in the application. When ingredients are all included, on a scene cache miss the entire scene with all included ingredients need to be downloaded. When ingredients are all referenced, only ingredients that are not cached will need to be downloaded before activation. Figure 13 and Figure 14 show activation time of a scene object with caching.
When ingredients are referenced and every scene has multiple content objects, activation time increases initially because of smaller overlap and as overlap increases significantly the activation time starts decreasing as seen in Figure 14.
Another factor in scene-to-scene transitions is when the current scene involves streaming video. During MPEG-2 video playout, the entire downstream bandwidth of the ADSL line would be used. During this interval there is insufficient network resource to prepare the next scene. So while MHEG permits pre-caching of next scenes to reduce activation delay, lack of excess network resources in scenes with streaming media would preclude pre-caching. In this case, preparation and activation of the next scene would begin at the termination of the current scene.
In this section we address the question of how STU memory size affects the response time for MHEG applications. In MHEG, scenes can either be retrieved on demand or pre-buffered. Pre-fetching of scenes can improve response time for the user but requires that the STU have sufficient memory to hold the scene objects as well as the current active scene. Increasing the STU memory makes the device more expensive; consequently, understanding the relationship between STU memory and MHEG application response time is an important practical matter.
We make the following assumptions. We assume that the retrieval of a scene includes the associated media i.e. content inclusion, except that for continuous media, it includes only what is needed for pre-buffering. At the conclusion of any scene there may be an arbitrary number of next scene choices. We transfer to the new scene and keep all the ingredients of the current scene in the cache.
We use a Least Frequently Used (LFU) policy for managing the memory model. MHEG has a prioritization scheme as described in the overview, so that when a new scene is loaded into the memory, the scene with low priority will be flushed out first. In this context LFU is the prioritization scheme in which frequency of user access is the weighting scheme. This is just an idealization as it is difficult to know how users access scenes. However, as discussed in the evaluation section it is possible for the system to keep statistics of user access pattern.
In our simulation we have devised a static memory partitioning at the STU which is related to the different types of MHEG objects and their application life times. For example, objects such as font tables and color tables may have a longer persistence than graphic or text objects.
There are at least three categories to consider:
In general, resources have a longer life time than MHEG and media objects. Also, media objects can be larger than other objects and most of the cache will be used by media objects. Hence the partitioning need not necessarily be a static one.
In this section we discuss our implementation. The purpose of this implementation was to assess 1) feasibility of the end-to-end protocol and integration issues, and 2) MHEG engine size. We implemented a subset of the MHEG engine sufficient to present MHEG objects generated from our IconAuthor to MHEG converter, and a portion of the DSM-CC U-U protocol suite that defines services like Directory and Download. We used RMI (RMI 1996) as the underlying RPC protocol. The engine and DSM-CC layers were implemented in Java (Java 1995). A snapshot of one scene is shown in Figure 15 and the MHEG composition is omitted due to space. For our implementation the MHEG engine is approximately 300 KB and the DSM-CC module is about 30 KB of Java byte code. No U-N protocol was implemented.
The MHEG engine (see Figure 16) is a Java application which is downloaded to the STU. The engine is composed of three components: 1) scheduler, 2) link manager, and 3) download and decode module.
The scheduler coordinates all events, processes actions and orchestrates the presentation of the scene. When the initial application object is retrieved, decoded, and instantiated, the startup action associated with the application object is processed. This results in activation of the first scene. When a scene is activated, it is first retrieved, decoded, and instantiated. Now, all ingredients in the scene are activated based on their behavior, i.e., the ingredient is presented based on its initial condition or it is registered in the link table if it is a link. The cache is first checked before retrieval of any object requested.
Once a scene is active and running, the user may interact with it. These external events are mapped to MHEG events and the link manager is activated along with the object from which the event originated.
The DSM-CC interface provides an API for the MHEG engine to communicate with the DSM-CC U-U services like Directory and Download. These services must be registered with the RMI registry before being accessed by the MHEG engine. RMI ensures a uniform name space for all services.
The service provider system consists of two services: download, and directory service. The download service used the DSM-CC download protocol to download media objects. In our implementation the download service was not integrated with the MHEG engine due to limitation of the browser. The directory service maintains directory information of all MHEG and media objects and transfers this information to the service consumer using the DSM-CC directory service specification.
Making the MHEG engine small size means memory size in the STU can be small and download times can be reduced. Further if the engine can be downloaded incrementally (as in our Java implementation) then the run-time footprint of the engine can be smaller than that of a monolithic implementation, but with possible increased access costs due to retrieval of engine components. We have not evaluated the performance issues of an incrementally loaded engine.
MHEG provides a composition model. Since downloading of media involves a lot of signaling, it is crucial for application and content developers prepare compositions in such a way that latencies due to signaling are reduced. One way to solve this problem is to put all frequently used ingredients in the application object rather than the scene object as the lifetime of the scene is much shorter than the application. Figure 17 shows the average response time for activating a scene. Here percentage of overlap is the ratio of the number of ingredients in a scene to total number of ingredients in the entire application. As overlap increases response times drop. The non zero response time for a 100% ingredient overlap is because of retrieval of media content as a result of cache misses.
In this section we provide a list of use of results based on this work.
DAVIC specifies a complex end-to-end model for ITV delivery. Many of the trials performed so far have validated interoperability but have not reported performance measurements. There are many of issues that determine the performance of these systems. In this paper we have provided a systematic characterization of the lower-bound of end-to-end latencies for a specific case of the DAVIC architecture. We have selected application transactions which are of general interest in application delivery.
Based on this analysis, we see that STU memory, the composition model of applications, signaling between STU and service providers, and network latencies play a key role in delivery performance. In the previous section we provide some general conclusions for content and application developers about system requirements and constraints. The quantitative results could be built in to post-production tools for automatic analysis of delivery performance for specific compositions.