Crowne Plaza Hotel, Seattle, USA
The paper reviews studies of attention and recall of expository MM presentations, and summarises findings into key guidelines for attentional design. An expert system based design advisor tool is then described, which uses the guidelines to analyse MM presentations.
Presentation Design, Guidelines, Tool support
One of the problems in the design of expository Multimedia (MM) interfaces is knowing whether the presentation will successfully deliver its content to its audience. In a review of studies of MM used for instructional design, [5] critices many approaches to MM design : 'Until now, the selection of media has been a macrolevel decision. That is the decision - should video be used or is audiotape sufficient ? This meta-question that has driven research on media for the past thirty years and has resulted in little understanding of learning with media.' Instead [5] proposes a 'micro-level' approach will be required to study multimedia : '...media decisions for integrated multimedia will be micro-level decisions... The moment by moment selection of appropriate media can respond to specific cognitive needs and task requirements.'
The conclusions of [5] suggest that for a user, understanding a multimedia presentation requires a series of cognitive processes composed of visual and auditory attention processing and the integration of media together across the presentation. The key design problem is thus, what can the user extract form a presentation at a ‘moment by moment’ level ? Our work has attempted to address these issues, empirically investigating the cognitive processes of attention to Multimedia and providing design advice and tool support.
The paper has two main sections. First we review empirical studies of visual attention and recall of MM presentations. We were interested in the attentional processing of different visual presentation elements, such as text captions, labels and still and moving objects; and in how the serial nature of visual attention would cope with processing a complex and rapidly changing MM presentation. These studies used eye tracking equipment to analyse what was fixated by subjects viewing an MM presentation.
The results of the studies are taken as a basis to justify the importance of moment by moment design. The problems found in attentional design are used to inform a set of guidelines for presentation design. These are then extended to form a method for analysing MM presentations.
The second part of the paper describes a prototype authoring tool which uses the guidelines. The tool embeds our guidelines as rules within an expert system, which is linked to an authoring front end. Unlike several existing MM design tools built using AI methods [1],[3], it seeks to include the human designer by providing analysis and advice of potential design problems as the presentation is authored.
The existing literature on MM design can be divided into two categories : empirical studies of user performance, and guidelines or heuristics for successful design. Few empirical studies have addressed the effects of varying multimedia design on performance. [7] investigated the comprehension of a text and animation illustrating the operations of a piston. They suggest that the addition of text enhanced understanding of the presentation as a whole, and they argued that to be effective the presentation must sequence together both pictures and words into a unified whole. However little analysis is given as to how this sequence should be designed within the media themselves, or how such design decisions would have effected their results.
A similar study of MM design was conducted by [6]. They investigated the use of text, still image with text and animation with text in providing instruction for a descriptive presentation identifying parts of human heart. The study again only considered all or nothing changes and failed to address issues in how the media were designed. It is concluded that 'a difficult text is not necessarily enhanced by an animation even when the animation is well-designed and highly relevant. Complexity is not removed by the addition of more media.' The studies of [6],[7] offer high level results without considering how media can be combined or designed, or how media should be sequenced.
Several authors have suggested MM design guidelines. A set of principles for MM design are provided by [9].They summarise a range of educational issues in the use of MM. Examples of the type of advice offered include to ‘employ screen design and procedural conventions that require minimal cognitive resources, are familiar or can be readily understood, and are consonant with learning requirements’ and to ‘structure presentations and interactions to complement cognitive processes and reduce the complexity of the processing task’.
Whilst the principles of [9] clearly state that MM design should take into account the cognitive processes of the viewer, they offer little practical advice for what design requiring ‘minimal cognitive resources’ would actually entail.
A checklist for MM design is provided by [4]. It focuses on the need to combine media effectively and provides a questionnaire to rate how different media are used in the presentation . An example question would be to rate from ‘ineffective’ to ‘effective’ the ‘Use of text with sound’. Again, the checklist says little about design of media and gives no details as to how media should be synchronised together. [4] notes that 'what this taxonomy [of design issues] does not address is the design process and the order of use of different media'.
In summary, little existing work addresses the ‘moment by moment’ design of MM presentation. Studies generally address gross design decisions such as which media to use, and tend to be too high level to be useful.
In order to investigate the ‘moment by moment’ cognitive processes used in viewing an MM presentation, we performed a set of eye tracking studies upon presentations from a commercial MM presentation, ‘The Etiology of Cancer’. The studies made use of eight novice subjects. An example analysis is given below on part of a sequence explaining the spread of tumours via the blood system. Figure 1 shows a set of screen shots from the sequence and single subject traces produced form the eye track data. The average fixation location for the group of subjects is shown as numbered locations on the screen shots (for more details of the analysis see [2]).
Figure 1.
'Etiology of Cancer' Sequence (speech track is shown beneath), and single subject eye track trace
The first frame of the sequence showed the primary tumour and label, blood vessels with labels ‘blood to heart’ and ‘blood from heart’, and arrows in the vessels (figure 1, top). Eye tracking results showed that subjects initially fixated upon the large primary tumour object at the centre of the screen (1). Subjects then shifted from the tumour object to its label (2a), and to the blood vessels and their labels (2b),(2c). The primary tumour label (2a) received many more fixations than (2b) or (2c). This may be because (2a) was referred to in the speech track 'Metastasis resembles primary tumour development...', drawing attention from (2b),(2c).
The second frame showed the tumour cell within the primary tumour being highlighted (figure 1, bottom); subjects made a strong shift to fixate the highlight area (3). The tumour cell then began to move across the primary tumour and up the right vessel All subjects tracked its path into the blood vessel to the top right of the screen (4),(5a).. Finally, the tumour cell label was revealed. The reveal produced a shift of fixations between the tumour cell and its label (5b). Two of the subjects then returned to refixate the blood vessel labels before the presentation ended.
The attentional results suggested that speech cueing, highlights, reveals and motion have strong attentional effects, and showed that the tumour cell, its path and label were well attended to by all the subjects. They also predicted that a problem may have occurred in the design of the first frame of the presentation resulting from labels for ‘primary tumour’, ‘blood from heart’ and ‘blood to heart’ all being shown at once, with ‘primary tumour’ label cued in speech confusing focus with the other labels.
To ensure that the attentional results were accurate, we also tested recall. If we found a design problem, would it show up as a failing in the subject’s subsequent memory of the content ? A structured recall test bore out the eye track findings. Whilst all eight subjects had a general understanding of the motion of the tumour cell away from the primary tumour, only three out of eight could correctly recall that the tumour cell moved toward the heart rather than away from it. The direction to the heart was given in the labels on the blood vessels. The recall test provided some evidence that problems in attentional design could lead to difficulties in subsequent recall.
The results of our studies upon sequences from the ‘Etiology of Cancer’ were used to inform a set of guidelines for attentional design. The guidelines aimed to predict what would be attended to in a presentation and to flag any potential design problems. A subset of the guidelines are given below (see [2] for more details), followed by examples of their application to the cancer sequence.
IA1 : Scenes are not attended to uniformly. Objects which are larger, brighter, shown in more detail, away from other objects, or shown away from rest will be focused on in preference to other objects. This was the case for the primary tumour in the cancer sequence,which was larger and shown in more detail.
IA2 : Use a highlighting techniques (change colour, add symbol or label) to draw attention to important objects. By default only the scene level will be focused upon unless the user is motivated for closer inspection. The use of highlighting in the cancer presentation was highly effective, with subjects fixating upon the tumour cell when it was highlighted.
IA3 : Allow viewing time. Avoid showing an object in motion or using a highlighting technique when the user is extracting information from an image; allow at least a second before changing the image. In the cancer sequence subjects fixated the image for a second or more before they shifted to labels.
IA4: Beware of using too many highlighting techniques within an image at once.Sequence highlights to move attention from one object to another. In the cancer sequence, the initial frame had three objects highlighted with labels. This seemed to cause confusion, with viewing order moving from one to the other.
IA5: Gradually reveal objects and symbols to control viewing order. The reveal effectively attracts attention. Their is indication in the cancer sequence that reveal of the tumour cell label moved attention to the label and object.
AA1 : Onset of motion will attract attention; but it will also shift focus away from static objects in the scene. The motion of the tumour cell had a strong effect on all subjects. None of the subjects broke away from the path of the motion to read a label or fixate other static objects.
AA2 : Animation requires attention. If attention is already focused (such as reading a text or label, or attending to another animation) motion onset may not automatically shift attention.
AA3 : Use animation with care. Tracking an object motion will maintain focus but may prevent focus shifting elsewhere. When tracking motion avoid revealing other objects, symbols or labels during the animation or displaying more than one animation.
TA 1 : Generally an image will be focused on before a text. If focus is required on the text prior to the image, the text should be displayed before the image or the text area should be larger than the image. In the cancer sequence, attention was given to the image regions prior to the text.
TA 2: Revealing captions and labels is particularly effective and can be used to direct the users reading sequence. When the tumour cell label was revealed at the end of the cancer sequence, it was given longer fixations than other labels, such as those for the blood vessels.
TA 3 : Ensure that a label and object appear together to improve identification. Displaying a label and an object produces fixation shifts between the object and the label. These can be seen in the cancer sequence between the blood vessels and labels, and between the tumour cell and its label.
TA 4 : Allow reading time for text. When users are reading text, focus will not be able to shift until a break is encountered (end of a sentence or paragraph). The more complex the text is, the longer the reading time required.
SA 1 : Multiple strands of speech or sound will interfere with each other and distract focus. Speech can be focused upon concurrently with visuals, but no more than one strand of speech should be presented at once.
SA 2 : Reveal objects and labels when cued in the speech track. Cueing labels within the speech track will produce a shift attention to the object and its label. This may have been the case for the ‘primary tumour’ label which is cued in the speech track, and was generally given more fixations than the ‘blood form heart’ and ‘blood to heart’ labels.
SA 3 : Allow reading time after cueing a text. Avoid reveals or animation for the duration of speech segment which cues a label. If the label is complex, reading speed will be similar to that for speech track to pronounce it.
In order to use the guidelines in a design method, the presentation must be analysed over time and the guidelines applied to decided which part is likely to be focused upon, and where any contentions might occur. We perform this analysis by constructing a time line and then mark on to it which media are available and which are predicted to be in focus. We term this an ‘attention graph’. It shows the different media on the y axis, with time on the x axis.
An attention graph for the original cancer presentation is given in figure 2. Black lines indicate that an part of the presentation is in focus, whilst grey lines show that the part of the presentation is available for focus, but is not currently focused upon. Exclamation marks show potential problems.
Figure 2.
Example Attention Graph for 'Etiology of Cancer' sequence
In the first frame the speech track and primary tumour are marked initially in focus, as the primary tumour is the largest object in the scene (IA1). Next, the primary tumour and left and right vessel images have label effects, making them all candidates for focus (IA2). The guidelines suggest that the speech track will set focus on the primary tumour label, as it is referenced (SA2). They warn that each of the label and image effects should be revealed in sequence (IA5, IA6); and that the speech track does not refer to the blood vessel labels (SA2). The guidelines also note that reading time is required if labels are focused upon (TA4). Once focus is released from the tumour label, it would moved arbitrarily to one of the other blood vessels.
In the second frame, the guidelines set focus onto the tumour cell by the highlight effect (IA1). A possible problem with the tumour cell label is noted, which should be shown when the tumour cell is initially focused upon (TA3). The guidelines then give focus to the motion of the cell (AA1). When the cell stops moving, focus switches to the cell itself due to its label (IA1). A problem is noted with the speech track, which does not reference the tumour cell at the point when the label is shown (SA3).
A significant difficulty with using the attention graph is the time it takes to perform the analysis. Authors of MM presentations are used to being able to rapidly change presentation, delete and add media, and move media around in time. As the previous example shows, predicting a problem leads to the need to change the presentation, and thus requires a further iteration of the guidelines to confirm all is well with the new sequence. The time and effort consumed would make the method unwieldy for real world use.
Our answer to these problems has been to embed the guidelines within a tool, enabling the generation of the attention graph and annotation of potential problems on the fly. The aim was to build a tool that would allow a designer to enter their presentation, and be provided with immediate feedback of potential design problems. Whilst the key issue for the our method lies in the provision of guidelines, the implementation had to be scoped to allow both guidelines and to allow the presentation to be mocked up within the tool. The general requirements are thus to :
a) specify an environment which is rich enough to allow realistic MM presentations to be developed and which are comparable to the type of presentations which would be built in other authoring tools
b) specify a guideline advisor which has sufficient power to both support the guidelines at authoring time, and to produce the attention graphs and critique the completed design
c) ensure that the guidelines are separate from the authoring tool, and are inspectable and modifiable without rebuilding the tool itself.
The prototype tool was built in Visual C++ and AMZI prolog under Windows 95. It has two main components, shown in figure 3. The authoring component is used to build a task / media model. This represents the task which is to be presented, the media which are to be shown, and their sequencing in time. The Expert system component is able to interrogate the task / media model formed by the authoring tool and apply a set of design rules based on our guidelines. These are used to produce the attention graph and provide design advice. The following sections address key issues in the design of the authoring tool and expert system .
Figure 3.
Overview of tool architecture
To begin to enter the presentation, the designer constructs a hierarchical model of the task to be presented, the media required, and their sequencing in time. The first step is to build a task model. This is made up of a set of task nodes representing procedures, actions and objects. The cancer sequence is composed of a procedure, containing several objects eg the left blood vessel, primary tumour, and actions eg the movement of the tumour cell.
Figure 4.
Screen shot of Task / Media model and sketch input for 'move cell' task action
The task model is then populated with media, such as labels, text, image, speech and animation; eg for the ‘move cell’ action, an image, motion, speech and text label are required, shown in figure 4 (left). Since the tool was aimed at early stages of design, it was decided to use a sketch based representation for image and animation. This allows the user to rapidly prototype the design. An example is shown in figure 4 (right). Having selected the required media, each media is next annotated with presentation effects. The presentation effects are used to set focus on a particular media; eg in the ‘move cell’ action, the image of the cell is given a highlight effect. The demonstrator tool allows a limited range of effects, such as highlight and zoom on images. The presentation effect nodes are placed under the media to which they are related (see figure 4, left)
The end result of the task / media model is a hierarchy of task type, media type and presentation effect. In order to realise a presentation, the designer must now set time relations between the presentation effect eg when should an image be shown, when should the it highlighted etc. This is done on a set of time line bars. The presentation effects are shown as bars, which can be dragged to particular points in time, with their length giving duration (see figure 5). Having set the times at which a particular effect starts and ends, the finished presentation can then be played back.
Figure 5.
Screen shot of timeline bars for the 'Etiology of Cancer' presentation
The aim of the design advisor was to critique the emerging presentation as the user built up the task / media model and positioned the timeline bars. The notion of a 'critic' is similar to that used by [10] who defines that critic based advisors do not automate design, but attempt to spot and alert the user to problems in their design which the user can then fix.[11] suggests a range of levels of support a system can offer to the user, from manual to fully automated. We choose to provide critic based support for two reasons. First, whilst our guidelines were useful for spotting problems, they did not constrain the solutions. Thus, for a given problem, many solution orderings of the presentation elements existed. The choice of final ordering would be left to the designer. Second, we believe that in order to produce the presentation without designer intervention would require a far more detailed encoding of the task, the media content and the designer's needs. The costs of inputing such models may be greater than the value of the advice offered. A critic system on the other hand can be based on weaker models, since the designer can always qualify the advice based on these factors.
The expert system uses the task / media representation constructed in authoring to offer design advice and produce the attention graph. This is built in prolog and provides a forward chaining inference engine. It is based on a prolog in prolog style interpreter, which can load and interpret a set of rules from a text file. Prolog was chosen as it allows custom expert system shells to be easily prototyped [8] . The rules are made up of a left hand side which contains the condition which must be met in working storage for the rule to fire, and a set of actions which the rule causes. Since the rules are separate from the authoring tool, they can be easily edited and customised.
The expert system uses two knowledge bases of rules based on our guidelines. One set deals with interacting with the user based on menus. These give advice on selection of media and presentation effects, and prompt for additional information required for the attention graph. The example rule in figure 6 displays a menu which prompts the user for information concerning the visual appearance of an image. When the author selects one of the menu options, either image(emphasised) or addtech(image) are asserted causing further rules to fire to help the author choose a presentation effect.
Figure 6.
Example of menu rule and screen shot of menu produced.
The second set of rules are applied when a timeline bar is added to the presentation, or the user re-arranges the existing timeline bars. These rules are used to interrogate the start and end times of presentation effects set by the user. A sample rule is shown in figure 7. It checks if a timeline bar needing focus exists in working storage which is not a speech effect and not blocked, and which has a start time equal to the current end of focus (CE); if it finds one it adds priority(Start) to working storage.
Figure 7.
Example of Attention Graph rule
The rules used to produce the attention graph will now be described in more detail. They are divided into 6 phases :
a) Initialise : The first set of rules are used to initialise the focus search. They block focus on effects which require focus on another effect prior to themselves being given focused. These include highlights and animation which require their image show effect to be focused on first. The rules also separate running focus search on visuals from that on speech.
b) Find a visual focus start time : The next set of rules performs a search through the start and end times of the timeline bars, seeking bars which can potentially be in focus. A start time for focus is chosen :
i) Animation show effects are given focus by default if they overlap with the end of current focus. These rules recognises that animation have a very high attentional value and will gain focus by default as soon as it is free (AA1).
ii) Effects which are revealed just after the end of current focus are given focus. These rules reward effects which start directly after focus has ended, since it will be free to move to a new effect (IA5, TA2). Thus a label effect revealed directly after an image show effect releases focus will gain focus over a caption effect which started while focus was engaged on the image. Effects are favoured which are related to the current focus; this ensures that the current focus will move eg from an image show effect to the images highlight or label effect (IA6,TA2).
iii) Focus is given to effects which have started but which have not yet gained focus and are still active at the end of current focus. Previous effects are chosen based on recency, the effect closest to the focus end time is given focus. These rules work back over the timelines searching for effects which did not gain focus earlier. Again effects are favoured which are related to the current focus.
iv) Focus is given to the next effect which occurs after the end of current focus. These rules allow for gaps in the presentation in which no effects are active and requiring focus. They move focus on to the next available effect, again favouring effects which are related to the current focus.
c) Check visual focus priority : Having found a new start time for focus, a series of rules are now used to apply priority if more than one timeline bar competes for focus at the same time :
i) Focus is given to animation show effects (AA1). These rules favour animations over other effects. They advance the end of focus to the end of the animation effect, since attention will track animations and hold focus until the end of the effect (AA3).
ii) Focus is given to image show effects. These rules recognise that images will gain attention over labels or captions (TA1), and favours images with highlights or labels (IA1), or images which the user has selected as being perceptually prominent (IA1). They advance the end of focus to allow viewing time for the image (IA3). They also unblocks any effects which require prior focus on the image show effect. This allows an associated highlight or animation effect to compete for focus.
iii) Focus is given to highlight or label effects favouring highlights (IA2), then labels which are also referenced by speech (SA3). These rules favours effects which draw attention to images. They advance the end of focus to allow reading time for the label (TA4) or viewing time for the highlighted image (IA3).
iv) Focus is given to caption effect, favouring a bold font effect. They advance the end of focus to allow reading time for the caption (TA4).
d) Add visual focus warnings : The start and end time of focus are used to produce warnings.
i) A search is made for any effects which have start times that overlap the new focus period. These rules find any effects which have been prevented from getting focus by the effect focused upon. The rules then add warnings and design advice. Thus, if an image is revealed during an animation, a warning will be given because the animation will block the image getting focus.
ii) A search is made for any dependencies between effects. Thus if a label is shown, the image it is related is checked to ensure that it is in focus prior to the label.
iii) Sanity checks are applied to spot authoring errors. These include mistakes such as showing an highlight effect when the image is not being shown.
e) Check speech focus : focus on speech is phased separately to visuals. The rules are simpler than for visual focus, since only one strand of speech can ever be in focus at once (SA1). They ensure focus is given for the entire duration of the speech before it can switch to another speech effect. Focus is given to the first speech effect found, and held on it for its duration.
f) Add speech warnings : Warnings are given for overlapping speech effects (SA1), and checks are made that any labels and captions that are in focus when the speech effect starts are related (SA3).
The screen shots in figure 8 & 9 show an example of the how the rules produce an attention graph for the first frame of the cancer presentation. The white lines show the prediction of which part of the presentation will be in focus, and for how long (images require at least a second viewing time, whilst labels require 3 seconds). Buttons denote potential problems and provide warnings.
Figure 8.
Screen shot of Attention Graph for first frame of 'Etiology of Cancer' presentation
Figure 8 shows that focus is initially given to the primary cell image, due to the speech track drawing attention to the label, giving it a higher attentional value than the right or left vessel. Focus is held on the image for a small amount of time to allow for viewing. Warnings are added to show that potential problems may exist on the other timeline bars which overlap with the image bar (figure 10).
Figure 10.
Warning dialog for revealing images together
The rules then try to find other effects which are related to the image, giving focus to the label. They push focus back to allow reading time. Following this, focus is given arbitrarily to the left vessel, since both it and the right vessel have the same effects. Warnings are given at the end of the timeline bars, as they are to short to allow reading time for the left vessel label, which can be seen to run over the end of the label bar. They are also given due to the vessel labels being in focus when speech is being given from the primary tumour object.
By adjusting the timeline bars, the designer can correct the problems; when all the warning buttons have disappeared, the presentation has no focus problems. An example arrangement is shown in figure 9, in which the labels and images are revealed one by one to shift focus around the presentation. Each effect is spaced in time to allow for reading and viewing images. The left vessel label effect is not shown until the speech for the primary tumour has ended, allowing focus to remain on the primary tumour for the duration of the speech.
Figure 9.
Screen shot of Attention Graph for revised first frame of 'Etiology of Cancer' presentation
We believe that by taking into account the cognitive processes required to view an MM presentation we can improve presentation design. This paper has shown how empirical studies of these processes can be used to inform guidelines which are then wrapped up into a tool to support design. The need for tool support is vital if guidelines are to be actually used by authors in the real world. The guidelines must fit in with the type of tool used by the designer, and must be explicit and modifiable.
The current tool is a prototype to allow us to explore how guidelines can be used to assist authors. To this end the tool must present advice which can be readily understood by the author, and allow the author to try what-if style exploration. The attention graph is proposed as a representation which is highly fluid : the author can simply move the timeline bar and check quickly if any potential attentional problems will arise.
Our current work is to support our claims for the tool's utility be performing studies using the tool with novice MM designers. By analysing whether the tool improves their presentations, we should be able to justify our arguments for guideline support in authoring. We also feel that it is vital to take note of how designers would react toward the advice eg if it might reduce their creativity, or conflict with their instincts.
Finally, by stating explicit guidelines and modelling them in rules, the tool pushes us to perform further empirical studies to validate how the advice offered compares against what subjects actually do. These results would then refine our guidelines.
The Etiology of Cancer CD-ROM is produced by and copyright of American Medical Television and Silver Platter.