ACM Multimedia 95 - Electronic Proceedings
November 5-9, 1995
San Francisco, California

Surfing the Web by Voice (Demonstration Summary)

Charles T. Hemphill
Corporate Research and Development
Texas Instruements
PO Box 655474, MS 238
Dallas, Texas 75265, USA
214-995-0393
hemphill@csc.ti.com

Philip R. Thrift
Corporate Research and Development
Texas Instruements
PO Box 655474, MS 238
Dallas, Texas 75265, USA
214-995-7906
thrift@csc.ti.com

ACM Copyright Notice


Abstract

We demonstrate a speaker independent, continuous, real-time, flexible vocabulary, dynamic grammar speech interface to the World Wide Web. In addition to speakable control words (e.g., ``scroll down'' and ``back''), we have made NCSA Mosaic speech aware in three novel ways. First, the interface implements the idea of a speakable hotlist --- the user can associate any grammar or ``language'' with any URL (e.g., saying ``What's the weather today?'' brings up the URL <http://wxweb.cl.msu.edu/weather/interactive.html>). Second, the interface includes speakable links --- the user can speak any underlined hypertext link on any page. This involves some lexical challenges (e.g., ``DOW DOWN 1.68 AT 11''), on-the-fly pronunciation generation, and dynamic grammar modification. Finally, we have implemented the concept of smart pages, making it possible to associate a grammar with any Web page. In this way, the interface knows the language for that page, recognizes sentences using that language, and passes the result back to the page for interpretation. To avoid coverage issues, each smart page can briefly describe the language to the user. The demonstration will illustrate the usefulness of these mechanisms for voice input to a multimedia application, let us show how we can customize the system, and allow the audience to surf the Web by voice.

Table of Contents


Summary

A separate paper [Paper95] describes the work in more detail, along with the original motivation for this work --- intelligent agents. A separate video summary [Video95] describes a recorded demonstration of the work. However, with the dynamic nature of the Web and multimedia in general, a live demonstration provides the best opportunity to illustrate the system.

The live demonstration will begin much like the recorded demonstration: We will illustrate the utility of the speakable hotlist by quickly accessing an interesting page by voice. We will then navigate from there by speaking one of the links on that page and subsequent pages. Again using the speakable hotlist, we will visit a smart page to illustrate the power of associating a spoken language with a page.

Table 1 includes some sample speakable hotlist queries to try. They illustrate several points about the speakable hotlist: users can define any language they choose to access a URL, these languages can contain a wide variety of syntactic variation, and this variety often makes items in the speakable hotlist easier to remember when compared with the single title approach. In all cases, using the speakable hotlist proves faster and more convenient than the regular hotlist. We can discover more items to try in the speakable hotlist by speaking the query ``Show me my speakable hotlist''. The speakable hotlist defines this query using the following grammar in Backus-Naur form (BNF) and the `url' predicate to associate the URL with the grammar:

  start(speakable_hotlist).
  url("file://localhost/$SAM_HOTLIST").
  speakable_hotlist --->
	  [bring up | go to | (give | show) me]
	    [the | my] speakable hotlist.
Table 1. Sets of Sample Speakable Hotlist Sentences, the Related URL, and the URL title.

To illustrate the flexible vocabulary of the system, we will take page requests from the audience (assuming an Internet connection). Our system is not a dictation system, so for most requests we will begin by typing the requested URL. Once we reach the page, however, we can speak any of the links and the links on subsequent pages. In the unlikely event that no one requests a page, we can ``Bring up the Yahoo Yellow Pages'' and surf from there. The following sentences represent one such sequence from that page (ignoring intermediate ``scroll down'' commands to the browser):

We can straightforwardly speak the first link. The second requires a little tokenization on the part of the system to let us say ``P C's''. The third link illustrates two points: we can speak the token `WWW' in many ways (e.g., ``W W W'' and ``triple W'') and we can stop speaking after N words (default 3) for long links. The system therefore lets us say ``triple W personal computing'' to select the link. In the final link, we can simply say ``prehistoric computers'' since bracketed text becomes optional. The system supports many other tokenization rules to simplify link speaking.

To illustrate the speaker-independence and continuous speech aspects of the system we will pass the microphone to willing participants. We will also entertain additions and modifications to the speakable hotlist to illustrate the ease of customizing this important aspect of the interface. To add a URL to the speakable hotlist, we simply visit the page and say ``Add this page to my speakable hotlist''. This adds the title of the page as the default grammar and automatically associates that grammar with the URL. Speaking the phrase ``Edit the speakable hotlist'' allows us to manually add more syntactic flexibility in retrieving the page by voice.

As time permits, we will build a smart page of interest to the audience. It is straightforward to write a grammar and associate it with a page, but deciding what it should do in response to a query requires some thought on the part of the page designer. The following very simple, self-contained smart page and associated grammar might serve as a starting point. It supports commands such as ``Show me the home page for T I''. To install it, place the shell script in the cgi-bin directory and the grammar in the docs/grammars subdirectory. The full paper describes the smart page mechanism in detail [Paper95].

  #!/bin/sh

  if [ $# = 0 ]; then   # produce HTML
    echo Content-type: text/html; echo
    cat << PAGE
  <HTML><HEAD>
  <TITLE>Home Pages</title>
  <LINK REL="X-GRAMMAR" # a smart page!
   HREF="/grammars/homepages.cfg">
  </HEAD><BODY>
  Ask for home pages. For further help,
  <A HREF="/grammars/homepages.cfg">
  look at the smart page grammar.
  </A>
  </BODY></HTML>
  PAGE
  else   # interpret recognized sentence
    case "$*" in
      *Texas*) URL="http://www.ti.com/" ;;
      *T\ I*)  URL="http://www.ti.com/" ;;
      *Sun*)   URL="http://www.sun.com/" ;;
    esac

    echo "Location: $URL"; echo
  fi
  start(home_pages).
  home_pages ---> 
   (give | show) [me] the home page for CO_NAME.
  CO_NAME ---> Texas Instruments | T I |
	       Sun [Microsystems].
Table of Contents

References

[Paper95]
Hemphill, Charles T. and Thrift, Philip R., Surfing the Web by Voice, Online document, 1995.
[Video95]
Hemphill, Charles T. and Thrift, Philip R., Surfing the Web by Voice (Video Summary), Online document, 1995.
Table of Contents