ACM Multimedia 95 - Electronic Proceedings
November 5-9, 1995
San Francisco, California

SpeechActs:
A Conversational Speech System

Nicole Yankelovich
Speech Applications Group
Sun Microsystems Laboratories
Two Elizabeth Drive
Chelmsford, MA, USA 01824
508-442-0441
nicole.yankelovich@east.sun.com

ACM Copyright Notice


Table of Contents


INTRODUCTION

This video provides a live demonstration of the SpeechActs system developed by the Speech Applications Group at Sun Microsystems Laboratories.

SpeechActs has been designed for traveling professionals who require access to on-line information while they are away from their computer. While a portable computer can empower the nomad, the logistics of using a modem can often be impractical or impossible. Telephone-based interaction can be a simpler, lighter-weight means of staying in touch. Conversational speech offers an attractive alternative to keypad input for telephone-based interaction. The telephone is ubiquitous, and conversational speech is familiar, leaves hands and eyes free, and opens up the possibility of a virtually unlimited set of commands.

Implementing a usable conversational interface with error-prone speech recognizers, however, is a challenging task. Despite the difficulties, current speech technologies are now good enough to make experimentation worthwhile. The SpeechActs system is the result of this experimentation.

THE SpeechActs SYSTEM

SpeechActs is a research prototype that integrates third-party speech recognition and synthesis with telephony, natural language processing capabilities, and other tools for creating speech applications. For more information, see [1, 2, 3].

The video shows SpeechActs being used in the home, in the airport, and at a hotel. While the video focuses primarily on the end-user aspects of the system, it also touches briefly on some of the natural language underpinnings.

To date, SpeechActs includes speech-only interfaces to a suite of integrated applications. Two of these are speech interfaces to Sun's graphical deskset tools, Mail Tool and Calendar Manager. Another two applications provide speech access to dynamic data feeds for weather forecasts and stock quotations. A fifth application, called Notify, allows the user to set up reminder telephone calls.

Mail

With the mail application, which uses Sun's Mail Tool backend, users can hear their messages, skip forward or backward from one header to the next, fax a message, reply to a message, or initiate a new message to a person on their short list of names known to the speech recognizer. Based on a set of rules defined by each user, mail messages are grouped and prioritized. The groups allow users to handle batches of related messages (e.g., "skip those," or "delete that group") and the priorities cause the most important messages to be presented to the user first.

Following is an example of a typical interaction with the mail application:

   SpeechActs:	Switching to mail. You have 10 old messages and 63 
		new messages, of which 14 are priority. Your first
		three messages are from people in the Labs."

   User:	Let me hear them.

   SpeechActs: 	Message 1 from Eric Baatz, subject: "Draft Report."

   User:	Read it to me.

   SpeechActs:	"The first draft is ready for your comments. Eric."	

   User:	Reply and include the current message.

   SpeechActs:	Preparing message to Eric Baatz. Begin recording after 
		the tone. When finished, pause for several seconds.

   User:	Eric, I'll get those to you this afternoon. 

   SpeechActs:	Say cancel, send, or review. 

   User:	Send.

   SpeechActs:	Message sent. What now?

   User:	Next message.

   SpeechActs:	Message 2 from Stuart Adams, subject "Can we meet at 
		3:00 today?" 

   User:	Switch to calendar...

Calendar

The SpeechActs calendar interface, based on Sun's Calendar Manager application, allows users to browse their own calendar as well as the calendars of other users on their short list. When the user requests information, the application reads them all the events on a selected day. Typical calendar queries include:

   What do I have tomorrow?

   What about Bob?

   What did he have last Wednesday?

   And next Thursday?

   What was Paul doing three days after Labor Day?

Based on user studies, the speech version of the calendar was designed to support extensive use of pronominal references, which are common in conversational speech, as well as use of relative dates, which are essential if no graphical calendar is available.

Weather

The weather application provides an interface to the University of Michigan's on-line Weather Underground forecasts. Users can call up and ask for weather for states and for major cities around the country. For example, the user can say:

   What's the weather in Seattle?

   How about Texas?

   I'd like the extended forecast for Boston.

Stock Quotes

Like the weather application, the stock quotes application provides a speech interface to a dynamic data feed. The user is able to ask for the prices of selected stocks, or ask about their highs, lows, and volume. Sample queries include:

   What's the price of Sun?

   What was the volume?

   Tell me about IBM.

Notify

Somewhat different from the other applications, Notify allows users to request SpeechActs to call them at a pre-defined location or at a telephone number entered using the keypad. This is useful for wake-up calls or for reminder messages. When the user sets up the call, they may either opt to record a message or select an application. When the telephone rings at the appointed time, SpeechActs asks to speak to the user who initiated the call. Once the person provides his or her password, the recorded message is played or the selected application is started up.

Common Features

As with multiple graphical applications running in the same environment, SpeechActs supports a standard set of functions that are always available in any application. For example, the user may always switch to a different application, ask for help, transfer the call to another number, or end a session by saying "good bye."

UNDERLYING TECHNOLOGY

The SpeechActs Application Framework supports the use of speech recognizers and speech synthesizers as plug-in components. All speech recognizers that are currently supported by the framework are grammar-based, speaker-independent, continuous speech recognizers.

To simplify the software developer's task of writing an application that can be used with speech recognizers from different vendors, the SpeechActs framework includes a Unified Grammar language. This language allows developers to write speech recognition grammars in a recognizer-independent manner. In addition, the Unified Grammars can be augmented with tests and actions for our Swiftus natural language processor. Swiftus is responsible for translating the conversational speech input into sets of feature/value pairs that are easier for a backend application to parse than English-language sentences.

At run time, a Discourse Manager keeps track of state and other information necessary for successful communication with the user. The Discourse Manager interprets the meaning of pronouns, translates relative dates into specific ones, disambiguates ambiguous user names, and stores common information so that it can be shared by the applications.

ABOUT THE VIDEO

The SpeechActs software being demonstrated in the video runs on any SparcStation and uses Sun's XTL telephony software. For the demonstration, SpeechActs was configured with the HARK speech recognizer from BBN and the TruVoice synthesizer from Centigram. The speech versions of the applications, the Unified Grammar language, the Swiftus natural language processor, and the Discourse Manager were all developed by the Speech Applications Group at Sun Microsystems Laboratories.

REFERENCES

1. Martin, Paul and Andrew Kehler. "SpeechActs: A Testbed for Continuous Speech Applications," AAAI-94 Workshop on the Integration of Natural Language and Speech Processing, 12th National Conference on AI, Seattle, WA, July 31-August 1, 1994.

2. Yankelovich, Nicole, Gina-Anne Levow, and Matt Marx. "Designing SpeechActs: Issues in Speech User Interfaces," SIGCHI `95, Human Factors in Computing Systems Proceedings, Denver, CO, May 7-11, 1995.

3. Yankelovich, Nicole and Eric Baatz. "SpeechActs: A Framework for Building Speech Applications," AVIOS `94 Conference Proceedings, San Jose, CA, September 20-23, 1994.