Obviousness — US Patent 8355484

Analysis of Obviousness for US Patent 8,355,484

Introduction

This analysis examines the obviousness of the independent claims of U.S. Patent No. 8,355,484 ('484 patent) under 35 U.S.C. § 103. The analysis is based on prior art references cited within the patent's own documentation, with publication dates preceding the patent's priority date of January 8, 2007. An invention is considered obvious if the differences between the claimed invention and the prior art are such that the subject matter as a whole would have been obvious at the time the invention was made to a person having ordinary skill in the art (POSITA).

The core inventive concept of the '484 patent is to mask processing latency in an automatic dialog system by playing "transitional messages"—such as paralinguistic sounds ("um," "uh") or short phrases ("let's see...")—to make the delay feel more natural to the user, as opposed to using silence or non-human sounds like music or beeps.

The following combination of prior art references renders the independent claims (Claims 1, 3, and 6) of the '484 patent obvious.

Primary Obviousness Combination: US6546097B1 in view of US20050273338A1

A POSITA would have been motivated to combine the teachings of US Patent 6,546,097B1 (hereafter "'097") with the teachings of US Patent Application Publication 2005/0273338A1 (hereafter "'338") to arrive at the claimed invention. The '097 patent teaches the foundational concept of filling a user's wait time with audio, and the '338 publication teaches the specific type of natural, human-like audio content needed to make that filler less annoying and more conversational.

1. US Patent 6,546,097B1 (Rockwell, Pub. Date: April 8, 2003)

The '097 patent discloses an automatic call distribution (ACD) system that addresses the issue of callers waiting on hold. This waiting period is analogous to the processing latency in the '484 patent.

What '097 Teaches:
- Masking Latency: The patent explicitly teaches a solution for masking latency. It describes a "signal generator for providing a signal to a caller while the caller is on hold" ('097 Abstract). This directly corresponds to the '484 patent's concept of providing a message "while processing the communication."
- Providing Audio "Filler" Content: The '097 patent teaches that the signal provided to the user during the delay can be audio content, such as "music, advertisements, announcements, or customized messages" ('097 Abstract). This establishes the general principle of using audio to fill a processing delay in an automated communication system.

The '097 patent provides the basic framework of an automated system that fills a processing delay with audible content for the user. However, it does not specify the use of paralinguistic or natural hesitation sounds.

2. US Publication 2005/0273338A1 (IBM, Pub. Date: Dec. 8, 2005)

The '338 publication teaches a method for making synthesized speech from a Text-to-Speech (TTS) system sound more natural and less robotic by incorporating human-like non-word sounds.

What '338 Teaches:
- Synthesizing Paralinguistic Events: The '338 publication explicitly discloses generating "paralinguistic phenomena... in a text-to-speech (TTS) synthesis system" ('338 Abstract). This directly teaches the use of a "speech synthesis system" as required by the '484 claims.
- Specific Filler Content: The publication identifies the exact types of "transitional messages" claimed in the '484 patent, namely "filled pauses (e.g., 'um', 'uh'), hesitations, laughter, etc." ('338, ¶). Claim 11 of the '484 patent lists similar events, such as "a cough, a breath, an utterance 'uh,' an utterance 'um,' and/or an utterance 'hmmm.'"
- Motivation for Use: The stated purpose of generating these sounds is to solve the problem of unnatural-sounding automated systems. The publication notes that without these events, the speech output from conventional systems "sounds unnatural and robotic" ('338, ¶).

3. Motivation to Combine '097 and '338

A person of ordinary skill in the art developing interactive voice systems prior to 2007 would have been well aware of the user frustration caused by latency. The '484 patent itself acknowledges in its background that conventional latency-masking techniques, like playing "earcons" (e.g., music), can be "annoying or unnatural."

The motivation to combine the teachings of '097 and '338 would have been to improve the user experience during processing delays.

A POSITA would start with the established practice taught by '097: fill the latency with audio.
Recognizing the "annoying or unnatural" problem with the content suggested by '097 (music, ads), the POSITA would seek a better, more natural-sounding type of filler content.
The '338 publication provides a direct and explicit solution. It teaches not only the desirability of paralinguistic sounds like "um" and "uh" but also the technical means to create them within a speech synthesis system for the express purpose of making an automated system sound more natural and human-like.

It would have been an obvious step to replace the unnatural "announcements" or "music" from the '097 system with the natural-sounding, synthesized "paralinguistic events" from the '338 system. This combination would be aimed at the predictable goal of masking latency in a manner that is less irritating and more closely mimics human conversational patterns, thereby directly arriving at the invention claimed in the '484 patent.

4. Mapping the Combination to Independent Claims

Claim 1 (System) & Claim 6 (Method): The combination teaches a system/method that receives a communication ('097), processes it while the user waits ('097), and masks the resulting latency ('097). It further teaches modifying this system to provide transitional messages comprising paralinguistic events ("um," "uh") or phrases ('338) using a speech synthesis system ('338) until the final response is ready. A "filler generator" is inherent in the "signal generator" of '097, and playing a plurality of short sounds like "um" during a continued delay would be an obvious implementation.
Claim 3 (Apparatus): This claim adds that the transitional message is randomly selected from a database. This is not a patentably distinct feature. For a system designer aiming to create a more natural experience and avoid predictable, repetitive sounds, randomly selecting from a small set of filler sounds (e.g., "uh," "um," "hmmm") would have been a well-known and obvious design choice to enhance variety.