Obviousness — US Patent 8515765

The obviousness of US patent 8515765 under 35 U.S.C. § 103 can be analyzed by considering combinations of prior art references explicitly cited within the patent itself. A person having ordinary skill in the art (PHOSITA) in the field of speech recognition, natural language processing, and human-computer interaction would likely possess a relevant technical degree and practical experience in developing such systems. The PHOSITA would be aware of the challenges in creating intuitive voice user interfaces and the desire for more natural human-machine interactions.

The core of US8515765 describes a "cooperative conversational voice user interface" that understands free-form human utterances, leverages short-term and long-term shared knowledge to generate hypotheses about user intent, ranks these hypotheses by certainty, and produces adaptive responses. Key features include free-form voice search, noise tolerance, and context determination.

Identified Prior Art References:

U.S. Pat. No. 7,634,409 (the '409 patent): Titled "Dynamic Speech Sharpening," this patent describes an enhanced system for speech interpretation. It receives user verbalizations and generates preliminary interpretations by identifying phonemes, which are then mapped to syllables or words using an acoustic grammar.
U.S. Pat. No. 7,640,160 (the '160 patent): Titled "Systems and Methods for Responding to Natural Language Speech Utterance," this patent teaches systems and methods for processing natural language speech and non-speech communications. It transcribes these inputs into textual messages and executes commands by applying context, prior information, domain knowledge, and user-specific profile data to determine context and present expected results.
U.S. Pat. No. 7,949,529 (the '529 patent): Titled "Mobile Systems and Methods of Supporting Natural Language Human-Machine Interactions," this patent, while not fully detailed in the provided search snippets, is cited in US8515765 as relating to systems and methods for supporting natural language interactions, particularly in a mobile context. US8515765 cites it in the context of how context domain agents compete to determine the most appropriate domain for an utterance.
"Enhancing the VUE™ (Voce-User-Experience) Through Conversational Speech" by Tom Freeman and Larry Baldwin (the Freeman & Baldwin paper): This technical paper, published October 16, 2006, argues that the widespread adoption of speech technology depends on "Conversational Speech" models. It aims to define conversational speech and its role in human-machine interfaces (HMI), emphasizing that such models make interactions feel "normal" and eliminate the need for users to learn new HMIs.

Obviousness Analysis

Given the state of the art around the priority date of US8515765 (October 16, 2006), a PHOSITA would have been motivated to combine these references to arrive at the claimed invention.

Combination 1: The '409 patent + the '160 patent + the Freeman & Baldwin paper

The Problem Recognized by the Art: The Freeman & Baldwin paper clearly articulates the need for "Conversational Speech" models to make human-to-machine interactions intuitive and widely adopted. It states that existing HMIs fail to provide the same level of intuitive interaction as human-to-human conversation and that "widespread implementation of speech is dependent on Conversational Speech models like those described herein." This paper, co-authored by inventors of US8515765, provides a direct motivation for developing a cooperative conversational voice user interface.
Existing Components: The '409 patent teaches the fundamental speech recognition engine for generating preliminary interpretations from verbalizations. The '160 patent provides a robust framework for natural language understanding, using context, prior information, domain knowledge, and user profiles to interpret requests across multiple domains. US8515765 explicitly refers to these patents for its ASR component and its context determination process, respectively.
Motivation for Combination: A PHOSITA, aiming to address the need for a more natural "Conversational Speech" interface (as defined and motivated by Freeman & Baldwin), would find it obvious to combine an existing speech recognition system (like the one taught by the '409 patent) with an advanced natural language understanding and context management system (like the one taught by the '160 patent). The '409 patent handles the initial audio-to-text conversion, while the '160 patent enables the semantic understanding and contextual disambiguation crucial for a "conversational" experience. The Freeman & Baldwin paper provides the goal and high-level architectural guidance, making the combination a logical step to achieve a more natural and cooperative voice user interface.

Elements of US8515765 Rendered Obvious:

Cooperative conversational voice user interface: Directly envisioned and motivated by the Freeman & Baldwin paper.
Understanding free-form human utterances: The '160 patent explicitly describes handling "natural language speech utterance", and "free-form" is an obvious characteristic of "conversational speech" (Freeman & Baldwin).
Generating preliminary interpretations via speech recognition: Directly taught by the '409 patent and a prerequisite for any voice interface.
Context determination based on previous utterances, user profiles, and domain knowledge: Explicitly taught by the '160 patent, which emphasizes improving reliability through "user specific profile information" and "determining the context". US8515765 itself states that its context determination process uses competing context domain agents "as described in U.S. patent application Ser. No. 11/197,504" (which became the '160 patent).
Generating adaptive conversational responses: The '160 patent teaches "presenting the expected results" based on context. Making these responses "adaptive" to maintain a "conversational feel" (Freeman & Baldwin) would be an obvious refinement for a PHOSITA.

Combination 2: Combination 1 + the '529 patent

Adding Mobile Context: The '529 patent, titled "Mobile Systems and Methods of Supporting Natural Language Human-Machine Interactions," directly addresses the application of natural language interaction in mobile environments.
Motivation for Combination: Given the increasing ubiquity of mobile devices at the time, a PHOSITA would be motivated to extend the benefits of a cooperative conversational interface (achieved by combining '409, '160, and Freeman & Baldwin) to mobile platforms. This would be a straightforward implementation of known principles in a new, but highly relevant, computing environment. The '529 patent teaches the specific adaptation of such natural language systems for mobile use, making this integration obvious.

Further Obvious Features and Motivations:

Many of the specific features elaborated in US8515765, such as:

Free form voice search enhancements: Features like tolerating jargon, slang, variations in word order, verbalized pauses, handling compound requests, inferring intent from incomplete requests, recognizing alternative expressions, and managing imperfect speech (starts, stops, stutters, mid-utterance changes) are all recognized challenges in natural language understanding. A PHOSITA, armed with the goal of creating a "natural, intuitive, free-form manner of expression" as desired for conversational speech (Freeman & Baldwin), would be motivated to incorporate known techniques to address these imperfections and enhance robustness. For example, using "models of human understanding" to infer intent from imperfect speech, such as relying on the "last criterion" or intonation, would be an obvious approach informed by linguistic research.
Noise tolerance module: The ability to discard meaningless words or filter environmental/non-human noise (potentially using multi-microphone collation) is fundamental to improving the accuracy of any speech recognition system, including those described in the '409 patent. Techniques for noise reduction and multi-microphone processing were well-known in the signal processing and acoustics fields prior to the priority date. Defining performance benchmarks based on human criteria is an obvious way to measure success for a human-like interface.

In summary, the concepts of robust speech recognition, natural language understanding, contextual awareness, user profiling, and the explicit goal of achieving a natural, conversational human-machine interface were well-established in the prior art. The combinations of the '409, '160, and '529 patents, guided by the vision articulated in the Freeman & Baldwin paper, would have made the system and method of US8515765 obvious to a PHOSITA.