Obviousness — US Patent 8073681

Obviousness Analysis under 35 U.S.C. § 103 for US Patent 8073681

To assess the obviousness of US Patent 8073681, titled "System and method for a cooperative conversational voice user interface," under 35 U.S.C. § 103, we must consider whether the differences between the claimed invention and the prior art, at the time the invention was made (priority date: October 16, 2006), would have been obvious to a person having ordinary skill in the art (POSITA).

The '681 patent describes a sophisticated voice user interface aimed at achieving "cooperative conversations" analogous to human-to-human interaction, overcoming the limitations of earlier "Command and Control" systems. Key aspects include understanding free-form speech, tolerating noise and imperfect utterances, determining context through competing agents, building hypotheses based on shared knowledge (short-term and long-term), and generating adaptive, conversational responses that guide the user and correct misrecognitions.

The patent itself identifies several relevant prior art references:

US Patent 7,634,409 (formerly Ser. No. 11/513,269): "Dynamic Speech Sharpening," filed August 31, 2006, issued December 15, 2009. This reference describes techniques for a speech recognition engine (ASR) to interpret utterances using phonetic dictation to recognize a phoneme stream.
"Enhancing the VUETM (Voce-User-Experience) Through Conversational Speech" by Tom Freeman and Larry Baldwin: This publication is directly cited in the '681 patent in relation to the conversational speech engine, context domain agents, and vocabularies. Notably, Tom Freeman and Larry Baldwin are also inventors of US8073681.
US Patent 7,640,160 (formerly Ser. No. 11/197,504): "Systems and Methods for Responding to Natural Language Speech Utterance," filed August 5, 2005, issued December 29, 2009. This patent explicitly teaches context determination using competing context domain agents and inferring intended operations/context based on previous utterances, along with updating short-term and long-term shared knowledge.
US Patent Application Ser. No. 11/212,693: "Mobile Systems and Methods of Supporting Natural Language Human-Machine Interactions," filed August (incomplete citation in the provided text). While incompletely cited, its title indicates clear relevance to natural language human-machine interaction, particularly in mobile contexts.

Combination of Prior Art and Motivation for Obviousness

A person having ordinary skill in the art (POSITA) in speech recognition, natural language processing (NLP), and human-computer interaction, at the time of the invention (2006), would have been motivated to combine the teachings of US 7,634,409, US 7,640,160, and the "Enhancing the VUETM" publication to arrive at the claimed invention of US 8073681.

The primary motivation stems directly from the problems articulated in the Background of the Invention section of US 8073681 itself: existing Human-to-Machine interfaces failed to provide intuitive, cooperative interaction; speech interfaces required significant user learning; users were forced to "dumb down" requests; and there was no genuine dialogue capability. A POSITA would seek to address these known deficiencies to create a more natural and effective voice user interface.

Let's examine how the combination would render the independent claims (Claims 1, 10, and 19) obvious:

1. Foundational Speech Recognition

US 7,634,409 (Dynamic Speech Sharpening) serves as the foundation for the speech recognition engine (ASR) in US 8073681. This patent teaches interpreting utterances using "phonetic dictation to recognize a phoneme stream." A POSITA would readily integrate a robust ASR system like that described in '409 into any advanced voice user interface to convert spoken input into preliminary interpretations.

2. Contextual Understanding and Shared Knowledge

US 7,640,160 (Systems and Methods for Responding to Natural Language Speech Utterance) provides core elements for contextual understanding. It explicitly teaches that "one or more context domain agents compete to determine a most appropriate domain for a given utterance." Furthermore, it discusses inferring "intended operations and/or context based on previous utterances and/or requests" and updating "short-term and long-term shared knowledge."
- This directly addresses the '681 patent's context determination process, including the use of competing context domain agents, inferring intent from previous utterances, and the concept of accumulating "short-term and long-term shared knowledge." The motivation for a POSITA to incorporate these features would be to allow the system to maintain conversational memory, avoid repeating errors, and establish meaning within an ongoing dialogue, thereby making the interaction more natural and less prone to misinterpretations.

3. Free-Form Voice Search and Enhanced Conversationality

"Enhancing the VUETM (Voce-User-Experience) Through Conversational Speech" by Freeman and Baldwin: This publication, explicitly referenced in US 8073681, directly relates to the conversational speech engine, context domain agents, and vocabularies used for interpreting utterances and generating responses. Given its title and the co-authorship by inventors of the '681 patent, it would plausibly disclose or strongly suggest the techniques for handling "free form human utterances" – including specialized jargon, slang, variations in word order, and verbalized pauses or stuttered speech – which are central to the '681 patent's free-form voice search module.
- A POSITA, aware of the need to move beyond restrictive command-and-control interfaces (as highlighted in '681), would be motivated to integrate these advanced NLP techniques (from 'VUE' and general art) with the robust ASR ('409) and contextual framework ('640). The goal is to allow users to express themselves in a natural, day-to-day language.
- The concept of inferring requests from "incomplete or ambiguous requests" or "contradictory or otherwise inaccurate information" is a natural extension of a context-aware NLU system. Given that '640 teaches inferring intent, a POSITA would readily apply known NLU techniques to handle these imperfections in speech, such as using heuristics like "a last criterion is most likely to be correct" for mid-utterance changes.

4. Robustness to Noise and Adaptive Responses

The noise tolerance module of US 8073681, which discards meaningless words/noise and filters environmental/non-human noise (including from multiple microphones), represents well-known techniques in robust speech recognition. A POSITA would understand that combining these standard noise reduction methods with the context determination process (from '640) would enhance accuracy by filtering out words that do not fit into the identified context, thus reducing confusion. The idea of defining "performance benchmarks based on human criteria" for such modules, while a specific implementation detail, represents an obvious goal for improving user experience in challenging acoustic environments.
The adaptive response building in US 8073681, which generates "syntactically, grammatically, and contextually sensitive 'intelligent responses'" that adapt to a user's speaking manner, frame responses to influence replies, and correct misrecognitions, would also be obvious. The "Enhancing the VUE" publication, focused on improving the "Voice-User-Experience" through conversational speech, would strongly motivate the development of such adaptive and natural-sounding responses. A POSITA would aim to overcome the "incongruous input and output" problem, where input is conversational but output is "computerese," by using techniques like statistically rating and randomizing responses for "natural variation" and modeling misrecognition responses as "clarifications, rather than errors."

Conclusion

Based on the foregoing, a POSITA, motivated by the clear need for more intuitive and cooperative human-to-machine voice interfaces, would have been logically driven to combine the teachings of US Patent 7,634,409, US Patent 7,640,160, and the "Enhancing the VUETM (Voce-User-Experience) Through Conversational Speech" publication. This combination would encompass all the essential elements of the independent claims of US 8073681: a robust speech recognition front-end, a system for determining and maintaining conversational context using shared knowledge and competing domain agents, advanced natural language processing for understanding free-form and imperfect speech, and methods for generating adaptive, human-like responses that facilitate cooperative dialogue and correct conversational missteps. Therefore, the claimed invention in US Patent 8073681 would be rendered obvious under 35 U.S.C. § 103.