Obviousness — US Patent 10755699

To analyze the obviousness of US patent 10,755,699 under 35 U.S.C. § 103, we will consider the independent claims (1, 12, 22, and 29) and the prior art references explicitly mentioned and incorporated within the patent's description. The analysis will identify combinations of prior art that would render the claims obvious and explain the motivation for a person having ordinary skill in the art (PHOSITA) to combine them.

Identified Prior Art References from US10755699's Description:

U.S. Pat. No. 7,634,409 (filed Aug. 31, 2006): Entitled “Dynamic Speech Sharpening,” this patent is cited for the speech recognition engine 110 interpreting utterances using phonetic dictation to recognize a phoneme stream. [cite: US10755699, Description, The speech recognition engine 110 may interpret the utterance using techniques of phonetic dictation to recognize a phoneme stream, as described in U.S. patent application Ser. No. 11/513,269, entitled “Dynamic Speech Sharpening,” filed Aug. 31, 2006, which issued as U.S. Pat. No. 7,634,409 on Dec.]
"Enhancing the VUE™ (Voce-User-Experience) Through Conversational Speech" by Tom Freeman and Larry Baldwin: This publication is explicitly incorporated by reference and cited for describing how modules of the conversational speech engine (free form voice search module 245, noise tolerance module 250, and context determination process 255) communicate with a voice search engine 225, including context domain agents 230 and vocabularies 235, to aid in interpreting utterances and generating responses. [cite: US10755699, Description, modules 245 - 255 may communicate with a voice search engine 225 that includes one or more context domain agents 230 and/or one or more vocabularies 235 to aid in interpreting utterances and generating responses, as described in “Enhancing the VUETM (Voce-User-Experience) Through Conversational Speech,” by Tom Freeman and Larry Baldwin, which is herein incorporated by reference in its entirety.] Notably, Tom Freeman and Larry Baldwin are also inventors of US10755699.
U.S. Pat. No. 7,640,160 (filed Aug. 5, 2005): Entitled “Systems and Methods for Responding to Natural Language Speech Utterance,” this patent is cited for the context determination process 255, where one or more context domain agents compete to determine the most appropriate domain for a given utterance. [cite: US10755699, Description, The one or more contexts may be determined by having one or more context domain agents compete to determine a most appropriate domain for a given utterance, as described in U.S. patent application Ser. No. 11/197,504, entitled “Systems and Methods for Responding to Natural Language Speech Utterance,” filed Aug. 5, 2005, which issued as U.S. Pat. No. 7,640,160 on Dec. 29, 2009]
U.S. Pat. No. 7,949,529 (filed Aug. 29, 2005): Entitled “Mobile Systems and Methods of Supporting Natural Language Human-Machine Interactions,” this patent is also cited for the context determination process 255 and its use of competing context domain agents. [cite: US10755699, Description, U.S. patent application Ser. No. 11/212,693, entitled “Mobile Systems and Methods of Supporting Natural Language Human-Machine Interactions,” filed Aug. 29, 2005, which issued as U.S. Pat. No. 7,949,529 on May 24, 2011]

These references, particularly the Freeman and Baldwin paper and U.S. Pat. Nos. 7,640,160 and 7,949,529, are key as they describe core components and concepts central to US10755699, often in the context of improving natural language human-machine interaction.

Obviousness Analysis under 35 U.S.C. § 103

A PHOSITA, seeking to advance voice user interface technology, would be motivated to combine the known elements of speech recognition with natural language understanding and context determination systems to create a more intuitive and cooperative conversational experience. The patent itself identifies the problem in its background, noting that "existing Human-to-Machine interfaces fail to provide the same level of intuitive interaction" and "force users to dumb down their requests." The goal of making human-machine interaction analogous to human-to-human conversation is a recognized industry need.

Combination of Prior Art:

A compelling combination of prior art for an obviousness challenge would include:

U.S. Pat. No. 7,634,409: To provide the foundational speech recognition capabilities, including receiving human utterances and generating preliminary interpretations. This is a standard front-end for any voice-controlled system.
U.S. Pat. No. 7,640,160 and/or U.S. Pat. No. 7,949,529: To provide the context determination capabilities, specifically the use of competing context domain agents to infer the most appropriate domain and intended operations based on utterances and previous requests. [cite: US10755699, Description, The one or more contexts may be determined by having one or more context domain agents compete to determine a most appropriate domain for a given utterance, as described in U.S. patent application Ser. No. 11/197,504, entitled “Systems and Methods for Responding to Natural Language Speech Utterance,” filed Aug. 5, 2005, which issued as U.S. Pat. No. 7,640,160 on Dec. 29, 2009 and U.S. patent application Ser. No. 11/212,693, entitled “Mobile Systems and Methods of Supporting Natural Language Human-Machine Interactions,” filed Aug. 29, 2005, which issued as U.S. Pat. No. 7,949,529 on May 24, 2011] These patents teach leveraging conversational history for context, which touches upon the concept of short-term knowledge.
"Enhancing the VUE™ (Voce-User-Experience) Through Conversational Speech" by Tom Freeman and Larry Baldwin: This publication, co-authored by the '699 inventors and incorporated by reference, serves as a strong roadmap for a PHOSITA. It describes the "conversational speech engine" (215) and its core modules (free form voice search 245, noise tolerance 250, context determination 255), along with context domain agents 230 and vocabularies 235. [cite: US10755699, Description, modules 245 - 255 may communicate with a voice search engine 225 that includes one or more context domain agents 230 and/or one or more vocabularies 235 to aid in interpreting utterances and generating responses, as described in “Enhancing the VUETM (Voce-User-Experience) Through Conversational Speech,” by Tom Freeman and Larry Baldwin, which is herein incorporated by reference in its entirety.] Crucially, the '699 patent states that this conversational speech engine "may generate an adaptive conversational response to one or more requests, where the requests may depend on unspoken assumptions, incomplete information, context established by previous utterances, user profiles, historical profiles, environmental profiles, or other information." [cite: US10755699, Description, Conversational speech engine 215 may generate an adaptive conversational response to one or more requests, where the requests may depend on unspoken assumptions, incomplete information, context established by previous utterances, user profiles, historical profiles, environmental profiles, or other information.] This explicitly teaches the use of both short-term (previous utterances, context) and long-term (user/historical/environmental profiles) knowledge to inform adaptive responses and infer intent.

Motivation to Combine:

The motivation for a PHOSITA to combine these references is directly articulated by the problem statement in US10755699's own background: to overcome the limitations of existing voice user interfaces that are not intuitive, cooperative, or capable of natural, free-form conversation. The Freeman and Baldwin paper, with its title "Enhancing the VUE™ (Voice-User-Experience) Through Conversational Speech," directly points to this motivation – improving the user experience by making speech interaction more conversational.

A PHOSITA would logically integrate:

An existing, robust speech recognition system (like '409) to accurately convert speech to text.
Sophisticated context determination methods (like '160 and '529) to understand the meaning of utterances within a conversation.
The principles laid out in the Freeman and Baldwin paper to build a "conversational speech engine" that leverages "shared knowledge" (including short-term conversation history and long-term user/historical profiles) to infer user intent, handle free-form speech, and generate "adaptive conversational responses." The paper, authored by the inventors, suggests that such integration was a known and desired path to advance the technology.

Analysis of Independent Claims:

Independent Claim 1 (Method) and 12 (Device):
These claims cover receiving an utterance, generating interpretations, determining a conversation type, building hypotheses based on short-term and long-term shared knowledge (leveraging conversation type), ranking hypotheses, and generating an adaptive conversational response.

Receiving utterance, preliminary interpretations: Taught by '409. [cite: US10755699, Description, The utterance component of input 105 may be processed by a speech recognition engine 110 (which may alternatively be referred to herein as Automatic Speech Recognizer 110 , or as shown in FIG. 1 , ASR 110 ) to generate one or more preliminary interpretations of the utterance.]
Context determination/intent inference based on prior interactions (shared knowledge): Taught by '160 and '529, which describe inferring context from previous utterances, and the Freeman and Baldwin paper, which describes the conversational speech engine using "context established by previous utterances, user profiles, historical profiles" (i.e., short-term and long-term shared knowledge) to generate adaptive responses. [cite: US10755699, Description, Conversational speech engine 215 may generate an adaptive conversational response to one or more requests, where the requests may depend on unspoken assumptions, incomplete information, context established by previous utterances, user profiles, historical profiles, environmental profiles, or other information.] The explicit definitions and accumulation mechanisms for short-term and long-term shared knowledge might be presented as novel, but the underlying concepts of using conversation history and user profiles were known goals in the art, especially with the direction provided by the Freeman and Baldwin paper.
Building intelligent hypotheses and ranking with certainty: The '699 patent states that "Context domain agents 230 may also be self-aware, assigning degrees of certainty to one or more generated hypotheses." [cite: US10755699, Description, Context domain agents 230 may also be self-aware, assigning degrees of certainty to one or more generated hypotheses] These agents are taught by '160 and '529. The idea of classifying hypotheses by certainty would be an obvious refinement for a PHOSITA trying to make a more robust conversational system, especially given the goals outlined in the Freeman and Baldwin paper.
Determining a conversation type by considering conversational goals, participant roles, and allocation of information: While this specific tripartite classification is described in US10755699 (within "Intelligent Hypothesis Builder 310" of the cooperative conversational model), the general concept of understanding conversation dynamics to improve interaction is inherent in building "conversational speech" systems. A PHOSITA, building a system based on the Freeman and Baldwin paper, would naturally consider such factors to enhance the "VUE."
Generating adaptive responses: The conversational speech engine 215, described in the Freeman and Baldwin paper, explicitly generates "adaptive conversational responses." [cite: US10755699, Description, Conversational speech engine 215 may generate an adaptive conversational response to one or more requests, where the requests may depend on unspoken assumptions, incomplete information, context established by previous utterances, user profiles, historical profiles, environmental profiles, or other information.]

Independent Claim 22 (Method) and 29 (Device):
These claims further emphasize the "cooperative" nature, specifically by generating responses designed to "guide the user to reply in a manner favorable for recognition."

Guiding user reply for easier recognition: US10755699 states, "The intelligent responses may frame responses to influence a user reply utterance for easy recognition." [cite: US10755699, Description, Adaptive Response Builder 315] This is presented as a feature of the Adaptive Response Builder 315, which is part of the overall "cooperative conversational model 300" that builds upon the conversational speech engine elements (free form voice search, noise tolerance, context determination) taught by the Freeman and Baldwin paper. A PHOSITA attempting to "enhance the VUE" and make speech interfaces more "conversational" would find it obvious to design responses that steer the conversation for improved recognition, as this directly addresses the practical challenges of speech system accuracy and user frustration highlighted in the background.

Conclusion:

The combination of U.S. Pat. No. 7,634,409, U.S. Pat. No. 7,640,160, U.S. Pat. No. 7,949,529, and especially "Enhancing the VUE™ (Voce-User-Experience) Through Conversational Speech" by Tom Freeman and Larry Baldwin would render claims 1, 12, 22, and 29 of US10755699 obvious to a PHOSITA. The motivation to combine these references stems from the well-understood problems in existing voice user interfaces and the explicit goal, articulated by the inventors themselves in their prior publication, of achieving a more natural, intuitive, and cooperative human-machine conversational experience. The individual components for speech recognition, context determination, and using historical/profile data for adaptive responses are present in the prior art, and their integration to achieve a "conversational" feel and improve recognition (even by guiding user input) would be an obvious step for a PHOSITA.