Obviousness — US Patent 10510341

Obviousness Analysis for US Patent 10,510,341 under 35 U.S.C. § 103

This analysis evaluates whether the independent claims of US Patent 10,510,341 B1 ("the '341 patent") would have been obvious to a person having ordinary skill in the art (PHOSITA) at the time of the invention's priority date (October 16, 2006), given the cited prior art references. Obviousness under 35 U.S.C. § 103 requires demonstrating that the differences between the claimed invention and the prior art are such that the subject matter as a whole would have been obvious to a PHOSITA. This typically involves identifying a combination of prior art references and a motivation for a PHOSITA to combine them.

Core Inventive Elements of US 10,510,341

The '341 patent's independent claims (Claims 1, 12, and 19) generally describe a method, computer-readable medium, and system, respectively, for a cooperative conversational voice user interface. The key innovative aspects, particularly of Claim 1, include:

Generating a plurality of hypotheses about a user's intent.
Basing these hypotheses on shared knowledge, which explicitly includes both short-term knowledge (from the current conversation) and long-term knowledge (accumulated over time, such as user profiles/history).
Assigning a degree of certainty to each hypothesis.
Generating an adaptive response based on this degree of certainty.
The adaptive response being specifically configured to frame a domain for a subsequent utterance and influence that subsequent utterance towards one more likely to result in a completed request.

Proposed Combination of Prior Art for Obviousness

A PHOSITA, seeking to improve the robustness and cooperativeness of voice user interfaces, would have been motivated to combine elements from the following prior art references and common general knowledge:

US 7,640,160 B2 (Yankelovich et al.): "Systems and methods for responding to natural language speech utterance."
US 2005/0288924 A1 (Bennett): "Spoken language interface for enterprise applications."
Common general knowledge in the fields of Automatic Speech Recognition (ASR), Natural Language Understanding (NLU), and dialogue management systems, prevalent around the 2006 priority date.

Note: US 8,073,681 B2 and US 7,634,409 B2 are considered family members or related patents with the same inventors and priority claims, and thus are not considered prior art for an obviousness analysis against US 10,510,341.

Motivation to Combine

At the time of the invention (priority date 2006), a known challenge in voice user interfaces was the difficulty in accurately interpreting ambiguous or incomplete natural language utterances. Existing systems often failed on "average conversational missteps" and struggled to maintain context across a conversation, as acknowledged by the '341 patent itself.

A PHOSITA would be motivated to combine the context determination and disambiguation techniques from Yankelovich and Bennett with well-known principles of NLU and dialogue management to address these challenges and create a more intuitive and cooperative human-to-machine interface. The goal would be to build systems that better infer user intent, reduce recognition errors, and guide users towards successful task completion, mirroring human conversational behavior.

Specifically:

Improving Contextual Understanding: A PHOSITA would recognize that Yankelovich's competitive domain agents provide a strong mechanism for determining context, especially when a single utterance might have multiple meanings (e.g., "traffic" in music vs. navigation). However, to further refine this, they would look to incorporate more explicit forms of knowledge, such as those described by Bennett.
Leveraging Short-term and Long-term Knowledge: Bennett explicitly teaches using "dialog history" (akin to short-term knowledge) and "user preferences" (akin to long-term knowledge) to resolve ambiguities in spoken language. The '341 patent itself, when describing its context determination process, links directly to the methods in Yankelovich and states that the "winning agent may be responsible for... updating short-term and long-term shared knowledge." This indicates that the concept of accumulating different types of knowledge was already implicit or explicitly present in the relevant art or the context in which Yankelovich was understood. A PHOSITA would naturally combine these sources of knowledge to generate more robust interpretations.
Handling Uncertainty and Disambiguation: Faced with inherent ambiguities in natural language, a PHOSITA would know that ASR and NLU systems often generate multiple potential interpretations (N-best lists), each with an associated confidence score or probability. When the system's confidence is low or when multiple interpretations are plausible (e.g., a "deadlock" between Yankelovich's competing agents), the known solution in dialogue management is to seek clarification from the user. The '341 patent explicitly describes this scenario: "If there is a deadlock between context domain agents, an adaptive conversational response may prompt the user to assist in disambiguating between the deadlocked agents." This directly suggests an obvious solution to a problem existing in systems like Yankelovich.
Guiding User Interaction: It would be obvious to a PHOSITA that a clarifying question or a response that narrows down options (e.g., "Did you mean Portland, Maine or Portland, Oregon?") serves to "frame a domain" for the user's next response and "influence" the user to provide more structured or easily recognizable input. This is a fundamental principle in designing effective interactive systems to reduce errors and facilitate task completion. The '341 patent mentions that responses "may be modeled to illicit utterances from the user that may be more likely to result in a completed request" and conform to a "natural human tendency to 'parrot' what was just heard." This describes a desirable outcome of well-designed clarifying prompts.

Mapping Claim 1 Elements to the Combination

Let's examine how each element of Claim 1 of US 10,510,341 is rendered obvious by the proposed combination:

"receiving, by at least one processor, an utterance from a user;"
- Both Yankelovich and Bennett clearly teach receiving spoken utterances from a user in a voice user interface context.
"generating, by the at least one processor, a plurality of hypotheses about an intent of the user, based on the utterance and a shared knowledge, wherein the shared knowledge includes short-term knowledge associated with a current conversation and long-term knowledge accumulated over time;"
- Plurality of Hypotheses: Yankelovich's system, with its "one or more context domain agents compete to determine a most appropriate domain for a given utterance," inherently involves considering multiple potential interpretations (hypotheses) of the user's intent.
- Shared Knowledge (Short-term & Long-term): The '341 patent itself states that when a domain agent "wins" in a system like Yankelovich's, it "may be responsible for... updating short-term and long-term shared knowledge." Furthermore, Bennett explicitly uses "dialog history" (short-term knowledge) and "user preferences" (long-term knowledge) to improve disambiguation. A PHOSITA would be motivated to integrate and leverage these different types of accumulated knowledge (conversational history, user habits/preferences) to inform the generation and evaluation of multiple hypotheses about user intent.
"assigning, by the at least one processor, a degree of certainty to each hypothesis of the plurality of hypotheses;"
- Yankelovich's "competition" among agents implies a ranking and selection of a "most appropriate" domain, suggesting varying levels of confidence or certainty. A "deadlock" between agents explicitly represents a low degree of certainty. Furthermore, it was common general knowledge in ASR and NLU fields at the time to associate confidence scores, probabilities, or rankings with N-best hypotheses generated during speech and language processing. A PHOSITA would routinely assign a degree of certainty (e.g., derived from confidence scores or the outcome of agent competition) to each of the competing hypotheses.
"generating, by the at least one processor, an adaptive response to the user based on the degree of certainty of at least one hypothesis of the plurality of hypotheses, wherein the adaptive response is configured to: frame a domain for a subsequent utterance from the user; and influence the subsequent utterance from the user toward one or more utterances that are more likely to result in a completed request."
- Given the problem of ambiguity in systems like Yankelovich and Bennett, and the common knowledge of N-best lists and confidence scores, a PHOSITA would find it obvious to use the "degree of certainty" to determine the type of response. When certainty is low (e.g., in a "deadlock" between Yankelovich's agents), an "adaptive response" in the form of a clarifying question is a standard dialogue management technique. The '341 patent itself explicitly states that in such a deadlock, "an adaptive conversational response may prompt the user to assist in disambiguating." Such clarifying prompts inherently:
  - Frame a domain: By offering specific choices or asking for particular information, the system defines the scope for the user's next input (e.g., "Did you mean 'traffic' the band or 'traffic' the road conditions?").
  - Influence the subsequent utterance: The structured nature of the clarifying question guides the user to respond in a way that is easier for the system to process, making it "more likely to result in a completed request." The concept of "parroting" as described in '341 for easier recognition is an advanced manifestation of this fundamental guidance.

Conclusion

Based on this analysis, the independent claims of US Patent 10,510,341 B1 would have been obvious to a person having ordinary skill in the art at the time of the invention. The combination of Yankelovich's competitive domain agents for context, Bennett's use of dialogue history and user preferences for disambiguation, and the well-established practices of generating N-best hypotheses with confidence scores and employing clarifying dialogue strategies, provides all the elements of Claim 1 with a clear motivation for their combination. A PHOSITA would have been driven by the need to create more robust and user-friendly voice interfaces capable of handling the inherent ambiguities of natural language by inferring intent more accurately and guiding users efficiently towards task completion.