Obviousness — US Patent 7917367

Obviousness Analysis of US Patent 7917367 under 35 U.S.C. § 103

This analysis identifies combinations of prior art references that would render the claims of US patent 7917367 obvious to a person having ordinary skill in the art (PHOSITA) at the time of the invention (priority date: August 5, 2005). The motivation to combine these references stems from known problems in the field of natural language speech processing, such as improving reliability, naturalness, and robustness of human-computer interaction, as explicitly acknowledged in the background of US7917367 itself.

The independent claims of US7917367 broadly cover a system, method, and computer-readable storage medium for responding to natural language speech utterances by processing input to determine approximate meaning, generating machine-processable queries/commands, sending them to domain agents, evaluating results (even if ambiguous/conflicting), and presenting a natural language response considering user preferences and system personality.

Combination I: Natural Language Speech Interaction with Context and User Personalization

A PHOSITA would have been motivated to combine the teachings of US7702508B2, US20040235530A1, and US9620117B1 to create a natural language speech system that leverages context and user-specific information for improved interaction.

US7702508B2 (Priority Date: March 31, 2004) discloses a system for natural language processing of query answers, featuring a Natural Language Engine (NLE) to structure queries and lexically analyze user questions. The system includes speech recognition and a text-to-speech engine for "real-time, interactive, human-like dialog." It uses "context-specific grammars and dictionaries" and "natural language processing routines" to analyze user questions. This reference provides the core elements of receiving natural language speech, processing it, retrieving information, and generating a speech response in a human-like dialogue.
US20040235530A1 (Priority Date: May 23, 2003) is titled "Context specific speaker adaptation user interface." This reference explicitly teaches adapting user interfaces based on context specific to the speaker, which inherently involves considering user characteristics or preferences.
US9620117B1 (Priority Date: March 31, 2004) describes a natural language spoken dialog system that includes a Dialog Management (DM) module. This DM module manages the "dialogue state" and records user requests and system responses in a "logged interaction database." It also suggests that the DM module can perform actions based on semantic labels and adapt to a changing environment.

Motivation to Combine: A PHOSITA, recognizing the goal of achieving "human-like dialog" in systems like US7702508B2, would understand that improving the reliability and naturalness of interaction requires personalized and context-aware processing. US20040235530A1 provides the concept of "context specific speaker adaptation", which directly addresses tailoring the system to individual users. Furthermore, US9620117B1 offers a "Dialog Management module" that maintains conversational history and dialogue state. Combining these elements would be an obvious step to enhance a basic natural language query system. By integrating user-specific context and dialogue history, the combined system would more reliably interpret user utterances, formulate appropriate queries, and generate responses that are both accurate and "natural" to the user, directly addressing the stated deficiencies of prior art in US7917367 regarding complete and natural query/response environments. The integration of such modular components (ASR, NLE, DM, TTS) to achieve known improvements in dialogue systems is a matter of routine engineering.

Combination II: Integrating Probabilistic and Fuzzy Reasoning for Robustness and Ambiguity

A PHOSITA would have been motivated to combine the teachings of Sun et al. and the general knowledge of statistical NLP models to address the inherent ambiguity and incompleteness of natural language input.

Sun et al., "Fuzzy Logic-Based Natural Language Processing and Its Application to Speech Recognition" (2002) describes a method that applies fuzzy logic to speech recognition and natural language processing. The aim is to learn fuzzy semantic relations between words, predict unrecognized words, and thereby increase the accuracy of speech recognition systems. This approach explicitly addresses handling "language internal fuzzy phenomena" and "disfluency" such as repeated words or mispronounced words.
History of Natural Language Processing (NLP) indicates that by the late 1980s and 1990s, NLP research shifted towards machine learning algorithms and "statistical models" capable of making "soft, probabilistic decisions" became increasingly popular.

Motivation to Combine: The problem of reliably processing natural language, especially with imperfect information (e.g., incomplete thoughts, slang, repeated words, synonyms), was a known challenge in speech recognition systems prior to 2005. Sun et al. explicitly provides a solution using fuzzy logic to handle the "nuances of semantic meaning due to the ambiguity and variability inherent in human languages". A PHOSITA, aware of the advancements in statistical and probabilistic models in NLP, would find it obvious to apply these probabilistic and fuzzy reasoning techniques to a natural language speech system. This combination directly addresses the need to "deal with inconsistent, ambiguous, conflicting and incomplete information or responses," a key aspect highlighted in US7917367 for achieving robustness and natural responses. The integration of such known methods to improve the accuracy and robustness of language processing is a predictable extension.

Combination III: Multimodal Interaction with Context Management

A PHOSITA would have been motivated to combine prior art demonstrating multimodal input with known dialog management techniques to provide a coherent user experience.

Cheyer and Julia, "Multimodal Maps: An Agent-Based Approach" (1995) explicitly discloses a system (MMAP) that allows users to provide input using "drawn gestures and written natural language statements" combined with "natural language spoken input." This reference clearly establishes the concept of multimodal input, including both speech and non-speech modalities.
US9620117B1 (Priority Date: March 31, 2004), as discussed above, includes a "Dialog Management (DM) module" that manages conversational memory and tracks the "dialogue state" to decide the next action of a dialogue system.

Motivation to Combine: Recognizing that "human social interaction involves the intertwined cooperation of different modalities", a PHOSITA developing multimodal systems like Cheyer and Julia's MMAP would face the challenge of maintaining coherence across different input types. It would be an obvious design choice to integrate a dialogue management system, such as the one described in US9620117B1, to manage the conversational context and history for multimodal interactions. This combination would predictably ensure that the system's understanding and responses remain consistent regardless of the input modality (speech or non-speech), thereby providing a more integrated and natural user experience in a multimodal environment, which US7917367 claims as an improvement.

Combination IV: Enhancing User Experience with System "Personality"

A PHOSITA would have been motivated to integrate known techniques for simulating personality into a natural language speech system to improve user engagement.

US7266499B2 (Priority Date: September 13, 2000) discloses a "Voice user interface with personality." This system teaches selecting a personality and controlling the voice user interface to provide varied responses, based on criteria such as prior prompts and user experience. It mentions that social psychology experts determine suitable personality types for voice user interfaces.
US20080177685A1 (Priority Date: May 24, 2003) references even earlier prior art, including US5367454A (1994, "Interactive man-machine interface for simulating human emotions"), US5696981A (1997, "Personality analyzer"), and US6185534B1 (2001, "Modeling emotion and personality in a computer user interface"). These references demonstrate that the concept of simulating personality and emotions in computer interfaces was well-established.

Motivation to Combine: The background of US7917367 acknowledges that "rigid, highly formatted, or structured presentation of results may be deemed unnatural by many users." [Patent text, Definitions] A PHOSITA seeking to create a more "natural" and engaging human-computer interaction, as taught by US7266499B2, would be motivated to integrate simulated personality characteristics into a natural language speech system. The existence of prior art that explicitly teaches simulating emotions and personality in user interfaces (as cited in US20080177685A1) makes this integration a matter of applying known techniques for enhancing user experience. By doing so, the combined system would predictably "randomize aspects of responses, just as a real human would do" [Patent text, Definitions], and present information in a more sympathetic or varied manner, thereby overcoming the "mechanical" feel of prior systems, which is a clear objective of US7917367.

Combination V: Robust Speech Recognition in Noisy Environments for Device Control

A PHOSITA would have been motivated to combine a voice-controlled device system with known noise reduction technologies to ensure reliable operation in real-world environments.

"Process for automatic control of one or more devices by voice commands or by real-time voice dialog" (Publication Date: January 4, 2005) describes a speech dialog system for "automatic control of one or more devices by speech control or by real-time speech dialog." This system is designed for "operation in a noise-encumbered environment" and is "failure-tolerant," capable of extracting valid commands even from imperfect speech input.
US7003099B1 (Priority Date: August 1, 2003) discloses a "Small array microphone for acoustic echo cancellation and noise suppression." This patent teaches using "at least two microphones form an array microphone" and signal processing techniques to "suppress noise and to improve communication quality and voice recognition performance" in noisy settings like cars or mobile environments.
"Microphone Array Post-filter based on Spatially-Correlated Noise Measurements for Distant Speech Recognition" (2004) further supports the use of microphone array processing, including beamforming and post-filtering, to improve word error rates in noisy conditions for distant speech recognition.

Motivation to Combine: A PHOSITA developing a voice control system for devices, such as the one described in "Process for automatic control", would recognize that robust operation in noisy, real-world environments is crucial. The existing challenges of noise degrading speech recognition performance were well-known. The teachings of US7003099B1 and "Microphone Array Post-filter" directly address these challenges by providing array microphone technologies and associated signal processing techniques for effective noise suppression and echo cancellation. It would be an obvious engineering choice to integrate these proven noise reduction methods into a voice command system to ensure more reliable and accurate speech input, thereby enhancing the system's overall performance and user experience in its intended operational environment, which is a key feature for enabling the control of devices in US7917367.