Obviousness — US Patent 7634409

Obviousness Analysis of US Patent 7634409 under 35 U.S.C. § 103

This analysis identifies combinations of prior art elements, as described within the provided patent text, that would render the claims of US Patent 7634409 obvious to a person having ordinary skill in the art (PHOSITA) at the time of the invention (priority date August 31, 2005). Given the constraint to use only information from "the Prior Art section of this page," this analysis primarily relies on the "BACKGROUND OF THE INVENTION" and explicit statements in the "DETAILED DESCRIPTION" regarding what was known or existing.

Understanding the PHOSITA:
A PHOSITA in the field of automated speech interpretation in 2005 would be familiar with:

Standard speech engines and Automatic Speech Recognition (ASR) systems.
The use of grammars in ASR, often leading to large grammar sizes.
The challenges of large grammars in terms of compile time, load time, execution time, and response time, especially for embedded applications.
The problem of "out-of-vocabulary (OOV)" words and the general lack of accuracy in interpreting natural human speech due to various factors like noise, unclear speech, or accents.
Basic concepts of phoneme recognition and acoustic grammars.

Obviousness of Independent Claim 1 (Method)

Claim 1: A method for providing out-of-vocabulary interpretation capabilities and for tolerating noise when interpreting natural language speech utterances, the method comprising:

receiving an utterance from a user;
recognizing a stream of phonemes contained in the utterance on an electronic device;
mapping the recognized stream of phonemes to an acoustic grammar that phonemically represents one or more syllables, the recognized stream of phonemes mapped to a series of one or more of the phonemically represented syllables; and
generating at least one interpretation of the utterance, wherein the generated interpretation includes the series of syllables mapped to the recognized stream of phonemes.

Combination of Prior Art Elements:
A PHOSITA would combine:

A "Standard ASR System" (HPA1): Known to receive utterances and generate interpretations, but suffering from OOV limitations and large grammar problems.
"Phoneme Recognition" (HPA2): The patent itself explicitly states that "Phoneme recognition may disregard the notion of words, instead interpreting a verbalization as a series of phonemes, which may provide out-of-vocabulary (OOV) capabilities, such as when a user misspeaks or an electronic capture devices drops part of a speech signal, or for large-list applications, such as city and street names or song titles, for example."
"Acoustic Grammars for Syllables and Phonotactics" (HPA3): The patent notes that "the English language may be broken down into a detailed grammar of the phonotactic rules of the English language. Portions of a word may be represented by a syllable, which may be further broken down into core components of an onset, a nucleus, and a coda." This indicates that the concept of mapping speech to phonemes and then to syllable structures within an acoustic grammar was known.

Motivation for Combination:
The "BACKGROUND OF THE INVENTION" clearly identifies the problems with existing ASR systems, including "large grammars that include a large number of items, nodes, and transitions" and significant issues with "accuracy and interpreting words that are not defined in a predetermined vocabulary or grammar context (OOV)." The patent itself highlights that "phoneme recognition provides several benefits, particularly in the embedded space, such as offering out-of-vocabulary (OOV) capabilities, improving processing performance by reducing the size of a grammar, and eliminating the need to train Statistic Language Models (SLMs)."

Given these known problems and the acknowledged benefits of phoneme recognition, a PHOSITA would be strongly motivated to modify a standard ASR system (HPA1) to incorporate phoneme recognition (HPA2). To implement phoneme recognition effectively for natural language, a PHOSITA would naturally employ known acoustic grammars structured around phonemes and syllables (HPA3), as described in the patent's background. This combination directly addresses the identified problems of OOV interpretation and large grammar sizes in a predictable manner, making the steps of Claim 1 obvious.

Obviousness of Independent Claim 9 (System)

Claim 9: A system for providing out-of-vocabulary interpretation capabilities and for tolerating noise when interpreting natural language speech utterances, the system comprising:

at least one input device that receives an utterance from a user and generates an electronic signal corresponding to the utterance; and
a speech interpretation engine that receives the electronic signal corresponding to the utterance, the speech interpretation engine operable to:
- recognize a stream of phonemes contained in the utterance;
- map the recognized stream of phonemes to an acoustic grammar that phonemically represents one or more syllables, the recognized stream of phonemes mapped to a series of one or more of the phonemically represented syllables; and
- generate at least one interpretation of the utterance, wherein the generated interpretation includes the series of syllables mapped to the recognized stream of phonemes.

Combination of Prior Art Elements:
The system claim mirrors the method claim. The prior art elements would be a system embodiment of:

A "Standard ASR System" (HPA1-System): Comprising an input device and a speech interpretation engine.
A "Phoneme Recognition System" (HPA2-System): An ASR system or component configured for phoneme recognition.
"Acoustic Grammars for Syllables and Phonotactics" (HPA3-System): The knowledge of how to structure acoustic grammars to represent phonemes and syllables.

Motivation for Combination:
As with Claim 1, the motivation stems from the known deficiencies of standard ASR systems ("BACKGROUND OF THE INVENTION") in handling OOV words and the inefficiency of large grammars. The patent itself teaches that "Phoneme recognition provides several benefits... such as offering out-of-vocabulary (OOV) capabilities, improving processing performance by reducing the size of a grammar". A PHOSITA, aiming to build a more robust and efficient ASR system, would be motivated to configure an existing system (HPA1-System) with an input device to receive utterances and a speech interpretation engine capable of implementing phoneme recognition (HPA2-System). This engine would utilize known acoustic grammars that represent syllables and phonotactic rules (HPA3-System) to achieve the stated benefits. The design of such a system would be a straightforward engineering implementation of the known functional advantages of phoneme-based processing.

Obviousness of Dependent Claims

The dependent claims build upon the independent claims by adding further refinements related to acoustic grammar structure (linking elements) and post-processing techniques (candidate generation, scoring, domain agents, phonetic fuzzy matching with M-Trees). These refinements are also motivated by known problems and described by the patent as either known techniques or logical extensions to solve those problems.

Claim 4 (Method) and Claim 12 (System): Linking Elements (e.g., Schwa)

Elements: Building on Claim 1/9's acoustic grammar, these claims add using an "unstressed central vowel" (like schwa) as a "linking element between sequential phonemic elements" to reduce grammar transitions.
Motivation: The patent explicitly states that using a linking element "may reduce the number of grammar transitions, thereby speeding up the process of compiling, loading, and executing the speech engine" and "reduce both grammar size and response time." The "BACKGROUND OF THE INVENTION" directly identifies the problem of degraded response time due to the need to "parse through a large number of transition states." A PHOSITA would be highly motivated to implement any known technique that improves efficiency and reduces grammar size. The patent describes the phonetic characteristics of schwa that make it "ideal" for this role, implying these characteristics were generally understood. Therefore, applying a known phonetic feature (schwa) as a linking element in an acoustic grammar to achieve known benefits of grammar reduction and performance improvement would be obvious.

Claim 6 (Method) and Claim 14 (System): Generating Candidates and Scoring

Elements: These claims introduce generating a "plurality of candidate interpretations," assigning a "score" to each, and "selecting a candidate interpretation having a highest assigned score as being a probable interpretation."
Motivation: The "BACKGROUND OF THE INVENTION" states that "speech interpretation engines still have substantial problems with accuracy." A PHOSITA seeking to improve the accuracy of a speech interpretation system (as described in Claim 1/9) would naturally turn to established ASR post-processing techniques. Generating multiple candidate interpretations and ranking them by confidence scores (e.g., N-best lists and re-ranking) is a fundamental and well-known approach in ASR to disambiguate and select the most probable interpretation, thereby addressing accuracy problems. The patent describes this as a "sharpening" step.

Claim 7 (Method) and Claim 15 (System): Domain Agents, Phonetic Fuzzy Matching, and M-Trees

Elements: These claims refine the candidate generation by using "plurality of domain agents" to identify "suspect words or phrases," then identifying "closest phonetic matches... using a closest-distance metric associated with an M-Tree," and "substituting the identified closest phonetic matches."
Motivation: The "BACKGROUND OF THE INVENTION" highlights accuracy issues, especially with "words that are not defined in a predetermined vocabulary or grammar context" and problems caused by "poor quality microphones, extraneous noises, unclear or grammatically incorrect speech by the user, or an accent of the user." To further improve accuracy and OOV capabilities, a PHOSITA would be motivated to integrate domain-specific knowledge and robust phonetic matching.
- Domain Agents: Organizing vocabularies and language models into specific "domains of knowledge" (e.g., song titles, city names) is a known technique to constrain speech recognition, especially for "large-list applications" mentioned in the background.
- Phonetic Fuzzy Matching (PFM) and M-Trees: The patent states that "M-Trees are known to those skilled in the art" and describes them as "an index structure that resolves similarity queries between phonemes using a closest-distance metric based on relative weightings of phoneme misrecognition, phoneme addition, and phoneme deletion." Given the inherent challenges of misrecognized or noisy phonemes in speech, a PHOSITA would be motivated to employ known robust phonetic matching techniques, such as PFM utilizing M-Trees, to find the closest phonetic matches for uncertain parts of an utterance, thereby improving the accuracy of the final interpretation.

Conclusion:
Many of the core features and underlying motivations for the claimed invention in US7634409 are directly described within the patent's "BACKGROUND OF THE INVENTION" and "DETAILED DESCRIPTION" as addressing known problems of existing ASR systems or utilizing known techniques in the art. A PHOSITA would be motivated to combine these known elements (standard ASR with phoneme recognition, syllable-based acoustic grammars, linking elements for efficiency, and post-processing with scoring, domain agents, and phonetic fuzzy matching using M-Trees for accuracy) to overcome the acknowledged deficiencies of prior art speech interpretation systems. While specific prior art documents are not cited in the provided "Prior Art section," the patent's own description of the state of the art and the problems it sought to solve forms a sufficient basis for an obviousness analysis under the given constraints.