Obviousness — US Patent 8306815

Based on the provided prior art, an analysis of the obviousness of US patent 8,306,815 under 35 U.S.C. § 103 is as follows.

Obviousness Analysis of US Patent 8,306,815

A determination of obviousness under 35 U.S.C. § 103 requires assessing whether the differences between the claimed invention and the prior art are such that the invention as a whole would have been obvious at the time the invention was made to a person having ordinary skill in the art (PHOSITA). In this context, a PHOSITA would be an engineer or computer scientist with expertise in digital signal processing and speech recognition systems.

The core inventive concept of the '815 patent, as articulated in the independent claims (1, 20, 22, and 23), is a speech dialog system with a dual-feedback control architecture. This architecture involves two distinct control loops:

Analysis-to-Output Loop: Using non-semantic information from a speech input (the analysis signal) to control the system's speech output.
Recognition-to-Input Loop: Using the semantic meaning of the recognized words (the recognition result) to control the system's front-end signal pre-processing.

An argument for obviousness can be constructed by combining the teachings of the cited prior art references, as each reference discloses a key piece of the overall system. A PHOSITA would have been motivated to combine these teachings to solve the well-known problem of improving the robustness and usability of speech dialog systems in noisy and dynamic environments, such as inside a vehicle.

Combination Rendering Independent Claims 1, 22, and 23 Obvious

A combination of US 2004/0064315 A1 (Deisher) and US 5,524,169 A (IBM), supplemented with common knowledge in audio engineering and user interface design, would render the subject matter of claims 1, 22, and 23 obvious.

1. The Analysis-to-Output Loop:

What the Prior Art Teaches: Deisher teaches the generation of an "acoustic confidence measure" based on the signal-to-noise ratio (SNR) of an input signal. This is directly analogous to the analysis signal of the '815 patent containing information on the noise component (claim 4). Deisher uses this analysis to control the pre-processor.
Motivation to Combine/Modify: A primary challenge in noisy environments is ensuring the user can hear the system's feedback. If the system analyzes the background noise and determines it to be high (as taught by Deisher), a PHOSITA would be motivated to ensure the dialog is not broken by the user's inability to hear the system's response. The most direct and obvious solution to this problem is to increase the volume of the speech output unit. This is a basic principle of audio engineering and user interface design. Therefore, a PHOSITA would find it obvious to take the noise analysis taught by Deisher and use it to control the output volume, thereby creating the Analysis-to-Output Loop. The motivation is to improve the reliability of the human-machine interaction, a predictable outcome. This would render the feature of controlling the speech output unit based on the analysis signal (as required by claim 1) obvious.

2. The Recognition-to-Input Loop:

What the Prior Art Teaches: The prior art, including IBM and Pioneer, teaches using context to modify the speech recognition process. IBM specifically teaches using speaker location to select appropriate acoustic models. While this isn't a direct Recognition -> Input loop, it establishes the principle of using contextual information to adapt the system.
Motivation to Combine/Modify: A PHOSITA would understand that the semantic content of a recognized command can itself provide powerful context about future acoustic conditions. For example, if a user in a car says, "Open the driver's side window," the recognition result contains a semantic prediction that the noise environment is about to change dramatically. To maintain the system's robustness for the next command, a PHOSITA would be motivated to use this predictive information. Deisher teaches having an adaptive pre-processor. It would have been an obvious step to feed the semantic context from the recognized words into the control logic for Deisher's adaptive pre-processor. For instance, upon recognizing "open window," the system could proactively adjust the noise cancellation filter coefficients in anticipation of new wind noise. This creates the Recognition-to-Input Loop. The motivation is to improve future recognition performance in a dynamically changing environment, which is a predictable improvement, not an inventive leap.

Combining these two motivated modifications—using noise analysis to control output volume and using semantic recognition to control the input pre-processor—discloses all the key elements of independent claims 1, 22, and 23.

Combination Rendering Independent Claim 20 Obvious

Independent claim 20 is narrower, focusing only on the Recognition-to-Input Loop: "a speech dialog control unit configured to receive the recognition result, the speech dialog control unit configured to control the signal pre-processor unit based upon the recognition result."

This claim is rendered obvious by the combination of a standard speech recognition system with the adaptive pre-processor taught by Deisher (US 2004/0064315 A1).

Deisher teaches a signal pre-processor with adjustable parameters that are controlled based on an "acoustic confidence measure."
A PHOSITA would be motivated to find additional sources of information to improve the control of this pre-processor, especially in dynamic environments. As described above, the semantic content of a recognized phrase (e.g., "turn on the fan," "call mom on speakerphone") provides a clear prediction of an impending change to the acoustic environment (fan noise, echo path changes).
It would have been obvious to a PHOSITA to use the output of the speech recognizer as an additional input to the control logic for the pre-processor. The motivation is clear: to preemptively adapt the noise and echo cancellation filters before the acoustic environment changes, thus improving the recognition accuracy of the next utterance. This is a predictable optimization, not an invention. This combination directly teaches the control loop claimed in claim 20.

Conclusion:

While no single prior art reference discloses the specific dual-feedback architecture of US 8,306,815, the constituent elements existed in the art. A person of ordinary skill in the art, faced with the well-known problem of making speech dialog systems work better in cars and other noisy environments, would have been motivated to combine these known elements. Using noise analysis to control output volume and using recognized commands to predictively adjust input filtering are obvious engineering solutions that yield predictable improvements in system usability and robustness. Therefore, the independent claims of US patent 8,306,815 would be found obvious under 35 U.S.C. § 103.