Obviousness — US Patent 11069337

Obviousness Analysis of US Patent 11069337 under 35 U.S.C. § 103

To establish obviousness under 35 U.S.C. § 103, it must be demonstrated that the claimed invention as a whole would have been obvious to a person having ordinary skill in the art (PHOSITA) at the time of the invention, based on prior art references. This requires identifying: 1) the scope and content of the prior art, 2) the differences between the prior art and the claims at issue, and 3) the level of ordinary skill in the pertinent art. Crucially, there must also be an articulated reason or motivation for a PHOSITA to combine the identified prior art references to arrive at the claimed invention with a reasonable expectation of success.

1. Scope and Content of Prior Art

The patent itself identifies "Japanese Examined Patent Publication No. H07-109560" as relevant prior art.

Japanese Examined Patent Publication No. H07-109560 (H07-109560): This reference discloses a "voice control device that analyzes detected voice of a user and performs processing according to the user's intention." It also describes a voice control device that "outputs, via voice, that processing intended by a user has been performed, or outputs, via voice-content of a user's inquiry".

2. Differences Between Prior Art and Claims at Issue

The core innovation of US11069337, as defined in its independent claims (Claims 1, 6, and 9), lies in classifying a user's input voice as either a "first voice" (e.g., normal speech) or a "second voice" (e.g., a whisper), and then generating an output sentence where information is omitted in the "second output sentence" (for a second voice input) compared to the "first output sentence" (for a first voice input). This aims to suppress the influence of the output voice on others while maintaining comprehensibility for the user.

H07-109560 describes a general voice control device that analyzes user voice and outputs voice in response. However, the provided description of H07-109560 does not explicitly mention:

Classifying the input voice into different types (e.g., normal speech vs. whisper).
Adjusting the content of the output sentence (by omitting information) based on this classification. Instead, it notes that simply decreasing the sound volume of the output voice might make it hard for the user to understand.

Therefore, the key differences reside in the voice classification and the conditional content omission in the generated output sentence based on that classification.

3. Level of Ordinary Skill in the Art

A person having ordinary skill in the art (PHOSITA) in this field would likely possess a bachelor's degree in computer science, electrical engineering, or a related field, along with several years of experience in speech recognition, natural language processing, or voice user interface design. They would be familiar with existing voice control systems, speech analysis techniques (e.g., Fourier-transformation, Mel-frequency cepstrum coefficients), and methods for generating natural language responses.

4. Obviousness Combinations and Motivation to Combine

A potential combination of prior art that could render the claims of US11069337 obvious would involve H07-109560 combined with other references teaching voice classification and dynamic content adjustment.

Combination: H07-109560 + Prior Art Teaching Voice Classification + Prior Art Teaching Dynamic Content Omission in Voice Interfaces

Argument for Obviousness:

H07-109560 as a Base: H07-109560 provides the fundamental "voice control device that analyzes detected voice of a user and performs processing according to the user's intention" and "outputs, via voice... processing intended by a user has been performed, or outputs, via voice-content of a user's inquiry." This establishes a basic voice control system.
Adding Voice Classification: The concept of classifying different characteristics of a user's voice (e.g., whisper, normal speech, volume, speaking speed) was known in the art. For instance, the specification of US11069337 itself describes various methods for classifying a voice, such as analyzing peak frequency in a spectrum, slope of the peak, volume, speaking speed, volume ratio between speech and wind noise, proximity to the device, or Mel-frequency cepstrum coefficients. These techniques were generally understood by a PHOSITA in speech analysis.
- Motivation: A PHOSITA, starting with the system of H07-109560, would be motivated to incorporate voice classification. The problem identified in H07-109560 is that "the output voice may be heard by a person who is not the user... and may be an annoyance," and simply decreasing volume makes the content hard to understand. Recognizing this problem, a PHOSITA would seek ways to adapt the system's output based on the user's input context. Classifying the input voice (e.g., detecting a whisper) would be a logical step to infer the user's desire for a more discreet response. This is a "known-technique" rationale for combining references, where a known technique (voice classification) can improve a device (H07-109560's voice control) in a similar way (adapting output based on user context).
Adding Conditional Content Omission: Once the input voice is classified (e.g., as a whisper, indicating a desire for discretion), a PHOSITA would be motivated to adjust the content of the output to address the problem of annoying others while still conveying essential information to the user. Omitting non-critical information from the output sentence is a predictable solution to shorten the response and reduce its impact on others, especially when the user has signaled a need for discretion (e.g., by whispering). This is a straightforward design choice given the stated problem. The Federal Circuit has affirmed that a skilled artisan would be motivated to combine references if the combination addresses a known problem with predictable solutions.
- Motivation: The motivation to omit information from the output sentence when a "second voice" (e.g., whisper) is detected directly stems from the problem articulated in H07-109560 regarding the annoyance of output voice to others. If a user whispers, they implicitly indicate a desire for a discreet interaction. Merely lowering the volume might hinder user comprehension. Therefore, shortening the output by omitting less critical information (e.g., polite expressions, redundant details, or low-priority information, as described in US11069337) while retaining the core message would be an obvious design choice for a PHOSITA aiming to improve the user experience and address the privacy/discretion concern. This combines the benefit of reduced output length (less disturbance) with retained comprehensibility (by carefully selecting omitted information).

Conclusion on Obviousness:

Given the general knowledge in the art regarding voice analysis and speech interfaces, a PHOSITA, seeking to improve a voice control device like that in H07-109560 to be less obtrusive in certain environments, would have found it obvious to:

Integrate voice classification techniques (e.g., whisper detection) to infer the user's desire for discretion.
Based on this classification, dynamically adjust the content of the generated voice output by omitting information to reduce the length and potential intrusiveness of the response, while still ensuring the user understands the core message.

This combination of known elements and predictable results, driven by a clear motivation to solve the problem of voice output annoyance mentioned in the background art, suggests that the claims of US11069337 are vulnerable to an obviousness challenge under 35 U.S.C. § 103.