Obviousness — US Patent 9031259

To analyze the obviousness of US Patent 9,031,259 under 35 U.S.C. § 103, we will consider the knowledge available to a person having ordinary skill in the art (POSITA) prior to the patent's priority date of September 15, 2011. The patent itself provides a basis for identifying relevant prior art by describing "known" techniques and the problems they sought to overcome.

Claims for Analysis

We will focus on Independent Claim 1 (Noise Reduction Apparatus) and its representative elements, as the Audio Input Apparatus (Claim 10) and Noise Reduction Method (Claim 19) claims incorporate similar inventive concepts and features within a specific context or method.

Independent Claim 1:
A noise reduction apparatus comprising:

a speech segment determiner configured to determine whether or not a sound picked up by at least either a first microphone or a second microphone is a speech segment and to output speech segment information when it is determined that the sound picked up by the first or the second microphone is the speech segment;
a voice direction detector configured, when receiving the speech segment information, to detect a voice incoming direction indicating from which direction a voice sound travels, based on a first sound pick-up signal obtained based on a sound picked up by the first microphone and a second sound pick-up signal obtained based on a sound picked up by the second microphone and to output voice incoming-direction information when the voice incoming direction is detected; and
an adaptive filter configured to perform a noise reduction process using the first and second sound pick-up signals based on the speech segment information and the voice incoming-direction information.

Identified Prior Art and Known Techniques

The patent's specification acknowledges several conventional techniques and problems, which serve as a basis for understanding what a POSITA would have known:

Conventional Two-Microphone Adaptive Noise Reduction: The patent states that "a noise cancelling function (a noise reduction apparatus) is known for reducing noise components carried by a voice signal so that a voice sound can be clearly listened." It further describes that "a noise signal obtained based on a sound picked up by a sub-microphone for use in picking up mainly noise sounds is subtracted from a voice signal obtained based on a sound picked up by a main microphone for use in picking up mainly voice sounds, thereby reducing noise components carried by the voice signal." This explicitly describes a well-established two-microphone adaptive noise cancellation system.
- Problem with known systems: The patent notes that "the known noise cancelling function does not work well in an environment of high noise level" and "does not satisfy a demand for high quality of a voice sound."
Speech Segment Determination (Speech Activity Detection - SAD): The patent refers to "speech segment determination techniques" for accurately detecting human voices, even in high noise levels. It specifically mentions "a speech segment determination technique I described in U.S. patent application Ser. No. 13/302,040 or a speech segment determination technique II described in U.S. patent application Ser. No. 13/364,016 can be used." While the cited patent applications themselves may not be statutory prior art against US9031259 (due to their priority/filing dates relative to US9031259), the fact that the specification describes these as available techniques constitutes an admission by the applicant that such robust speech segment determination was known or obvious to a POSITA at the time of the invention.
Voice Direction Detection (Sound Source Localization - SSL): The patent states, "There are several techniques for voice direction detection. One technique is to detect a voice incoming direction based on a phase difference between the sound pick-up signals 21 and 22. Another technique is to detect a voice incoming direction based on the difference or ratio between the magnitudes of a sound... (power information)." This confirms that methods for detecting a voice incoming direction using two microphones based on phase differences (time difference of arrival) or amplitude/power differences were known in the art prior to 2011.

Obviousness Analysis under 35 U.S.C. § 103

A POSITA in the field of audio signal processing and noise reduction, motivated to overcome the acknowledged shortcomings of conventional two-microphone adaptive noise reduction in high-noise environments, would have found the combination of elements in US9031259 obvious.

Combination of Known Techniques:

A POSITA would begin with Prior Art 1: A conventional two-microphone adaptive noise reduction system. This system provides the basic framework of using a main microphone for speech+noise and a sub-microphone for noise reference, feeding into an adaptive filter.

To address the problem of poor performance in high-noise environments and the potential for speech cancellation (a known issue with uncontrolled adaptive filters), a POSITA would be motivated to integrate Prior Art 2: Robust speech segment determination. It was a well-known practice in adaptive noise reduction to use speech activity detection (SAD) to control the adaptive filter's behavior, specifically by freezing or slowing coefficient adaptation during speech segments and allowing adaptation during noise-only segments. This prevents the adaptive filter from mistakenly learning and canceling the desired speech signal.

To further enhance the adaptive noise reduction, especially in dynamic environments where the relative positions of the voice source and microphones might change, or where the designated "main" and "sub" microphone roles might become inappropriate, a POSITA would be motivated to incorporate Prior Art 3: Voice direction detection using two microphones. Understanding the spatial relationship of the voice source to the microphones provides valuable information for optimizing noise reduction.

Motivation for Combining and Specific Control Logic:

Speech Segment Determiner and Triggering Voice Direction Detection:
- Motivation: It would be obvious to a POSITA that performing computationally intensive voice direction detection only when speech is actually present (i.e., triggered by speech segment information) would conserve processing resources and improve the accuracy of direction detection by avoiding attempts to localize non-speech sounds. This is a logical and efficient design choice.
Adaptive Filter Controlled by Both Speech Segment Information and Voice Incoming-Direction Information:
- Using Speech Segment Information for Adaptive Filter Control: As noted, controlling adaptive filter coefficient updates (e.g., holding updates during speech, adapting during noise) using speech segment information was a conventional and obvious technique to prevent speech cancellation and improve noise reduction performance in noisy conditions.
- Using Voice Incoming-Direction Information for Adaptive Filter Control:
  - Switching Microphone Roles: A POSITA would readily recognize that if the detected voice incoming direction indicates that the primary voice source is closer to, or predominantly picked up by, the microphone conventionally designated as the "sub-microphone" (which typically provides the noise reference), then the roles of the main and sub microphones should be dynamically switched. This ensures that the adaptive filter receives the optimal voice-plus-noise signal and noise-reference signal for effective noise reduction. The patent itself describes this scenario, e.g., when the phase or power difference indicates the sub-microphone signal is more advanced or stronger for the voice component.
  - Disabling/Limiting Noise Reduction: When the voice source is detected to be roughly equidistant from both microphones (e.g., in a central area between them) or from an "inappropriate direction," the noise reference signal from the sub-microphone would inevitably contain a significant, coherent portion of the desired speech signal. In such a scenario, applying adaptive noise cancellation would lead to destructive cancellation of the speech itself. Therefore, it would be an obvious design choice for a POSITA to disable or significantly limit the noise reduction process under these conditions to preserve speech quality, as acknowledged by the patent.

Conclusion

Given the admitted state of the art, a POSITA would be motivated to combine the known elements of two-microphone adaptive noise reduction, robust speech segment determination, and two-microphone voice direction detection. The specific control logic of having speech segment determination trigger voice direction detection, and using both speech segment and voice incoming-direction information to intelligently control the adaptive filter (e.g., for coefficient updates, dynamic microphone role switching, or disabling noise reduction) represents a series of obvious engineering solutions to the acknowledged problems of noise and speech cancellation in high-noise environments. Therefore, the independent claims of US Patent 9,031,259 would likely have been obvious under 35 U.S.C. § 103 prior to its priority date.