Patent 9031259
Derivative works
Defensive disclosure: derivative variations of each claim designed to render future incremental improvements obvious or non-novel.
Active provider: Google · gemini-2.5-flash
Derivative works
Defensive disclosure: derivative variations of each claim designed to render future incremental improvements obvious or non-novel.
The search results confirm the existence of US Patent 9,031,259 and its active status, as well as providing links to USPTO search tools. This aligns with the provided authoritative patent text. I can now proceed with generating the defensive disclosure.
Defensive Disclosure for US Patent 9,031,259
Patent Title: Noise reduction apparatus, audio input apparatus, wireless communication apparatus, and noise reduction method
Current Assignee: Soundclear Technologies LLC
Original Assignee: JVCKenwood Corp
Inventor: Takaaki Yamabe
Filing Date: September 14, 2012
Issue Date: May 12, 2015
Adjusted Expiration: November 14, 2033
This Defensive Disclosure document aims to preempt future patenting efforts by competitors on incremental improvements related to US Patent 9,031,259. By technically detailing various derivative implementations and combinations with open-source standards, we assert these concepts as obvious or non-novel to a person having ordinary skill in the art, thereby establishing prior art.
Derivative Variations
The following derivatives expand upon the core concepts of US Patent 9,031,259, which include a speech segment determiner, a voice direction detector, and an adaptive filter operating on signals from at least two microphones.
1. Material & Component Substitution: Multi-Element MEMS Microphone Array with Digital Beamforming
Enabling Description:
This derivative replaces the discrete main microphone (11) and sub-microphone (12) of US9031259 with a spatially distributed array of micro-electro-mechanical systems (MEMS) microphones, specifically a uniform linear array (ULA) or a uniform circular array (UCA) comprising N (where N > 2) MEMS elements. Each MEMS microphone integrates its own analog-to-digital converter (ADC) operating at a sampling rate of 48 kHz and 24-bit resolution, outputting a pulse-density modulation (PDM) or I²S digital stream. A Field-Programmable Gate Array (FPGA) or a high-performance Digital Signal Processor (DSP) (e.g., Texas Instruments C66x series or Analog Devices SHARC series) is employed for signal aggregation and digital beamforming.
The speech segment determiner (15) and voice direction detector (16) functionalities are integrated within this DSP/FPGA. The voice direction detection is achieved not merely by phase or power difference between two discrete microphones, but by implementing advanced digital beamforming algorithms such as Minimum Variance Distortionless Response (MVDR) or Generalized Sidelobe Canceller (GSC) on the N-element array. This allows for real-time estimation of the Angle of Arrival (AoA) of speech signals with sub-degree precision. The adaptive filter (18) then receives not just two raw microphone signals, but a focused main lobe signal (derived from beamforming towards the detected AoA) and one or more spatially filtered noise reference signals (derived from nulling beamformers or side lobes directed away from the AoA). The adaptive coefficient adjuster (74) within the adaptive filter (18) uses an improved Normalized Least Mean Squares (NLMS) or Recursive Least Squares (RLS) algorithm, which adapts its coefficients based on the cleaner, beamformed noise reference, thereby enhancing noise reduction in complex multi-source noise environments. The system could further incorporate an ultrasonic transducer for detecting gestures (e.g., pointing) that can inform the beamforming directionality, effectively making the microphone 'steerable' by user input in a non-contact manner.
graph TD
M_MEMS1[MEMS Mic 1 (with ADC)] --> FPGA_DSP
M_MEMSN[MEMS Mic N (with ADC)] --> FPGA_DSP
FPGA_DSP -- Digital Beamforming (MVDR/GSC) --> BF_OUTPUT(Beamformed Signals)
BF_OUTPUT -- Main Lobe (Voice) --> Adaptive_Filter
BF_OUTPUT -- Nulling Beamformers (Noise References) --> Adaptive_Filter
FPGA_DSP -- AoA Estimation --> VDD(Voice Direction Detector)
FPGA_DSP -- Speech Features --> SSD(Speech Segment Determiner)
SSD -- Speech Segment Info --> VDD
VDD -- Voice Incoming-Direction Info --> AFC(Adaptive Filter Controller)
SSD -- Speech Segment Info --> AFC
AFC -- Control Signal --> Adaptive_Filter
Adaptive_Filter -- Low-Noise Signal --> Output
2. Operational Parameter Expansion: Ultra-Low Latency Noise Reduction for Safety-Critical High-Noise Environments
Enabling Description:
This derivative of the noise reduction apparatus (1) is optimized for ultra-low latency operation in extremely high-noise, safety-critical industrial environments, such as active factory floors or machinery operation zones. The system employs main and sub-microphones (11, 12) featuring custom-designed piezoelectric transducers with a flat frequency response from 20 Hz to 20 kHz, coupled to ultra-fast 32-bit ADCs sampling at 192 kHz. The entire signal processing chain, including the speech segment determiner (15), voice direction detector (16), and adaptive filter (18), is implemented on a dedicated Application-Specific Integrated Circuit (ASIC) with parallel processing units and hard-wired logic, bypassing general-purpose DSPs to minimize instruction cycles.
The speech segment determiner (15) utilizes a spectral flux and energy-based Voice Activity Detection (VAD) algorithm, coupled with a pre-trained Support Vector Machine (SVM) classifier for human speech, achieving detection within 5 milliseconds. The voice direction detector (16) employs a generalized cross-correlation with phase transform (GCC-PHAT) algorithm running on sub-millisecond windows to detect voice direction, aiming for a total latency of less than 10 milliseconds from sound capture to voice incoming-direction information (25) output. The adaptive filter (18) is an optimized FIR filter (as shown in FIG. 6) with a very short impulse response (e.g., 64 taps or less) and uses a fast-converging adaptive algorithm like fast block LMS or Kalman filtering variants, tuned for low computational overhead per sample. Crucially, the adaptive filter controller (17) is programmed to not only adjust filter coefficients based on speech segment and direction but also to integrate real-time acoustic event detection (e.g., emergency alarms, shouts) within the noise-dominated signal (82). If an emergency event is detected, the system prioritizes the transmission of the raw, unprocessed voice signal (81) or a minimally filtered signal, ensuring critical safety alerts are not attenuated, irrespective of ongoing noise reduction, with an emergency bypass latency of less than 2 milliseconds. This architecture allows for reliable voice communication and critical alert identification in environments with sustained broadband noise exceeding 90 dB SPL.
sequenceDiagram
participant Mic1 as Main Mic (11)
participant Mic2 as Sub Mic (12)
participant ADCs as Ultra-Fast ADCs (13, 14)
participant ASIC as ASIC (SSD, VDD, AF)
participant AF_Control as Adaptive Filter Controller (17)
participant Output as Low-Noise Output (27)
Mic1->>ADCs: Analog Signal (20 Hz-20 kHz)
Mic2->>ADCs: Analog Signal (20 Hz-20 kHz)
ADCs->>ASIC: Digital Signals (192 kHz, 32-bit)
ASIC->>ASIC: Speech Segment Determination (<5ms)
ASIC->>ASIC: Voice Direction Detection (GCC-PHAT) (<10ms total)
ASIC->>AF_Control: Speech Segment Info (24)
ASIC->>AF_Control: Voice Direction Info (25)
AF_Control->>ASIC: Control Signal (26)
ASIC->>ASIC: Adaptive Filtering (FIR, Fast Block LMS/Kalman)
ASIC-->>ASIC: Emergency Event Detection
ASIC->>Output: Low-Noise Signal (27) (<10ms latency)
ASIC->>Output: Emergency Bypass (raw/min. filtered) (<2ms latency if emergency)
3. Cross-Domain Application: Noise Reduction for Submersible Acoustic Monitoring Systems
Enabling Description:
This derivative applies the noise reduction principles of US9031259 to an underwater acoustic monitoring system deployed on an autonomous underwater vehicle (AUV) or a fixed seabed observatory. The "first microphone" (11) and "second microphone" (12) are replaced with a pair of spatially separated hydrophones (e.g., RESON TC4032), tuned for underwater sound capture (typically 1 Hz to 100 kHz). The hydrophones are housed in pressure-resistant, acoustically transparent casings and mounted with a precise baseline separation of 0.5 to 2 meters for optimal phase difference detection. The A/D converters (13, 14) are specialized for high-resolution (32-bit) underwater acoustic data, sampling at 250 kHz to capture a broad range of marine sounds, including marine mammal vocalizations and anthropogenic noise.
The "speech segment determiner" (15) is reconfigured as an "acoustic event determiner," trained to identify specific underwater acoustic events, such as cetacean clicks and whistles, ship engine noise, or active sonar pings, based on their unique spectral and temporal characteristics. This uses a combination of Mel-frequency cepstral coefficients (MFCCs) for feature extraction and a Gaussian Mixture Model (GMM) or a Convolutional Neural Network (CNN) for classification. When a target acoustic event (e.g., specific marine mammal vocalization) is detected, the "voice direction detector" (16) becomes an "acoustic event direction detector," employing time difference of arrival (TDOA) or beamforming techniques (e.g., steered response power-phase transform, SRP-PHAT) on the hydrophone signals to determine the bearing and range of the acoustic source. The "adaptive filter" (18) performs noise reduction to enhance the target acoustic event against background ocean noise (e.g., wave action, current noise, biological noise from other species). The adaptive filter controller (17) adjusts the filter coefficients to dynamically suppress identified noise sources while preserving or enhancing the target acoustic event signal, enabling clearer detection, tracking, and analysis of marine activity.
classDiagram
class UnderwaterSystem {
+Hydrophone_1: object
+Hydrophone_2: object
+SpecializedADCs: object
+AcousticEventDeterminer: object
+EventDirectionDetector: object
+AdaptiveFilter: object
+FilterController: object
+OutputSignal: object
}
class Hydrophone_1 {
+captureSound(signal)
}
class Hydrophone_2 {
+captureSound(signal)
}
class SpecializedADCs {
+convertAnalogToDigital(analog_signal): digital_signal
}
class AcousticEventDeterminer {
-MFCC_FeatureExtractor: object
-CNN_Classifier: object
+determineEvent(digital_signal): event_info
}
class EventDirectionDetector {
-TDOA_Algorithm: object
-SRP_PHAT_Algorithm: object
+detectDirection(digital_signal_1, digital_signal_2): direction_info
}
class AdaptiveFilter {
+processSignals(signal_1, signal_2, control_info): low_noise_signal
}
class FilterController {
+generateControl(event_info, direction_info): control_signal
}
Hydrophone_1 "1" -- "1" UnderwaterSystem
Hydrophone_2 "1" -- "1" UnderwaterSystem
SpecializedADCs "1" -- "1" UnderwaterSystem
AcousticEventDeterminer "1" -- "1" UnderwaterSystem
EventDirectionDetector "1" -- "1" UnderwaterSystem
AdaptiveFilter "1" -- "1" UnderwaterSystem
FilterController "1" -- "1" UnderwaterSystem
Hydrophone_1 --> SpecializedADCs
Hydrophone_2 --> SpecializedADCs
SpecializedADCs --> AcousticEventDeterminer
SpecializedADCs --> EventDirectionDetector
AcousticEventDeterminer --> EventDirectionDetector : event_info
AcousticEventDeterminer --> FilterController : event_info
EventDirectionDetector --> FilterController : direction_info
FilterController --> AdaptiveFilter : control_signal
SpecializedADCs --> AdaptiveFilter : digital_signals
AdaptiveFilter --> UnderwaterSystem : output_signal
4. Integration with Emerging Tech: AI-Driven Contextual Noise Reduction with IoT Sensor Fusion
Enabling Description:
This derivative integrates US9031259's noise reduction principles with an AI-driven system that leverages IoT sensor fusion for dynamic, contextual noise reduction. The apparatus (1) includes the main microphone (11) and sub-microphone (12) but augments them with a suite of IoT environmental sensors, including an accelerometer (for device movement/vibration), a barometer (for atmospheric pressure changes indicating weather or indoor/outdoor shifts), a thermistor (temperature), a hygrometer (humidity), and an ambient light sensor. All sensor data, alongside audio signals, are fed into a central processing unit (CPU) equipped with a Neural Processing Unit (NPU) for machine learning inference.
The speech segment determiner (15) is implemented as a Deep Neural Network (DNN) that not only analyzes audio features (like MFCCs and spectral contrast) but also incorporates contextual features from the IoT sensors (e.g., high accelerometer readings might suggest device movement, affecting noise profiles). This DNN is continuously updated via federated learning from a distributed network of similar devices, allowing it to adapt to diverse acoustic and environmental contexts for more accurate speech detection. The voice direction detector (16) operates as described in US9031259, but its output is also fed into a secondary AI module. This module, based on a Reinforcement Learning (RL) agent, dynamically optimizes the adaptive filter's (18) coefficients and algorithms (e.g., switching between NLMS, RLS, or a deep learning-based filter) based on the detected speech segment, voice incoming direction, and the full suite of environmental context data. For instance, if the barometer indicates high wind and the accelerometer detects rapid movement, the RL agent might prioritize a wind noise cancellation algorithm. Conversely, in a quiet, stable indoor environment, it might select an algorithm optimized for subtle reverberation reduction. The system further employs edge computing for immediate processing and only offloads complex model retraining or aggregated, anonymized data to a cloud-based server.
graph LR
Mic1[Main Mic (11)] --> CPU_NPU
Mic2[Sub Mic (12)] --> CPU_NPU
IoT_Sensors[IoT Environmental Sensors] --> CPU_NPU
CPU_NPU -- Audio + Sensor Data --> DNN_SSD(Deep Neural Network Speech Segment Determiner)
CPU_NPU -- Audio Signals --> VDD(Voice Direction Detector)
DNN_SSD -- Speech Segment Info (24) --> RL_Agent(Reinforcement Learning Agent)
VDD -- Voice Incoming-Direction Info (25) --> RL_Agent
IoT_Sensors -- Contextual Data --> RL_Agent
RL_Agent -- Adaptive Filter Control (26) --> Adaptive_Filter(Adaptive Filter 18)
CPU_NPU -- Audio Signals --> Adaptive_Filter
Adaptive_Filter -- Low-Noise Output (27) --> Output
RL_Agent -- Anonymized Data (for retraining) --> Cloud_Federated_Learning
Cloud_Federated_Learning -- Model Updates --> CPU_NPU
5. The "Inverse" or Failure Mode: Privacy-Preserving Low-Power Obfuscation Mode
Enabling Description:
This derivative implements a "privacy-preserving low-power obfuscation mode" for the noise reduction apparatus (1) of US9031259. In this mode, the primary goal is to indicate the presence of speech and its direction while intentionally rendering the speech content non-intelligible, all while minimizing power consumption. The main microphone (11) and sub-microphone (12) are utilized, but the A/D converters (13, 14) operate at a reduced sampling rate (e.g., 4 kHz) and lower bit depth (e.g., 8-bit) to conserve energy.
When the speech segment determiner (15) detects speech (which may use a simplified, lower-computation VAD algorithm in this mode), the voice direction detector (16) still attempts to determine the incoming direction. However, the adaptive filter (18) is controlled by the adaptive filter controller (17) to perform a modified "noise reduction" process. Instead of clarity, it applies a real-time speech obfuscation algorithm. This algorithm involves dynamic pitch shifting (e.g., randomizing pitch by +/- 2 semitones every 100ms), formant blurring (e.g., applying a variable-Q comb filter to smear formants), and a time-domain scrambling technique (e.g., segmenting the speech into 50ms blocks and randomly reordering them within a short buffer, 200ms). This renders the speech incomprehensible to a human listener while preserving enough information (e.g., prosody, rhythm, presence of voiced/unvoiced segments, and source directionality) for an external system to infer conversational activity or source location without exposing content. If the voice incoming-direction information (25) indicates speech from an "unauthorized" direction (e.g., not facing the device), the obfuscation level may be further increased. The system operates on a low-power microcontroller with specialized audio codecs to execute these functions with minimal energy draw, suitable for always-on, privacy-sensitive applications.
stateDiagram
[*] --> Standby: Power On
Standby --> Low_Power_Listen: User/System Command
Low_Power_Listen --> Speech_Detected: VAD detects speech (low-power)
Speech_Detected --> Obfuscation_Mode: Speech Segment Info + Direction Info
Obfuscation_Mode --> Low_Power_Listen: Speech Ends or Timer Expired
Obfuscation_Mode --> Enhanced_Obfuscation: Unauthorized Direction/High Sensitivity
Enhanced_Obfuscation --> Obfuscation_Mode: Direction Clears
Low_Power_Listen --> Standby: User/System Command
state Speech_Detected {
Speech_Detected: Simplified VAD
Speech_Detected: Voice Direction Detection
}
state Obfuscation_Mode {
Obfuscation_Mode: Reduced Sample Rate/Bit Depth
Obfuscation_Mode: Dynamic Pitch Shifting
Obfuscation_Mode: Formant Blurring
Obfuscation_Mode: Time-Domain Scrambling
}
state Enhanced_Obfuscation {
Enhanced_Obfuscation: Increased Randomization
Enhanced_Obfuscation: Additional Noise Insertion
}
Combination Prior Art Scenarios
Here are three scenarios combining the core inventive concepts of US Patent 9,031,259 with existing open-source standards, demonstrating obviousness for certain applications:
US9031259 + Opus Audio Codec for Real-time Communication:
- Description: The noise reduction apparatus (1) processes audio input from its microphones (11, 12) through the speech segment determiner (15), voice direction detector (16), and adaptive filter (18) to produce a low-noise output signal (27). This low-noise signal is then directly fed into an encoder implementing the Opus interactive audio codec (RFC 6716). Opus is a high-quality, open-source audio codec widely used for real-time applications like VoIP, videoconferencing, and in-game communication, known for its low latency and excellent quality at various bitrates. Combining the noise reduction capabilities of US9031259 (especially the contextual adaptation based on speech segment and direction) with Opus's efficient compression and error concealment would be an obvious choice for any modern real-time communication system, providing clear voice transmission even in noisy environments with minimal bandwidth overhead. The speech segment information (24) could also inform Opus's Voice Activity Detection (VAD) mode, and the voice incoming-direction information (25) could be used for spatial audio rendering at the receiving end, enhancing immersion and intelligibility.
US9031259 + MQTT Protocol for IoT Acoustic Sensing:
- Description: An audio input apparatus (500) incorporating the noise reduction apparatus (1) (as shown in FIG. 9 or FIG. 10) is deployed as an IoT acoustic sensor in environments such as smart homes, industrial monitoring, or public spaces. After the adaptive filter (18) produces the low-noise output signal (27), instead of direct transmission, a metadata extraction module processes this signal. This module extracts key features like the presence of speech, the detected voice incoming direction, and the estimated signal-to-noise ratio (SNR) of the speech segment. These metadata, potentially along with highly compressed (e.g., using a low-bitrate Opus stream) or anonymized audio snippets, are then formatted into JSON payloads and published via the Message Queuing Telemetry Transport (MQTT) protocol (ISO/IEC PRF 20922) to a central MQTT broker. MQTT is an open-source, lightweight messaging protocol ideal for IoT devices with limited resources and unreliable networks. The combination allows for efficient, event-driven transmission of critical acoustic information (e.g., "speech detected from North, moderate SNR," or "emergency shout detected from East") without continuously streaming high-bandwidth audio, enabling scalable and context-aware acoustic monitoring systems.
US9031259 + WebRTC for Enhanced Browser-Based Communication:
- Description: A wireless communication apparatus (600) (as shown in FIG. 10) or an audio input apparatus (500) integrates the noise reduction apparatus (1) and serves as an audio capture front-end for a browser-based WebRTC (Web Real-Time Communication) application. WebRTC is an open-source project that enables real-time communication (voice, video, data) directly within web browsers and mobile applications via simple APIs. The low-noise output signal (27) from the adaptive filter (18) of US9031259 is provided as the audio input stream to the WebRTC
getUserMedia()API. This means that any browser-based communication using WebRTC would inherently benefit from the advanced noise reduction and voice-direction-aware processing of the apparatus. Furthermore, the speech segment information (24) and voice incoming-direction information (25) could be exposed via custom WebRTC data channels or metadata tracks, allowing the remote WebRTC client or server to implement further enhancements like spatial audio rendering, intelligent speaker detection, or adaptive user interface elements, making the in-browser communication significantly clearer and more context-aware, particularly in noisy user environments.
- Description: A wireless communication apparatus (600) (as shown in FIG. 10) or an audio input apparatus (500) integrates the noise reduction apparatus (1) and serves as an audio capture front-end for a browser-based WebRTC (Web Real-Time Communication) application. WebRTC is an open-source project that enables real-time communication (voice, video, data) directly within web browsers and mobile applications via simple APIs. The low-noise output signal (27) from the adaptive filter (18) of US9031259 is provided as the audio input stream to the WebRTC
Generated 5/16/2026, 6:49:30 AM