Patent 10783899
Derivative works
Defensive disclosure: derivative variations of each claim designed to render future incremental improvements obvious or non-novel.
Active provider: Google · gemini-2.5-pro
Derivative works
Defensive disclosure: derivative variations of each claim designed to render future incremental improvements obvious or non-novel.
DEFENSIVE DISCLOSURE
Title: Derivative Methods and Systems for Probabilistic, Dynamically-Controlled Signal Suppression
Reference Patent: US 10783899 B2 ("Babble noise suppression")
Publication Date: May 8, 2026
This document discloses a series of derivative works, applications, and enhancements based on the core principles described in US patent 10783899. The purpose of this disclosure is to place these variations in the public domain, thereby establishing prior art against future patent claims for similar or incremental improvements.
Axis 1: Material & Component Substitution
Derivative 1.1: Neural Network-Based Soft Speech Detector
Enabling Description: The soft speech detector, which calculates a speech likelihood score, is implemented using a lightweight, time-delay neural network (TDNN) or a 1D convolutional neural network (CNN) instead of the kurtosis/cepstral feature combination. The network is trained on a large corpus of audio with and without speech in various babble noise conditions. The input to the network is a short-time Fourier transform (STFT) frame or a mel-spectrogram, and its output is a single scalar value between 0 and 1 representing the probability of foreground speech presence. This allows the detector to learn more complex and robust features of speech versus babble noise, improving accuracy over the heuristic statistical methods. The trained model weights are stored in non-volatile memory on the device.
Mermaid Diagram:
flowchart TD A[Audio Input Frame] --> B{1D-CNN/TDNN Model}; B --> |Inference| C[Speech Probability Score (0-1)]; C --> D{Noise Suppressor}; D --> E[Processed Audio Frame];
Derivative 1.2: Log-MMSE Noise Suppressor
Enabling Description: The Wiener filter-based noise suppressor is replaced with a Logarithmic Minimum Mean-Square Error (Log-MMSE) spectral amplitude estimator. The dynamic noise overestimation factor
β_oe(l)from the soft speech detector is used to directly modulate the a priori Signal-to-Noise Ratio (SNR) estimate within the Log-MMSE algorithm. When speech likelihood is low,β_oe(l)is high, which artificially inflates the estimated noise power, causing the Log-MMSE estimator to apply more aggressive suppression. This substitution provides less "musical noise" artifacting, which is a common problem with simple Wiener filters, resulting in a more natural-sounding residual background.Mermaid Diagram:
sequenceDiagram participant SSD as Soft Speech Detector participant LMMSE as Log-MMSE Estimator participant Audio as Audio Stream Audio->>SSD: Input Frame SSD->>LMMSE: Speech Likelihood LMMSE->>LMMSE: Calculate a priori SNR LMMSE->>LMMSE: Modulate SNR with Likelihood^-1 LMMSE->>Audio: Apply Log-MMSE Gain Audio-->>Audio: Output Denoised Frame
Derivative 1.3: FPGA-Based Hardware Implementation
Enabling Description: The entire system, including the soft speech detector and noise suppressor, is implemented on a Field-Programmable Gate Array (FPGA) or a custom Application-Specific Integrated Circuit (ASIC). The STFT, feature calculation (or neural net inference), dynamic factor computation, and spectral weighting application are all pipelined in the hardware logic. This enables real-time processing with microsecond-level latency, far exceeding the capabilities of a general-purpose CPU or DSP. This is critical for applications like in-ear communication devices for pilots or first responders where any delay is unacceptable. Power consumption is also significantly reduced, enabling battery-powered operation for extended periods.
Mermaid Diagram:
graph LR subgraph FPGA Fabric A(ADC) --> B[STFT Engine]; B --> C[Feature Extractor]; C --> D[Probabilistic Classifier]; D --> E[Overestimation Factor LUT]; B --> F[Spectral Weighting Multiplier]; E --> F; F --> G[ISTFT Engine]; end G --> H(DAC); style A fill:#f9f,stroke:#333,stroke-width:2px style H fill:#f9f,stroke:#333,stroke-width:2px
Axis 2: Operational Parameter Expansion
Derivative 2.1: Ultrasonic Bioacoustic Filtering
Enabling Description: The system is adapted for the ultrasonic frequency range (e.g., 20 kHz to 200 kHz) to study animal vocalizations. The "foreground speech" is the specific call of a target species (e.g., a bat's echolocation pulse), while the "babble noise" is the cacophony of other animal calls, insects, and environmental noise in the same frequency band. The STFT window size and feature extraction parameters are scaled accordingly. The soft detector is trained to identify the unique spectro-temporal signature of the target species' call. This allows researchers to isolate specific animal communications from dense acoustic environments for population studies or behavioral analysis.
Mermaid Diagram:
stateDiagram-v2 [*] --> Idle Idle --> Listening: High-frequency audio stream starts Listening --> Processing: Signal energy exceeds threshold Processing --> Listening: Target species call probability < 0.5 Processing --> Target_Call_Isolated: Target species call probability >= 0.5 Target_Call_Isolated --> Listening: Call ends Target_Call_Isolated: Dynamic suppression of non-target ultrasonic noise Listening: Low-power monitoring state
Derivative 2.2: Cryogenic Sensor Denoising
Enabling Description: The technology is applied to denoise signals from scientific instruments operating at cryogenic temperatures (e.g., below 77 Kelvin), such as superconducting quantum interference devices (SQUIDs) or radio astronomy receivers. The "foreground speech" is the faint, transient signal of interest (e.g., a single photon detection event), while the "babble noise" is a combination of thermal noise and interference from control electronics. The soft detector uses features sensitive to the expected quantum signal signature (e.g., specific rise times and energy distributions) to differentiate it from random thermal fluctuations. The dynamic suppressor aggressively filters the baseline noise between events, significantly improving the instrument's signal-to-noise ratio.
Mermaid Diagram:
flowchart TD A[Cryogenic Sensor Signal] --> B{STFT}; B --> C{Quantum Signature Detector (Soft)}; C -- Probability --> D{Dynamic Noise Floor Adjuster}; D -- Adjusted Noise Profile --> E{MMSE Suppressor}; B -- Noisy Spectrum --> E; E --> F{ISTFT}; F --> G[Cleaned Event Data];
Axis 3: Cross-Domain Application
Derivative 3.1: Aerospace Cockpit Communication Enhancement
Enabling Description: The system is integrated into an active noise-cancellation headset for aircraft pilots. The "foreground speech" is the pilot's own voice into the microphone and critical ATC communications. The "babble noise" is the high-decibel, wide-spectrum noise of the engines, wind, and non-critical radio chatter. The soft speech detector uses the speech probability score to control both the noise suppression on the outgoing microphone signal and the active noise cancellation profile of the earpieces. During speech pauses, suppression is maximized to reduce fatigue. When speech is detected, suppression is relaxed to ensure clarity and prevent distortion of critical commands.
Mermaid Diagram:
classDiagram class PilotHeadset { +Microphone mic +Speaker speaker +Processor dsp +processAudio() } class SoftSpeechDetector { +getSpeechProbability(frame) float } class ANCEngine { -suppressionLevel: float +setAggressiveness(level) +generateAntiNoise(frame) } class CommFilter { -overestimationFactor: float +setAggressiveness(factor) +filterOutgoingSignal(frame) } PilotHeadset --> SoftSpeechDetector : uses PilotHeadset --> ANCEngine : controls PilotHeadset --> CommFilter : controls
Derivative 3.2: AgTech Livestock Health Monitoring
Enabling Description: A low-power acoustic sensor network is deployed in a large-scale poultry or swine facility. Each sensor runs the babble suppression algorithm at the edge. The "foreground speech" is a specific set of acoustic biomarkers for disease, such as a particular type of cough or wheeze indicative of respiratory illness. The "babble noise" is the standard background noise of thousands of healthy animals. The soft detector is trained on audio signatures of these biomarkers. When the probability of a biomarker sound exceeds a threshold, the system sends a high-fidelity, denoised audio snippet and an alert to a central management system for veterinary analysis, enabling early disease detection across a large population.
Mermaid Diagram:
sequenceDiagram participant Sensor participant Cloud participant Veterinarian loop Continuous Monitoring Sensor->>Sensor: Capture audio from barn Sensor->>Sensor: Apply babble suppression Sensor->>Sensor: Calculate health biomarker probability end alt Biomarker probability > Threshold Sensor->>Cloud: Upload denoised audio & alert Cloud->>Veterinarian: Push notification end
Derivative 3.3: FinTech Trading Floor Compliance
Enabling Description: The system is deployed in a real-time compliance analysis platform for financial trading floors. It processes audio from turret microphones that capture multiple traders speaking simultaneously. For a given trader's channel, their voice is the "foreground speech" and the voices of all other traders and background noise is the "babble." The system isolates the primary trader's speech with high fidelity by aggressively suppressing the surrounding babble, especially during their speech pauses. The cleaned audio stream is fed into a speech-to-text engine that flags keywords related to policy violations or market manipulation, creating a reliable record for regulatory compliance.
Mermaid Diagram:
flowchart LR A[Trading Floor Audio] --> B{Multi-Channel Input}; B -- Channel 1 --> C1[Babble Suppression (Trader 1)]; B -- Channel 2 --> C2[Babble Suppression (Trader 2)]; B -- Channel N --> CN[Babble Suppression (Trader N)]; C1 --> D1{Speech-to-Text}; C2 --> D2{Speech-to-Text}; CN --> DN{Speech-to-Text}; D1 & D2 & DN --> E{Compliance Keyword Analysis}; E --> F[Alerting & Logging];
Axis 4: Integration with Emerging Tech
Derivative 4.1: Reinforcement Learning for Suppression Policy
Enabling Description: The fixed mapping between speech probability and the noise overestimation factor is replaced by a policy network optimized via Reinforcement Learning (RL). The RL agent's state is a vector of audio features (including speech probability, SNR, noise type). Its action is to select a value for the overestimation factor from a discrete set. The reward function is a weighted sum of an objective speech quality metric (e.g., PESQ) and a noise reduction score, calculated against a "clean speech" reference during training. Over millions of iterations, the agent learns a sophisticated policy that adapts the suppression aggressiveness not just to speech presence, but to the specific type and level of background noise, outperforming any static, hand-tuned function.
Mermaid Diagram:
graph TD subgraph RL_Training_Loop A[State: Audio Features] --> B{RL Agent (Policy Network)}; B -- Action: Set β_oe --> C[Noise Suppressor]; C -- Processed Audio --> D{Reward Calculation}; A -- Original Audio --> D; D -- Reward Signal --> B; end B -- Export --> E[Optimized Policy Model]; subgraph Deployment F[Live Audio Features] --> E; E -- Optimal β_oe --> G[Live Noise Suppressor]; end
Derivative 4.2: IoT-Enabled Adaptive Denoising
Enabling Description: The system is embedded in a network of IoT smart speakers. Each speaker uses the soft speech detector to classify the acoustic environment. When babble noise is detected, it not only activates the local suppression algorithm but also signals this environmental state to neighboring IoT devices over a mesh network (e.g., Zigbee, Thread). A central IoT hub can then use this information to create a real-time "noise map" of a building, and other devices can pre-emptively adjust their microphone gain or suppression parameters before a user starts talking, leading to a more seamless interaction. The speech probability score itself is transmitted as lightweight metadata.
Mermaid Diagram:
erDiagram IOT_DEVICE ||--o{ SENSOR_READING : has IOT_DEVICE { string deviceId PK string location } SENSOR_READING { datetime timestamp PK string deviceId FK float speech_probability string noise_type } IOT_HUB ||--|{ IOT_DEVICE : manages IOT_HUB { string hubId PK object noise_map }
Axis 5: The "Inverse" or Failure Mode
Derivative 5.1: Graceful Degradation Mode
Enabling Description: The system includes a real-time CPU load monitor. If the processing load exceeds a predefined threshold (e.g., 85% for more than 500ms), the system enters a "graceful degradation" mode. In this mode, the complex Log-MMSE or Wiener filter is replaced by a simple spectral gate with a fixed threshold. The soft speech detector is also simplified, using only a single, computationally cheap feature like frame energy. This ensures that the audio stream is never dropped and a basic level of noise reduction is maintained, even under heavy system load, preventing catastrophic failure in mission-critical applications. The system returns to high-fidelity mode once the load subsides.
Mermaid Diagram:
stateDiagram-v2 state High_Fidelity { [*] --> Active Active: Full soft-detection & spectral suppression } state Low_Fidelity { [*] --> Active Active: Simple energy VAD & spectral gate } [*] --> High_Fidelity High_Fidelity --> Low_Fidelity: CPU Load > 85% Low_Fidelity --> High_Fidelity: CPU Load < 70%
Derivative 5.2: Active Speech Obfuscation for Privacy
Enabling Description: An "inverse" implementation for user privacy in smart devices. When the device is not activated by its wake-word, the system runs in a privacy-preserving mode. The soft speech detector continually monitors for human speech. If speech is detected (i.e., the user is having a conversation nearby), the noise suppressor is reconfigured to act as a speech obfuscator. Instead of suppressing the background, it uses the speech probability score to control a filter that aggressively distorts or removes the spectral components identified as speech, while leaving the background noise intact. This ensures that any inadvertently buffered audio contains no intelligible human speech, providing a strong guarantee of privacy.
Mermaid Diagram:
flowchart TD A[Ambient Audio] --> B{Soft Speech Detector}; B -- Speech Prob. > 0.7 --> C{Speech Obfuscation Module}; C -- Distorted Speech Spectrum --> E{Buffer}; B -- Speech Prob. <= 0.7 --> D[No Action]; D --> E; E -.-> F((Wake Word Engine));
Combination Prior Art with Open-Source Standards
WebRTC Integration with RTCP Metadata: The babble suppression system is defined as a standard processing block within the WebRTC audio pipeline. The frame-by-frame speech probability score from the soft detector is embedded into custom RTCP packets (e.g., using the
APPpacket type). The receiving client can use this metadata to understand the acoustic conditions at the sender's end, dynamically adjust its jitter buffer, or inform the user interface that the other party is in a noisy environment.Kaldi ASR Toolkit Pre-processor: The system is integrated as a front-end processing script for the Kaldi open-source speech recognition toolkit. The script takes raw audio, applies the dynamic babble suppression, and pipes the cleaned audio to Kaldi's feature extraction (
compute-mfcc-feats). The speech probability score is written to a separate file, which is then used by Kaldi's VAD tools (compute-vad) to generate more reliable speech/non-speech labels, improving endpointing and overall recognition accuracy in noisy conditions.VST/LV2 Plugin for Digital Audio Workstations: The entire method is packaged as an open-source LV2 audio plugin, compatible with digital audio workstations like Audacity, Ardour, and Reaper. The plugin exposes the core parameters to the user via a graphical interface:
β_max(maximum overestimation), sensitivity of the probability-to-overestimation mapping, and a choice of soft-detector features (e.g., Kurtosis-based, Cepstrum-based, or a lightweight NN model). This allows audio engineers and forensic analysts to apply and fine-tune babble suppression on any audio track.
Generated 5/8/2026, 10:11:01 PM