Derivative works — US Patent 10783899

DEFENSIVE DISCLOSURE

Title: Derivative Methods and Systems for Probabilistic, Dynamically-Controlled Signal Suppression
Reference Patent: US 10783899 B2 ("Babble noise suppression")
Publication Date: May 8, 2026

This document discloses a series of derivative works, applications, and enhancements based on the core principles described in US patent 10783899. The purpose of this disclosure is to place these variations in the public domain, thereby establishing prior art against future patent claims for similar or incremental improvements.

Axis 1: Material & Component Substitution

Derivative 1.1: Neural Network-Based Soft Speech Detector

Enabling Description: The soft speech detector, which calculates a speech likelihood score, is implemented using a lightweight, time-delay neural network (TDNN) or a 1D convolutional neural network (CNN) instead of the kurtosis/cepstral feature combination. The network is trained on a large corpus of audio with and without speech in various babble noise conditions. The input to the network is a short-time Fourier transform (STFT) frame or a mel-spectrogram, and its output is a single scalar value between 0 and 1 representing the probability of foreground speech presence. This allows the detector to learn more complex and robust features of speech versus babble noise, improving accuracy over the heuristic statistical methods. The trained model weights are stored in non-volatile memory on the device.

Mermaid Diagram:

flowchart TD
    A[Audio Input Frame] --> B{1D-CNN/TDNN Model};
    B --> |Inference| C[Speech Probability Score (0-1)];
    C --> D{Noise Suppressor};
    D --> E[Processed Audio Frame];

Derivative 1.2: Log-MMSE Noise Suppressor

Enabling Description: The Wiener filter-based noise suppressor is replaced with a Logarithmic Minimum Mean-Square Error (Log-MMSE) spectral amplitude estimator. The dynamic noise overestimation factor β_oe(l) from the soft speech detector is used to directly modulate the a priori Signal-to-Noise Ratio (SNR) estimate within the Log-MMSE algorithm. When speech likelihood is low, β_oe(l) is high, which artificially inflates the estimated noise power, causing the Log-MMSE estimator to apply more aggressive suppression. This substitution provides less "musical noise" artifacting, which is a common problem with simple Wiener filters, resulting in a more natural-sounding residual background.

Mermaid Diagram:

sequenceDiagram
    participant SSD as Soft Speech Detector
    participant LMMSE as Log-MMSE Estimator
    participant Audio as Audio Stream

    Audio->>SSD: Input Frame
    SSD->>LMMSE: Speech Likelihood
    LMMSE->>LMMSE: Calculate a priori SNR
    LMMSE->>LMMSE: Modulate SNR with Likelihood^-1
    LMMSE->>Audio: Apply Log-MMSE Gain
    Audio-->>Audio: Output Denoised Frame

Derivative 1.3: FPGA-Based Hardware Implementation

Enabling Description: The entire system, including the soft speech detector and noise suppressor, is implemented on a Field-Programmable Gate Array (FPGA) or a custom Application-Specific Integrated Circuit (ASIC). The STFT, feature calculation (or neural net inference), dynamic factor computation, and spectral weighting application are all pipelined in the hardware logic. This enables real-time processing with microsecond-level latency, far exceeding the capabilities of a general-purpose CPU or DSP. This is critical for applications like in-ear communication devices for pilots or first responders where any delay is unacceptable. Power consumption is also significantly reduced, enabling battery-powered operation for extended periods.

Mermaid Diagram:

graph LR
subgraph FPGA Fabric
    A(ADC) --> B[STFT Engine];
    B --> C[Feature Extractor];
    C --> D[Probabilistic Classifier];
    D --> E[Overestimation Factor LUT];
    B --> F[Spectral Weighting Multiplier];
    E --> F;
    F --> G[ISTFT Engine];
end
G --> H(DAC);

style A fill:#f9f,stroke:#333,stroke-width:2px
style H fill:#f9f,stroke:#333,stroke-width:2px

Axis 2: Operational Parameter Expansion

Derivative 2.1: Ultrasonic Bioacoustic Filtering

Enabling Description: The system is adapted for the ultrasonic frequency range (e.g., 20 kHz to 200 kHz) to study animal vocalizations. The "foreground speech" is the specific call of a target species (e.g., a bat's echolocation pulse), while the "babble noise" is the cacophony of other animal calls, insects, and environmental noise in the same frequency band. The STFT window size and feature extraction parameters are scaled accordingly. The soft detector is trained to identify the unique spectro-temporal signature of the target species' call. This allows researchers to isolate specific animal communications from dense acoustic environments for population studies or behavioral analysis.

Mermaid Diagram:

stateDiagram-v2
    [*] --> Idle
    Idle --> Listening: High-frequency audio stream starts
    Listening --> Processing: Signal energy exceeds threshold
    Processing --> Listening: Target species call probability < 0.5
    Processing --> Target_Call_Isolated: Target species call probability >= 0.5
    Target_Call_Isolated --> Listening: Call ends
    Target_Call_Isolated: Dynamic suppression of non-target ultrasonic noise
    Listening: Low-power monitoring state

Derivative 2.2: Cryogenic Sensor Denoising

Enabling Description: The technology is applied to denoise signals from scientific instruments operating at cryogenic temperatures (e.g., below 77 Kelvin), such as superconducting quantum interference devices (SQUIDs) or radio astronomy receivers. The "foreground speech" is the faint, transient signal of interest (e.g., a single photon detection event), while the "babble noise" is a combination of thermal noise and interference from control electronics. The soft detector uses features sensitive to the expected quantum signal signature (e.g., specific rise times and energy distributions) to differentiate it from random thermal fluctuations. The dynamic suppressor aggressively filters the baseline noise between events, significantly improving the instrument's signal-to-noise ratio.

Mermaid Diagram:

flowchart TD
    A[Cryogenic Sensor Signal] --> B{STFT};
    B --> C{Quantum Signature Detector (Soft)};
    C -- Probability --> D{Dynamic Noise Floor Adjuster};
    D -- Adjusted Noise Profile --> E{MMSE Suppressor};
    B -- Noisy Spectrum --> E;
    E --> F{ISTFT};
    F --> G[Cleaned Event Data];

Axis 3: Cross-Domain Application

Derivative 3.1: Aerospace Cockpit Communication Enhancement

Enabling Description: The system is integrated into an active noise-cancellation headset for aircraft pilots. The "foreground speech" is the pilot's own voice into the microphone and critical ATC communications. The "babble noise" is the high-decibel, wide-spectrum noise of the engines, wind, and non-critical radio chatter. The soft speech detector uses the speech probability score to control both the noise suppression on the outgoing microphone signal and the active noise cancellation profile of the earpieces. During speech pauses, suppression is maximized to reduce fatigue. When speech is detected, suppression is relaxed to ensure clarity and prevent distortion of critical commands.

Mermaid Diagram:

classDiagram
    class PilotHeadset {
        +Microphone mic
        +Speaker speaker
        +Processor dsp
        +processAudio()
    }
    class SoftSpeechDetector {
        +getSpeechProbability(frame) float
    }
    class ANCEngine {
        -suppressionLevel: float
        +setAggressiveness(level)
        +generateAntiNoise(frame)
    }
    class CommFilter {
        -overestimationFactor: float
        +setAggressiveness(factor)
        +filterOutgoingSignal(frame)
    }
    PilotHeadset --> SoftSpeechDetector : uses
    PilotHeadset --> ANCEngine : controls
    PilotHeadset --> CommFilter : controls

Derivative 3.2: AgTech Livestock Health Monitoring

Enabling Description: A low-power acoustic sensor network is deployed in a large-scale poultry or swine facility. Each sensor runs the babble suppression algorithm at the edge. The "foreground speech" is a specific set of acoustic biomarkers for disease, such as a particular type of cough or wheeze indicative of respiratory illness. The "babble noise" is the standard background noise of thousands of healthy animals. The soft detector is trained on audio signatures of these biomarkers. When the probability of a biomarker sound exceeds a threshold, the system sends a high-fidelity, denoised audio snippet and an alert to a central management system for veterinary analysis, enabling early disease detection across a large population.

Mermaid Diagram:

sequenceDiagram
    participant Sensor
    participant Cloud
    participant Veterinarian

    loop Continuous Monitoring
        Sensor->>Sensor: Capture audio from barn
        Sensor->>Sensor: Apply babble suppression
        Sensor->>Sensor: Calculate health biomarker probability
    end

    alt Biomarker probability > Threshold
        Sensor->>Cloud: Upload denoised audio & alert
        Cloud->>Veterinarian: Push notification
    end

Derivative 3.3: FinTech Trading Floor Compliance

Enabling Description: The system is deployed in a real-time compliance analysis platform for financial trading floors. It processes audio from turret microphones that capture multiple traders speaking simultaneously. For a given trader's channel, their voice is the "foreground speech" and the voices of all other traders and background noise is the "babble." The system isolates the primary trader's speech with high fidelity by aggressively suppressing the surrounding babble, especially during their speech pauses. The cleaned audio stream is fed into a speech-to-text engine that flags keywords related to policy violations or market manipulation, creating a reliable record for regulatory compliance.

Mermaid Diagram:

flowchart LR
    A[Trading Floor Audio] --> B{Multi-Channel Input};
    B -- Channel 1 --> C1[Babble Suppression (Trader 1)];
    B -- Channel 2 --> C2[Babble Suppression (Trader 2)];
    B -- Channel N --> CN[Babble Suppression (Trader N)];
    C1 --> D1{Speech-to-Text};
    C2 --> D2{Speech-to-Text};
    CN --> DN{Speech-to-Text};
    D1 & D2 & DN --> E{Compliance Keyword Analysis};
    E --> F[Alerting & Logging];

Axis 4: Integration with Emerging Tech

Derivative 4.1: Reinforcement Learning for Suppression Policy

Enabling Description: The fixed mapping between speech probability and the noise overestimation factor is replaced by a policy network optimized via Reinforcement Learning (RL). The RL agent's state is a vector of audio features (including speech probability, SNR, noise type). Its action is to select a value for the overestimation factor from a discrete set. The reward function is a weighted sum of an objective speech quality metric (e.g., PESQ) and a noise reduction score, calculated against a "clean speech" reference during training. Over millions of iterations, the agent learns a sophisticated policy that adapts the suppression aggressiveness not just to speech presence, but to the specific type and level of background noise, outperforming any static, hand-tuned function.

Mermaid Diagram:

graph TD
    subgraph RL_Training_Loop
        A[State: Audio Features] --> B{RL Agent (Policy Network)};
        B -- Action: Set β_oe --> C[Noise Suppressor];
        C -- Processed Audio --> D{Reward Calculation};
        A -- Original Audio --> D;
        D -- Reward Signal --> B;
    end
    B -- Export --> E[Optimized Policy Model];
    subgraph Deployment
        F[Live Audio Features] --> E;
        E -- Optimal β_oe --> G[Live Noise Suppressor];
    end

Derivative 4.2: IoT-Enabled Adaptive Denoising

Enabling Description: The system is embedded in a network of IoT smart speakers. Each speaker uses the soft speech detector to classify the acoustic environment. When babble noise is detected, it not only activates the local suppression algorithm but also signals this environmental state to neighboring IoT devices over a mesh network (e.g., Zigbee, Thread). A central IoT hub can then use this information to create a real-time "noise map" of a building, and other devices can pre-emptively adjust their microphone gain or suppression parameters before a user starts talking, leading to a more seamless interaction. The speech probability score itself is transmitted as lightweight metadata.

Mermaid Diagram:

erDiagram
    IOT_DEVICE ||--o{ SENSOR_READING : has
    IOT_DEVICE {
        string deviceId PK
        string location
    }
    SENSOR_READING {
        datetime timestamp PK
        string deviceId FK
        float speech_probability
        string noise_type
    }
    IOT_HUB ||--|{ IOT_DEVICE : manages
    IOT_HUB {
        string hubId PK
        object noise_map
    }

Axis 5: The "Inverse" or Failure Mode

Derivative 5.1: Graceful Degradation Mode

Enabling Description: The system includes a real-time CPU load monitor. If the processing load exceeds a predefined threshold (e.g., 85% for more than 500ms), the system enters a "graceful degradation" mode. In this mode, the complex Log-MMSE or Wiener filter is replaced by a simple spectral gate with a fixed threshold. The soft speech detector is also simplified, using only a single, computationally cheap feature like frame energy. This ensures that the audio stream is never dropped and a basic level of noise reduction is maintained, even under heavy system load, preventing catastrophic failure in mission-critical applications. The system returns to high-fidelity mode once the load subsides.

Mermaid Diagram:

stateDiagram-v2
    state High_Fidelity {
        [*] --> Active
        Active: Full soft-detection & spectral suppression
    }
    state Low_Fidelity {
        [*] --> Active
        Active: Simple energy VAD & spectral gate
    }

    [*] --> High_Fidelity
    High_Fidelity --> Low_Fidelity: CPU Load > 85%
    Low_Fidelity --> High_Fidelity: CPU Load < 70%

Derivative 5.2: Active Speech Obfuscation for Privacy

Enabling Description: An "inverse" implementation for user privacy in smart devices. When the device is not activated by its wake-word, the system runs in a privacy-preserving mode. The soft speech detector continually monitors for human speech. If speech is detected (i.e., the user is having a conversation nearby), the noise suppressor is reconfigured to act as a speech obfuscator. Instead of suppressing the background, it uses the speech probability score to control a filter that aggressively distorts or removes the spectral components identified as speech, while leaving the background noise intact. This ensures that any inadvertently buffered audio contains no intelligible human speech, providing a strong guarantee of privacy.

Mermaid Diagram:

flowchart TD
    A[Ambient Audio] --> B{Soft Speech Detector};
    B -- Speech Prob. > 0.7 --> C{Speech Obfuscation Module};
    C -- Distorted Speech Spectrum --> E{Buffer};
    B -- Speech Prob. <= 0.7 --> D[No Action];
    D --> E;
    E -.-> F((Wake Word Engine));

Combination Prior Art with Open-Source Standards

WebRTC Integration with RTCP Metadata: The babble suppression system is defined as a standard processing block within the WebRTC audio pipeline. The frame-by-frame speech probability score from the soft detector is embedded into custom RTCP packets (e.g., using the APP packet type). The receiving client can use this metadata to understand the acoustic conditions at the sender's end, dynamically adjust its jitter buffer, or inform the user interface that the other party is in a noisy environment.
Kaldi ASR Toolkit Pre-processor: The system is integrated as a front-end processing script for the Kaldi open-source speech recognition toolkit. The script takes raw audio, applies the dynamic babble suppression, and pipes the cleaned audio to Kaldi's feature extraction (compute-mfcc-feats). The speech probability score is written to a separate file, which is then used by Kaldi's VAD tools (compute-vad) to generate more reliable speech/non-speech labels, improving endpointing and overall recognition accuracy in noisy conditions.
VST/LV2 Plugin for Digital Audio Workstations: The entire method is packaged as an open-source LV2 audio plugin, compatible with digital audio workstations like Audacity, Ardour, and Reaper. The plugin exposes the core parameters to the user via a graphical interface: β_max (maximum overestimation), sensitivity of the probability-to-overestimation mapping, and a choice of soft-detector features (e.g., Kurtosis-based, Cepstrum-based, or a lightweight NN model). This allows audio engineers and forensic analysts to apply and fine-tune babble suppression on any audio track.