Derivative works — US Patent 8320575

Defensive Disclosure and Prior Art Derivations for US 8,320,575

Publication Date: May 8, 2026
Subject: Derivatives and extensions of methods for efficient audio signal processing in the sub-band regime. This document is intended to enter the public domain and serve as prior art for future inventions in this field.

Derivatives of Core Method (Claim 1)

The core method involves dividing an audio signal into sub-bands, excising a subset for computational efficiency, processing the remaining subset, reconstructing the excised subset, and synthesizing a final enhanced signal. The following are derivative works based on this core concept.

Axis 1: Component & Algorithm Substitution

Derivative 1.1: Wavelet Packet Decomposition for Non-Uniform Sub-Bands

Enabling Description: Instead of using uniform-bandwidth filter banks like DFT or FFT, the initial "dividing" step is performed using a Wavelet Packet Decomposition (WPD). This allows for a non-uniform division of the frequency spectrum, providing higher frequency resolution at lower frequencies and higher temporal resolution at higher frequencies, which better matches human auditory perception. A 5-level WPD with a Daubechies 8 (db8) mother wavelet is used to decompose the signal. The "excising" step then selectively prunes entire branches of the wavelet packet tree based on a perceptual entropy calculation, discarding sub-bands with minimal contribution to speech intelligibility. Reconstruction of the pruned branches is achieved using polyphase interpolation from adjacent parent and child nodes in the tree before the inverse WPD synthesis.

Mermaid.js Diagram:

graph TD
    A[Audio Input y(n)] --> B{Wavelet Packet Decomposition};
    B --> C1[Sub-band 1];
    B --> C2[Sub-band 2];
    B --> C...[...];
    B --> CN[Sub-band N];
    subgraph Excision
        C1 --> D1{Process};
        C2 --> E2[Excise];
        C... --> D...{Process};
        CN --> EN[Excise];
    end
    subgraph Processing & Reconstruction
        D1 --> F1[Enhanced Sub-band 1];
        D... --> F...[Enhanced Sub-band ...];
        F1 & F... --> G{Reconstruct Excised Bands};
        G --> H1[Reconstructed Sub-band 2];
        G --> HN[Reconstructed Sub-band N];
    end
    F1 & F... & H1 & HN --> I{Inverse Wavelet Packet Synthesis};
    I --> J[Enhanced Output s(n)];

Derivative 1.2: Neural Network-Based Spectral Reconstruction

Enabling Description: After excising a subset of sub-bands (e.g., every odd-indexed frequency bin from an STFT), the remaining sub-bands are processed for noise reduction. The reconstruction of the excised bands is performed by a Generative Adversarial Network (GAN). The generator network is a lightweight convolutional neural network (CNN) that takes the processed, sparse sub-band vector as input and outputs a fully dense, reconstructed vector. The discriminator network is trained to distinguish between original, complete sub-band vectors and the generator's reconstructed vectors. This system learns the statistical relationships between frequency bands, allowing it to generate highly accurate reconstructions that preserve harmonic structures, significantly outperforming linear or spline interpolation. The model is trained on a large corpus of speech data, such as the LibriSpeech dataset.

Mermaid.js Diagram:

sequenceDiagram
    participant A as Audio Input
    participant B as STFT
    participant C as Exciser
    participant D as Processor
    participant G as GAN Reconstructor
    participant S as iSTFT

    A->>B: Process Frame
    B->>C: Full Spectrum
    C->>D: Remaining Bands (Even)
    C-->>G: Remaining Bands (Even)
    D->>G: Enhanced Bands (Even)
    G->>S: Reconstructed Full Spectrum
    S->>A: Enhanced Audio Frame

Axis 2: Operational Parameter Expansion

Derivative 2.1: Massively Parallel Processing for Volumetric Acoustic Imaging

Enabling Description: The method is scaled to process data from a large-aperture spherical microphone array comprising 4096 individual MEMS microphones for real-time 3D acoustic imaging. The input is not a single audio signal but a 4096-channel data stream. The sub-band division is performed on a GPU using a batched CUDA-based FFT kernel. The excision is applied across both frequency and spatial domains; sub-bands corresponding to spatial sectors with energy below a dynamic threshold are excised to focus computation on active sound sources. Processing involves beamforming and dereverberation on the remaining sub-bands. Reconstruction uses a 4D spatio-spectral interpolation model. The system processes audio at 96kHz to resolve ultrasonic frequencies for detailed source localization, requiring a throughput in the teraflop range.

Mermaid.js Diagram:

graph LR
    subgraph Input Stage
        M1[Mic 1]
        M2[Mic 2]
        M...[Mic ...]
        M4096[Mic 4096]
    end
    subgraph GPU Processing Core
        A[Batched FFT]
        B[Spatio-Spectral Excision]
        C[Beamforming/Denoising]
        D[4D Interpolation/Reconstruction]
        E[Batched iFFT & Synthesis]
    end
    M1 -- Channel 1 --> A
    M4096 -- Channel 4096 --> A
    A --> B --> C --> D --> E
    E --> F[3D Acoustic Image Stream]

Derivative 2.2: Application in Infrasonic Geopolitical Monitoring

Enabling Description: The method is applied to signals from the International Monitoring System (IMS), which uses infrasound arrays to detect nuclear detonations. These signals are extremely low frequency (0.01 Hz to 10 Hz) and recorded over continental distances. The raw signal from an array is divided into narrow sub-bands using a high-resolution FFT (e.g., 2^24 points). Sub-bands known to contain persistent microbarom noise from oceanic wave interactions are excised. The remaining bands are processed using adaptive filters to enhance transient events. Reconstruction is critical to avoid artifacts that could be misinterpreted as treaty violations. The processed and reconstructed signals are synthesized to provide a clear signal for event analysis and source triangulation.

Mermaid.js Diagram:

stateDiagram-v2
    [*] --> Receiving: IMS Array Data (0.01-10Hz)
    Receiving --> SubBandDivision: Long-window FFT
    SubBandDivision --> Excision: Remove Microbarom Bands
    Excision --> Processing: Enhance Transients
    Processing --> Reconstruction: Interpolate Excised Bands
    Reconstruction --> Synthesis: Inverse FFT
    Synthesis --> Analysis: Event Detection & Triangulation
    Analysis --> [*]

Axis 3: Cross-Domain Application

Derivative 3.1: Accelerated Magnetic Resonance Imaging (MRI) Scans

Enabling Description: The method is applied to the raw k-space data acquired during an MRI scan. The k-space data, which is the 2D or 3D Fourier transform of the image, is treated as a signal. It is divided into sub-bands (regions of k-space). The "excising" step corresponds to a compressed sensing acquisition pattern, where peripheral, high-frequency k-space regions are deliberately under-sampled or skipped, drastically reducing scan time. The remaining acquired sub-bands (central k-space) are processed to correct for motion artifacts. The non-acquired, "excised" sub-bands are then reconstructed using an iterative algorithm (e.g., Total Variation minimization or a deep learning model) that leverages the sparsity of the underlying medical image. The full k-space is then synthesized via an inverse Fourier transform to produce the final diagnostic image.

Mermaid.js Diagram:

flowchart TD
    A[MRI RF Coil Signal] --> B{k-Space Acquisition};
    B -- Full Sampling --> C[Standard k-Space Data];
    B -- Compressed Sensing --> D[Sub-Sampled k-Space (Remaining Bands)];
    D --> E{Motion Correction};
    subgraph Image Reconstruction
        E --> F{Reconstruct Missing k-Space (Excised Bands)};
        F --> G[Full Reconstructed k-Space];
    end
    G --> H{Inverse Fourier Transform};
    H --> I[High-Resolution MR Image];

Derivative 3.2: High-Frequency Trading Data Compression and Analysis

Enabling Description: A time-series of stock market order book data is treated as a signal. The signal is divided into sub-bands using a Short-Time Fourier Transform (STFT) to create a spectrogram of market activity. The high-frequency sub-bands, representing algorithmic noise trading and fleeting quote changes, are excised to reduce the dataset size and computational load for trend analysis. The remaining lower-frequency sub-bands, representing more significant market movements, are processed using a Long Short-Term Memory (LSTM) network to predict price trends. The excised bands are then statistically reconstructed to provide a complete, but denoised, data stream for back-testing and risk modeling before being synthesized back into a time-series.

Mermaid.js Diagram:

sequenceDiagram
    participant T as Tick Data Stream
    participant W as Windowing & STFT
    participant E as High-Frequency Exciser
    participant L as LSTM Trend Predictor
    participant R as Statistical Reconstructor
    participant S as Synthesis

    T->>W: Ingest Market Data
    W->>E: Market Spectrogram
    E->>L: Low-Frequency Bands
    L-->>L: Predict Trend
    E->>R: Low-Frequency Bands
    R->>S: Denoised Full Spectrogram
    S->>T: Filtered Data for Trading Algo

Axis 4: Integration with Emerging Tech

Derivative 4.1: AI-Driven Adaptive Excision for Voice Assistants

Enabling Description: The method is integrated into the audio front-end of a smart speaker. A reinforcement learning (RL) agent dynamically controls the excision process. The agent’s state is defined by real-time acoustic parameters: Signal-to-Noise Ratio (SNR), presence of a keyword (e.g., "Alexa"), and number of active speakers. The agent's action is to select an excision pattern (e.g., excise 25%, 50%, or 75% of sub-bands) and a reconstruction algorithm (linear, spline, or neural). The reward function is R = w1 * (CPU_savings) - w2 * (1 - WER), where WER is the Word Error Rate from the downstream speech recognizer. This allows the device to conserve maximum power during idle listening but instantly allocate full processing fidelity when a keyword is detected, optimizing the trade-off between power consumption and recognition accuracy.

Mermaid.js Diagram:

graph TD
    A[Mic Audio] --> B{Feature Extraction};
    B --> C[State: SNR, Keyword, etc.];
    C --> D[RL Agent (Policy Network)];
    D -- Action: Excision % --> E{Sub-band Excision};
    A --> F{Divide into Sub-bands};
    F --> E;
    E --> G{Enhancement Processing};
    G --> H{Reconstruction};
    D -- Action: Recon. Algo --> H;
    H --> I{Synthesis};
    I --> J{ASR Engine};
    J -- Word Error Rate --> K{Reward Calculation};
    C --> K;
    K -- Reward --> D;

Derivative 4.2: IoT-Edge Data Reduction with Blockchain Anchoring

Enabling Description: In an industrial predictive maintenance system, a vibration sensor on a machine generates a continuous data stream. An edge computing device applies the '575 method. It divides the vibration signal into sub-bands. A pre-trained anomaly detection model identifies which sub-bands are exhibiting nominal behavior; these are excised. The remaining sub-bands, which may contain early fault signatures, are processed and transmitted to a cloud server. To ensure data integrity, a cryptographic hash of the remaining sub-band data, along with a bitmask representing the excision pattern, is computed and anchored to a private blockchain. This creates an immutable, auditable record of the sensor data, preventing tampering while reducing data transmission and storage costs by over 90%. The full signal can be reconstructed in the cloud for detailed analysis if a fault is confirmed.

Mermaid.js Diagram:

flowchart LR
    A[Vibration Sensor] --> B(Edge Device);
    subgraph B
        B1[Sub-band Division] --> B2{Anomaly Detection};
        B2 -- Nominal --> B3[Excise];
        B2 -- Anomaly --> B4[Process];
        B4 --> B5{Data Hashing};
        B3 & B4 --> B6[Assemble Payload];
    end
    B5 --> C((Blockchain Anchor));
    B6 --> D((Cloud Storage));
    D --> E{Reconstruction & Analysis};

Axis 5: The "Inverse" or Failure Mode

Derivative 5.1: Graceful Audio Degradation for Low-Power/High-CPU Load

Enabling Description: The method is implemented in a mobile telecommunication device's baseband processor. A system monitor continuously tracks CPU load and battery state of charge. When the battery drops below 20% or CPU load exceeds 95%, the system triggers a "low-power" audio mode. In this mode, the excision ratio is increased from 50% (every other sub-band) to 75% (keep one, excise three). The processing block (e.g., echo canceller) reduces its filter tap length by half. The reconstruction algorithm switches from a computationally expensive spline interpolator to a simple zero-order hold. This results in audibly lower fidelity (more "robotic" sound) but prevents the call from dropping or audio from breaking up entirely, ensuring mission-critical functionality under adverse conditions.

Mermaid.js Diagram:

stateDiagram-v2
    state "Normal Mode" as Normal {
        [*] --> Processing
        Processing: Excision=50%, Full-tap filter, Spline recon.
        Processing --> Processing: CPU < 95% AND Battery > 20%
    }
    state "Low Power Mode" as LowPower {
        [*] --> Degraded
        Degraded: Excision=75%, Half-tap filter, Zero-order recon.
        Degraded --> Degraded: CPU > 95% OR Battery < 20%
    }
    Normal --> LowPower: CPU > 95% OR Battery < 20%
    LowPower --> Normal: CPU < 95% AND Battery > 20%

Combination Prior Art Scenarios

Combination with WebRTC Standard: The core method is implemented as a new module within the WebRTC audio processing pipeline. The RTCPeerConnection API is extended with a degradationPreference attribute. When a developer sets this to "maintain-framerate", the browser, upon detecting network congestion via RTCP receiver reports, will trigger the excision/reconstruction process on the audio stream before it is fed to the Opus encoder. The excision pattern is signaled to the remote peer via a custom RTP header extension, allowing the receiver to perform a more informed reconstruction, thus maintaining a smooth, uninterrupted conversation at the cost of temporary fidelity reduction.
Combination with GStreamer Open-Source Framework: A GStreamer plugin named subexcise is created. It acts as an audio filter element that can be dynamically inserted into any GStreamer pipeline. The element exposes properties such as excision-ratio (a float from 0.0 to 1.0) and mode (e.g., 'odd-even', 'random', 'perceptual'). A user could construct a pipeline for adaptive streaming: rtspsrc ! rtph264depay ! avdec_h264 ! videoconvert ! autovideosink pulsesrc ! audioconvert ! subexcise excision-ratio=0.5 ! opusenc ! rtpopuspay ! udpsink. The excision-ratio could be controlled in real-time by an application monitoring system resources.
Combination with SOFA (Spatially Oriented Format for Acoustics) Standard: The method is used to create a lossy, compressed variant of the SOFA file format, tentatively named SFC (SOFA-Compressed). A utility, sofa2sfc, takes a standard SOFA file containing high-resolution Head-Related Transfer Functions (HRTFs). For each HRTF, it performs a sub-band analysis, excises perceptually masked frequency bands, and stores only the remaining bands along with metadata for the reconstruction algorithm. This reduces the file size of complex spatial audio scenes by 70-80%, enabling their use in web-based and mobile applications where bandwidth and storage are at a premium. The reconstruction to a full SOFA-compliant structure happens at load time.