Derivative works — US Patent 11087750

DEFENSIVE DISCLOSURE AND PRIOR ART PUBLICATION

Title: Derivative Methods and Apparatus for Acoustic Command Detection
Publication Date: April 26, 2026
Reference Patent: US 11,087,750 ("the '750 patent")
Description: The following technical disclosures are intended to enter the public domain to be considered prior art for any future patent applications in the field of acoustic command detection, voice processing, and human-computer interaction. These disclosures describe novel extensions, applications, and variations derived from the core concepts taught in US 11,087,750.

Part 1: Derivatives of Trigger-less Voice Command Detection (Relates to Claims 1 & 8 of the '750 patent)

Derivative 1.1: Sterile Field Surgical Environment Control

Axis: Cross-Domain Application (Medical Devices)
Enabling Description: An apparatus for controlling surgical equipment (e.g., endoscopic cameras, cauterization tools, information displays) via trigger-less voice commands within a sterile operating room. The system utilizes an array of directional, far-field microphones focused on the lead surgeon. The acoustic models are specifically trained to distinguish command phrases from normal intra-operative conversation and medical terminology using semantic analysis. The system rejects any acoustic input originating from outside a pre-defined "sterile zone" around the operating table, using time-delay-of-arrival (TDOA) calculations from the microphone array for source localization. The command grammar is constrained to prevent accidental activation of critical equipment; for instance, a command to increase power to a cauterization device would require a two-part command structure with a confirmation phrase.
```
graph TD
    subgraph Operating Room
        A[Mic Array] -->|Acoustic Stream| B(Signal Processor);
        B -->|Source Localization| C{Sterile Zone?};
        C -->|Yes| D[Semantic Command Classifier];
        C -->|No| E[Discard];
        D -->|Valid Command| F(Surgical Device Interface);
        F --> G[Endoscope Control];
        F --> H[Display Control];
    end
```

Derivative 1.2: Predictive Intent Command Initiation using Edge AI

Axis: Integration with Emerging Tech (AI)
Enabling Description: A method wherein a mobile device uses a generative large language model (LLM) running on an edge AI accelerator (e.g., a Neural Processing Unit) to predict user intent and pre-stage a response before a voice command is fully articulated. The system analyzes the initial phonemes of an utterance in conjunction with a rich set of contextual cues (see Part 2). For example, if the user picks up the phone (motion cue) at 6:00 PM (time cue) near their home (location cue) and utters "Wha-," the system predicts a high probability of the query "What's the traffic like on my way home?" It then pre-fetches traffic data from a server. If the user completes the predicted command, the response is delivered instantly. If the command diverges, the pre-fetched data is discarded and the actual command is processed. This reduces perceived latency.
```
sequenceDiagram
    actor User
    participant Device
    participant Edge_AI_NPU
    participant Server

    User->>+Device: Speaks "Wha..."
    Device->>Edge_AI_NPU: Phonemes + Context Cues
    Edge_AI_NPU-->>Device: Predicts "What's traffic?" (Prob: 0.85)
    Device->>+Server: Pre-fetch traffic data
    Server-->>-Device: Traffic data
    User->>Device: Finishes command: "...t's the traffic?"
    Device-->>-User: Instantly displays traffic data
```

Derivative 1.3: Infrasonic Industrial Machinery Control

Axis: Operational Parameter Expansion (Frequency)
Enabling Description: A method for detecting voice commands in extreme-noise industrial environments (e.g., stamping plants, engine rooms) by transposing human speech into the infrasonic frequency range (1-20 Hz). The user speaks into a noise-canceling microphone headset, which modulates the speech onto an infrasonic carrier wave transmitted via a dedicated transducer. A receiver on the industrial controller is tuned to this infrasonic channel, which is significantly below the frequency spectrum of ambient machine noise. The receiver demodulates the signal back into an audible frequency for processing by the voice command detection engine. This effectively isolates the command signal from environmental noise interference.
```
graph TD
    A[User Speech] --> B(Noise-Canceling Mic);
    B --> C(Infrasonic Modulator);
    C --> D((Transducer));
    subgraph High-Noise Environment
        D -- Infrasonic Wave (5 Hz) --> E((Receiver));
    end
    E --> F(Infrasonic Demodulator);
    F --> G(Voice Command Processor);
    G --> H[Machine Control Unit];
```

Derivative 1.4: Bone-Conducted Command Detection via Piezoelectric Sensors

Axis: Material & Component Substitution
Enabling Description: A system that replaces microphones with an array of piezoelectric film sensors integrated into the chassis of a wearable device (e.g., smart glasses, helmet). These sensors detect voice commands as mechanical vibrations conducted through the user's skull (bone conduction). The system processes these vibration signatures instead of acoustic waves. This method is immune to high levels of ambient airborne noise and is effective for covert operations or when the user is wearing breathing apparatus. The processing unit uses a machine learning model trained on the unique spectral patterns of bone-conducted vibrations, which differ from airborne acoustics.
```
stateDiagram-v2
    [*] --> Idle
    Idle --> Listening: Vibration Detected
    Listening --> Processing: Vibration pattern matches command signature
    Processing --> Executing: Command validated
    Executing --> Idle: Task complete
    Processing --> Idle: Invalid command
    Listening --> Idle: No command signature
```

Derivative 1.5: Contextual Command Muting ("Acoustic Cloaking")

Axis: The "Inverse" or Failure Mode
Enabling Description: A method for intelligently disabling trigger-less command detection based on contextual cues indicating a high probability that any detected speech is not intended for the device. The system integrates with the user's calendar, GPS, and ambient audio analysis. If the device detects it is in a location tagged as "Movie Theater" or an event on the calendar is marked "Confidential Meeting," it enters a "Muted" state, ignoring all voice input. Furthermore, if the system's VAD detects multiple distinct speakers in a short time frame (indicating a conversation), it will temporarily lower the confidence threshold required to initiate a command, effectively requiring a more explicit and clear utterance to activate.
```
flowchart LR
    A[Acoustic Input] --> B{Context Check};
    subgraph Context Sources
        C[Calendar]
        D[GPS Location]
        E[Multi-Speaker VAD]
    end
    C & D & E --> B;
    B -- Context is Private/Public --> F[Enter Muted State];
    F --> G[Ignore Input];
    B -- Context is Permissive --> H[Process Input Normally];
```

Part 2: Derivatives of Low-Power Detection with Contextual Cues (Relates to Claims 15 & 22 of the '750 patent)

Derivative 2.1: Hyper-Contextual IoT Sensor Fusion

Axis: Integration with Emerging Tech (IoT)
Enabling Description: A method where the mobile device acts as a central command unit, fusing contextual data from a distributed network of low-power IoT sensors via a mesh protocol (e.g., Thread, Zigbee). These sensors provide real-time data on room occupancy (PIR sensors), ambient light levels, temperature, air pressure, and even specific device states (e.g., a smart TV is on). This hyper-contextual data stream is used to dynamically adjust the acoustic processing pipeline. For example, if occupancy sensors indicate the user is alone and the TV is off, the system uses a highly sensitive acoustic model. If multiple people are present and the TV is on, it switches to a noise-robust model and uses TDOA to focus only on the device owner's voiceprint.
```
classDiagram
    MobileDevice {
        +processAcousticInput(audio)
        -activeAcousticModel
        -updateContext(contextData)
    }
    IoTSensor <|-- PIRSensor
    IoTSensor <|-- LightSensor
    IoTSensor <|-- TVStateSensor
    MobileDevice o-- "many" IoTSensor : Fuses data from
    class IoTSensor{
        <<interface>>
        +readData()
    }
```

Derivative 2.2: Livestock Biometric and Environmental Monitoring

Axis: Cross-Domain Application (AgTech)
Enabling Description: A system for monitoring livestock health where an animal-worn sensor package (e.g., an ear tag or collar) uses the principles of low-power contextual detection. The device remains in a sleep state, conserving power. It is activated by a combination of acoustic and contextual cues. For example, it wakes upon detecting a specific vocalization pattern (acoustic cue) that co-occurs with a sudden spike in body temperature and a drop in motion (contextual cues from integrated biometric sensors). This event could signify distress. Upon waking, the device records a longer audio snippet and transmits it, along with the contextual data, to a central farm management system for analysis by a veterinarian.
```
stateDiagram-v2
    state "Low-Power Sleep" as Sleep
    state "Active Monitoring" as Active
    [*] --> Sleep
    Sleep --> Active: (Acoustic Distress) AND (High Temp) AND (Low Motion)
    Active --> Transmitting: Data packet compiled
    Transmitting --> Sleep: Transmission complete
    Active --> Sleep: Timeout / False Trigger
```

Derivative 2.3: Multi-Source Energy Harvesting for Contextual Sensors

Axis: Material & Component Substitution
Enabling Description: A mobile device where the low-power processor and its associated contextual sensors (accelerometer, GPS, etc.) are powered independently from the main battery by a dedicated multi-source energy harvesting module. This module comprises a photovoltaic film on the device's surface, a piezoelectric generator to convert kinetic energy from motion, and an RF energy harvester to capture power from ambient Wi-Fi and cellular signals. A power management IC (PMIC) selects the optimal energy source or combines sources to charge a supercapacitor. This ensures that the "always-on" contextual awareness function can operate indefinitely without draining the main battery, only drawing from it when the main processor must be engaged.
```
graph TD
    subgraph Energy Sources
        A[Photovoltaic Film]
        B[Piezoelectric Generator]
        C[RF Harvester]
    end
    subgraph Power Management
        A & B & C --> D[PMIC];
        D --> E[Supercapacitor];
    end
    subgraph Low-Power Domain
        E --> F(Low-Power CPU);
        F --> G[Context Sensors];
    end
    H[Main Battery] --> I(Main CPU);
    F -- Wake-up signal --> I;
```

Part 3: Derivatives of Two-Processor System Architecture (Relates to Claims 29 & 36 of the '750 patent)

Derivative 3.1: Neuromorphic First-Stage "Cochlea" Processor

Axis: Component Substitution
Enabling Description: An apparatus where the first, low-power processor is a neuromorphic spiking neural network (SNN) chip designed to mimic the function of the human cochlea. This processor receives the raw microphone data and converts it into a sparse, event-based stream of neural spikes. It is exceptionally power-efficient for continuous monitoring. It performs rudimentary voice activity detection and phoneme classification. Only when the spike patterns indicate a high probability of human speech with command-like intonation does it wake the second processor, a conventional CPU, and passes the recognized phoneme stream (not the raw audio) for full speech recognition and natural language understanding.
```
sequenceDiagram
    participant Mic
    participant Neuromorphic_SNN
    participant Main_CPU
    Mic->>+Neuromorphic_SNN: Continuous Audio Waveform
    loop Always-On Low-Power
        Neuromorphic_SNN->>Neuromorphic_SNN: Processes audio into spikes
    end
    Neuromorphic_SNN->>Main_CPU: Wake-up + Phoneme Stream
    activate Main_CPU
    Main_CPU->>Main_CPU: ASR and NLU
    Main_CPU-->>-Neuromorphic_SNN: Return to sleep
    deactivate Main_CPU
```

Derivative 3.2: Blockchain-Verified Command Handoff

Axis: Integration with Emerging Tech (Blockchain)
Enabling Description: A method for high-security applications where the two-processor system creates an immutable, non-repudiable record of commands. The first low-power processor detects a potential voice command and a voiceprint from the user. It generates a hash of the preliminary acoustic features and the user's voiceprint. Upon waking the second, more powerful processor, it passes this hash along with the full acoustic data. The second processor performs full command recognition and, upon successful execution, writes a transaction to a private blockchain. The transaction includes the initial hash, the final recognized command text, a timestamp, and is cryptographically signed using a key stored in a secure enclave. This creates a verifiable audit trail for commands used in financial transactions or physical access control.
```
flowchart TD
    A[Acoustic Input] --> B(Low-Power Processor);
    B --> C[Extract Voiceprint + Acoustic Features];
    C --> D[Generate Hash_1];
    D --> E{Wake Main Processor?};
    E -- Yes --> F(Main Processor);
    B -- Acoustic Data --> F;
    D -- Hash_1 --> F;
    F --> G[Full ASR + NLU];
    G --> H{Execute Command};
    H -- Success --> I[Create Blockchain Transaction];
    I -- (Hash_1, Command Text, Timestamp) --> J((Private Ledger));
```

Derivative 3.3: Distributed Asynchronous Processing for Aerospace

Axis: Cross-Domain Application (Aerospace)
Enabling Description: In an aircraft cockpit, the "first processor" is a network of simple, redundant Digital Signal Processors (DSPs), each connected to a microphone in a different location (e.g., pilot's helmet, co-pilot's headset, ambient cabin). These DSPs perform noise cancellation and VAD locally. The "second processor" is a centralized, fault-tolerant flight control computer. When any DSP detects speech, it transmits a compressed feature vector to the central computer. The central computer uses inputs from multiple DSPs to triangulate the speaker's location and identity, fuse the feature vectors to improve recognition accuracy in high-G and high-vibration environments, and validate the command against the current flight state before execution.
```
graph TD
    subgraph Distributed DSPs (First Processor Stage)
        DSP1[Pilot Mic DSP]
        DSP2[Co-Pilot Mic DSP]
        DSP3[Cabin Mic DSP]
    end
    subgraph Central Computer (Second Processor Stage)
        FC[Flight Control Computer]
    end
    DSP1 -- Feature Vector --> FC
    DSP2 -- Feature Vector --> FC
    DSP3 -- Feature Vector --> FC
    FC -->|Fuse Data & Validate| EXEC[Execute Flight Command]
```

Part 4: Combination Prior Art with Open-Source Standards

Combination with RISC-V and TensorFlow Lite: A mobile device is disclosed wherein the first, low-power processor (as in claim 29) is implemented as a custom System-on-Chip (SoC) utilizing the open-source RISC-V instruction set architecture. The core is specifically designed with custom instructions to accelerate the matrix multiplication and convolution operations common in audio feature extraction (e.g., MFCCs). The voice activity detection (VAD) and limited keyword spotting models running on this processor are built and optimized using the open-source TensorFlow Lite for Microcontrollers framework. This allows for a fully open-source, highly-optimized hardware/software stack for the first stage of acoustic processing, minimizing power and licensing costs.
Combination with Matter Protocol for IoT Control: The trigger-less voice command detection system (as in claim 1) is integrated as a feature in a smart home hub. When the system detects a command like "It's too dark in the living room," the NLU module on the hub's main processor interprets the intent. It then translates this intent into a standardized command using the open-source Matter application layer protocol. It sends a "LevelControl: MoveToLevel" command to the Zigbee/Thread address of the lighting group designated as "living room," demonstrating a seamless integration of natural language voice control with an open smart home standard.
Combination with the ONNX (Open Neural Network Exchange) Standard: A method is disclosed where the acoustic models used for voice command detection (both on the low-power and main processors) are stored and deployed using the open-source ONNX format. This decouples the model training pipeline from the specific hardware inference engine. Models can be trained in any popular framework (e.g., PyTorch, JAX) and exported to ONNX. The device's runtime environment can then use an ONNX-compatible inference engine (e.g., ONNX Runtime) that is optimized for its specific silicon (e.g., a Qualcomm Hexagon DSP or an Apple Neural Engine), allowing for flexible and portable deployment of state-of-the-art acoustic models without vendor lock-in.