Patent 11087750
Derivative works
Defensive disclosure: derivative variations of each claim designed to render future incremental improvements obvious or non-novel.
Active provider: Google · gemini-2.5-pro
Derivative works
Defensive disclosure: derivative variations of each claim designed to render future incremental improvements obvious or non-novel.
DEFENSIVE DISCLOSURE AND PRIOR ART PUBLICATION
Title: Derivative Methods and Apparatus for Acoustic Command Detection
Publication Date: April 26, 2026
Reference Patent: US 11,087,750 ("the '750 patent")
Description: The following technical disclosures are intended to enter the public domain to be considered prior art for any future patent applications in the field of acoustic command detection, voice processing, and human-computer interaction. These disclosures describe novel extensions, applications, and variations derived from the core concepts taught in US 11,087,750.
Part 1: Derivatives of Trigger-less Voice Command Detection (Relates to Claims 1 & 8 of the '750 patent)
Derivative 1.1: Sterile Field Surgical Environment Control
Axis: Cross-Domain Application (Medical Devices)
Enabling Description: An apparatus for controlling surgical equipment (e.g., endoscopic cameras, cauterization tools, information displays) via trigger-less voice commands within a sterile operating room. The system utilizes an array of directional, far-field microphones focused on the lead surgeon. The acoustic models are specifically trained to distinguish command phrases from normal intra-operative conversation and medical terminology using semantic analysis. The system rejects any acoustic input originating from outside a pre-defined "sterile zone" around the operating table, using time-delay-of-arrival (TDOA) calculations from the microphone array for source localization. The command grammar is constrained to prevent accidental activation of critical equipment; for instance, a command to increase power to a cauterization device would require a two-part command structure with a confirmation phrase.
graph TD subgraph Operating Room A[Mic Array] -->|Acoustic Stream| B(Signal Processor); B -->|Source Localization| C{Sterile Zone?}; C -->|Yes| D[Semantic Command Classifier]; C -->|No| E[Discard]; D -->|Valid Command| F(Surgical Device Interface); F --> G[Endoscope Control]; F --> H[Display Control]; end
Derivative 1.2: Predictive Intent Command Initiation using Edge AI
Axis: Integration with Emerging Tech (AI)
Enabling Description: A method wherein a mobile device uses a generative large language model (LLM) running on an edge AI accelerator (e.g., a Neural Processing Unit) to predict user intent and pre-stage a response before a voice command is fully articulated. The system analyzes the initial phonemes of an utterance in conjunction with a rich set of contextual cues (see Part 2). For example, if the user picks up the phone (motion cue) at 6:00 PM (time cue) near their home (location cue) and utters "Wha-," the system predicts a high probability of the query "What's the traffic like on my way home?" It then pre-fetches traffic data from a server. If the user completes the predicted command, the response is delivered instantly. If the command diverges, the pre-fetched data is discarded and the actual command is processed. This reduces perceived latency.
sequenceDiagram actor User participant Device participant Edge_AI_NPU participant Server User->>+Device: Speaks "Wha..." Device->>Edge_AI_NPU: Phonemes + Context Cues Edge_AI_NPU-->>Device: Predicts "What's traffic?" (Prob: 0.85) Device->>+Server: Pre-fetch traffic data Server-->>-Device: Traffic data User->>Device: Finishes command: "...t's the traffic?" Device-->>-User: Instantly displays traffic data
Derivative 1.3: Infrasonic Industrial Machinery Control
Axis: Operational Parameter Expansion (Frequency)
Enabling Description: A method for detecting voice commands in extreme-noise industrial environments (e.g., stamping plants, engine rooms) by transposing human speech into the infrasonic frequency range (1-20 Hz). The user speaks into a noise-canceling microphone headset, which modulates the speech onto an infrasonic carrier wave transmitted via a dedicated transducer. A receiver on the industrial controller is tuned to this infrasonic channel, which is significantly below the frequency spectrum of ambient machine noise. The receiver demodulates the signal back into an audible frequency for processing by the voice command detection engine. This effectively isolates the command signal from environmental noise interference.
graph TD A[User Speech] --> B(Noise-Canceling Mic); B --> C(Infrasonic Modulator); C --> D((Transducer)); subgraph High-Noise Environment D -- Infrasonic Wave (5 Hz) --> E((Receiver)); end E --> F(Infrasonic Demodulator); F --> G(Voice Command Processor); G --> H[Machine Control Unit];
Derivative 1.4: Bone-Conducted Command Detection via Piezoelectric Sensors
Axis: Material & Component Substitution
Enabling Description: A system that replaces microphones with an array of piezoelectric film sensors integrated into the chassis of a wearable device (e.g., smart glasses, helmet). These sensors detect voice commands as mechanical vibrations conducted through the user's skull (bone conduction). The system processes these vibration signatures instead of acoustic waves. This method is immune to high levels of ambient airborne noise and is effective for covert operations or when the user is wearing breathing apparatus. The processing unit uses a machine learning model trained on the unique spectral patterns of bone-conducted vibrations, which differ from airborne acoustics.
stateDiagram-v2 [*] --> Idle Idle --> Listening: Vibration Detected Listening --> Processing: Vibration pattern matches command signature Processing --> Executing: Command validated Executing --> Idle: Task complete Processing --> Idle: Invalid command Listening --> Idle: No command signature
Derivative 1.5: Contextual Command Muting ("Acoustic Cloaking")
Axis: The "Inverse" or Failure Mode
Enabling Description: A method for intelligently disabling trigger-less command detection based on contextual cues indicating a high probability that any detected speech is not intended for the device. The system integrates with the user's calendar, GPS, and ambient audio analysis. If the device detects it is in a location tagged as "Movie Theater" or an event on the calendar is marked "Confidential Meeting," it enters a "Muted" state, ignoring all voice input. Furthermore, if the system's VAD detects multiple distinct speakers in a short time frame (indicating a conversation), it will temporarily lower the confidence threshold required to initiate a command, effectively requiring a more explicit and clear utterance to activate.
flowchart LR A[Acoustic Input] --> B{Context Check}; subgraph Context Sources C[Calendar] D[GPS Location] E[Multi-Speaker VAD] end C & D & E --> B; B -- Context is Private/Public --> F[Enter Muted State]; F --> G[Ignore Input]; B -- Context is Permissive --> H[Process Input Normally];
Part 2: Derivatives of Low-Power Detection with Contextual Cues (Relates to Claims 15 & 22 of the '750 patent)
Derivative 2.1: Hyper-Contextual IoT Sensor Fusion
Axis: Integration with Emerging Tech (IoT)
Enabling Description: A method where the mobile device acts as a central command unit, fusing contextual data from a distributed network of low-power IoT sensors via a mesh protocol (e.g., Thread, Zigbee). These sensors provide real-time data on room occupancy (PIR sensors), ambient light levels, temperature, air pressure, and even specific device states (e.g., a smart TV is on). This hyper-contextual data stream is used to dynamically adjust the acoustic processing pipeline. For example, if occupancy sensors indicate the user is alone and the TV is off, the system uses a highly sensitive acoustic model. If multiple people are present and the TV is on, it switches to a noise-robust model and uses TDOA to focus only on the device owner's voiceprint.
classDiagram MobileDevice { +processAcousticInput(audio) -activeAcousticModel -updateContext(contextData) } IoTSensor <|-- PIRSensor IoTSensor <|-- LightSensor IoTSensor <|-- TVStateSensor MobileDevice o-- "many" IoTSensor : Fuses data from class IoTSensor{ <<interface>> +readData() }
Derivative 2.2: Livestock Biometric and Environmental Monitoring
Axis: Cross-Domain Application (AgTech)
Enabling Description: A system for monitoring livestock health where an animal-worn sensor package (e.g., an ear tag or collar) uses the principles of low-power contextual detection. The device remains in a sleep state, conserving power. It is activated by a combination of acoustic and contextual cues. For example, it wakes upon detecting a specific vocalization pattern (acoustic cue) that co-occurs with a sudden spike in body temperature and a drop in motion (contextual cues from integrated biometric sensors). This event could signify distress. Upon waking, the device records a longer audio snippet and transmits it, along with the contextual data, to a central farm management system for analysis by a veterinarian.
stateDiagram-v2 state "Low-Power Sleep" as Sleep state "Active Monitoring" as Active [*] --> Sleep Sleep --> Active: (Acoustic Distress) AND (High Temp) AND (Low Motion) Active --> Transmitting: Data packet compiled Transmitting --> Sleep: Transmission complete Active --> Sleep: Timeout / False Trigger
Derivative 2.3: Multi-Source Energy Harvesting for Contextual Sensors
Axis: Material & Component Substitution
Enabling Description: A mobile device where the low-power processor and its associated contextual sensors (accelerometer, GPS, etc.) are powered independently from the main battery by a dedicated multi-source energy harvesting module. This module comprises a photovoltaic film on the device's surface, a piezoelectric generator to convert kinetic energy from motion, and an RF energy harvester to capture power from ambient Wi-Fi and cellular signals. A power management IC (PMIC) selects the optimal energy source or combines sources to charge a supercapacitor. This ensures that the "always-on" contextual awareness function can operate indefinitely without draining the main battery, only drawing from it when the main processor must be engaged.
graph TD subgraph Energy Sources A[Photovoltaic Film] B[Piezoelectric Generator] C[RF Harvester] end subgraph Power Management A & B & C --> D[PMIC]; D --> E[Supercapacitor]; end subgraph Low-Power Domain E --> F(Low-Power CPU); F --> G[Context Sensors]; end H[Main Battery] --> I(Main CPU); F -- Wake-up signal --> I;
Part 3: Derivatives of Two-Processor System Architecture (Relates to Claims 29 & 36 of the '750 patent)
Derivative 3.1: Neuromorphic First-Stage "Cochlea" Processor
Axis: Component Substitution
Enabling Description: An apparatus where the first, low-power processor is a neuromorphic spiking neural network (SNN) chip designed to mimic the function of the human cochlea. This processor receives the raw microphone data and converts it into a sparse, event-based stream of neural spikes. It is exceptionally power-efficient for continuous monitoring. It performs rudimentary voice activity detection and phoneme classification. Only when the spike patterns indicate a high probability of human speech with command-like intonation does it wake the second processor, a conventional CPU, and passes the recognized phoneme stream (not the raw audio) for full speech recognition and natural language understanding.
sequenceDiagram participant Mic participant Neuromorphic_SNN participant Main_CPU Mic->>+Neuromorphic_SNN: Continuous Audio Waveform loop Always-On Low-Power Neuromorphic_SNN->>Neuromorphic_SNN: Processes audio into spikes end Neuromorphic_SNN->>Main_CPU: Wake-up + Phoneme Stream activate Main_CPU Main_CPU->>Main_CPU: ASR and NLU Main_CPU-->>-Neuromorphic_SNN: Return to sleep deactivate Main_CPU
Derivative 3.2: Blockchain-Verified Command Handoff
Axis: Integration with Emerging Tech (Blockchain)
Enabling Description: A method for high-security applications where the two-processor system creates an immutable, non-repudiable record of commands. The first low-power processor detects a potential voice command and a voiceprint from the user. It generates a hash of the preliminary acoustic features and the user's voiceprint. Upon waking the second, more powerful processor, it passes this hash along with the full acoustic data. The second processor performs full command recognition and, upon successful execution, writes a transaction to a private blockchain. The transaction includes the initial hash, the final recognized command text, a timestamp, and is cryptographically signed using a key stored in a secure enclave. This creates a verifiable audit trail for commands used in financial transactions or physical access control.
flowchart TD A[Acoustic Input] --> B(Low-Power Processor); B --> C[Extract Voiceprint + Acoustic Features]; C --> D[Generate Hash_1]; D --> E{Wake Main Processor?}; E -- Yes --> F(Main Processor); B -- Acoustic Data --> F; D -- Hash_1 --> F; F --> G[Full ASR + NLU]; G --> H{Execute Command}; H -- Success --> I[Create Blockchain Transaction]; I -- (Hash_1, Command Text, Timestamp) --> J((Private Ledger));
Derivative 3.3: Distributed Asynchronous Processing for Aerospace
Axis: Cross-Domain Application (Aerospace)
Enabling Description: In an aircraft cockpit, the "first processor" is a network of simple, redundant Digital Signal Processors (DSPs), each connected to a microphone in a different location (e.g., pilot's helmet, co-pilot's headset, ambient cabin). These DSPs perform noise cancellation and VAD locally. The "second processor" is a centralized, fault-tolerant flight control computer. When any DSP detects speech, it transmits a compressed feature vector to the central computer. The central computer uses inputs from multiple DSPs to triangulate the speaker's location and identity, fuse the feature vectors to improve recognition accuracy in high-G and high-vibration environments, and validate the command against the current flight state before execution.
graph TD subgraph Distributed DSPs (First Processor Stage) DSP1[Pilot Mic DSP] DSP2[Co-Pilot Mic DSP] DSP3[Cabin Mic DSP] end subgraph Central Computer (Second Processor Stage) FC[Flight Control Computer] end DSP1 -- Feature Vector --> FC DSP2 -- Feature Vector --> FC DSP3 -- Feature Vector --> FC FC -->|Fuse Data & Validate| EXEC[Execute Flight Command]
Part 4: Combination Prior Art with Open-Source Standards
Combination with RISC-V and TensorFlow Lite: A mobile device is disclosed wherein the first, low-power processor (as in claim 29) is implemented as a custom System-on-Chip (SoC) utilizing the open-source RISC-V instruction set architecture. The core is specifically designed with custom instructions to accelerate the matrix multiplication and convolution operations common in audio feature extraction (e.g., MFCCs). The voice activity detection (VAD) and limited keyword spotting models running on this processor are built and optimized using the open-source TensorFlow Lite for Microcontrollers framework. This allows for a fully open-source, highly-optimized hardware/software stack for the first stage of acoustic processing, minimizing power and licensing costs.
Combination with Matter Protocol for IoT Control: The trigger-less voice command detection system (as in claim 1) is integrated as a feature in a smart home hub. When the system detects a command like "It's too dark in the living room," the NLU module on the hub's main processor interprets the intent. It then translates this intent into a standardized command using the open-source Matter application layer protocol. It sends a "LevelControl: MoveToLevel" command to the Zigbee/Thread address of the lighting group designated as "living room," demonstrating a seamless integration of natural language voice control with an open smart home standard.
Combination with the ONNX (Open Neural Network Exchange) Standard: A method is disclosed where the acoustic models used for voice command detection (both on the low-power and main processors) are stored and deployed using the open-source ONNX format. This decouples the model training pipeline from the specific hardware inference engine. Models can be trained in any popular framework (e.g., PyTorch, JAX) and exported to ONNX. The device's runtime environment can then use an ONNX-compatible inference engine (e.g., ONNX Runtime) that is optimized for its specific silicon (e.g., a Qualcomm Hexagon DSP or an Apple Neural Engine), allowing for flexible and portable deployment of state-of-the-art acoustic models without vendor lock-in.
Generated 5/8/2026, 10:10:03 PM