Derivative works — US Patent 10134398

Defensive Disclosure: Derivative Innovations for US Patent 10,134,398 - Hotword Detection on Multiple Devices

This document outlines derivative variations and extensions of the core concepts disclosed in US Patent 10,134,398, with the strategic objective of creating defensive prior art. These disclosures aim to render future incremental improvements by competitors "obvious" or "non-novel" by comprehensively describing a range of technical alternatives, operational paradigms, cross-domain applications, integrations with emerging technologies, and failure/low-power modes, all grounded in the principles of hotword detection and inter-device arbitration.

The core claims of US Patent 10,134,398 describe a method, system, and computer-readable storage medium for coordinating hotword detection among multiple computing devices. The essence involves a first device determining a hotword likelihood, receiving a hotword likelihood from a second device, comparing these values, and based on the comparison, initiating speech recognition. A key aspect highlighted by the Federal Circuit's reversal of PTAB decisions regarding this patent family is the "exchanging messages while in a low power mode" limitation, which will be specifically addressed and expanded upon in the "Inverse or Failure Mode" derivatives.

Combination Prior Art Scenarios with Open-Source Standards

The foundational concept of multi-device hotword arbitration can be combined with existing open-source standards to enhance robustness, interoperability, and feature sets.

Combination with MQTT (Message Queuing Telemetry Transport) for IoT Environments:
The hotword confidence score exchange and arbitration messages (e.g., device identifiers, activation states) can be implemented over an MQTT broker. Each computing device acts as an MQTT client, subscribing to a common topic (e.g., hotword/scores) and publishing its calculated hotword confidence score along with its device ID and potentially other context (e.g., loudness, battery status). A designated "coordinator" device or a central server (also an MQTT client) could subscribe to these topics, perform the comparison, and publish the arbitration result (e.g., hotword/winner/device_id) to which all devices subscribe. This enables lightweight, publish/subscribe messaging for hotword coordination in large-scale IoT deployments, adhering to a widely adopted open standard for constrained devices and unreliable networks.
- Enabling Description: A local MQTT broker (e.g., Mosquitto) is deployed on the local network. Each hotword-enabled computing device integrates an MQTT client library (e.g., Paho MQTT for Python, ESP-MQTT for ESP32). Upon detecting a potential hotword and computing a confidence score S_i (as per claim 1, step 2), device i publishes a JSON payload { "device_id": "ID_i", "hotword_score": S_i, "timestamp": T_i, "battery_level": B_i } to the topic hotword/+/scores, where + is a wildcard for device-specific subtopics or a common topic for all devices. Devices subscribe to hotword/+/scores to receive scores from peers. A designated "leader election" algorithm, possibly based on the highest score and then lexicographical device ID for ties, is executed locally by each device or by a central "arbiter" service. The elected leader then transitions to a full speech recognition state, while others enter a low-power listening state or cease processing.
Combination with Bluetooth Low Energy (BLE) Advertising and Scanning (Generic Access Profile):
For localized multi-device hotword detection where devices are in close proximity (e.g., a single room), BLE's advertising and scanning mechanisms (defined in the Generic Access Profile of the Bluetooth Core Specification) can facilitate the exchange of hotword confidence scores without requiring a local network infrastructure. Each computing device, upon computing a hotword confidence score, can periodically broadcast a custom BLE Advertising Packet. This packet's manufacturer-specific data field would encode the device's identifier and its current hotword confidence score. Other nearby computing devices, acting as BLE scanners, would passively listen for these advertisements, collect scores from identified peers, and perform the comparison to determine the highest score. The winning device would then proceed with full speech recognition, while others would continue advertising (potentially with a "suppressed" flag) or cease further processing.
- Enabling Description: Each computing device includes a BLE radio and controller. A custom Advertisement Data (AD) structure is defined, including a Service Data field (UUID: 0xFEAA for Eddystone or a custom 128-bit UUID) containing a Device ID (e.g., 4 bytes) and the hotword confidence score (e.g., a 2-byte fixed-point representation of a float 0.0-1.0). When a hotword confidence score S_i is determined, the device i configures its BLE Advertiser to broadcast this AD packet every 100ms for a duration of 500ms. Simultaneously, device i (and all other devices) operates a BLE Scanner, collecting AD packets from nearby devices. A filtering mechanism ensures only relevant hotword advertisements are processed. A local comparison logic evaluates S_i against all received S_j within a recent time window (e.g., 500ms). The device with the highest valid S_i proceeds to activate full speech recognition; all others halt processing or remain in a low-power hotword detection loop.
Combination with WebRTC (Web Real-Time Communication) Data Channels for Peer-to-Peer Arbitration:
For a more dynamic and potentially secure peer-to-peer hotword arbitration, WebRTC Data Channels could be leveraged. After an initial discovery phase (e.g., mDNS or a simple server-based rendezvous), computing devices could establish direct peer-to-peer data channels. Once a hotword is detected and a confidence score is calculated, devices could transmit their scores and identifiers directly over these secure, low-latency data channels. This allows for real-time exchange and comparison without a central server for the arbitration logic itself, while benefiting from WebRTC's built-in NAT traversal and encryption capabilities.
- Enabling Description: Each computing device hosts a lightweight WebRTC client (e.g., using webrtc-rs for Rust, node-webrtc for Node.js). Devices discover each other on the local network via mDNS (multicast DNS) advertising a specific service type (e.g., _hotword_arbitration._udp). Upon discovery, devices initiate WebRTC peer connections, exchanging SDP (Session Description Protocol) offers/answers and ICE (Interactive Connectivity Establishment) candidates to establish direct data channels. When a hotword confidence score S_i is determined (Claim 1, step 2), device i sends a message {"type": "hotword_score", "device_id": "ID_i", "score": S_i} over its established data channels to all connected peers. Each device maintains a buffer of recently received scores from peers. A comparison function is triggered upon receiving new scores, identifying the device with the maximal S_k. If device i's score is highest, it proceeds with speech recognition (Claim 1, step 5); otherwise, it suppresses further action.

Derivative Variations for Core Claim Functionality (Based on Independent Claim 1)

Core Functionality: A first computing device (1) receives audio, (2) determines a first hotword likelihood, (3) receives a second hotword likelihood from a second device, (4) compares the values, and (5) initiates speech recognition based on the comparison.

Axis 1: Material & Component Substitution

Derivative 1.1: Multi-Modal Bio-Acoustic Sensor Array for Hotword Detection

Enabling Description: Instead of solely relying on traditional MEMS microphones, this derivative employs a distributed array of piezoelectric acoustic transducers integrated directly into wearable computing devices (e.g., smart rings, smart watches, bone-conduction headsets). These transducers are augmented with electroencephalography (EEG) sensors and electromyography (EMG) sensors positioned on the user's scalp and neck/jawline, respectively. The "audio data" (Claim 1, step 1) is a fusion of acoustic vibrations, cranial electrical activity, and muscle contractions associated with speech articulation, capturing not just the utterance sound but also the user's intent to speak or formulate words. The "first value" (Claim 1, step 2) is a hotword confidence score derived from a multi-kernel support vector machine (MK-SVM) that processes features extracted from all three sensor modalities (acoustic spectrogram, EEG alpha/beta band power, EMG onset/duration). Communication between devices for score exchange (Claim 1, step 3) uses ultra-wideband (UWB) radio modules for high-precision ranging and secure, short-burst data transfer, ensuring lower interference and better spatial resolution than conventional Wi-Fi or Bluetooth.

Mermaid Diagram:

graph TD
    A[User Utterance & Intent] --> B(Piezoelectric Acoustic Transducer)
    A --> C(EEG Sensors)
    A --> D(EMG Sensors)
    B --> E{Audio Features: Spectrogram}
    C --> F{EEG Features: Brainwave Power}
    D --> G{EMG Features: Muscle Activation}
    E & F & G --> H[Feature Fusion Module]
    H --> I{Multi-Kernel SVM Hotword Classifier}
    I --> J(First Hotword Confidence Score S1)
    J --> K[UWB Transceiver]
    K -- Transmit S1 & Receive S2 --> L[UWB Transceiver]
    L --> M(Second Hotword Confidence Score S2)
    J & M --> N{Score Comparison Logic}
    N -- Highest Score --> O[Initiate Multi-Modal Speech Recognition]
    N -- Lower Score --> P[Suppress Processing]

Derivative 1.2: Photonic Sensor Network with Quantum-Dot Lasers for Acoustic Detection

Enabling Description: In this advanced system, the "microphone" of each computing device is replaced by a photonic acoustic sensor network. Small, integrated quantum-dot laser diodes emit light into a resonant cavity where sound waves cause minute deformations. These deformations modulate the laser output's phase and frequency, which are then detected by a photodiode array. The "audio data" (Claim 1, step 1) is thus optical phase and frequency shift data. A dedicated field-programmable gate array (FPGA) with integrated optical coherent detection circuits performs real-time demodulation and acoustic feature extraction (e.g., optical heterodyne spectrum analysis). The hotword confidence score is computed by a deep neural network (DNN) implemented in a neuromorphic computing ASIC, optimizing for low power and high inference speed. Inter-device communication for score exchange (Claim 1, step 3) utilizes free-space optical communication (e.g., infrared laser pulses) for secure, directional, and high-bandwidth transmission over short ranges within a room, managed by beam steering micro-electromechanical systems (MEMS) mirrors.

Mermaid Diagram:

graph TD
    A[User Utterance (Sound Waves)] --> B(Quantum-Dot Laser Diode & Resonant Cavity)
    B --> C(Photodiode Array)
    C --> D{Optical Phase/Frequency Shift Data}
    D --> E[FPGA with Coherent Detection]
    E --> F{Acoustic Features (Heterodyne Spectrum)}
    F --> G[Neuromorphic Hotword ASIC (DNN)]
    G --> H(First Hotword Confidence Score S1)
    H --> I[Free-Space Optical Transceiver]
    I -- Transmit S1 & Receive S2 --> J[Free-Space Optical Transceiver]
    J --> K(Second Hotword Confidence Score S2)
    H & K --> L{Score Comparison Module}
    L -- Highest Score --> M[Initiate High-Fidelity Speech Recognition]
    L -- Lower Score --> N[Suppress Optical Processing]

Derivative 1.3: Ferroelectric Polymer Microphone Arrays with Embedded DSP

Enabling Description: The "microphone" (Claim 1, step 1) is composed of a ferroelectric polymer (e.g., PVDF) film array embedded into flexible substrates (e.g., smart fabric, wall coverings). These films generate an electrical charge in response to acoustic pressure, offering broad frequency response and high sensitivity. Each element in the array is coupled to a low-power, custom-designed Digital Signal Processor (DSP) chip that performs initial noise reduction, beamforming (to localize the sound source), and mel-frequency cepstral coefficient (MFCC) extraction. The hotword confidence score (Claim 1, step 2) is calculated by a Convolutional Recurrent Neural Network (CRNN) residing on the embedded DSP, trained specifically for low-resource inference. For score transmission (Claim 1, step 3), devices use a LoRa (Long Range) radio module operating in a license-free ISM band, providing low-power, long-range communication ideal for smart home or building environments without relying on Wi-Fi infrastructure.

Mermaid Diagram:

graph TD
    A[User Utterance] --> B(Ferroelectric Polymer Microphone Array)
    B --> C[Embedded DSP (Noise Reduction, Beamforming, MFCC)]
    C --> D{Acoustic Features}
    D --> E[CRNN Hotword Classifier]
    E --> F(First Hotword Confidence Score S1)
    F --> G[LoRa Radio Module]
    G -- Transmit S1 & Receive S2 --> H[LoRa Radio Module]
    H --> I(Second Hotword Confidence Score S2)
    F & I --> J{Score Arbitration Logic}
    J -- Highest Score --> K[Activate Full Speech Recognition Backend]
    J -- Lower Score --> L[Enter Low-Power Monitoring State]

Axis 2: Operational Parameter Expansion

Derivative 2.1: Distributed Hotword Detection in Industrial Nanomanufacturing Cleanrooms

Enabling Description: This system operates within a Class 100 nanomanufacturing cleanroom environment, characterized by extremely low airborne particulate matter and a highly controlled acoustic profile, but also potential for specific machinery hums. The "computing devices" (Claim 1, step 1) are integrated dust-particle-immune, high-frequency ultrasonic microphones (20kHz-100kHz bandwidth) embedded in cleanroom robotic arms, inspection stations, and operator consoles. The "utterance" could be a specific ultrasonic command sequence from an operator or a critical machinery acoustic signature. The "audio data" is received across devices and processed for a "hotword" which might be an ultrasonic anomaly pattern indicating equipment malfunction or a specific human vocal command. The "first value" (Claim 1, step 2) is a hotword confidence score determined by an Ensemble Kalman Filter (EnKF) based acoustic anomaly detector. Scores are exchanged (Claim 1, step 3) using fiber-optic data links to avoid EMI and maintain cleanroom integrity. The comparison (Claim 1, step 4) and subsequent initiation of speech recognition (Claim 1, step 5) could trigger human operator alerts or automated process adjustments, distinguishing between environmental background noise and intentional commands/anomalies.

Mermaid Diagram:

graph TD
    A[Ultrasonic Command/Machinery Anomaly] --> B1(Ultrasonic Mic Device 1)
    A --> B2(Ultrasonic Mic Device 2)
    B1 --> C1{Ultrasonic Audio Data}
    B2 --> C2{Ultrasonic Audio Data}
    C1 --> D1[EnKF Anomaly Detector D1]
    C2 --> D2[EnKF Anomaly Detector D2]
    D1 --> E1(S1: Anomaly Confidence Score)
    D2 --> E2(S2: Anomaly Confidence Score)
    E1 --> F1[Fiber-Optic Transmitter F1]
    E2 --> F2[Fiber-Optic Transmitter F2]
    F1 -- Transmit S1 --> G2[Fiber-Optic Receiver G2]
    F2 -- Transmit S2 --> G1[Fiber-Optic Receiver G1]
    E1 & G1 --> H1{Score Comparison H1}
    E2 & G2 --> H2{Score Comparison H2}
    H1 -- Highest Score --> I1[Initiate Automated Process Adjustment / Alert]
    H2 -- Lower Score --> J2[Standby / Ignore Anomaly]

Derivative 2.2: Deep-Sea Acoustic Hotword Detection in Oceanographic Monitoring Networks

Enabling Description: This system operates in a deep-sea environment (e.g., 2000m depth, 4°C, 3000psi), where communication is severely limited and acoustic propagation characteristics are complex. The "computing devices" are autonomous underwater vehicles (AUVs) and fixed seafloor hydrophone arrays, each equipped with low-frequency hydrophones (10Hz-5kHz bandwidth). The "utterance" (Claim 1, step 1) is a specific underwater acoustic signature, e.g., a whale call pattern indicating migration, a seismic event, or a coded command from a remote surface vessel. The "audio data" is processed for a "hotword" that represents these signatures. The "first value" (Claim 1, step 2) is a hotword confidence score determined by a matched filter algorithm against known bio-acoustic or geophonic patterns, adapted for deep-ocean acoustic distortion. Due to high latency and low bandwidth, score exchange (Claim 1, step 3) occurs via acoustic modem bursts (underwater communication), where data packets are highly compressed and error-corrected. Arbitration logic (Claim 1, step 4) selects the device with the highest confidence and closest proximity (estimated via signal arrival time differences), which then "initiates speech recognition" (Claim 1, step 5) by forwarding the full acoustic event data via satellite uplink (if an AUV) or initiating a local data logging routine.

Mermaid Diagram:

graph TD
    A[Underwater Acoustic Event] --> B1(Seafloor Hydrophone Array 1)
    A --> B2(AUV Hydrophone 2)
    B1 --> C1{Raw Acoustic Data (Low-Freq)}
    B2 --> C2{Raw Acoustic Data (Low-Freq)}
    C1 --> D1[Matched Filter Classifier D1]
    C2 --> D2[Matched Filter Classifier D2]
    D1 --> E1(S1: Signature Confidence Score)
    D2 --> E2(S2: Signature Confidence Score)
    E1 --> F1[Acoustic Modem F1]
    E2 --> F2[Acoustic Modem F2]
    F1 -- Transmit S1, Pos --> G2[Acoustic Modem G2]
    F2 -- Transmit S2, Pos --> G1[Acoustic Modem G1]
    E1 & G1 --> H1{Arbitration Logic (Score + Proximity)}
    E2 & G2 --> H2{Arbitration Logic (Score + Proximity)}
    H1 -- Highest Score/Closest --> I1[Initiate Data Uplink / Log Full Event]
    H2 -- Lower Score --> J2[Discard Event / Low-Power Monitoring]

Derivative 2.3: High-Altitude Atmospheric Hotword Detection for Aerial Survey Drones

Enabling Description: This system operates at high altitudes (e.g., 10,000m, -40°C, low air density), utilizing a fleet of solar-powered, long-endurance aerial survey drones. Each drone is equipped with ultra-sensitive microbarometers and low-noise microphones designed for thin atmospheric conditions, capable of detecting infrasound (sub-20Hz) and very low-frequency audible signals from ground events (e.g., distant explosions, unusual animal vocalizations, covert human activity). The "utterance" (Claim 1, step 1) is a specific low-frequency acoustic signature or pattern. The "audio data" is collected by these devices. The "first value" (Claim 1, step 2) is a hotword confidence score generated by a custom atmospheric acoustic propagation model combined with a deep learning classifier to filter out atmospheric turbulence noise and identify specific ground events. Inter-drone communication for score exchange (Claim 1, step 3) is handled via directional laser communication links for secure, high-bandwidth peer-to-peer transmission, maintaining stringent power budgets. The arbitration logic (Claim 1, step 4) considers both confidence scores and geographical proximity/line-of-sight to the detected event, with the winning drone initiating "speech recognition" (Claim 1, step 5) by focusing its high-resolution optical/thermal cameras and transmitting detailed imagery and full acoustic recordings back to a ground station via satellite.

Mermaid Diagram:

graph TD
    A[Ground Acoustic Event (Low-Freq)] --> B1(Drone Mic/Microbarometer 1)
    A --> B2(Drone Mic/Microbarometer 2)
    B1 --> C1{Atmospheric Acoustic Data}
    B2 --> C2{Atmospheric Acoustic Data}
    C1 --> D1[DL Classifier + Propagation Model D1]
    C2 --> D2[DL Classifier + Propagation Model D2]
    D1 --> E1(S1: Event Confidence Score)
    D2 --> E2(S2: Event Confidence Score)
    E1 --> F1[Laser Comms Link F1]
    E2 --> F2[Laser Comms Link F2]
    F1 -- Transmit S1, GeoLoc --> G2[Laser Comms Link G2]
    F2 -- Transmit S2, GeoLoc --> G1[Laser Comms Link G1]
    E1 & G1 --> H1{Arbitration Logic (Score + GeoProximity)}
    E2 & G2 --> H2{Arbitration Logic (Score + GeoProximity)}
    H1 -- Highest Score/Best Focus --> I1[Initiate Camera Focus & Satellite Uplink]
    H2 -- Lower Score --> J2[Maintain Survey Flight Pattern]

Axis 3: Cross-Domain Application

Derivative 3.1: Automotive Cabin Hotword Arbitration for Autonomous Driving Systems

Enabling Description: In an automotive environment with multiple passengers and integrated vehicle systems, each seat or zone of an autonomous vehicle could be considered a "computing device" with its own microphone array. The "utterance" (Claim 1, step 1) could be a passenger's command (e.g., "Navigate home," "Decrease temperature"). The "audio data" is captured by microphone arrays embedded in headrests, dashboards, and ceiling panels. Each zone's local processing unit (LPU) determines a "first value" (Claim 1, step 2) representing the hotword confidence score, augmented by speaker diarization and localization data (to identify who spoke and where they are). These scores are exchanged (Claim 1, step 3) over the vehicle's CAN bus or Automotive Ethernet network. The central vehicle control unit (VCU) performs the comparison (Claim 1, step 4) and, based on the highest confidence score and confirmed speaker location/ID, initiates "speech recognition" (Claim 1, step 5) only for the designated passenger, preventing conflicting commands from other passengers or ambient noise.

Mermaid Diagram:

graph TD
    A[Passenger Utterance] --> B1(Mic Array Zone A)
    A --> B2(Mic Array Zone B)
    B1 --> C1[LPU Zone A (Hotword, Speaker ID, Localization)]
    B2 --> C2[LPU Zone B (Hotword, Speaker ID, Localization)]
    C1 --> D1(S1: Conf. Score + Context)
    C2 --> D2(S2: Conf. Score + Context)
    D1 -- Transmit S1 via Automotive Ethernet --> E(Vehicle Control Unit - VCU)
    D2 -- Transmit S2 via Automotive Ethernet --> E
    E --> F{VCU Arbitration Logic (Max Score, Speaker ID Match)}
    F -- Highest Valid Score --> G[Initiate Speech Recognition for Designated Zone]
    F -- Lower/Invalid Score --> H[Suppress Command Processing for Zone]

Derivative 3.2: Agricultural Field Robotics Hotword Detection for Crop Monitoring

Enabling Description: In large-scale agricultural fields, a network of autonomous agricultural robots (e.g., weeding robots, sensing drones, irrigation nodes) acts as "computing devices." Each robot is equipped with directional acoustic sensors and integrated environmental microphones (Claim 1, step 1) to detect specific "hotword" acoustic signatures in the field. An "utterance" could be the distress call of a specific pest species, the unique sound of a malfunctioning irrigation pump, or a specific acoustic signature of crop stress (e.g., from insect feeding). Each robot determines a "first value" (Claim 1, step 2) representing the likelihood of detecting a particular hotword signature using a spectral analysis neural network tuned for environmental acoustics. These scores are exchanged (Claim 1, step 3) over a private 5G cellular network established across the farm. The central farm management system (FMS) compares the scores (Claim 1, step 4) from all robots and, based on the highest score and location data, initiates "speech recognition" (Claim 1, step 5) by dispatching a closer inspection drone or activating targeted pest control measures, preventing redundant responses and conserving resources.

Mermaid Diagram:

graph TD
    A[Field Acoustic Signature (Pest, Pump Fault)] --> B1(Agri-Robot 1 Acoustic Sensor)
    A --> B2(Agri-Robot 2 Acoustic Sensor)
    B1 --> C1[Spectral Analysis NN Classifier C1]
    B2 --> C2[Spectral Analysis NN Classifier C2]
    C1 --> D1(S1: Signature Confidence + GeoLoc)
    C2 --> D2(S2: Signature Confidence + GeoLoc)
    D1 -- Transmit S1 via Private 5G --> E(Farm Management System - FMS)
    D2 -- Transmit S2 via Private 5G --> E
    E --> F{FMS Arbitration (Max Score, Nearest Location)}
    F -- Highest Valid Score --> G[Initiate Targeted Action (Drone Dispatch, Irrigation Adjust)]
    F -- Lower/Invalid Score --> H[Continue Monitoring]

Derivative 3.3: Patient Monitoring Hotword Arbitration in Critical Care Units

Enabling Description: In a critical care unit (CCU) of a hospital, multiple "computing devices" are deployed, including bedside monitors, smart infusion pumps, and ambient room microphones (Claim 1, step 1). The "utterance" could be a patient's vocalized distress (e.g., a cough pattern indicating respiratory distress, a cry for help), a specific alarm sound from medical equipment, or a verbal command from a nurse or doctor. The "audio data" is continuously monitored. Each device determines a "first value" (Claim 1, step 2) for a "hotword" using an acoustic event classifier trained on medical soundscapes and vocalizations, considering factors like sound source localization (e.g., originating from the patient's bed). These scores are exchanged (Claim 1, step 3) over a secure hospital-grade Wi-Fi network with QoS prioritization. A central care coordination system (CCS) performs the comparison (Claim 1, step 4). Based on the highest confidence score, patient ID, and sound source, the CCS initiates "speech recognition" (Claim 1, step 5) by routing the event to the appropriate care provider (e.g., nurse workstation alert, direct voice message to attending physician's device) and potentially logging the full audio segment for medical review, ensuring rapid and accurate response to critical events while filtering out ambient hospital noise.

Mermaid Diagram:

graph TD
    A[Patient Vocalization / Equipment Alarm] --> B1(Bedside Monitor Mic)
    A --> B2(Room Ambient Mic)
    B1 --> C1[Acoustic Event Classifier C1 (Hotword, Source Loc)]
    B2 --> C2[Acoustic Event Classifier C2 (Hotword, Source Loc)]
    C1 --> D1(S1: Event Conf. + Patient ID + Source)
    C2 --> D2(S2: Event Conf. + Patient ID + Source)
    D1 -- Transmit S1 via Secure Hospital Wi-Fi --> E(Care Coordination System - CCS)
    D2 -- Transmit S2 via Secure Hospital Wi-Fi --> E
    E --> F{CCS Arbitration (Max Score, Patient ID, Source)}
    F -- Highest Valid Score --> G[Initiate Alert & Route to Care Provider / Log Audio]
    F -- Lower/Invalid Score --> H[Continue Background Monitoring]

Axis 4: Integration with Emerging Tech

Derivative 4.1: AI-Driven Adaptive Hotword Arbitration with Reinforcement Learning

Enabling Description: The "computing devices" (Claim 1, step 1) continuously receive "audio data." The "first value" (Claim 1, step 2) is a hotword confidence score. However, the decision process (Claim 1, step 4) is enhanced by AI-driven adaptive arbitration. Each device runs a local Reinforcement Learning (RL) agent that observes the environment (e.g., ambient noise levels, user interaction history with each device, device battery status, current application context) and dynamically adjusts its hotword detection threshold and its weighting of other devices' scores. When scores are exchanged (Claim 1, step 3), they include additional contextual vectors. The RL agent's policy, continuously refined through user feedback (e.g., explicit confirmation/correction of device activation) and inferred intent, determines the optimal "winning" device by running a multi-agent deep Q-network (DQN). The DQN's output directly influences the "comparison" and "initiation of speech recognition" (Claim 1, step 5), leading to a highly personalized and context-aware hotword response system.

Mermaid Diagram:

graph TD
    A[User Utterance + Environmental Context] --> B1(Device 1)
    A --> B2(Device 2)
    B1 --> C1[Hotword Detector + Context Extractor C1]
    B2 --> C2[Hotword Detector + Context Extractor C2]
    C1 --> D1(S1: Score, Context Vector, Device State)
    C2 --> D2(S2: Score, Context Vector, Device State)
    D1 -- Exchange Data (via IoT Hub) --> E(Distributed RL Agents / DQN)
    D2 -- Exchange Data (via IoT Hub) --> E
    E --> F{Adaptive Arbitration Policy (Learned)}
    F -- Optimal Device Selection --> G[Chosen Device Initiates Speech Recognition]
    F -- Others --> H[Suppress / Adjust State based on Policy]
    G --> I[User Feedback / Action]
    H --> I
    I --> E(Reinforcement Learning Loop)

Derivative 4.2: IoT Sensor-Augmented Hotword Detection for Environmental Context

Enabling Description: Each "computing device" (Claim 1, step 1) is part of a broader IoT sensor network, collecting various environmental data beyond just audio. Alongside the "audio data," the devices also gather data from ambient light sensors, PIR motion sensors, proximity sensors (IR/ultrasonic), and accelerometers/gyroscopes (to detect if a device is being held or moved). This rich contextual data is transmitted with the "first value" (Claim 1, step 2) hotword confidence score, forming an "extended value" data packet. The "comparison" (Claim 1, step 4) now involves a multi-criteria decision-making algorithm that weighs the hotword confidence score with the contextual data. For example, a device being held, showing user proximity, and having sufficient light exposure might be preferentially selected over a device on a distant shelf, even if their raw hotword scores are similar. The communication (Claim 1, step 3) uses IPv6 over Low-Power Wireless Personal Area Networks (6LoWPAN) to handle diverse sensor data alongside hotword scores.

Mermaid Diagram:

graph TD
    A[User Utterance] --> B1(Mic Device 1)
    A --> B2(Mic Device 2)
    B1 --> C1[Hotword Detector C1]
    B2 --> C2[Hotword Detector C2]
    B1 --> D1[IoT Sensors (Light, PIR, Proximity, Accel)]
    B2 --> D2[IoT Sensors (Light, PIR, Proximity, Accel)]
    C1 & D1 --> E1(Extended Value EV1: Score + Contextual Data)
    C2 & D2 --> E2(Extended Value EV2: Score + Contextual Data)
    E1 -- Transmit EV1 via 6LoWPAN --> F(Central Arbiter / Peer Devices)
    E2 -- Transmit EV2 via 6LoWPAN --> F
    F --> G{Multi-Criteria Decision-Making (MCDM) for Arbitration}
    G -- Selected Device --> H[Initiate Context-Aware Speech Recognition]
    G -- Others --> I[Suppress / Update Environmental Profile]

Derivative 4.3: Blockchain-Secured Hotword Arbitration with Immutable Auditing

Enabling Description: In environments requiring high security and auditability, the process of hotword detection and arbitration can be secured using blockchain technology. Each "computing device" (Claim 1, step 1) is a node on a private or consortium blockchain (e.g., using Hyperledger Fabric or Ethereum Quorum). When a device determines a "first value" (Claim 1, step 2), it hashes its calculated hotword confidence score, device ID, and a timestamp, and then proposes this as a transaction to the blockchain. Other devices (Claim 1, step 3) similarly propose their scores. A smart contract deployed on the blockchain (acting as the "comparison" logic, Claim 1, step 4) automatically executes an arbitration function (e.g., selects the highest score, resolves ties deterministically) and immutably records the winning device. Only the device designated by the smart contract's output is authorized to "initiate speech recognition" (Claim 1, step 5). This provides a transparent, tamper-proof log of all hotword events and arbitration decisions, crucial for compliance or forensic analysis in sensitive applications.

Mermaid Diagram:

graph TD
    A[User Utterance] --> B1(Hotword Device 1)
    A --> B2(Hotword Device 2)
    B1 --> C1[Hotword Detector C1]
    B2 --> C2[Hotword Detector C2]
    C1 --> D1(S1: Score, Device ID, Timestamp)
    C2 --> D2(S2: Score, Device ID, Timestamp)
    D1 -- Propose Transaction (Hash S1) --> E(Blockchain Network)
    D2 -- Propose Transaction (Hash S2) --> E
    E --> F{Smart Contract: Arbitration Logic}
    F -- Records Winning Device --> G[Winning Device (from Smart Contract) Initiates Speech Recognition]
    F -- Others --> H[Other Devices Abort Processing]
    G --> I[Immutable Audit Log on Blockchain]

Axis 5: The "Inverse" or Failure Mode / Low-Power Functionality

Derivative 5.1: Hierarchical Low-Power Hotword Arbitration with Dynamic Wake-up Tiers

Enabling Description: This derivative specifically addresses the "exchanging messages while in a low power mode" limitation. The system defines multiple low-power states (LPS) for computing devices:
- LPS0 (Deep Sleep): Only a minimal acoustic trigger (e.g., specific frequency detection, energy threshold) is active. No hotword processing.
- LPS1 (Ultra-Low Power Hotworder): A highly optimized, hardware-accelerated hotworder (e.g., a custom ASIC with reduced precision models) runs, consuming minimal power to compute a "coarse" hotword confidence score (S_coarse). This S_coarse is transmitted via a low-power wake-up radio (WUR), indicating "potential interest." This is the primary mode for exchanging messages.
- LPS2 (Partial Hotworder): If S_coarse from LPS1 meets a lower threshold, the device partially wakes up a more capable, but still low-power, DSP-based hotworder to compute a "fine" hotword confidence score (S_fine). S_fine is exchanged via Bluetooth Low Energy (BLE) advertisement.
- LPS3 (Full Hotworder): If S_fine meets a higher threshold, the device fully wakes up its main processor to run a full, high-accuracy hotword model.

When an "utterance" (Claim 1, step 1) occurs, devices in LPS1 compute S_coarse. They exchange these S_coarse values (Claim 1, step 3) using WUR packets. The device with the highest S_coarse (Claim 1, step 4) transitions to LPS2 to compute S_fine. Other devices in LPS1 (with lower S_coarse) remain in LPS1 or return to LPS0. The devices in LPS2 then exchange S_fine via BLE. The device with the highest S_fine transitions to LPS3 for full hotword processing and, if confirmed, "initiates speech recognition" (Claim 1, step 5). Devices that do not win at any tier revert to a lower LPS.

Mermaid Diagram:

stateDiagram-v2
    [*] --> LPS0_DeepSleep
    LPS0_DeepSleep --> LPS1_UltraLowPowerHotworder: Acoustic Trigger
    
    LPS1_UltraLowPowerHotworder --> Compute_S_Coarse: Audio Data Received
    Compute_S_Coarse --> Exchange_S_Coarse: Transmit S_Coarse via WUR
    
    Exchange_S_Coarse --> Compare_S_Coarse: Receive S_Coarse from Peers
    Compare_S_Coarse --> LPS2_PartialHotworder: S_Coarse is Highest (Device Wins)
    Compare_S_Coarse --> LPS0_DeepSleep: S_Coarse is Lower (Device Loses)
    
    LPS2_PartialHotworder --> Compute_S_Fine: Audio Data for Fine Scoring
    Compute_S_Fine --> Exchange_S_Fine: Transmit S_Fine via BLE
    
    Exchange_S_Fine --> Compare_S_Fine: Receive S_Fine from Peers
    Compare_S_Fine --> LPS3_FullHotworder: S_Fine is Highest (Device Wins)
    Compare_S_Fine --> LPS0_DeepSleep: S_Fine is Lower (Device Loses)
    
    LPS3_FullHotworder --> Full_Hotword_Confirm: Process Full Hotword Model
    Full_Hotword_Confirm --> Initiate_Speech_Recognition: Hotword Confirmed
    Full_Hotword_Confirm --> LPS0_DeepSleep: Hotword Rejected
    Initiate_Speech_Recognition --> [*]

Derivative 5.2: Controlled Degraded Hotword Detection for Emergency Override

Enabling Description: This derivative describes an "inverse" scenario where the hotword detection system is designed to intentionally operate in a degraded mode for safety or emergency override. When a system-wide "emergency flag" is activated (e.g., detected fire alarm, critical power failure, medical emergency alert), the normal hotword arbitration is suspended. Instead, all "computing devices" (Claim 1, step 1) enter a "failsafe hotword mode." In this mode, each device applies a universal, low-threshold hotword confidence score (e.g., 0.1 normalized) for specific emergency hotwords (e.g., "Emergency Stop," "Evacuate," "Help"). The "first value" (Claim 1, step 2) is derived from this relaxed threshold. Crucially, no scores are exchanged (bypassing Claim 1, step 3 and 4). Any device detecting an emergency hotword above this low threshold immediately "initiates speech recognition" (Claim 1, step 5) and executes the associated emergency command, even if other devices might have a higher "normal" hotword score. This prioritizes rapid response to critical commands, tolerating false positives to ensure safety. Communication for status reporting (not arbitration) may occur using Zigbee Green Power for minimal energy footprint.

Mermaid Diagram:

graph TD
    A[System State] --> B{Emergency Flag Activated?}
    B -- Yes --> C[Enter Failsafe Hotword Mode]
    B -- No --> D[Normal Hotword Arbitration (as per patent)]
    C --> E1(Device 1)
    C --> E2(Device 2)
    E1 --> F1[Detect Emergency Hotword (Low Threshold)]
    E2 --> F2[Detect Emergency Hotword (Low Threshold)]
    F1 --> G1{Emergency Hotword Detected?}
    F2 --> G2{Emergency Hotword Detected?}
    G1 -- Yes --> H1[Immediate Emergency Action / Speech Recognition]
    G2 -- Yes --> H2[Immediate Emergency Action / Speech Recognition]
    G1 -- No --> I1[Continue Failsafe Monitoring]
    G2 -- No --> I2[Continue Failsafe Monitoring]
    H1 --> J[Broadcast Emergency Status via Zigbee Green Power]
    H2 --> J
    J --> K[System-wide Emergency Response]

Derivative 5.3: Proactive Hotword Suppression with Environmental Noise Prediction

Enabling Description: This derivative focuses on proactively suppressing hotword detection on devices least likely to be addressed, particularly in high-noise environments. The "computing devices" (Claim 1, step 1) incorporate environmental noise sensors and real-time noise classification (e.g., music, TV, conversation, traffic). Before computing the "first value" (Claim 1, step 2) for a hotword, each device predicts the signal-to-noise ratio (SNR) of a potential hotword utterance. Scores exchanged (Claim 1, step 3) include this predicted SNR and a "noise profile" confidence. The "comparison" (Claim 1, step 4) is augmented: devices with a predicted SNR below a dynamic threshold (adjusted by the noise profile) voluntarily decrease their reported hotword confidence score or suppress transmission entirely. This creates a "soft" arbitration, effectively taking devices in excessively noisy environments "out of the running" before full hotword processing. This allows a device with a moderate raw hotword score but excellent SNR to win over a device with a higher raw score buried in noise. The result is more reliable hotword activation and reduced false positives, particularly useful for "smart speaker" systems in busy households.

Mermaid Diagram:

graph TD
    A[User Utterance] --> B1(Device 1)
    A --> B2(Device 2)
    B1 --> C1[Mic + Noise Sensors C1]
    B2 --> C2[Mic + Noise Sensors C2]
    C1 --> D1[Estimate SNR + Noise Profile D1]
    C2 --> D2[Estimate SNR + Noise Profile D2]
    D1 --> E1{Hotword Detector S1 + Adjust based on SNR}
    D2 --> E2{Hotword Detector S2 + Adjust based on SNR}
    E1 --> F1(Adjusted S1 + Noise Profile)
    E2 --> F2(Adjusted S2 + Noise Profile)
    F1 -- Transmit (Adjusted S1/Suppress) --> G(Score Comparison Module)
    F2 -- Transmit (Adjusted S2/Suppress) --> G
    G -- Highest Adjusted Score --> H[Initiate Speech Recognition]
    G -- Lower/Suppressed --> I[Suppress Hotword Activation]

Derivative 5.4: Shared Contextual Blacklisting for Hotword Suppression

Enabling Description: This derivative focuses on suppressing hotword detection based on shared contextual cues indicating a device is not intended to respond. "Computing devices" (Claim 1, step 1) continuously monitor not just audio but also active screen usage, active media playback, and user input events (e.g., keyboard, touch). When an "utterance" occurs, each device computes its "first value" (Claim 1, step 2). Simultaneously, each device generates a "contextual blacklisting score" (CBS) based on its current activity. For example, a tablet actively playing a full-screen movie for 30 minutes would have a very high CBS, indicating it's unlikely the user intends to address it. These hotword scores and CBS values are exchanged (Claim 1, step 3). The "comparison" (Claim 1, step 4) incorporates the CBS: if a device's CBS exceeds a predefined threshold, its hotword score is discounted or entirely invalidated, regardless of its raw hotword confidence. This proactive blacklisting ensures that devices currently in active user engagement modes (e.g., watching a video, typing an email) do not erroneously respond to hotwords, enhancing user experience.

Mermaid Diagram:

graph TD
    A[User Utterance] --> B1(Device 1)
    A --> B2(Device 2)
    B1 --> C1[Hotword Detector C1]
    B2 --> C2[Hotword Detector C2]
    B1 --> D1[Context Monitor D1 (Screen Active, Media Play, Input)]
    B2 --> D2[Context Monitor D2 (Screen Active, Media Play, Input)]
    C1 --> E1(Raw Hotword S1)
    C2 --> E2(Raw Hotword S2)
    D1 --> F1(CBS1: Contextual Blacklisting Score)
    D2 --> F2(CBS2: Contextual Blacklisting Score)
    E1 & F1 -- Transmit S1, CBS1 --> G(Arbitration Module)
    E2 & F2 -- Transmit S2, CBS2 --> G
    G --> H{Apply CBS Discount/Invalidation}
    H --> I{Compare Adjusted Scores}
    I -- Highest Adjusted Score --> J[Initiate Speech Recognition]
    I -- Lower/Invalidated --> K[Suppress Hotword Activation]

Derivative 5.5: Predictive Failure Mode for Graceful Hotword Service Degradation

Enabling Description: This derivative introduces a predictive failure mode for hotword services to ensure graceful degradation rather than abrupt failure. Each "computing device" (Claim 1, step 1) incorporates sensor data for internal component health (e.g., battery cycles, flash memory wear, CPU temperature, microphone array degradation diagnostics). This health data is fed into a local predictive maintenance AI model which forecasts imminent hardware failures or performance degradation affecting hotword detection quality. When an "utterance" occurs and the "first value" (Claim 1, step 2) is determined, this hotword confidence score is augmented with a "reliability score" derived from the predictive model. These augmented scores are exchanged (Claim 1, step 3). The "comparison" (Claim 1, step 4) now prioritizes devices with high hotword confidence and high reliability scores. If a device's reliability score falls below a critical threshold, it proactively transmits a "self-disablement" message, effectively removing itself from arbitration until repairs, even if its raw hotword score is high. This ensures that the hotword service is always handled by the most reliable device, and prevents user frustration due to intermittent or failing responses.

Mermaid Diagram:

graph TD
    A[User Utterance] --> B1(Device 1)
    A --> B2(Device 2)
    B1 --> C1[Hotword Detector C1]
    B2 --> C2[Hotword Detector C2]
    B1 --> D1[Health Sensors + Predictive AI D1]
    B2 --> D2[Health Sensors + Predictive AI D2]
    C1 --> E1(Raw Hotword S1)
    C2 --> E2(Raw Hotword S2)
    D1 --> F1(R1: Reliability Score)
    D2 --> F2(R2: Reliability Score)
    E1 & F1 -- Transmit S1, R1 / Self-Disable --> G(Arbitration Module)
    E2 & F2 -- Transmit S2, R2 / Self-Disable --> G
    G --> H{Filter Self-Disabled Devices}
    H --> I{Compare Scores & Reliability}
    I -- Highest (S + R) --> J[Initiate Speech Recognition]
    I -- Lower / Self-Disabled --> K[Suppress / Acknowledge Self-Disablement]