Patent 10769446

Derivative works

Defensive disclosure: derivative variations of each claim designed to render future incremental improvements obvious or non-novel.

Active provider: Google · gemini-2.5-pro

Derivative works

Defensive disclosure: derivative variations of each claim designed to render future incremental improvements obvious or non-novel.

✓ Generated

Defensive Disclosure and Prior Art Derivations for U.S. Patent No. 10,769,446

Publication Date: May 12, 2026
Subject Matter: Enhancements, variations, and alternative embodiments for systems and methods of combining video content with one or more augmentations.

This document discloses further inventions and improvements that build upon, or provide alternative implementations for, the technology described in U.S. Patent No. 10,769,446 ("the '446 patent"). The purpose of this disclosure is to place these concepts into the public domain, thereby establishing them as prior art for any future patent applications.

The core concept of the '446 patent involves a client-server architecture where video content is presented with a synchronized, invisible layer of bounding boxes. Each box is assigned a unique identifier (e.g., an RGBA color value), allowing a user's selection on the screen to be rapidly mapped to a specific object. This triggers the display of augmented information. The following disclosures expand upon this concept across various dimensions.


Axis 1: Material & Component Substitution

1.1. Unique Identifier Encoding via Non-Visible Light and Steganography

Enabling Description: Instead of using RGBA values in a parallel, invisible frame buffer (the "ID buffer"), object identifiers are encoded directly into the Luma (Y') channel of the YCbCr color space of the video stream itself. By subtly modulating the luma value of pixels within a bounding box by an imperceptible amount (e.g., ±1 on an 8-bit scale), a unique binary ID can be embedded. The client device, upon user selection, would not sample a separate buffer but would read the luma values of the pixels at the selected coordinates, decode the embedded ID, and request the corresponding augmentation. This method eliminates the need for a secondary video-like stream for the bounding boxes, reducing bandwidth. An alternative uses steganography, embedding the object ID in the least significant bits (LSB) of the color data of the video frame's pixels within the bounding box.

sequenceDiagram
    participant User
    participant ClientDevice as Client Device
    participant Renderer as Augmentation Renderer

    User->>+ClientDevice: Taps screen at (x,y)
    ClientDevice->>ClientDevice: Read Luma/LSB values at (x,y)
    ClientDevice->>ClientDevice: Decode Object ID from pixel data
    ClientDevice->>+Renderer: Request Augmentation for Object ID
    Renderer-->>-ClientDevice: Return Augmentation Data
    ClientDevice->>User: Display Augmentation over Video

1.2. Haptic Feedback Grids as Bounding Box Proxies

Enabling Description: For applications on devices with advanced haptic feedback systems, the bounding box information is used to create a "haptic texture map." Instead of a visual selection, the user drags their finger across the screen. As their finger enters a region corresponding to a tagged object, the device provides a distinct haptic feedback pattern (e.g., a specific vibration frequency or texture). The haptic pattern's unique signature, tied to the object ID, is used to identify the selection upon a secondary action, like a firm press or double-tap. This method is particularly useful for accessibility or in environments where the user's visual attention cannot be diverted.

stateDiagram-v2
    [*] --> NoInteraction
    NoInteraction --> FingerDown: User touches screen
    FingerDown --> Dragging: User moves finger
    Dragging --> HapticFeedback: Finger enters bounding box area
    HapticFeedback --> Dragging: Finger leaves bounding box area
    HapticFeedback --> SelectionMade: User performs confirmation action (e.g., double-tap)
    SelectionMade --> NoInteraction: Augmentation displayed

1.3. Compressed Vector-Based Bounding Box Definitions

Enabling Description: Rather than sending a full-frame ID buffer or pixel-level data, the server transmits a separate, time-synchronized metadata stream containing vectorized representations of the bounding boxes. These vectors could be defined using Scalable Vector Graphics (SVG) path data or a compact binary format describing polygons. The client-side renderer reconstructs these invisible vector shapes on a canvas overlaid on the video. When a user taps the screen, the client performs a point-in-polygon test against the vector data to determine which object was selected. This significantly reduces the data overhead compared to a full-frame RGBA buffer, especially for high-resolution video with few tracked objects.

graph TD
    A[Server: Encodes Video] --> B(Server: Generates Vector Bounding Box Data);
    B --> C{Metadata Stream (SVG/Custom Format)};
    A --> D{Video Stream};
    C --> E[Client Device];
    D --> E;
    E --> F[Render Video];
    E --> G[Render Invisible Vector Shapes];
    F & G --> H(Composite View);
    I[User Tap] --> H;
    H --> J{Point-in-Polygon Test};
    J --> K[Identify Selected Object ID];
    K --> L[Request/Display Augmentation];

Axis 2: Operational Parameter Expansion

2.1. High-Frequency (240fps+) Video for Slow-Motion Analysis

Enabling Description: In high-speed sports analysis (e.g., golf swing, baseball pitch), the system processes video at 240 frames per second or higher. The bounding box data stream is time-coded with sub-millisecond precision to match each video frame. To handle the data rate, the server pre-processes the video to generate keyframes for object location, and the client-side renderer interpolates the bounding box positions between these keyframes using a kinematic model (e.g., a Kalman filter) that predicts object trajectory. When the user pauses the video and scrubs frame-by-frame through a slow-motion replay, the bounding boxes remain perfectly synchronized, allowing for precise selection and retrieval of frame-accurate performance metrics.

graph TD
    A[High-FPS Video (240Hz)] --> B(Server: Keyframe Extraction);
    B --> C(Server: Bounding Box Generation @ Keyframes);
    C --> D[Client: Receives Video & Sparse Box Data];
    D --> E{Client: Interpolates Box Positions (Kalman Filter)};
    E --> F(Client: Renders Video & Synced Boxes @ 240Hz);
    G[User Interaction: Pause, Scrub] --> F;
    F --> H(Display Frame-Specific Augmentation);

2.2. Distributed, Low-Bandwidth "Edge Rendering" System

Enabling Description: In a low-bandwidth environment (e.g., mobile network in a stadium), the full video stream is not sent to a central server. Instead, on-site edge computing nodes, connected to the broadcast cameras, perform the object detection and generate the bounding box metadata. The client device receives a compressed video stream from a standard CDN, and a separate, very low-bandwidth data stream containing the bounding box definitions from the nearest edge node. The augmentation content (e.g., player stats) is also cached at the edge. When a user selects a bounding box, the request is sent to the local edge node, not a distant cloud server, which returns the augmentation. This minimizes latency and reduces the core network load.

sequenceDiagram
    participant UserDevice
    participant EdgeNode
    participant CloudServer

    CloudServer-->>EdgeNode: Pre-cache Augmentation Data
    EdgeNode->>UserDevice: Low-bandwidth Bounding Box Stream
    CloudServer-->>UserDevice: Compressed Video Stream (via CDN)

    UserDevice->>UserDevice: User selects object
    UserDevice->>EdgeNode: Request Augmentation (Object ID)
    EdgeNode-->>UserDevice: Return Aug-Data (Low Latency)

2.3. Nanoscale Microscopy Video Augmentation

Enabling Description: The system is adapted for real-time analysis of video from an electron microscope. The "objects" are cellular structures, organelles, or nanoparticles. The bounding box definitions are generated by a real-time image analysis algorithm trained to identify these microscopic features. A researcher viewing the live feed can tap on a specific mitochondrion, for example. The system uses its pre-assigned unique ID (from the RGBA buffer) to retrieve and overlay data from other sensors connected to the microscopy equipment, such as fluorescence intensity, chemical signatures from a spectrometer, or its movement vector over the last 1000 frames. This allows for immediate, in-context data correlation during live experiments.

graph TD
    A[Electron Microscope Video Feed] --> B(Image Analysis Server);
    B --> C{Real-time Organelle Identification};
    C --> D(Generates RGBA ID Mask);
    A --> E[Researcher's Display];
    D --> E;
    E --> F(User Taps on a Mitochondrion);
    F --> G{Read RGBA Value};
    G --> H{Request Data for ID};
    I[Spectrometer/Sensor Data] --> H;
    H --> J(Overlay Fluorescence Data on Display);

Axis 3: Cross-Domain Application

3.1. Aerospace: Augmented Reality for Aircraft Maintenance

Enabling Description: A maintenance technician points a tablet at an aircraft engine. The tablet's camera feed is processed in real-time. A server, using a 3D CAD model of the engine, identifies key components (e.g., fuel lines, sensor connections, turbine blades) and sends bounding polygons for each visible component back to the tablet. The technician taps on a specific hydraulic line. The client device identifies the component via the invisible ID layer and overlays its maintenance history, required torque specifications, and the next scheduled inspection date directly onto the video feed.

graph TD
    subgraph On-Site Technician
        A(Tablet Camera) --> B{Live Video Feed};
    end
    subgraph Maintenance Server
        C(3D CAD Model) -- Identifies --> D{Component Recognition};
        D -- Generates --> E(Bounding Polygon Data);
    end
    subgraph Tablet Display
        B -- Renders --> F[Video Display];
        E -- Renders --> G(Invisible ID Overlay);
        F & G --> H((Composite View));
    end
    I(Technician Taps Component) --> H;
    H --> J{ID Lookup};
    J --> K[Request Maintenance Data];
    L[Maintenance Database] --> K;
    K --> M(Display Specs & History);
    M -- Overlays --> H;

3.2. Agriculture (AgTech): Precision Livestock Monitoring

Enabling Description: Video feeds from cameras in a large-scale dairy farm are streamed to a central monitoring system. An animal recognition algorithm places a unique bounding box around each cow. A veterinarian or farm manager can view the live feed on a tablet. By tapping on a specific cow, they can instantly pull up an augmented reality overlay showing that cow's health records, milk yield, last insemination date, and any current health alerts (e.g., from an IoT-enabled collar). The selection is made by decoding the RGBA value of the transparent bounding box associated with that animal in the video frame.

sequenceDiagram
    participant FarmCamera
    participant RecognitionServer
    participant VetTablet
    participant FarmDB

    loop Live Feed
        FarmCamera->>+RecognitionServer: Send Video Frame
        RecognitionServer->>RecognitionServer: Identify Cows, Assign IDs
        RecognitionServer-->>-VetTablet: Stream Video + Bounding Box Data
    end
    VetTablet->>VetTablet: Vet Taps a Cow
    VetTablet->>+FarmDB: Request Record for Cow ID
    FarmDB-->>-VetTablet: Return Health & Production Data
    VetTablet->>VetTablet: Overlay Data on Video

3.3. Consumer Electronics: Interactive Cooking Tutorials

Enabling Description: A smart display in a kitchen streams a cooking class. As the chef introduces ingredients and uses equipment, an AI on the server identifies each item (e.g., "salt," "whisk," "mixing bowl") and creates a corresponding time-synced bounding box. The user, following along, can tap on the whisk in the video. The system identifies the "whisk" object and displays an augmentation with a link to purchase that specific whisk online, or a pop-up video demonstrating the proper whisking technique. The selection mechanism uses an invisible RGBA overlay, ensuring the primary video is unobstructed.

graph TD
    A[Cooking Video Stream] --> B(Server-side Object Recognition);
    B -- Identifies --> C(Ingredients & Utensils);
    C --> D(Generate Bounding Box Metadata);
    A & D --> E[User's Smart Display];
    E --> F(User Taps on 'Whisk');
    F --> G{Decode ID from Tap Coords};
    G --> H[Request 'Whisk' Augmentation];
    H --> I(Display Purchase Link / Technique Video);

Axis 4: Integration with Emerging Tech

4.1. AI-Driven Predictive Bounding Boxes & Augmentation

Enabling Description: A machine learning model analyzes player and object trajectories from the spatiotemporal data. It predicts the screen position of all tracked objects for the next N frames (e.g., N=10). The server sends not just the current bounding box data, but also the predicted trajectory vectors for each box. The client device uses these vectors to extrapolate the box positions, ensuring smooth, accurate tracking even with network latency or dropped packets in the metadata stream. A second AI model analyzes user interaction patterns (e.g., which player stats are most frequently requested) and pre-fetches the most likely augmentation data to the client device before the user even makes a selection, resulting in near-instantaneous display of information.

graph TD
    subgraph Server
        A[Live Spatiotemporal Data] --> B(Predictive Tracking Model);
        B --> C(Generates Bounding Boxes + Trajectory Vectors);
        A --> D(User Behavior Model);
        D --> E(Pre-fetches Probable Augmentations);
    end
    subgraph Client
        C --> F[Receive Boxes & Vectors];
        E --> G[Cache Augmentations];
        F --> H{Render & Extrapolate Box Positions};
        I[User Taps Player] --> H;
        H --> J{Lookup ID};
        J --> K[Retrieve Augmentation from Cache];
        K --> L(Display Instantly);
    end

4.2. IoT-Enhanced Live Event Augmentation

Enabling Description: During a live auto race, each car is equipped with IoT sensors transmitting real-time data (tire pressure, engine temperature, G-forces). This data is ingested by a central server. The system described in the '446 patent is used to track the cars on the video broadcast. The bounding box for each car is linked not to a static database, but to the live IoT data stream for that vehicle. When a viewer taps on a car, the augmentation overlay displays a live telemetry dashboard for that specific vehicle, showing real-time G-force in a corner or a warning if tire temperatures are critical. The renderer acts as an aggregator, pulling video-frame data, bounding box IDs, and live IoT data together to generate the composite view.

sequenceDiagram
    participant CarIoT as "Car IoT Sensors"
    participant CentralServer as "Data Ingestion Server"
    participant Renderer
    participant ClientDevice

    CarIoT->>CentralServer: Stream Real-Time Telemetry
    CentralServer->>Renderer: Forward Telemetry by Car ID
    Renderer->>ClientDevice: Send Video + Bounding Boxes

    activate ClientDevice
    ClientDevice->>Renderer: User selects Car (sends Bounding Box ID)
    activate Renderer
    Renderer->>Renderer: Correlate ID with Live Telemetry
    Renderer-->>ClientDevice: Send Augmentation (Live Data)
    deactivate Renderer
    ClientDevice->>ClientDevice: Display Real-Time Telemetry Overlay
    deactivate ClientDevice

4.3. Blockchain-Verified Authenticity for Digital Collectibles

Enabling Description: In a system for viewing or trading digital art or sports memorabilia (e.g., NFTs), a video showcases the item. A bounding box is placed around the digital asset. When a user taps the asset, the client sends the object's unique ID to a server. The server uses this ID to query a public blockchain to verify the asset's authenticity, ownership history, and provenance. The augmentation returned to the user is a visually secure, non-editable graphic displaying the "Verified on Blockchain" status along with the transaction history. This provides immediate, trustworthy validation of digital goods within an interactive video context.

graph TD
    A[Video of Digital Asset] --> B{Server Adds Bounding Box ID};
    B --> C[Stream to User];
    subgraph User Interaction
        C --> D(User Taps on Asset);
        D --> E{Client Extracts ID};
    end
    E --> F[Server Receives ID];
    F --> G(Query Blockchain for ID);
    G --> H[Retrieve Provenance Data];
    H --> I(Generate Verification Augmentation);
    I --> J[Send Augmentation to Client];
    J --> K(Display on User's Screen);

Axis 5: The "Inverse" or Failure Mode

5.1. Graceful Degradation Mode for Low-Power/Low-Bandwidth

Enabling Description: The client device monitors its own performance (e.g., CPU load, frame-rate drops) and network quality (e.g., latency, jitter). If performance degrades below a set threshold, the augmentation system automatically switches to a "low-fidelity" mode. In this mode, instead of requesting and rendering complex animated graphics, the client requests a simple text-only version of the augmentation data. For example, instead of a player's animated shot chart, it would display "Player X: 6/10 FG, 3/5 3PT." The invisible bounding boxes may also be simplified from complex polygons to simple rectangles to reduce the client-side rendering load for the hit-test. This ensures the core functionality remains available without crashing or lagging the device.

stateDiagram-v2
    state "High-Fidelity Mode" as HiFi
    state "Low-Fidelity Mode" as LoFi

    [*] --> HiFi : Initial State
    HiFi --> LoFi: Detects High CPU / Low Bandwidth
    LoFi --> HiFi: Detects Improved Performance

    HiFi: Renders Animated Graphics
    HiFi: Uses Complex Bounding Polygons
    LoFi: Renders Text-Only Data
    LoFi: Uses Simple Bounding Rectangles

5.2. Haptic/Audio Feedback on Selection Failure

Enabling Description: To provide clear feedback when an augmentation fails to load, the system incorporates a failure-mode response. When a user selects a valid bounding box, the client sends the request to the renderer. If the renderer fails to respond within a timeout period (e.g., 500ms) or returns an error, the client device provides immediate non-visual feedback. This can be a short, sharp vibration (haptic alert) and/or a subtle, non-intrusive audio cue (e.g., a soft "click" or "error" tone). This differentiates a failed data fetch from a tap on an non-interactive area of the screen, improving the user experience by managing expectations and communicating system status.

sequenceDiagram
    participant User
    participant ClientDevice
    participant Renderer

    User->>+ClientDevice: Tap on Bounding Box
    ClientDevice->>+Renderer: Request Augmentation
    alt Successful Response
        Renderer-->>ClientDevice: Augmentation Data
        ClientDevice-->>-User: Display Augmentation
    else Timeout or Error
        Renderer--xClientDevice: Error / No Response
        ClientDevice->>ClientDevice: Trigger Haptic Pulse
        ClientDevice->>ClientDevice: Play Audio Cue
        ClientDevice-->>-User: Display "Info Unavailable"
    end

5.3. Static Fallback Cache System

Enabling Description: Before an event (e.g., a sports game) begins, the server pushes a lightweight "static fallback cache" to all client devices. This cache contains basic, non-real-time augmentation data for key objects (e.g., player names, jersey numbers, and basic season stats). During the live event, if the client requests a dynamic, real-time augmentation and the connection to the renderer fails, the client's augmentation module will automatically check its local cache for the selected object's ID. If a match is found, it will display the pre-loaded static data with a visual indicator (e.g., a "cached" icon) to signify that the information is not live. This ensures that the user receives some relevant information even during network interruptions.

graph TD
    A[Pre-Game] --> B(Server Pushes Static Data Cache to Client);
    C[Live Game] --> D{User Taps Player};
    D --> E{Request Live Augmentation from Server};
    E -- Fails --> F{Check Local Cache for Player ID};
    F -- ID Found --> G[Display Static Data];
    E -- Succeeds --> H[Display Live Data];

Combination Prior Art Scenarios

  1. Combination with WebGL and ONNX.js: The core method of the '446 patent is combined with open web standards. A web browser-based client receives the video stream and bounding box data. The bounding box ID buffer is rendered to an off-screen <canvas> element using WebGL for GPU-accelerated performance. Upon user click, the RGBA value is read from this canvas. The object ID is then fed into a client-side ONNX.js (Open Neural Network Exchange for JavaScript) runtime, which executes a pre-trained machine learning model to generate a context-aware augmentation locally, reducing server-side load. For example, the model could predict a basketball player's shot probability from their current on-court position (derived from the bounding box) and display it.

  2. Combination with GStreamer and RTP: A broadcast system uses the open-source GStreamer framework for video processing. A custom GStreamer plugin is developed to analyze the video, perform object detection (e.g., using a YOLO model), and embed the bounding box data as a separate data track within the Real-time Transport Protocol (RTP) stream (per RFC 2326), alongside the audio and video tracks. A client application, also built with GStreamer, receives the RTP stream, demultiplexes the video and bounding box data tracks, and uses the '446 patent's method to render the invisible ID layer and handle user interaction for displaying augmentations. This creates an end-to-end open-source pipeline for creating and consuming interactive, augmented video streams.

  3. Combination with FFMPEG and MQTT: The '446 patent's method is used in a remote surveillance or industrial monitoring context. Video is encoded using the open-source FFMPEG library. Simultaneously, a separate process generates bounding box data for moving objects or equipment states. This metadata, containing the object ID and coordinates, is published to a lightweight MQTT (Message Queuing Telemetry Transport) topic. A client application subscribes to this MQTT topic and also streams the video via a standard protocol like HLS. The client synchronizes the low-latency MQTT messages with the video frames and renders the augmented overlay. Tapping an object publishes an MQTT message back to the server, which can trigger an action, such as moving a PTZ camera or logging an event, creating a two-way interactive system based on open standards.

Generated 5/12/2026, 12:49:20 PM