Derivative works — US Patent 11120271

Defensive Disclosure and Prior Art Generation Based on U.S. Patent No. 11,120,271

Subject Patent: U.S. Patent No. 11,120,271
Title: Data processing systems and methods for enhanced augmentation of interactive video content
Priority Date: February 28, 2014
Disclaimer: This document is a defensive publication intended to disclose concepts and technologies that build upon, extend, or modify the inventions described in U.S. Patent No. 11,120,271. The purpose is to place these concepts into the public domain to be considered prior art for any future patent applications.

Derivative Variations Based on Core Claims

The following derivative concepts are based on the core method of identifying semantic elements and their context from spatiotemporal video data to generate interactive augmentations.

1. Material & Component Substitution

Derivative 1.1: Neuromorphic Vision Sensor for Spatiotemporal Data Acquisition

Enabling Description: The system is modified to replace conventional frame-based video cameras with event-based neuromorphic vision sensors (e.g., Dynamic Vision Sensors - DVS). Instead of processing a plurality of video frames, the system directly ingests an asynchronous stream of pixel-level brightness change events (spikes). The "spatiotemporal data" is now this event stream itself, not derived positional data. A Spiking Neural Network (SNN) is used for the "semantic element" identification, recognizing moving objects (players, balls) by their unique temporal signatures. The "semantic context" is determined by a Recurrent Neural Network (RNN) layer within the SNN that interprets sequences of spike events as specific actions (e.g., a "pick and roll"). Augmentations are triggered based on the SNN's classification output, and the rendering engine composites these augmentations onto a reconstructed video frame generated from the event stream for display. This substitution significantly reduces data bandwidth and processing latency, making it suitable for real-time edge computing applications.

Diagram:

graph TD
    A[Neuromorphic Vision Sensor] -->|Event Stream| B(Spiking Neural Network);
    B --> C{Semantic Element & Context Identification};
    C -->|Action: 'Pick & Roll'| D(Augmentation Engine);
    D --> E[Augmentation Data];
    A --> F(Frame Reconstructor);
    F --> G[Video Frame];
    E & G --> H(Compositor);
    H --> I(Augmented Video Output);

Derivative 1.2: Volumetric Video (Voxel) Data as Spatiotemporal Input

Enabling Description: The system is adapted to process volumetric video data (voxels) captured by a multi-camera array, rather than 2D video. The "spatiotemporal data" is the 4D (3D space + time) voxel dataset. Semantic element identification is performed by a 3D Convolutional Neural Network (3D-CNN) that directly analyzes the voxel data to segment and classify objects (players, equipment) in three-dimensional space. "Semantic context" is derived from analyzing the volumetric change and interaction between these 3D object models over time. Augmentations are rendered as 3D objects or data visualizations placed within the 3D scene, which can be viewed from any virtual camera angle selected by the user. This enables a fully immersive, "free-viewpoint" augmented reality sports experience.

Diagram:

graph TD
    subgraph Data Acquisition
        A[Multi-Camera Array] --> B{Volumetric Capture Server};
        B --> C[4D Voxel Data Stream];
    end
    subgraph Processing
        C --> D[3D-CNN for Object Segmentation];
        D --> E{Semantic Element & Context Analysis};
        E --> F[Augmentation Logic];
    end
    subgraph Rendering & Display
        F --> G[3D Augmentation Assets];
        C --> H[Volumetric Renderer];
        G --> H;
        H --> I(Interactive 3D Scene);
    end

Derivative 1.3: Graphene-based Wearable Sensors for Player Spatiotemporal Data

Enabling Description: The spatiotemporal data is generated not from video but from a network of graphene-based strain and inertial measurement unit (IMU) sensors integrated into player jerseys and equipment (e.g., ball, shoes). These sensors provide high-frequency, low-latency data on player limb orientation, velocity, acceleration, and ball impact forces. The "semantic elements" (players, ball) are pre-identified by their sensor IDs. The "semantic context" is determined by a machine learning model (e.g., a Long Short-Term Memory network, LSTM) trained to recognize specific athletic movements and plays (e.g., shooting form, tackle type, swing mechanics) from the multi-channel sensor time-series data. This system decouples the augmentation from the broadcast video feed, allowing personalized, biomechanical augmentations to be overlaid on any video source or even a virtual representation of the game.

Diagram:

sequenceDiagram
    participant P as Player (Wearable Sensors)
    participant S as Sensor Aggregation Gateway
    participant C as Cloud Analytics Platform
    participant V as Viewing Device

    P->>S: High-Frequency IMU/Strain Data
    S->>C: Transmit Aggregated Sensor Data Stream
    C->>C: LSTM Model Analyzes Movement Patterns
    C->>C: Detects 'Shot Attempt' (Semantic Context)
    C->>V: Send Augmentation Data (e.g., Release Angle, Velocity)
    V->>V: Overlay Augmentation on Video Feed

2. Operational Parameter Expansion

Derivative 2.1: Microfluidic Analysis Augmentation

Enabling Description: The system is applied at a microscopic scale to analyze video feeds from high-speed cameras observing microfluidic "lab-on-a-chip" devices. The "semantic elements" are individual cells, droplets, or micro-particles flowing through channels. The "spatiotemporal data" is derived from high-frequency particle tracking velocimetry (PTV) algorithms. The "semantic context" involves identifying cellular behaviors like mitosis, apoptosis, or chemotaxis, or detecting anomalies in fluid flow patterns. Augmentations include overlaying color-coded velocity vectors, highlighting specific cells that exhibit target behaviors, and plotting real-time graphs of particle concentrations. User interaction allows a researcher to select a single cell and track its complete path and state changes over time.

Diagram:

flowchart TD
    A[Microscope Video Feed] --> B(Image Processor);
    B -- Particle Tracking --> C[Spatiotemporal Data (X,Y,T)];
    C --> D[ML Classifier for Cell Behavior];
    D -- Context: Mitosis --> E{Generate Augmentation};
    E --> F[Overlay: Highlight Cell, Plot Lineage Tree];
    A & F --> G(Augmented Microscope View);

Derivative 2.2: Hyperspectral Satellite Imagery Analysis

Enabling Description: The system is scaled to operate on time-series hyperspectral satellite imagery for agricultural or environmental monitoring. A "video frame" is a single satellite image capture, and a sequence of captures over time constitutes the video. "Semantic elements" are distinct land parcels, crop types, or water bodies identified through spectral signature analysis. "Spatiotemporal data" includes the location, size, and spectral changes of these elements over days, weeks, or months. "Semantic context" is determined by analyzing these changes to detect events like crop stress, illegal deforestation, or algal blooms. Augmentations include overlaying color-coded health indices (e.g., NDVI), drought warnings, or predicted yield data directly onto the satellite imagery map.

Diagram:

graph LR
    A[Time-Series Satellite Imagery] --> B(Spectral Signature Analysis);
    B --> C(Object Segmentation: Fields, Forests);
    C --> D{Spatiotemporal Change Detection};
    D -- Context: NDVI Drop > 20% --> E(Augmentation Engine);
    E --> F[Generate 'Drought Stress' Alert & Overlay];
    A & F --> G(Augmented Geospatial Display);

3. Cross-Domain Application

Derivative 3.1: Aerospace - Automated Air Traffic Control

Enabling Description: The system is applied to real-time radar and ADS-B data feeds, visualized on a 2D or 3D air traffic control map. The "video content" is the real-time map display. "Semantic elements" are individual aircraft, identified by their transponder codes. The "spatiotemporal data" is their 4D trajectory (latitude, longitude, altitude, time). The system's machine learning model determines "semantic context" by predicting potential conflicts, such as loss of separation or runway incursion risks, based on flight vectors and known airport procedures. Augmentations include highlighting conflicting aircraft in red, drawing projected flight paths with color-coded risk levels, and automatically generating resolution advisories (e.g., "Climb FL350") as interactive overlays for the controller to approve or reject.

Diagram:

sequenceDiagram
    participant Radar/ADS-B as Data Feed
    participant ATCSystem as Core System
    participant ControllerUI as Operator Display

    loop Real-Time Update
        Radar/ADS-B->>ATCSystem: Update Aircraft Positions (Spatiotemporal Data)
        ATCSystem->>ATCSystem: Identify Aircraft (Semantic Elements)
        ATCSystem->>ATCSystem: Predict Trajectories & Conflicts (Semantic Context)
        alt Conflict Detected
            ATCSystem->>ControllerUI: Generate 'Loss of Separation' Augmentation
        end
        ControllerUI->>ControllerUI: Render Aircraft with Highlight & Vector
    end

Derivative 3.2: AgTech - Autonomous Pest and Disease Detection

Enabling Description: The system is deployed on an agricultural drone equipped with multispectral cameras. The "video content" is the live feed from the drone's cameras as it flies over a field. "Semantic elements" are individual plants or sections of a crop row. "Spatiotemporal data" is the GPS-tagged location of each frame and the visual data within it. The system uses a trained computer vision model to determine the "semantic context" of each plant, such as "healthy," "nutrient deficient," "insect-infested," or "diseased." Augmentations are overlaid in real-time on the operator's display, color-coding areas of the field based on health status. User interaction allows the operator to tap on a highlighted area to view detailed multispectral readings, zoom in on the specific pest identified, and dispatch a targeted spraying drone to that precise GPS coordinate.

Diagram:

graph TD
    A[Drone Multispectral Video] --> B{Plant Health Classifier};
    B -- Context: Aphid Infestation --> C[Augmentation Engine];
    C --> D(Generate Infestation Bounding Box);
    A & D --> E[Operator's Live View];
    E -- User Tap --> F{Dispatch Targeting Info};
    F --> G[Spraying Drone];

Derivative 3.3: Retail - In-Store Shopper Behavior Analysis

Enabling Description: In a retail environment, the "video content" is from overhead cameras. "Semantic elements" are individual shoppers, identified and tracked anonymously. The "spatiotemporal data" consists of their movement paths, dwell times in different aisles, and interactions with products. The "semantic context" is the classification of shopping behavior, such as "browsing," "comparing products," "seeking assistance," or "abandoning cart." Augmentations are provided on a store manager's dashboard, showing a real-time heatmap of store activity, highlighting shoppers who have been waiting for assistance for over a threshold time, or flagging unusual movement patterns that might indicate theft. An alert could be triggered to a store associate's mobile device, showing a video clip of the event and the shopper's location.

Diagram:

flowchart LR
    subgraph Store
        A[Ceiling Cameras] --> B(Video Stream);
    end
    subgraph Analysis Server
        B --> C{Shopper Tracking & Path Analysis};
        C --> D[Behavioral Classification (LSTM)];
        D -- Context: 'Hesitation' at Shelf --> E{Augmentation Rule Engine};
        E --> F[Generate 'Potential Assistance Needed' Alert];
    end
    subgraph Staff Interface
        F --> G(Manager Dashboard / Associate App);
    end

4. Integration with Emerging Tech

Derivative 4.1: AI-Driven Predictive Augmentation

Enabling Description: The system is integrated with a predictive AI model trained on vast datasets of historical games. The "semantic context" determination is enhanced to be predictive. Based on the current spatiotemporal data (player positions, velocities), the AI model calculates the probabilities of various near-future events (e.g., "75% chance of a 3-point shot attempt," "40% chance of a turnover"). The system then "pre-fetches" and renders relevant augmentations before the event occurs. For example, as a player dribbles towards the three-point line, their season 3-point percentage is pre-emptively displayed. If the user interacts with this predictive augmentation, it could trigger a display of probability-weighted shot charts showing the most likely outcomes.

Diagram:

sequenceDiagram
    participant VideoInput as Video & Spatiotemporal Data
    participant PredictiveAI as AI Model
    participant AugmentationEngine as Augmentation Engine
    participant Renderer as Renderer/Display

    VideoInput->>PredictiveAI: Send Current Game State (Frame t)
    PredictiveAI->>PredictiveAI: Analyze State & Predict Next Actions (t+1..t+n)
    PredictiveAI->>AugmentationEngine: Send 'High Probability Shot' context
    AugmentationEngine->>AugmentationEngine: Select 'Shooting Stats' augmentation
    AugmentationEngine->>Renderer: Send Augmentation for Frame t+1
    VideoInput->>Renderer: Send Video Frame t+1
    Renderer->>Renderer: Composite Video and Augmentation

Derivative 4.2: IoT Sensor Fusion for Enhanced Context

Enabling Description: The system's "spatiotemporal data" is augmented with data from a network of IoT sensors. In a sports context, this includes biometric sensors on players (heart rate, fatigue level), environmental sensors in the stadium (temperature, humidity), and even crowd noise level sensors. The "semantic context" determination now fuses video-derived data with this IoT data. For example, the system can identify a "fatigued player" context when their in-game speed (from video tracking) drops while their heart rate (from IoT sensor) remains high. The resulting augmentation could be a visual indicator over the player, alerting a coach to a potential substitution need, or providing commentators with a data-driven talking point.

Diagram:

graph TD
    A[Video Spatiotemporal Data] --> C{Context Fusion Engine};
    B[IoT Sensor Data (Heart Rate, etc.)] --> C;
    C --> D[Multi-modal Semantic Context Analysis];
    D -- Context: high_hr AND low_speed --> E{Generate 'Fatigue' Augmentation};
    E --> F[Augmented Video Output];

Derivative 4.3: Blockchain for Verifiable Augmentations and Micro-transactions

Enabling Description: Each spatiotemporal event and its associated semantic context (e.g., "Player A scored a 3-point shot at game time 10:32") is hashed and recorded as a transaction on a distributed ledger (blockchain). The augmentations displayed to users, especially those related to gambling or official statistics, include a cryptographic signature that can be verified against the blockchain record, ensuring data integrity and provenance. Furthermore, user interaction with certain augmentations can trigger micro-transactions. For example, a user clicking a "Collect This Highlight" button on an augmented video clip could trigger a smart contract to mint a Non-Fungible Token (NFT) of that specific play, transferring ownership to the user's digital wallet.

Diagram:

graph TD
    A[Video & Spatiotemporal Data] --> B(Semantic Event Detection);
    B -- "Goal Scored" --> C{Create Event Record};
    C --> D(Hash Event Data);
    D --> E{Write to Blockchain};
    B --> F(Augmentation Engine);
    F -- "Show Goal Highlight" --> G[Render Interactive Augmentation];
    G -- User Clicks "Collect" --> H{Initiate Smart Contract};
    H -- Mint NFT --> E;

5. The "Inverse" or Failure Mode

Derivative 5.1: Graceful Degradation Mode for Low-Bandwidth Environments

Enabling Description: The system incorporates a "low-power" or "graceful degradation" mode that activates when the client device detects poor network bandwidth or limited processing capability. In this mode, the system prioritizes the core video stream. The "spatiotemporal data" received is down-sampled, and the "semantic element" detection is limited to only primary objects (e.g., just the ball carrier, not all players). "Semantic context" analysis is simplified to basic events (e.g., "shot" vs. "pass") instead of complex plays. Augmentations are switched from processor-intensive 3D graphics to lightweight, static text or simple icons. User interaction is limited to toggling these simple data points on/off, rather than triggering new video streams or complex overlays, ensuring the core viewing experience remains fluid.

Diagram:

stateDiagram-v2
    [*] --> HighBandwidth
    HighBandwidth: Full Augmentations, HD Video, Complex Contexts
    HighBandwidth --> LowBandwidth: Network Degrades
    LowBandwidth: Simplified Augmentations, SD Video, Basic Contexts
    LowBandwidth --> HighBandwidth: Network Improves
    LowBandwidth --> [*]
    HighBandwidth --> [*]

Derivative 5.2: Privacy-Preserving Augmentation via Anonymization

Enabling Description: This "inverse" application uses the same core technology to enhance privacy. The system processes video from a public space (e.g., a city square). The "semantic element" identification algorithm detects all human figures. However, instead of augmenting them with identifying information, the system applies an obfuscation augmentation. It generates a "privacy bounding box" for each person and applies a real-time blur or pixelation filter only within that box. The "spatiotemporal data" is used to track the bounding boxes accurately as people move. The "semantic context" engine can be configured to selectively de-anonymize authorized personnel (e.g., security guards) based on a uniform or other identifier, while keeping all other individuals anonymized in the security feed.

Diagram:

flowchart TD
    A[Live Security Camera Feed] --> B(Person Detection & Tracking);
    B -- Bounding Boxes --> C{Semantic Context: Is Person Authorized?};
    C -- No --> D[Apply Blur Augmentation];
    C -- Yes --> E[No Augmentation (Clear View)];
    A & D & E --> F(Composited Privacy-Preserving Video);

Combination Prior Art Scenarios

Combination 1: Integration with WebRTC and TensorFlow.js for Browser-Based Real-Time Augmentation

Enabling Description: The methods of patent 11,120,271 are implemented entirely within a web browser using open standards. A live video stream of a sporting event is delivered to the client's browser via WebRTC, a standard protocol for real-time peer-to-peer communication. On the client side, a lightweight computer vision model for object tracking, built using the TensorFlow.js library, runs directly in the browser. This model processes the incoming video frames to extract basic spatiotemporal data (x, y coordinates of players). This data, along with a timestamp, is sent to a remote server which performs the more intensive "semantic context" analysis. The server sends back lightweight augmentation data (e.g., JSON objects with text and display coordinates). The client-side JavaScript then renders these augmentations as HTML5 Canvas or SVG overlays on top of the <video> element. This combination makes the interactive video experience accessible on any modern web browser without plugins, leveraging open-source ML libraries for client-side processing.

Diagram:

sequenceDiagram
    participant Browser
    participant WebRTC_Server
    participant Analytics_Server

    WebRTC_Server->>Browser: Streams Live Video
    Browser->>Browser: TensorFlow.js: Tracks players in video frames
    Browser->>Analytics_Server: Sends spatiotemporal data (player coords, timestamp)
    Analytics_Server->>Analytics_Server: Determines semantic context (e.g., 'fast break')
    Analytics_Server->>Browser: Returns augmentation data (JSON)
    Browser->>Browser: Renders HTML5 Canvas overlay on video

Combination 2: Integration with MPEG-DASH and Timed Metadata Tracks (EMSG)

Enabling Description: The system's output is packaged for standards-compliant adaptive bitrate streaming. The "spatiotemporal data" and "semantic context" information (e.g., play type, player ID, coordinates) is encoded into a timed metadata track, specifically using the "Events Message" (emsg) box format within the MPEG-DASH standard. The video server generates the DASH manifest (MPD) which lists the available video, audio, and the new metadata tracks. A standard-compliant DASH player on a client device plays the video and simultaneously parses the emsg boxes as they arrive. The player's application logic uses the data from these timed events to trigger the rendering of augmentations, synchronizing them precisely with the video frames. This allows the augmented experience to be delivered through standard Content Delivery Networks (CDNs) and played on a wide range of devices that support MPEG-DASH, without requiring a custom video player.

Diagram:

graph TD
    subgraph Server-Side
        A[Live Video Feed] --> B(Encoder);
        C[Spatiotemporal Analysis] --> D{Metadata Generator (EMSG)};
        B & D --> E(MPEG-DASH Packager);
        E --> F[CDN];
    end
    subgraph Client-Side
        F --> G(DASH Video Player);
        G -- Video Segments --> H[Video Decoder];
        G -- EMSG Metadata --> I[Augmentation Renderer];
        H & I --> J(Synchronized Augmented Display);
    end

Combination 3: Integration with ROS (Robot Operating System) for Autonomous Drone Cinematography

Enabling Description: The system is integrated into an autonomous drone running the Robot Operating System (ROS), an open-source framework for robot software development. The drone's camera provides the "video content". Onboard computer vision nodes running within ROS perform real-time object detection and tracking to generate "spatiotemporal data" for players on a field. A "semantic context" node analyzes this data to identify key moments in a game (e.g., a breakaway, a goal, a key defensive play). This context is used to trigger an "augmentation," which in this case is not a visual overlay, but a command to the drone's flight control system. For example, upon detecting a "fast break" (semantic context), the system commands the drone to automatically switch to a more cinematic "chase" camera angle, smoothly following the lead player. This combines the patent's event-detection capabilities with open-source robotics control software to create an automated, intelligent sports cinematographer.

Diagram:

graph TD
    subgraph Drone_Onboard_System
        A[Camera] -- Image Data --> B[ROS Node: Vision Processing];
        B -- Player Positions --> C[ROS Node: Spatiotemporal Analysis];
        C -- "Goal Scored" Event --> D[ROS Node: Cinematography Logic];
        D -- "Execute Orbit Shot" --> E[ROS Node: Flight Controller];
        E -- Motor Commands --> F[Drone Motors/Actuators];
    end
    G[Ground Station] --> D;
    E --> G;