Derivative works — US Patent 11611785

Here is a comprehensive Defensive Disclosure document for US Patent 11611785, focusing on generating new derivative works and technical disclosures to establish prior art.

Defensive Disclosure for US Patent 11611785

Introduction

This document details several derivative variations and combinations of the core inventive concepts described in US Patent 11611785, titled "Systems and methods for encoding and streaming video encoded using a plurality of maximum bitrate levels." The purpose of this disclosure is to enrich the prior art landscape, rendering obvious or non-novel future incremental improvements by competitors within the domain of adaptive video encoding and streaming. The derivations are based on the independent claims, primarily Claim 1, which describes a method for encoding and adaptively streaming video content by evaluating quality across multiple target bitrates for various resolutions.

Derivative Variations Grounded in Claim 1

1. Material & Component Substitution: Specialized Hardware Acceleration for Encoding and Storage

Enabling Description:
The method for encoding source content (Claim 1, step 3) can be performed using specialized hardware accelerators such as dedicated Video Processing Units (VPUs) or ASICs incorporating advanced video codecs like AV1 or VVC. Specifically, the "encoding at least a portion of the source video content multiple times" can be executed by a farm of NVIDIA NVENC-enabled GPUs (e.g., RTX 6000 Ada Generation) or Intel Arc A-series GPUs leveraging their integrated Xe Media Engine, allowing for parallel encoding of different resolution/bitrate combinations. For quality evaluation (Claim 1, step 3), a dedicated Image Signal Processor (ISP) module can be used to compute real-time perceptual quality metrics (e.g., VMAF, SSIM) on the encoded outputs. The "uploading encodings" (Claim 1, step 5) would target a geographically distributed object storage system, such as Amazon S3 Glacier Deep Archive or Google Cloud Storage Coldline, utilizing erasure coding (e.g., Reed-Solomon codes) for data resilience and integrity instead of simple replication. The content distribution system (Claim 1, step 5) could then employ programmable edge proxies running WebAssembly modules for dynamic manifest manipulation (Claim 1, step 6) and request routing (Claim 1, step 8).

graph TD
    A[Source Video Content] --> B{Hardware Encoding Farm<br>(NVENC/Intel Xe Media Engine)};
    B -- Multiple Resolutions & Bitrates --> C{Hardware-Accelerated Quality Evaluation<br>(ISP, VMAF Engine)};
    C --> D{Resolution & Bitrate Selection Logic};
    D -- Selected Encodings --> E[Distributed Object Storage<br>(S3-compatible with Erasure Coding)];
    E --> F{Content Distribution System<br>(CDN Edge Proxies with WASM)};
    F -- Dynamic MPD/Manifest Generation --> G[Playback Devices];
    G -- Segment Requests --> F;

2. Operational Parameter Expansion: Ultra-Low Latency Volumetric Video Streaming

Enabling Description:
The method is expanded for ultra-low latency adaptive streaming of volumetric video data (e.g., point clouds, mesh sequences) at both extreme fidelity (e.g., 8K depth maps, 120 FPS) and extremely coarse granularities (e.g., sparse point clouds at 5 FPS). The "source video content" (Claim 1, step 1) consists of real-time captured volumetric data. "Identifying a plurality of resolutions" (Claim 1, step 2) would involve selecting different levels of spatial and temporal resolution for the volumetric data, along with varying levels of point density or mesh complexity. The "encoding" (Claim 1, step 3) would leverage specialized volumetric codecs (e.g., V-PCC, MPEG-G) running on custom hardware with dedicated 3D processing units. "Quality evaluation" (Claim 1, step 3) would focus on metrics relevant to 3D perception, such as point cloud distortion (e.g., Hausdorff distance) or visual coherence during view synthesis, performed within a millisecond budget. "Target bitrates" would range from hundreds of Mbps for high-fidelity interactive VR/AR applications down to tens of Kbps for basic volumetric presence sensing. The "top level index" (Claim 1, step 6) would dynamically describe different volumetric representations, including sparse and dense point clouds, and varied mesh resolutions, allowing for adaptive fetching based on user viewpoint, device capabilities, and network conditions to maintain sub-100ms end-to-end latency.

sequenceDiagram
    participant VC as Volumetric Capture System
    participant EH as Encoding Hardware (3D VPU)
    participant QEL as Quality Evaluation Logic (3D Metrics)
    participant ABSE as Adaptive Bitrate Streaming Engine
    participant CDN as CDN (Edge Servers)
    participant PD as Playback Device (AR/VR Headset)

    VC->>EH: Raw Volumetric Data (High-Res, High-FPS)
    EH->>EH: Encode (multiple resolutions/bitrates)
    EH->>QEL: Encoded Volumetric Streams
    QEL->>EH: Quality Feedback (3D Distortion, Perceptual)
    EH->>ABSE: Selected Volumetric Representations
    ABSE->>CDN: Upload Volumetric Segments & Manifest
    PD->>ABSE: Request Initial Manifest (e.g., DASH MPD for Volumetric)
    ABSE->>PD: Provide Volumetric Manifest
    PD->>PD: Analyze Network/Device Capability
    PD->>CDN: Request Volumetric Segment (Adaptive)
    CDN->>PD: Deliver Volumetric Segment (Ultra-Low Latency)
    PD->>PD: Render Volumetric Content

3. Cross-Domain Application: High-Definition Geospatial Data Streaming for Urban Planning

Enabling Description:
The adaptive streaming method is applied to high-definition geospatial video (e.g., aerial footage, drone surveys, satellite imagery time-lapses) for urban planning and environmental monitoring. The "source video content" (Claim 1, step 1) includes large-scale orthomosaic images, LiDAR scan animations, and photogrammetry model traversals, often spanning gigapixels. "Identifying a plurality of resolutions" (Claim 1, step 2) involves selecting different spatial resolutions (e.g., 5cm/pixel, 20cm/pixel, 1m/pixel) and temporal resolutions (daily updates, weekly updates). "Encoding" (Claim 1, step 3) would utilize advanced geospatial compression techniques (e.g., JPEG 2000, WebP for image sequences, or specialized point cloud compression) to create multi-resolution tiled data. "Quality evaluation" (Claim 1, step 3) would assess feature recognition accuracy, measurement precision, and cartographic legibility at various zoom levels. "Target bitrates" are adapted to client-side GIS processing capabilities and available network bandwidth (e.g., for field agents with cellular connections vs. office analysts with fiber). The "top level index" (Claim 1, step 6) would describe the various geospatial data layers and their resolutions, enabling dynamic data loading for interactive map applications and 3D city models.

graph LR
    A[Raw Geospatial Data<br>(Aerial, LiDAR, Photogrammetry)] --> B{Data Pre-processing<br>(Tiling, Layering, Metadata)};
    B --> C{Encoding Engine<br>(Geospatial Codecs, Multi-resolution)};
    C -- Multiple Resolution/Bitrate Versions --> D{Quality & Feature Validation<br>(GIS Accuracy, Legibility Metrics)};
    D --> E{Adaptive Stream Selector};
    E -- Selected Stream Profiles --> F[Geospatial Data Servers<br>(Tiled Storage)];
    F --> G{Dynamic Manifest Generator<br>(Geo-Spatial MPD)};
    G --> H[GIS Workstation/Mobile Device];
    H -- Requests for Geo-Segments --> F;

4. Cross-Domain Application: Industrial Metrology and Quality Control

Enabling Description:
This method is adapted for real-time streaming of high-precision metrology video from industrial inspection systems (e.g., automated optical inspection (AOI), computed tomography (CT) scans). The "source video content" (Claim 1, step 1) consists of high-magnification optical images, X-ray videos, or 3D scan data (e.g., structured light projections) used to detect microscopic defects in manufactured goods. "Identifying a plurality of resolutions" (Claim 1, step 2) involves varying pixel densities for defect features (e.g., 1000 DPI for critical areas, 100 DPI for overview), frame rates (e.g., high-speed for inline inspection, low-speed for manual review), and color depth. "Encoding" (Claim 1, step 3) focuses on preserving minute details, using lossless or perceptually lossless codecs where critical. "Quality evaluation" (Claim 1, step 3) is based on automated defect detection algorithms (e.g., machine vision, edge detection) and their confidence scores, ensuring that lower bitrate streams do not mask critical anomalies. "Target bitrates" are determined by the severity of potential defects and network conditions within the factory, prioritizing high-fidelity streams for critical process steps while offering lower bitrate streams for general monitoring or remote diagnostic support. The "top level index" (Claim 1, step 6) would allow manufacturing execution systems (MES) or remote technicians to request specific quality levels of inspection data.

graph TD
    A[Metrology Sensor<br>(AOI, CT, Structured Light)] --> B{Data Acquisition & Pre-processing};
    B --> C{Industrial Video Encoder<br>(Lossless/Perceptually Lossless Codecs)};
    C -- Multiple Encodings --> D{Automated Defect Detection & Quality Metrics<br>(ML-based Vision)};
    D --> E{Adaptive Quality Selector<br>(Prioritize Defect Visibility)};
    E -- Selected Stream Profiles --> F[Factory Data Historian/Server];
    F --> G{Real-time Manifest Generator};
    G --> H[MES/QA Workstation/Remote Diagnostic];
    H -- Requests Inspection Streams --> F;

5. Cross-Domain Application: Broadcast Journalism & Event Coverage

Enabling Description:
The adaptive streaming methodology is applied to remote broadcast journalism and live event coverage scenarios. The "source video content" (Claim 1, step 1) would be live camera feeds from field reporters or event venues, often originating from diverse, unpredictable network environments (e.g., cellular bonds, satellite links). "Identifying a plurality of resolutions" (Claim 1, step 2) involves selecting broadcast-grade resolutions (e.g., 1080p, 720p) and also lower-resolution proxies for rapid preview. "Encoding" (Claim 1, step 3) would utilize professional codecs (e.g., ProRes, AVC-Intra) for high-quality masters and efficient distribution codecs (e.g., HEVC, AV1) for adaptive streams. "Quality evaluation" (Claim 1, step 3) considers broadcast standards (e.g., EBU R 128 for audio loudness, specific video artifacts), reporter feedback, and editorial priorities. "Target bitrates" are dynamically adjusted to ensure maximum possible quality under prevailing network conditions, prioritizing consistent frame rate and audio/video synchronization over transient resolution dips. The "top level index" (Claim 1, step 6) would inform broadcast control rooms or online news platforms about available stream qualities, enabling them to select the most suitable feed for live broadcast or archiving, with seamless switching.

flowchart TD
    subgraph Field Acquisition
        Camera(Broadcast Camera) -- Live Feed --> Encoder(Field Encoder - Multi-Bitrate)
        Encoder -- Encoded Streams --> Network(Variable Network Link)
    end

    subgraph Studio/CDN
        Network --> Ingest(Cloud/Studio Ingest)
        Ingest -- Store --> Storage(Storage for Alternatives)
        Ingest -- Quality Eval --> Quality(Quality Evaluation Module - Broadcast Standards)
        Quality --> Selector(Adaptive Stream Selector)
        Selector --> CDN(CDN Distribution)
        CDN --> Manifest(Manifest Generator - e.g., HLS/DASH)
    end

    subgraph Client
        Manifest --> Playback(Broadcast Control Room/Online Platform)
        Playback -- Requests Segments --> CDN
    end

6. Integration with Emerging Tech: AI-Driven Perceptual Quality Optimization

Enabling Description:
The core method is enhanced by integrating an AI/ML model for highly sophisticated "quality evaluation" (Claim 1, step 3) and predictive "selecting a plurality of resolution and target bitrate combinations" (Claim 1, step 4). An ensemble of deep learning models, pre-trained on vast datasets correlating objective metrics (PSNR, SSIM, VMAF) with subjective human perceptual quality scores (MOS), would dynamically assess the visual and auditory quality of each encoding. During the "encoding at least a portion... multiple times" (Claim 1, step 3) step, this AI agent would actively adjust encoding parameters (e.g., quantization parameters, GOP structure, rate control algorithms, pre-processing filters like noise reduction or de-interlacing) for each resolution in real-time. Instead of merely evaluating fixed target bitrates, the AI would perform a continuous gradient descent or reinforcement learning to find the optimal bitrate-quality trade-off for dynamic content complexity and anticipated network fluctuations. The "selection" (Claim 1, step 4) is then an AI-inferred decision to maximize perceptual quality under bandwidth constraints, potentially even recommending dynamic resolution changes or aspect ratio adjustments based on content analysis (e.g., faces, text, high-motion scenes).

graph TD
    A[Source Video Content] --> B{Pre-processing};
    B --> C{Dynamic Encoder<br>(Multiple resolutions, Adaptive QP/GOP)};
    C -- Encoded Streams --> D{AI Perceptual Quality Evaluator<br>(VMAF, MOS Prediction)};
    D -- Quality Feedback --> C;
    D -- Optimal Bitrate-Quality Trade-off --> E{AI Stream Selector<br>(Predictive, Content-Aware)};
    E -- Selected Streams --> F[Content Distribution System];
    F --> G[Playback Devices];

7. Integration with Emerging Tech: IoT Sensor-Contextualized Adaptive Streaming

Enabling Description:
The adaptive streaming system is augmented with real-time contextual data from synchronized IoT sensors collocated with the "source video content" (Claim 1, step 1) capture device. For instance, a surveillance camera might be accompanied by light sensors, ambient noise microphones, motion detectors, and environmental condition sensors (temperature, humidity). This IoT data is ingested alongside the video. When "identifying a plurality of resolutions" (Claim 1, step 2) and "encoding" (Claim 1, step 3), the system considers these sensor inputs. For "quality evaluation" (Claim 1, step 3), the IoT context influences the prioritization. E.g., if a motion sensor is triggered or ambient light drops, the system might proactively select higher target bitrates or resolutions for the relevant video segments to enhance detail for security analysis, even if network conditions are suboptimal. The "top level index" (Claim 1, step 6) would include metadata links to the synchronized IoT sensor data streams, allowing playback devices to filter or prioritize video based on specific sensor events (e.g., only show video when motion detected, or overlay temperature readings). This allows for context-aware adaptive streaming.

classDiagram
    class SourceVideoContent {
        +VideoData raw
        +Timestamp
    }
    class IoTSensorData {
        +SensorID
        +Timestamp
        +Value string
        +Type string
    }
    class Encoder {
        -resolutions []
        -targetBitrates []
        +encode(video, resolution, bitrate)
    }
    class QualityEvaluator {
        +evaluate(encoding, sensorData)
    }
    class StreamSelector {
        +select(qualities, sensorData)
    }
    class AdaptiveStream {
        +Resolution
        +Bitrate
        +EncodingData
        +IoTSensorLinks []
    }
    class TopLevelIndex {
        +StreamManifest []
        +GlobalSensorLinks []
    }

    SourceVideoContent "1" -- "*" IoTSensorData : has context
    SourceVideoContent --> Encoder : feeds
    IoTSensorData --> QualityEvaluator : informs
    Encoder --> QualityEvaluator : provides
    QualityEvaluator --> StreamSelector : outputs
    StreamSelector --> AdaptiveStream : creates
    AdaptiveStream "1" -- "1" TopLevelIndex : described in

8. Integration with Emerging Tech: Blockchain-Verified Content Provenance and Integrity

Enabling Description:
The method incorporates blockchain technology to ensure the provenance, integrity, and immutability of the "source video content" (Claim 1, step 1) and its "plurality of alternative video streams." Each significant step in the encoding and distribution pipeline is recorded on a distributed ledger. Specifically, upon "identifying source video content," a cryptographic hash of the raw content is generated and timestamped on a blockchain (e.g., Ethereum, Hyperledger Fabric). For each "encoding" (Claim 1, step 3), the hash of the encoded segment, along with its resolution, target bitrate, and the result of the "quality evaluation," is linked to the original content hash and added as a transaction to the blockchain. The "top level index" (Claim 1, step 6) would then include verifiable proofs (e.g., Merkle roots, transaction IDs) that allow playback devices or content consumers to verify the authenticity and integrity of each received video segment (Claim 1, step 8) against the immutable blockchain record. This provides a auditable chain of custody for the video content from creation to consumption, preventing tampering or unauthorized modifications.

sequenceDiagram
    participant S as Source Content
    participant EE as Encoding Engine
    participant Q as Quality Evaluator
    participant BC as Blockchain Network
    participant CDS as Content Distribution System
    participant PD as Playback Device

    S->>BC: Hash Raw Content & Record Tx (Content ID)
    S->>EE: Source Video Content
    EE->>EE: Encode (multiple resolutions/bitrates)
    EE->>Q: Encoded Streams
    Q->>Q: Evaluate Quality
    Q->>BC: Hash Encoded Segment, Quality & Record Tx (Segment ID)
    Q->>CDS: Upload Encoded Segments
    CDS->>CDS: Generate Top Level Index (with Blockchain Proofs)
    CDS->>PD: Provide Top Level Index
    PD->>BC: Verify Blockchain Proofs for Segments
    PD->>CDS: Request Specific Encoded Sections
    CDS->>PD: Deliver Encoded Sections
    PD->>PD: Verify Segment Integrity with Hash

9. The "Inverse" or Failure Mode: Graceful Degradation for Network Outages

Enabling Description:
The system is designed to operate in a "graceful degradation" mode during severe network outages or extreme bandwidth limitations. When the "responding to requests from the one or more playback devices" (Claim 1, step 8) detects a persistent inability to deliver requested segments at any of the standard "plurality of resolution and target bitrate combinations," the system transitions to an emergency mode. In this mode, "encoding at least a portion of the source video content multiple times" (Claim 1, step 3) is replaced by dynamically generating ultra-low-fidelity "emergency streams." These streams could be static keyframes (e.g., 1 frame per 10 seconds), highly compressed monochrome video at QVGA resolution, or even text-only descriptions of scene changes. "Quality evaluation" (Claim 1, step 3) is simplified to checking for minimal data transmissibility, prioritizing any form of content over complete loss. The "top level index" (Claim 1, step 6) is dynamically updated to exclusively offer these emergency streams. Upon recovery of network conditions, the system would gradually reintroduce standard alternative streams, allowing playback devices to seamlessly transition back to higher quality. This ensures continuous, albeit highly degraded, user experience during critical network failures.

stateDiagram-v2
    state NormalStreaming {
        [*] --> Initializing
        Initializing --> EncodingContent : Source identified
        EncodingContent --> EvaluateQuality : Encoded
        EvaluateQuality --> SelectCombinations : Quality assessed
        SelectCombinations --> UploadEncodings : Combinations chosen
        UploadEncodings --> GenerateIndex : Uploaded
        GenerateIndex --> ProvideIndex : Index ready
        ProvideIndex --> RespondRequests : Index provided
        RespondRequests --> RespondRequests : Requests handled
        RespondRequests --> NetworkDegradation : Critical network failure detected
    }

    state NetworkDegradation {
        NetworkDegradation --> EmergencyMode : Persistent failure
        EmergencyMode : Prioritize minimal data
        EmergencyMode --> GenerateEmergencyStreams : Switch to low-fi content
        GenerateEmergencyStreams --> UpdateEmergencyIndex : Index for emergency streams
        UpdateEmergencyIndex --> ServeEmergencyContent : Respond with emergency streams
        ServeEmergencyContent --> NetworkRecovery : Network conditions improve
    }
    EmergencyMode --> NormalStreaming : Recovered

10. The "Inverse" or Failure Mode: Privacy-Preserving Obfuscation on Request

Enabling Description:
This derivative introduces a privacy-preserving "failure mode" where the content distribution system, upon specific request (e.g., from a user, an regulatory authority, or detected privacy violation), deliberately obfuscates sensitive portions of the video content. When "responding to requests from the one or more playback devices" (Claim 1, step 8), if a privacy-sensitive request is received, the content distribution system does not serve the original high-fidelity streams. Instead, for the relevant sections, it dynamically selects or generates "alternative streams" (Claim 1, step 4) that have been intentionally degraded to protect privacy. This could involve real-time facial blurring, license plate pixelation, or audio anonymization, even if higher quality versions are available. The "encoding" (Claim 1, step 3) or a dedicated post-processing step would include selective obfuscation algorithms. "Quality evaluation" (Claim 1, step 3) in this context involves verifying the effectiveness of the obfuscation rather than pure visual fidelity, ensuring that sensitive information is sufficiently masked. The "top level index" (Claim 1, step 6) might include flags or separate stream descriptions for privacy-enhanced versions, allowing playback devices to specifically request these or be automatically redirected based on user/policy settings.

graph LR
    A[Source Video Content] --> B{Privacy Policy/Request};
    B -- Apply Policy --> C{Content Analysis<br>(Identify Sensitive Regions)};
    C --> D{Selective Obfuscation Engine<br>(Face Blur, Pixelation, Audio Anonymization)};
    D -- Obfuscated/Original Segments --> E{Multi-Variant Encoder};
    E -- Multiple Resolution/Bitrate/Obfuscation Versions --> F{Quality & Privacy Effectiveness Evaluator};
    F --> G{Adaptive Stream Selector<br>(Policy-driven)};
    G -- Selected Stream Profiles --> H[Content Distribution System];
    H --> I{Dynamic Manifest Generator};
    I --> J[Playback Devices];
    J -- Requests for (Privacy-Enhanced) Segments --> H;

Combination Prior Art Scenarios with Open-Source Standards

These scenarios illustrate how US Patent 11611785 can be combined with existing open-source standards, thereby expanding the prior art for adaptive video streaming.

US Patent 11611785 + MPEG-DASH (ISO/IEC 23009-1):
The methods described in US11611785, particularly the "encoding at least a portion of the source video content multiple times using the particular resolution and multiple different target bitrates" (Claim 1, step 3) and "selecting a plurality of resolution and target bitrate combinations for a plurality of alternative streams" (Claim 1, step 4), are directly applicable to the generation of MPEG-DASH content. The "top level index" (Claim 1, step 6) would explicitly be a DASH Media Presentation Description (MPD). Each "alternative stream" would correspond to a DASH Representation within an AdaptationSet. The "evaluated quality" (Claim 1, step 3) would inform the choice of target bitrates and other encoding parameters (e.g., codec profile, level, picture structure) for each Representation, ensuring optimal perceptual quality for various bandwidth conditions. Playback devices (Claim 1, step 7) would utilize standard DASH client libraries (e.g., dash.js) to parse the MPD and perform adaptive bitrate switching by "responding to requests for specific encodings" (Claim 1, step 8) based on their internal ABR logic and network feedback.
US Patent 11611785 + WebRTC (W3C Standard):
The adaptive encoding and quality evaluation techniques of US11611785 can be integrated with WebRTC for real-time communication scenarios. The "source video content" (Claim 1, step 1) would be live camera or screen capture input. Instead of pre-encoded files, "encoding at least a portion... multiple times" (Claim 1, step 3) would occur dynamically within a WebRTC-enabled browser or application, generating multiple Scalable Video Coding (SVC) layers or simulcast streams with different resolutions and bitrates. The "evaluating quality" (Claim 1, step 3) would incorporate WebRTC's real-time network statistics (e.g., RTCPeerConnection.getStats() reports on packet loss, round-trip time, estimated bandwidth) to continuously inform the selection of optimal RTCRtpSender parameters. The "selecting a plurality of resolution and target bitrate combinations" (Claim 1, step 4) would be a dynamic process, adjusting the active SVC layers or simulcast stream parameters in real-time, effectively forming a "top level index" (Claim 1, step 6) of available qualities conveyed via SDP. Playback devices (other WebRTC peers) would adaptively receive the most appropriate stream based on their capabilities and observed network conditions, "responding to requests" (Claim 1, step 8) by adjusting which SVC layers or simulcast streams they request or process.
US Patent 11611785 + FFmpeg (Open-Source Multimedia Framework):
The fundamental "method for encoding source content" (Claim 1, preamble) and specifically the "encoding at least a portion of the source video content multiple times" (Claim 1, step 3) and "evaluating quality for each of the multiple encodings" (Claim 1, step 3) can be comprehensively implemented using FFmpeg and its integrated libraries (e.g., libx264, libx265, libvpx for VP9/AV1, libsvtav1). A system following US11611785 would use FFmpeg to identify available "plurality of resolutions" (Claim 1, step 2) for a given source. The "encoding at least a portion" would involve invoking FFmpeg multiple times with different -vf scale options for resolution and various -b:v (target bitrate) or -crf (constant rate factor) parameters for libx264 or libx265. For "evaluating quality," FFmpeg's built-in ssim or psnr filters can be used to compare the multiple encodings against the source. The resulting "selected plurality of resolution and target bitrate combinations" (Claim 1, step 4) would then be packaged into container formats (e.g., MP4, WebM) by FFmpeg, and a manifest (the "top level index" of Claim 1, step 6) generated by a custom script or another tool (e.g., Bento4 for DASH MPDs) for a content distribution system. This combination demonstrates how the core inventive steps can be achieved with widely available open-source tools.