Derivative works — US Patent 7532808

Defensive Disclosure: US Patent 7,532,808 - Method for Coding Motion in a Video Sequence

This document details several derivative variations and combinations of US Patent 7,532,808, aiming to establish prior art for potential future incremental improvements in video motion coding, particularly concerning adaptive skip mode motion vector prediction. These disclosures are designed to render such improvements obvious or non-novel to a person having ordinary skill in the art (POSITA) as of the current date, April 26, 2026.

Derivatives of Core Claims (Claims 1, 5, 31, 35, 47)

The core claims of US Patent 7,532,808 revolve around a redefined skip mode where a motion vector for a first segment (e.g., macroblock) is determined as either zero or a predicted non-zero motion vector based on the motion of a neighboring second segment, without explicitly coding this motion vector information in the bitstream. This principle is applied to both encoding and decoding methods, and corresponding encoder/decoder apparatuses, as well as multimedia terminals.

For brevity and to avoid redundancy given the analogous nature of the claims, the following derivatives will address the underlying technical concepts as broadly applicable to the encoding/decoding process and the systems that implement them, covering the inventive scope of Claims 1, 5, 31, 35, and 47. Each derivative will focus on a specific axis of innovation.

1. Material & Component Substitution

Derivative 1.1: Wavelet-based Transform and FPGA Acceleration

Enabling Description: Instead of the block-based Discrete Cosine Transform (DCT) for prediction error coding (transformation unit 104 in the patent), a two-dimensional Discrete Wavelet Transform (DWT), specifically using a 9/7 biorthogonal wavelet filter bank, is employed. The DWT coefficients are then quantized and entropy coded. The motion estimation block (130/630) and motion compensated prediction block (150/650/740), including the surrounding motion analysis (802) and active motion parameter generation (803), are implemented on a Field-Programmable Gate Array (FPGA) platform. This FPGA utilizes custom hardware accelerators for parallel sum-of-absolute-differences (SAD) or sum-of-squared-differences (SSD) calculations, as well as pipelined median filtering operations for motion vector prediction, significantly boosting processing speed for high-resolution video streams (e.g., 4K/8K). The FPGA-based implementation replaces dedicated ASIC components, offering flexibility and reconfigurability.

graph TD
    A[Video Input] --> B{DWT Transform}
    B --> C[Quantizer]
    C --> D[Entropy Coder]
    D --> E[Encoded Bitstream]
    F[FPGA-based Motion Estimator & Predictor] --> G{Motion Vector Memory}
    G --> H[Surrounding Motion Analysis]
    H --> I[Active Motion Parameter Generation]
    H --> J[Zero Motion Parameter Generation]
    I --> F
    J --> F
    F --> B
    F --> C
    K[Reference Frame Store] --> F

Derivative 1.2: Neural Network-based Residual Compression

Enabling Description: The traditional DCT/IDCT and subsequent quantization/entropy coding (blocks 104, 106, 108, 110 in the patent) for prediction error residuals are replaced with a compact convolutional neural network (CNN) autoencoder architecture. After motion compensation, the prediction error signal is fed into a neural autoencoder (encoder part), which compresses the residual into a latent representation. This latent representation, instead of DCT coefficients, is then entropy coded. The decoder side employs the corresponding neural autoencoder (decoder part) to reconstruct the prediction error. The CNN autoencoder can be implemented on a dedicated Neural Processing Unit (NPU) or GPU, optimizing for floating-point operations. The selection between the traditional transform and NN-based compression can be dynamically determined by the control manager (160) based on prediction error characteristics or computational budget.

graph TD
    A[Prediction Error] --> B{CNN Autoencoder (Encoder)}
    B --> C[Latent Representation]
    C --> D[Entropy Coder]
    D --> E[Encoded Bitstream]
    E -- Decoded Latent Rep --> F{CNN Autoencoder (Decoder)}
    F --> G[Reconstructed Prediction Error]
    H[Motion Compensated Prediction] --> A
    I[Motion Compensated Prediction] --> G

2. Operational Parameter Expansion

Derivative 2.1: Micro-scale Object Tracking in Microscopy Video

Enabling Description: The disclosed method is adapted for real-time processing of high-magnification microscopy video sequences, where "segments" are not macroblocks of typical video but rather micro-regions tracking cellular components or microorganisms. The "global motion" here refers to stage drift or slight focusing changes, while "regional motion" could be the movement of specific cells. The operational parameters for motion estimation are significantly scaled down: instead of 16x16 macroblocks for luminance, segments might be 4x4 or 8x8 pixel regions in monochrome images, analyzed at sub-pixel accuracy (e.g., 1/8th or 1/16th pixel precision). The "insignificant level of motion" threshold in surrounding motion analysis block (802) is adjusted to a much finer scale, perhaps 0.1 pixel displacement, to accurately classify subtle biological movements. This requires higher precision arithmetic units within the motion estimation/prediction blocks.

stateDiagram
    [*] --> Init: Microscopy Video Input
    Init --> SegmentAnalysis: High-Mag Frame
    SegmentAnalysis --> MotionClassify: Identify Micro-Regions
    MotionClassify --> ActiveMotion: Surrounding Motion > Threshold
    MotionClassify --> ZeroMotion: Surrounding Motion <= Threshold
    ActiveMotion --> PredictMV[Predict Non-Zero MV]: Based on Neighbors (e.g., nuclear drift)
    ZeroMotion --> AssignZeroMV[Assign Zero MV]: No Significant Motion
    PredictMV --> FormPrediction: Use Predicted MV
    AssignZeroMV --> FormPrediction: Use Zero MV
    FormPrediction --> EncodeSegment: Skip Mode, No MV Sent
    EncodeSegment --> [*]: Next Segment/Frame

Derivative 2.2: Hyperspectral Video Compression for Remote Sensing

Enabling Description: The method is extended to compress hyperspectral video data, where each "pixel" is a vector of spectral intensities across dozens or hundreds of wavelength bands, captured by airborne or satellite sensors. Instead of YUV components, the "segments" are volumetric data cubes (spatial x spectral). Motion compensation operates in a higher-dimensional space, where a "motion vector" might include spatial (Δx, Δy) and potentially spectral shifts (Δλ) due to atmospheric variations or sensor characteristics. The "neighboring segment" for motion analysis includes not just spatially adjacent blocks, but also spectrally adjacent "bands" or spatially co-located blocks in different spectral bands. The processing requires distributed computing architectures due to the massive data volume (e.g., cloud-based GPU clusters for parallel block matching and motion parameter generation), operating on data rates up to terabits per second.

flowchart LR
    A[Hyperspectral Video Cube Input] --> B{Segment Data Cube}
    B --> C{Multidimensional Motion Estimator}
    C --> D[Neighboring Spatial-Spectral Segments]
    D -- Extract MV Info --> E[Surrounding Motion Analysis (Multi-Dim)]
    E --> F{Active Motion MV Generation}
    E --> G{Zero Motion MV Generation}
    F --> H[Predicted Non-Zero Motion Vector (Spatial+Spectral)]
    G --> I[Zero Motion Vector (Spatial+Spectral)]
    H --> J{Form Prediction}
    I --> J
    J --> K[Encode Bitstream (Skip Mode, No Explicit MV)]
    K --> L[Output Compressed Hyperspectral Data]

3. Cross-Domain Application

Derivative 3.1: Autonomous Vehicle Lidar/Radar Odometry Coding

Enabling Description: The skip mode motion coding method is applied to sequences of Lidar point clouds or radar reflectivity maps, which are analogous to video frames. A "segment" corresponds to a grid-based occupancy map or a cluster of point cloud data. "Motion information" is derived from iterative closest point (ICP) algorithms or phase correlation applied to these segments. "Global motion" here represents the ego-motion of the autonomous vehicle itself (translation and rotation), while "regional motion" could be other moving vehicles or pedestrians. If the vehicle is stationary or moving predictably relative to a reference map (e.g., GPS-aided dead reckoning shows minimal deviation), a predicted non-zero motion vector (representing the vehicle's own odometry) is used for the skip mode segment, eliminating the need to explicitly transmit odometry updates for every segment in the compressed sensor stream. The "neighboring segments" are previous and adjacent sensor data blocks in time and space.

sequenceDiagram
    participant Sensor as Lidar/Radar Sensor
    participant Encoder as Autonomous Vehicle Encoder
    participant Decoder as Autonomous Vehicle Decoder
    Sensor->>Encoder: Raw Sensor Data (Frame N)
    Encoder->>Encoder: Segment into Regions
    Encoder->>Encoder: Analyze Neighboring Regions' MV (Odometry)
    alt If Global/Regional MV Detected
        Encoder->>Encoder: Predict Non-Zero MV for Skip Mode
    else If Insignificant MV
        Encoder->>Encoder: Assign Zero MV for Skip Mode
    end
    Encoder->>Encoder: Form Prediction for Segment
    Encoder->>Encoder: Encode Skip Mode Indication (No Explicit MV)
    Encoder->>Decoder: Encoded Bitstream
    Decoder->>Decoder: Decode Skip Mode Indication
    Decoder->>Decoder: Analyze Previously Decoded Neighboring MV
    alt If Global/Regional MV Detected
        Decoder->>Decoder: Predict Non-Zero MV for Skip Mode (Same as Encoder)
    else If Insignificant MV
        Decoder->>Decoder: Assign Zero MV for Skip Mode
    end
    Decoder->>Decoder: Form Prediction for Segment (Reconstruct)
    Decoder->>Autonomous Vehicle: Reconstructed Sensor Data

Derivative 3.2: Medical Imaging (4D CT/MRI/Ultrasound) Compression

Enabling Description: The method is adapted for compressing 4D (3D + time) medical imaging sequences, such as dynamic CT, functional MRI, or 3D ultrasound, which capture organ motion (e.g., heart, lungs, blood flow). A "segment" is a 3D volumetric sub-region (e.g., 16x16x16 voxels) within the larger 3D image. "Global motion" could be patient movement or respiratory motion, while "regional motion" is the specific deformation of an organ. The motion estimation (802, 803) analyzes 3D motion vectors of surrounding volumetric segments. If a coherent physiological motion pattern (e.g., cardiac contraction) is identified in neighboring volumes, a predicted 3D non-zero motion vector is assigned to the current skip mode segment, based on the identified pattern. This avoids transmitting explicit dense deformation fields for regions exhibiting predictable physiological motion, significantly reducing data for remote diagnostics or surgical planning.

flowchart TD
    A[4D Medical Image Volume] --> B{Segment 3D Sub-Volume}
    B --> C[Retrieve Neighboring 3D Sub-Volumes MV]
    C --> D{3D Surrounding Motion Analysis}
    D -- Detect Coherent Physiological Motion --> E[Active 3D MV Generation]
    D -- No Coherent Motion --> F[Zero 3D MV Generation]
    E --> G[Predicted Non-Zero 3D MV]
    F --> H[Zero 3D MV]
    G --> I{Form 3D Prediction}
    H --> I
    I --> J[Encode 4D Bitstream (Skip Mode, No Explicit 3D MV)]
    J --> K[Output Compressed 4D Medical Data]

4. Integration with Emerging Tech

Derivative 4.1: AI-Driven Adaptive Thresholding and Prediction Refinement

Enabling Description: The surrounding motion analysis block (802) is augmented with an embedded, lightweight neural network (NN) responsible for dynamically adjusting the "insignificant level of motion" threshold and refining the non-zero motion vector prediction. This NN is trained on a diverse dataset of video sequences to recognize complex motion patterns (e.g., turbulent fluid motion, camera shake vs. object motion) and adaptively set the threshold based on content complexity, scene change rate, and available bitrate. For non-zero motion vectors, the NN can further refine the median-based prediction by learning typical offsets and deviations based on the aggregate motion of a larger context window of neighboring macroblocks. This AI component (e.g., a small ResNet or MLP) runs on a dedicated edge AI accelerator integrated into the encoder/decoder SoC. The parameters of the NN can be updated periodically via firmware or over-the-air updates.

graph LR
    A[Motion Information Memory 801] --> B[Surrounding Motion Analysis 802]
    B --> C{AI-Driven Adaptive Thresholding & Refinement}
    C -- Dynamic Threshold --> D[Decision Logic for Active/Non-Active Motion]
    C -- Refined MV --> E[Active Motion Parameter Generation 803]
    D --> E
    D --> F[Zero Motion Parameter Generation 804]
    E --> G[Skip Mode Motion Vector]
    F --> G
    G --> H[Motion Compensated Prediction 650/740]

Derivative 4.2: IoT-Enhanced Contextual Motion Prediction

Enabling Description: For multimedia terminals (Claim 47) equipped with multiple IoT sensors (e.g., accelerometer, gyroscope, GPS, proximity sensors) (FIG. 10), this sensor data is fed into the system control block (84) and integrated into the motion estimation process. Before even performing pixel-domain motion analysis, the sensor data provides a "global motion hint." For example, if the device's accelerometer indicates stable zero movement, the surrounding motion analysis block (802) can be pre-biased towards classifying motion as "non-active" and assigning zero motion vectors more frequently, saving computation. Conversely, if gyroscope data indicates a smooth panning motion, the active motion parameter generation block (803) can use this information to initialize or bound its search for the non-zero motion vector, making the prediction more robust. This contextual information from IoT sensors is used to refine the prediction accuracy and reduce false positives/negatives in motion classification, particularly in scenarios where the video content itself might be ambiguous (e.g., static camera on a moving vehicle). The sensor data can be timestamped and synchronized with video frames to ensure accurate correlation.

graph TD
    A[IoT Sensors (Accelerometer, Gyro, GPS)] --> B{System Control 84}
    B -- Global Motion Hint --> C[Motion Estimation Block 630 (Enhanced)]
    C --> D[Motion Information Memory 801]
    D --> E[Surrounding Motion Analysis 802]
    E -- Contextual Refinement --> F[Active Motion Parameter Generation 803]
    E -- Contextual Pre-Bias --> G[Zero Motion Parameter Generation 804]
    F --> H[Skip Mode Motion Vector]
    G --> H
    H --> I[Motion Compensated Prediction 650]
    I --> J[Video Bitstream]

Derivative 4.3: Blockchain for Tamper-Evident Motion Encoding Parameters

Enabling Description: The critical parameters influencing the skip mode decision—specifically, the detected motion characteristics of the neighboring segment, the resulting decision (zero vs. predicted non-zero MV), and a hash of the assigned motion vector—are cryptographically hashed and included as metadata within the encoded bitstream (135) for relevant frames. These hashes are then periodically aggregated and committed to a permissioned blockchain ledger. This does not involve transmitting additional motion vector information for the first segment, but rather a small cryptographic proof (hash) of the decision-making process. At the decoder side, the same motion analysis (802, 803, 804) is performed, and the generated skip mode motion vector's hash is compared against the blockchain record (or the metadata hash in the bitstream). This provides tamper-evidence for the integrity of the motion coding process, ensuring that the encoder's decisions (and thus the decoded video) are verifiable, crucial for forensic analysis, content authentication, or regulatory compliance in sensitive applications.

sequenceDiagram
    participant Encoder as Video Encoder 600
    participant Decoder as Video Decoder 700
    participant Blockchain as Permissioned Blockchain Ledger

    Encoder->>Encoder: Perform Skip Mode Decision (Claim 1 Steps)
    Encoder->>Encoder: Generate Skip MV (Zero/Predicted Non-Zero)
    Encoder->>Encoder: Hash Skip MV & Decision Metadata (H_MV)
    Encoder->>Encoder: Embed H_MV in Bitstream (Metadata)
    Encoder->>Encoder: Commit H_MV to Blockchain (Periodically)
    Encoder->>Decoder: Encoded Bitstream (incl. H_MV)

    Decoder->>Decoder: Receive Bitstream
    Decoder->>Decoder: Perform Skip Mode Decision (Claim 31 Steps)
    Decoder->>Decoder: Generate Skip MV (Zero/Predicted Non-Zero)
    Decoder->>Decoder: Hash Locally Generated Skip MV & Decision (H'_MV)
    Decoder->>Decoder: Extract H_MV from Bitstream Metadata
    Decoder->>Blockchain: Request H_MV from Ledger (Optional)
    Decoder->>Decoder: Compare H'_MV with H_MV
    alt If H'_MV == H_MV
        Decoder->>Decoder: Motion Coding Verified
    else
        Decoder->>Decoder: Tamper Alert: Motion Coding Mismatch
    end
    Decoder->>Decoder: Form Prediction using Skip MV

5. The "Inverse" or Failure Mode

Derivative 5.1: Graceful Degradation in Low-Bandwidth Scenarios

Enabling Description: In situations of extremely low bandwidth (e.g., network congestion, poor wireless signal), the system control manager (160) forces the surrounding motion analysis block (802) into a "low-complexity" mode. In this mode, the analysis is simplified: if any directly adjacent (e.g., top or left) macroblock's motion vector is zero, the current skip mode macroblock is always assigned a zero motion vector, bypassing active motion parameter generation (803) entirely. This drastically reduces the computational overhead for motion vector prediction and avoids the slight increase in bit rate that a non-zero motion vector might indirectly cause (e.g., if the prediction residual after applying the predicted non-zero MV is still larger than the zero MV residual). This results in potentially less accurate motion compensation but ensures very low bitrate and low computational load, allowing the video stream to continue with minimal interruption or power consumption, even if some global motion is poorly represented.

stateDiagram
    [*] --> High_BW_Mode: Normal Operation
    High_BW_Mode --> Low_BW_Mode: Low Bandwidth Detected
    Low_BW_Mode --> Check_Neighbors: Process Segment
    Check_Neighbors --> Zero_MV_Forced: Any Neighbor MV == 0
    Check_Neighbors --> Active_MV_Disabled: Else
    Zero_MV_Forced --> Assign_Zero_MV: Assign Zero MV
    Active_MV_Disabled --> Assign_Zero_MV: Assign Zero MV (Fallback)
    Assign_Zero_MV --> Encode_Skip_Mode: Encode Skip Mode
    Encode_Skip_Mode --> Low_BW_Mode: Next Segment
    Low_BW_Mode --> High_BW_Mode: Bandwidth Recovers

Derivative 5.2: Secure Fallback for Motion Parameter Corruption

Enabling Description: The active motion parameter generation block (803) includes a self-validation module. After generating a predicted non-zero motion vector, a plausibility check is performed. This check assesses if the generated motion vector falls within a reasonable range (e.g., maximum expected velocity for the scene, coherence with a broader region). If the plausibility check fails (e.g., due to corrupted input from motion information memory 801 or an anomalous prediction), the system immediately reverts to a "safe zero MV" fallback. Instead of using the potentially erroneous predicted non-zero motion vector, the zero motion parameter generation block (804) is activated. Simultaneously, an error flag is set for the current macroblock, potentially triggering an immediate INTRA-refresh for that macroblock in a subsequent frame, ensuring spatial integrity even if temporal prediction briefly failed. This prevents the propagation of erroneous motion vectors that could lead to severe visual artifacts ("ghosting," "tearing").

flowchart TD
    A[Surrounding Motion Analysis 802] --> B{Active Motion Parameter Generation 803}
    B --> C{Plausibility Check}
    C -- Valid MV --> D[Use Predicted Non-Zero MV]
    C -- Invalid MV --> E[Trigger Safe Zero MV Fallback]
    E --> F[Zero Motion Parameter Generation 804]
    F --> G[Assign Zero MV]
    D --> H[Motion Compensated Prediction]
    G --> H
    H --> I[Encoded Bitstream]
    E --> J[Set Error Flag/Request INTRA Refresh]

Combination Prior Art Scenarios with Open-Source Standards

The adaptive skip mode with predicted non-zero motion vectors of US 7,532,808 can be combined with various existing open-source video coding standards to enhance their performance or address specific limitations, effectively extending the prior art.

H.264/AVC with Adaptive Skip Mode:
- Enabling Description: The H.264/AVC standard (ITU-T Rec. H.264 | ISO/IEC 14496-10) already defines a SKIP macroblock mode, where no residual or motion vector data is transmitted, and the motion vector is implicitly derived from neighboring blocks (e.g., median predictor). However, this implicit motion vector is typically treated as a zero vector or a fixed prediction, and the primary benefit of SKIP mode in H.264 is the zero residual. By integrating the concepts from US 7,532,808, an H.264 encoder/decoder can be enhanced. When a macroblock is selected for SKIP mode, the surrounding motion analysis block (802) (as described in US 7,532,808) would analyze the motion vectors of previously encoded H.264/AVC neighboring macroblocks. If significant global or regional motion (e.g., consistent non-zero motion vectors in the top, left, and top-left H.264/AVC macroblocks) is detected, the active motion parameter generation block (803) would compute a predicted non-zero motion vector for the current SKIP macroblock (e.g., a median of the neighboring H.264 motion vectors, scaled according to reference picture indices). This predicted non-zero MV would then be used for motion compensation for that H.264 SKIP macroblock, rather than defaulting to a zero vector or a simpler fixed prediction, all without transmitting any explicit motion vector information for the SKIP macroblock, adhering to the H.264 syntax for SKIP mode. This directly addresses global motion scenarios in H.264 where conventional SKIP mode might be inefficient.
- Relevant Standard: H.264/MPEG-4 AVC (Joint Video Team's "Joint Model" (JM) used in the patent is a precursor to H.264).
VP9/AV1 with Extended Merge Mode for Active Skip:
- Enabling Description: Modern open-source codecs like Google's VP9 and Alliance for Open Media's AV1 feature highly sophisticated motion vector prediction mechanisms, including "merge mode" candidates where motion information for a block can be inherited from spatio-temporal neighbors without explicit signaling. The skip mode in these codecs allows a block to be coded without residual or motion vector data if its motion (derived from merge candidates) results in a negligible prediction error. The innovation of US 7,532,808 can be applied to extend this: in a VP9/AV1 encoder, during the merge candidate generation for a block chosen for SKIP mode, the surrounding motion analysis (802) explicitly prioritizes and evaluates predicted non-zero motion vectors derived from statistical analysis of the motion field of a broader set of previously coded neighboring blocks. If this active prediction from neighbors yields a superior prediction (e.g., lower SAD/SSD error) compared to a static (zero) or simple median merge candidate, that specific predicted non-zero motion vector is implicitly chosen for the SKIP block. The codec would signal the SKIP mode but, as per US 7,532,808, no further motion vector information for the first segment is coded in the encoded bitstream, leveraging the existing implicit signaling of merge mode within VP9/AV1 to effectively transmit an "active skip" motion.
- Relevant Standards: VP9, AV1.
MPEG-DASH/WebRTC Live Streaming with Adaptive Skip Mode Feedback:
- Enabling Description: In adaptive bitrate (ABR) live streaming scenarios using MPEG-DASH or WebRTC, network conditions and content complexity fluctuate. An encoder (as part of a multimedia terminal, Claim 47) implementing the US 7,532,808 adaptive skip mode can dynamically adjust the aggressiveness of its motion analysis and non-zero MV prediction based on feedback from the network or streaming client. For example, if network congestion is detected (e.g., through RTCP feedback in WebRTC indicating packet loss or high RTT), the system control block (84) can instruct the surrounding motion analysis block (802) to use a simpler, less computationally intensive prediction model or a higher threshold for "insignificant motion," effectively favoring zero motion vectors for skip mode to reduce even marginal data overhead. Conversely, if high bandwidth is available and content requires high fidelity (e.g., fast-moving sports video), the encoder can activate more sophisticated non-zero MV prediction algorithms for skip mode to maximize prediction efficiency and visual quality, all while maintaining the core principle of not coding further motion vector information for the skip segments. This adaptive control, driven by real-time streaming feedback, optimizes the trade-off between compression efficiency, computational load, and quality under varying network conditions.
- Relevant Standards: MPEG-DASH, WebRTC (particularly RTP/RTCP for feedback).