Derivative works — US Patent 11841803

Defensive Disclosure for U.S. Patent 11,841,803

Publication Date: May 13, 2026
Reference Patent: US 11,841,803 B2, "GPU chiplets using high bandwidth crosslinks"
Purpose: This document discloses a plurality of derivative inventions and improvements upon the '803 patent to place them in the public domain. The detailed descriptions and diagrams provided herein are intended to establish prior art against future patent applications claiming these or similar concepts, rendering them obvious under 35 U.S.C. § 103 or anticipated under 35 U.S.C. § 102.

Derivatives Based on Core System Claims (Ref: Claim 1)

Axis 1: Material & Component Substitution

1.1. Derivative: Organic Interposer with Embedded Waveguides

Enabling Description: The passive silicon crosslink is substituted with a multi-layer organic substrate interposer. This interposer contains embedded polymer optical waveguides for inter-chiplet communication, replacing electrical traces. Each GPU chiplet is flip-chip bonded to the organic interposer and uses micro-lenses and vertical-cavity surface-emitting lasers (VCSELs) for optical signal transmission and photo-diodes for reception. This architecture provides higher bandwidth and lower crosstalk compared to electrical interconnects. The routing of optical signals is passive, determined by the physical layout of the waveguides.

Mermaid Diagram:

graph TD
    subgraph CPU_Module
        CPU("Central Processing Unit (RISC-V Core)")
    end

    subgraph Multi_Chiplet_Module
        A[GPU Chiplet 1] -- Optical/Electrical via Bus --> CPU
        B[GPU Chiplet 2]
        C[GPU Chiplet N]

        subgraph Organic_Interposer
            direction LR
            WG1(Polymer Waveguide 1)
            WG2(Polymer Waveguide 2)
            WG3(Polymer Waveguide 3)
        end

        A -- VCSEL/Photodiode --> WG1
        B -- VCSEL/Photodiode --> WG1
        A -- VCSEL/Photodiode --> WG2
        C -- VCSEL/Photodiode --> WG2
        B -- VCSEL/Photodiode --> WG3
        C -- VCSEL/Photodiode --> WG3
    end

    style Organic_Interposer fill:#f9f,stroke:#333,stroke-width:2px

1.2. Derivative: Gallium-Nitride (GaN) High-Frequency Crosslink

Enabling Description: The passive crosslink is fabricated from a Gallium-Nitride (GaN) substrate. GaN's properties allow for significantly higher frequency operation (>100 GHz) and better thermal conductivity than silicon. The inter-chiplet communication PHYs on the GPU chiplets are specifically designed to drive GaN-compatible transmission lines, enabling ultra-high-speed data transfer. The passive crosslink itself contains no active components, only impedance-matched microstrip or coplanar waveguides etched into the GaN substrate.

Mermaid Diagram:

sequenceDiagram
    participant CPU
    participant GPU_Chiplet_1
    participant GaN_Crosslink
    participant GPU_Chiplet_2

    CPU->>GPU_Chiplet_1: Memory Request (via CXL Bus)
    GPU_Chiplet_1->>GaN_Crosslink: Route Request (High-Freq Electrical Signal)
    Note over GaN_Crosslink: Passive propagation over GaN waveguides
    GaN_Crosslink->>GPU_Chiplet_2: Request Arrives
    GPU_Chiplet_2-->>GaN_Crosslink: Return Data
    GaN_Crosslink-->>GPU_Chiplet_1: Data Arrives
    GPU_Chiplet_1-->>CPU: Return Data to CPU

Axis 2: Operational Parameter Expansion

2.1. Derivative: Cryogenic Superconducting Crosslink for Quantum Computing

Enabling Description: The system is designed to operate at cryogenic temperatures (e.g., < 4 Kelvin) for use in a quantum computing control plane. The GPU chiplets are specialized processors for qubit state management. The passive crosslink is a superconducting interposer made of Niobium-Titanium (NbTi) alloy. The traces on the interposer are superconducting, offering zero electrical resistance and eliminating thermal noise. Communication is achieved via single flux quantum (SFQ) logic pulses, providing extremely low power and high-speed inter-chiplet signaling.

Mermaid Diagram:

graph TD
    subgraph Cryostat [Cryogenic Environment < 4K]
        QPU("Quantum Processing Unit")
        subgraph Control_Plane
            C1("Qubit Control Chiplet 1")
            C2("Qubit Control Chiplet 2")
            SC("Superconducting Passive Crosslink (NbTi)")

            C1 -- SFQ Pulses --> SC
            C2 -- SFQ Pulses --> SC
        end
        QPU -- Control Lines --> C1
        QPU -- Control Lines --> C2
    end

2.2. Derivative: Massively Parallel Exascale Array

Enabling Description: The architecture is scaled to an array of 256 or more GPU chiplets arranged in a 16x16 grid on a large-area silicon interposer. This "passive backplane" provides a mesh network topology. A memory access request from a host CPU is routed through a primary chiplet, which then uses a 2D mesh routing algorithm (e.g., dimension-ordered routing) to forward the request to the target chiplet. The address space is interleaved across all chiplets in the array, creating a massive, unified last-level cache.

Mermaid Diagram:

graph TD
    Host[Host CPU] --> C_0_0
    subgraph Chiplet_Array_256
        C_0_0("Chiplet 0,0") --- C_0_1("Chiplet 0,1")
        C_0_0 --- C_1_0("Chiplet 1,0")
        C_0_1 --- C_0_2("...")
        C_0_1 --- C_1_1("Chiplet 1,1")
        C_1_0 --- C_1_1
        C_1_0 --- C_2_0("...")
        C_1_1 --- C_1_2("...")
        C_1_1 --- C_2_1("...")
    end
    note "All connections are via passive silicon backplane"

Axis 3: Cross-Domain Application

3.1. Derivative: Automotive Sensor Fusion Engine

Enabling Description: An automotive system for ADAS/autonomous driving features a central compute module with specialized chiplets. A primary chiplet serves as a task scheduler. It connects via a passive crosslink to dedicated chiplets for LiDAR point cloud processing, RADAR signal processing, and camera image processing (e.g., using a CNN accelerator). When the LiDAR chiplet needs correlated camera data for object classification, it requests the relevant data region from the camera chiplet via the passive crosslink, enabling real-time, low-latency sensor fusion.

Mermaid Diagram:

flowchart LR
    subgraph Automotive_ECU
        Scheduler[Primary Chiplet]
        Lidar[LiDAR Processor Chiplet]
        Radar[RADAR Processor Chiplet]
        Camera[Camera Processor Chiplet]
        Crosslink{Passive Crosslink}

        Scheduler -- Control --> Crosslink
        Lidar <--> Crosslink
        Radar <--> Crosslink
        Camera <--> Crosslink
    end
    LiDAR_Sensor -- Point Cloud --> Lidar
    RADAR_Sensor -- Raw Data --> Radar
    Camera_Sensor -- Image Stream --> Camera
    Scheduler -- Fused Data --> Vehicle_CAN_Bus

3.2. Derivative: Distributed 5G/6G Baseband Processor

Enabling Description: A 5G/6G base station utilizes a chiplet-based baseband processing unit. One chiplet handles the fronthaul interface (e.g., eCPRI). It distributes demodulation and decoding tasks to an array of identical processing chiplets via a high-bandwidth passive crosslink. Each processing chiplet is assigned a subset of users or frequency resource blocks. The crosslink is used for coordinating handovers and managing interference by allowing chiplets to share channel state information directly, without going through a central memory controller.

Mermaid Diagram:

sequenceDiagram
    participant Antenna
    participant Fronthaul_Chiplet
    participant Crosslink
    participant Processor_Chiplet_1
    participant Processor_Chiplet_2

    Antenna->>Fronthaul_Chiplet: RF Data In
    Fronthaul_Chiplet->>Crosslink: Distribute User Data (User A)
    Crosslink->>Processor_Chiplet_1: Route Data for User A
    Fronthaul_Chiplet->>Crosslink: Distribute User Data (User B)
    Crosslink->>Processor_Chiplet_2: Route Data for User B
    Note over Processor_Chiplet_1, Processor_Chiplet_2: Independent Demodulation
    Processor_Chiplet_1->>Crosslink: Share Channel State Info
    Crosslink->>Processor_Chiplet_2: Forward CSI for Interference Mitigation

Axis 4: Integration with Emerging Tech

4.1. Derivative: AI-Managed Predictive Caching

Enabling Description: The passive crosslink controller on the primary GPU chiplet incorporates a lightweight, hardware-accelerated neural network. This AI model is trained to predict future memory access patterns based on the instruction stream from the CPU. When a memory access request is received, the model predicts which chiplet will be needed next. It then speculatively issues a pre-fetch command across the passive crosslink to that chiplet, moving the anticipated data into its last-level cache before it is explicitly requested. This reduces effective memory latency.

Mermaid Diagram:

flowchart TD
    A[CPU sends Memory Request Addr_X] --> B{Primary Chiplet Receives}
    B --> C[Passive Crosslink Controller]
    C --> D{AI Predictive Model}
    D -- "Predicts next request is Addr_Y on Chiplet 3" --> E[Issue Speculative Prefetch for Addr_Y]
    C -- "Request Addr_X is on Chiplet 2" --> F[Route Request for Addr_X]

    E --> G(Crosslink)
    F --> G
    G --> H[Chiplet 2]
    G --> I[Chiplet 3]

    H --> J[Return Data_X]
    I -- "Cache Data_Y for future use" --> K(LLC on Chiplet 3)
    J --> C --> L[Return Data_X to CPU]

4.2. Derivative: IoT-Monitored Thermal-Aware Routing

Enabling Description: Each GPU chiplet integrates a grid of thermal and voltage sensors (IoT sensors). These sensors provide a real-time thermal map of each die to the primary chiplet's crosslink controller. When routing a memory access request, the controller consults this live thermal data. If the target caching chiplet is approaching a thermal throttle point, the controller can temporarily offload a portion of its cache lines to a cooler, neighboring chiplet, and redirect the memory request accordingly. This dynamic thermal management balances performance and system longevity.

Mermaid Diagram:

stateDiagram-v2
    [*] --> Idle
    Idle --> Routing: Memory Request
    Routing: Entry/Consult Thermal Map
    Routing --> Normal_Path: Target Chiplet is Cool
    Routing --> Reroute_Path: Target Chiplet is Hot
    Normal_Path --> Serviced
    Reroute_Path: Action/Migrate hot cache lines
    Reroute_Path --> Serviced: Route to alternate chiplet
    Serviced --> [*]

    state Routing {
        direction LR
        Thermal_OK: check_temp() < threshold
        Thermal_Hot: check_temp() >= threshold
        [*] --> Thermal_OK
        [*] --> Thermal_Hot
    }

Axis 5: The "Inverse" or Failure Mode

5.1. Derivative: Graceful Degradation via Crosslink Fusing

Enabling Description: The passive crosslink incorporates electronically-fusible links on the communication traces connected to each chiplet. A built-in self-test (BIST) routine runs at boot time. If the BIST identifies a faulty GPU chiplet, the primary chiplet sends a high-voltage signal to the fusible links associated with that chiplet, physically and permanently disconnecting it from the crosslink. The system then boots in a degraded mode with fewer active chiplets, remapping the memory address space across the remaining functional units. This provides high system reliability for mission-critical applications.

Mermaid Diagram:

flowchart TD
    Start --> A{Power-On Self Test}
    A -- All Chiplets OK --> B[Normal Operation]
    A -- Chiplet 3 FAILS --> C{Primary Chiplet Controller}
    C --> D[Send high-voltage signal to fuses for Chiplet 3]
    D --> E[Chiplet 3 Electrically Isolated]
    E --> F[Remap Address Space across remaining Chiplets]
    F --> G[Boot in Degraded Mode]
    B --> End
    G --> End

Combination Prior Art Scenarios (Integration with Open Standards)

Scenario 1: Combination with RISC-V and TileLink

Description: The system is implemented within a RISC-V based System-on-Chip (SoC). The "CPU" is a multi-core RISC-V processor cluster (e.g., using the BOOM core design). The "bus" connecting the CPU to the primary GPU chiplet is the open-source TileLink cache-coherent interconnect standard. The GPU chiplet array functions as a TileLink agent. Memory access requests are TileLink transactions. The primary chiplet's crosslink controller is responsible for translating TileLink requests into the internal protocol used across the passive crosslink, making the entire GPU chiplet array appear as a single, coherent TileLink peripheral.

Scenario 2: Combination with OpenCL and SPIR-V

Description: The multi-chiplet GPU is exposed to software through an OpenCL 3.0-compliant driver. Applications written in OpenCL are compiled into the SPIR-V intermediate representation. The custom GPU driver contains a JIT compiler that translates SPIR-V into machine code native to the chiplet architecture. The driver is responsible for abstracting the distributed nature of the hardware. It manages buffer allocation across the memories of the different chiplets and translates global memory accesses into the appropriate primary chiplet requests as described in the '803 patent's method. To the application, the device appears as a single OpenCL compute device with a large, unified global memory.

Scenario 3: Combination with CHIPS Alliance Bunch of Wires (BoW)

Description: The physical layer of the "passive crosslink" is a direct implementation of the Bunch of Wires (BoW) die-to-die interconnect specification from the CHIPS Alliance. The PHY regions on each GPU chiplet are BoW-compliant PHYs. The "passive crosslink" is an interposer with passive traces that conform to the BoW channel specifications for a given process node. The protocol for routing memory requests is layered on top of the BoW physical signaling standard, leveraging an open, industry-vetted standard for the chiplet-to-chiplet electrical interface.