Patent 11841803
Derivative works
Defensive disclosure: derivative variations of each claim designed to render future incremental improvements obvious or non-novel.
Active provider: Google · gemini-2.5-pro
Derivative works
Defensive disclosure: derivative variations of each claim designed to render future incremental improvements obvious or non-novel.
Defensive Disclosure for U.S. Patent 11,841,803
Publication Date: May 13, 2026
Reference Patent: US 11,841,803 B2, "GPU chiplets using high bandwidth crosslinks"
Purpose: This document discloses a plurality of derivative inventions and improvements upon the '803 patent to place them in the public domain. The detailed descriptions and diagrams provided herein are intended to establish prior art against future patent applications claiming these or similar concepts, rendering them obvious under 35 U.S.C. § 103 or anticipated under 35 U.S.C. § 102.
Derivatives Based on Core System Claims (Ref: Claim 1)
Axis 1: Material & Component Substitution
1.1. Derivative: Organic Interposer with Embedded Waveguides
- Enabling Description: The passive silicon crosslink is substituted with a multi-layer organic substrate interposer. This interposer contains embedded polymer optical waveguides for inter-chiplet communication, replacing electrical traces. Each GPU chiplet is flip-chip bonded to the organic interposer and uses micro-lenses and vertical-cavity surface-emitting lasers (VCSELs) for optical signal transmission and photo-diodes for reception. This architecture provides higher bandwidth and lower crosstalk compared to electrical interconnects. The routing of optical signals is passive, determined by the physical layout of the waveguides.
- Mermaid Diagram:
graph TD subgraph CPU_Module CPU("Central Processing Unit (RISC-V Core)") end subgraph Multi_Chiplet_Module A[GPU Chiplet 1] -- Optical/Electrical via Bus --> CPU B[GPU Chiplet 2] C[GPU Chiplet N] subgraph Organic_Interposer direction LR WG1(Polymer Waveguide 1) WG2(Polymer Waveguide 2) WG3(Polymer Waveguide 3) end A -- VCSEL/Photodiode --> WG1 B -- VCSEL/Photodiode --> WG1 A -- VCSEL/Photodiode --> WG2 C -- VCSEL/Photodiode --> WG2 B -- VCSEL/Photodiode --> WG3 C -- VCSEL/Photodiode --> WG3 end style Organic_Interposer fill:#f9f,stroke:#333,stroke-width:2px
1.2. Derivative: Gallium-Nitride (GaN) High-Frequency Crosslink
- Enabling Description: The passive crosslink is fabricated from a Gallium-Nitride (GaN) substrate. GaN's properties allow for significantly higher frequency operation (>100 GHz) and better thermal conductivity than silicon. The inter-chiplet communication PHYs on the GPU chiplets are specifically designed to drive GaN-compatible transmission lines, enabling ultra-high-speed data transfer. The passive crosslink itself contains no active components, only impedance-matched microstrip or coplanar waveguides etched into the GaN substrate.
- Mermaid Diagram:
sequenceDiagram participant CPU participant GPU_Chiplet_1 participant GaN_Crosslink participant GPU_Chiplet_2 CPU->>GPU_Chiplet_1: Memory Request (via CXL Bus) GPU_Chiplet_1->>GaN_Crosslink: Route Request (High-Freq Electrical Signal) Note over GaN_Crosslink: Passive propagation over GaN waveguides GaN_Crosslink->>GPU_Chiplet_2: Request Arrives GPU_Chiplet_2-->>GaN_Crosslink: Return Data GaN_Crosslink-->>GPU_Chiplet_1: Data Arrives GPU_Chiplet_1-->>CPU: Return Data to CPU
Axis 2: Operational Parameter Expansion
2.1. Derivative: Cryogenic Superconducting Crosslink for Quantum Computing
- Enabling Description: The system is designed to operate at cryogenic temperatures (e.g., < 4 Kelvin) for use in a quantum computing control plane. The GPU chiplets are specialized processors for qubit state management. The passive crosslink is a superconducting interposer made of Niobium-Titanium (NbTi) alloy. The traces on the interposer are superconducting, offering zero electrical resistance and eliminating thermal noise. Communication is achieved via single flux quantum (SFQ) logic pulses, providing extremely low power and high-speed inter-chiplet signaling.
- Mermaid Diagram:
graph TD subgraph Cryostat [Cryogenic Environment < 4K] QPU("Quantum Processing Unit") subgraph Control_Plane C1("Qubit Control Chiplet 1") C2("Qubit Control Chiplet 2") SC("Superconducting Passive Crosslink (NbTi)") C1 -- SFQ Pulses --> SC C2 -- SFQ Pulses --> SC end QPU -- Control Lines --> C1 QPU -- Control Lines --> C2 end
2.2. Derivative: Massively Parallel Exascale Array
- Enabling Description: The architecture is scaled to an array of 256 or more GPU chiplets arranged in a 16x16 grid on a large-area silicon interposer. This "passive backplane" provides a mesh network topology. A memory access request from a host CPU is routed through a primary chiplet, which then uses a 2D mesh routing algorithm (e.g., dimension-ordered routing) to forward the request to the target chiplet. The address space is interleaved across all chiplets in the array, creating a massive, unified last-level cache.
- Mermaid Diagram:
graph TD Host[Host CPU] --> C_0_0 subgraph Chiplet_Array_256 C_0_0("Chiplet 0,0") --- C_0_1("Chiplet 0,1") C_0_0 --- C_1_0("Chiplet 1,0") C_0_1 --- C_0_2("...") C_0_1 --- C_1_1("Chiplet 1,1") C_1_0 --- C_1_1 C_1_0 --- C_2_0("...") C_1_1 --- C_1_2("...") C_1_1 --- C_2_1("...") end note "All connections are via passive silicon backplane"
Axis 3: Cross-Domain Application
3.1. Derivative: Automotive Sensor Fusion Engine
- Enabling Description: An automotive system for ADAS/autonomous driving features a central compute module with specialized chiplets. A primary chiplet serves as a task scheduler. It connects via a passive crosslink to dedicated chiplets for LiDAR point cloud processing, RADAR signal processing, and camera image processing (e.g., using a CNN accelerator). When the LiDAR chiplet needs correlated camera data for object classification, it requests the relevant data region from the camera chiplet via the passive crosslink, enabling real-time, low-latency sensor fusion.
- Mermaid Diagram:
flowchart LR subgraph Automotive_ECU Scheduler[Primary Chiplet] Lidar[LiDAR Processor Chiplet] Radar[RADAR Processor Chiplet] Camera[Camera Processor Chiplet] Crosslink{Passive Crosslink} Scheduler -- Control --> Crosslink Lidar <--> Crosslink Radar <--> Crosslink Camera <--> Crosslink end LiDAR_Sensor -- Point Cloud --> Lidar RADAR_Sensor -- Raw Data --> Radar Camera_Sensor -- Image Stream --> Camera Scheduler -- Fused Data --> Vehicle_CAN_Bus
3.2. Derivative: Distributed 5G/6G Baseband Processor
- Enabling Description: A 5G/6G base station utilizes a chiplet-based baseband processing unit. One chiplet handles the fronthaul interface (e.g., eCPRI). It distributes demodulation and decoding tasks to an array of identical processing chiplets via a high-bandwidth passive crosslink. Each processing chiplet is assigned a subset of users or frequency resource blocks. The crosslink is used for coordinating handovers and managing interference by allowing chiplets to share channel state information directly, without going through a central memory controller.
- Mermaid Diagram:
sequenceDiagram participant Antenna participant Fronthaul_Chiplet participant Crosslink participant Processor_Chiplet_1 participant Processor_Chiplet_2 Antenna->>Fronthaul_Chiplet: RF Data In Fronthaul_Chiplet->>Crosslink: Distribute User Data (User A) Crosslink->>Processor_Chiplet_1: Route Data for User A Fronthaul_Chiplet->>Crosslink: Distribute User Data (User B) Crosslink->>Processor_Chiplet_2: Route Data for User B Note over Processor_Chiplet_1, Processor_Chiplet_2: Independent Demodulation Processor_Chiplet_1->>Crosslink: Share Channel State Info Crosslink->>Processor_Chiplet_2: Forward CSI for Interference Mitigation
Axis 4: Integration with Emerging Tech
4.1. Derivative: AI-Managed Predictive Caching
- Enabling Description: The passive crosslink controller on the primary GPU chiplet incorporates a lightweight, hardware-accelerated neural network. This AI model is trained to predict future memory access patterns based on the instruction stream from the CPU. When a memory access request is received, the model predicts which chiplet will be needed next. It then speculatively issues a pre-fetch command across the passive crosslink to that chiplet, moving the anticipated data into its last-level cache before it is explicitly requested. This reduces effective memory latency.
- Mermaid Diagram:
flowchart TD A[CPU sends Memory Request Addr_X] --> B{Primary Chiplet Receives} B --> C[Passive Crosslink Controller] C --> D{AI Predictive Model} D -- "Predicts next request is Addr_Y on Chiplet 3" --> E[Issue Speculative Prefetch for Addr_Y] C -- "Request Addr_X is on Chiplet 2" --> F[Route Request for Addr_X] E --> G(Crosslink) F --> G G --> H[Chiplet 2] G --> I[Chiplet 3] H --> J[Return Data_X] I -- "Cache Data_Y for future use" --> K(LLC on Chiplet 3) J --> C --> L[Return Data_X to CPU]
4.2. Derivative: IoT-Monitored Thermal-Aware Routing
- Enabling Description: Each GPU chiplet integrates a grid of thermal and voltage sensors (IoT sensors). These sensors provide a real-time thermal map of each die to the primary chiplet's crosslink controller. When routing a memory access request, the controller consults this live thermal data. If the target caching chiplet is approaching a thermal throttle point, the controller can temporarily offload a portion of its cache lines to a cooler, neighboring chiplet, and redirect the memory request accordingly. This dynamic thermal management balances performance and system longevity.
- Mermaid Diagram:
stateDiagram-v2 [*] --> Idle Idle --> Routing: Memory Request Routing: Entry/Consult Thermal Map Routing --> Normal_Path: Target Chiplet is Cool Routing --> Reroute_Path: Target Chiplet is Hot Normal_Path --> Serviced Reroute_Path: Action/Migrate hot cache lines Reroute_Path --> Serviced: Route to alternate chiplet Serviced --> [*] state Routing { direction LR Thermal_OK: check_temp() < threshold Thermal_Hot: check_temp() >= threshold [*] --> Thermal_OK [*] --> Thermal_Hot }
Axis 5: The "Inverse" or Failure Mode
5.1. Derivative: Graceful Degradation via Crosslink Fusing
- Enabling Description: The passive crosslink incorporates electronically-fusible links on the communication traces connected to each chiplet. A built-in self-test (BIST) routine runs at boot time. If the BIST identifies a faulty GPU chiplet, the primary chiplet sends a high-voltage signal to the fusible links associated with that chiplet, physically and permanently disconnecting it from the crosslink. The system then boots in a degraded mode with fewer active chiplets, remapping the memory address space across the remaining functional units. This provides high system reliability for mission-critical applications.
- Mermaid Diagram:
flowchart TD Start --> A{Power-On Self Test} A -- All Chiplets OK --> B[Normal Operation] A -- Chiplet 3 FAILS --> C{Primary Chiplet Controller} C --> D[Send high-voltage signal to fuses for Chiplet 3] D --> E[Chiplet 3 Electrically Isolated] E --> F[Remap Address Space across remaining Chiplets] F --> G[Boot in Degraded Mode] B --> End G --> End
Combination Prior Art Scenarios (Integration with Open Standards)
Scenario 1: Combination with RISC-V and TileLink
- Description: The system is implemented within a RISC-V based System-on-Chip (SoC). The "CPU" is a multi-core RISC-V processor cluster (e.g., using the BOOM core design). The "bus" connecting the CPU to the primary GPU chiplet is the open-source TileLink cache-coherent interconnect standard. The GPU chiplet array functions as a TileLink agent. Memory access requests are TileLink transactions. The primary chiplet's crosslink controller is responsible for translating TileLink requests into the internal protocol used across the passive crosslink, making the entire GPU chiplet array appear as a single, coherent TileLink peripheral.
Scenario 2: Combination with OpenCL and SPIR-V
- Description: The multi-chiplet GPU is exposed to software through an OpenCL 3.0-compliant driver. Applications written in OpenCL are compiled into the SPIR-V intermediate representation. The custom GPU driver contains a JIT compiler that translates SPIR-V into machine code native to the chiplet architecture. The driver is responsible for abstracting the distributed nature of the hardware. It manages buffer allocation across the memories of the different chiplets and translates global memory accesses into the appropriate primary chiplet requests as described in the '803 patent's method. To the application, the device appears as a single OpenCL compute device with a large, unified global memory.
Scenario 3: Combination with CHIPS Alliance Bunch of Wires (BoW)
- Description: The physical layer of the "passive crosslink" is a direct implementation of the Bunch of Wires (BoW) die-to-die interconnect specification from the CHIPS Alliance. The PHY regions on each GPU chiplet are BoW-compliant PHYs. The "passive crosslink" is an interposer with passive traces that conform to the BoW channel specifications for a given process node. The protocol for routing memory requests is layered on top of the BoW physical signaling standard, leveraging an open, industry-vetted standard for the chiplet-to-chiplet electrical interface.
Generated 5/13/2026, 12:13:29 AM