Derivative works — US Patent 8307116

This document serves as a defensive publication of technical concepts and variations that build upon, extend, or modify the architecture described in U.S. Patent No. 8,307,116. The purpose is to place these concepts in the public domain, thereby establishing them as prior art.

1. Material & Component Substitutions

1.1. Optical Waveguide Interconnect

Enabling Description: The physical communication channels are implemented not as metallic wires, but as integrated silicon photonic waveguides. Each processing node (104) incorporates a micro-ring resonator modulator and a germanium photodetector for E-O (Electrical-to-Optical) and O-E (Optical-to-Electrical) conversion. Data is transmitted as wavelength-division multiplexed (WDM) light signals along the horizontal (505) and vertical (503) bus waveguides. The router (118) within each node acts as an optical add-drop multiplexer (OADM). This architecture significantly increases bandwidth and reduces RC delay and power consumption compared to copper interconnects, while maintaining the two-hop routing characteristic. The shared-medium nature is achieved by broadcasting a light signal on a specific wavelength across the waveguide, which can be tapped by any node on that bus.

graph TD
    subgraph "Node A (Source)"
        P1[Processor Core] --> E1[E-O Modulator]
    end
    subgraph "Node B (Intermediate)"
        P2[Processor Core]
        OADM2[Optical Add-Drop Mux]
        E2[E-O Modulator]
        D2[O-E Detector]
        P2 --- D2
        E2 --- OADM2
    end
    subgraph "Node C (Destination)"
        P3[Processor Core]
        D3[O-E Detector]
        P3 --- D3
    end

    E1 -- Light Signal (λ1) --> WaveguideX[Horizontal Waveguide Bus]
    WaveguideX --> OADM2
    OADM2 -- Drop λ1 --> D3
    OADM2 -- Pass-through --> WaveguideX

1.2. Millimeter-Wave (mmWave) Wireless On-Chip Network

Enabling Description: The chip substrate is engineered with integrated antennas and transceivers at each node, operating in the 60 GHz band. The horizontal and vertical communication channels are established as line-of-sight wireless links. Beamforming techniques are employed by the node's router/transceiver to direct transmissions along a specific row or column, effectively creating a "wireless bus." This allows multiple destination nodes along that axis to receive the broadcasted packet. This approach eliminates metal routing congestion, reduces crosstalk, and allows for dynamic reconfiguration of the network topology by altering beamforming patterns. The two-hop routing is preserved: one hop along the source node's row and a second hop along the destination's column.

graph TD
    subgraph "Node (1,1)"
        N11[CPU] -- Data --> T11(Tx/Rx)
    end
    subgraph "Node (1,4)"
        N14[CPU]
        T14(Tx/Rx)
    end
    subgraph "Node (4,4)"
        N44[CPU]
        T44(Tx/Rx)
    end

    T11 -- "Wireless Hop 1 (Row 1)" --o T14
    T14 -- "Wireless Hop 2 (Column 4)" --> T44
    T44 -- Data --> N44[CPU]

    style N11 fill:#f9f,stroke:#333,stroke-width:2px
    style N44 fill:#ccf,stroke:#333,stroke-width:2px

1.3. Memristive Crossbar for In-Memory Computing Nodes

Enabling Description: The core logic of each node (104) is replaced or augmented with a memristive crossbar array, enabling processing-in-memory (PIM) capabilities. The multinodal array (102) thus becomes a network of PIM nodes. The shared communication channels (405) are used not only for data routing but also for distributing programming voltages and control signals to configure the memristive states for specific matrix-vector multiplication tasks, common in AI workloads. A data packet from a source node can contain both operands and control signals, which are broadcast along a row bus to configure a set of PIM nodes simultaneously for a parallel computation phase. The second hop can then be used to collect and aggregate results from a column of PIM nodes.

sequenceDiagram
    participant Controller
    participant Node_1_1 as Source PIM Node
    participant Node_1_2 as Dest PIM Node
    participant Node_3_2 as Dest PIM Node

    Controller->>Node_1_1: Send Data & Opcode
    Node_1_1->>Node_1_2: Hop 1 (Horizontal Bus): Broadcast Compute Task
    Node_1_2->>Node_3_2: Hop 2 (Vertical Bus): Route intermediate result
    Node_3_2-->>Controller: Final result

2. Operational Parameter Expansion

2.1. Cryogenic Superconducting Interconnect for Quantum Controllers

Enabling Description: The system is implemented using superconducting niobium or similar alloys, operating at cryogenic temperatures (e.g., < 4 Kelvin). The physical communication channels (303) are superconducting transmission lines with near-zero resistance, enabling extremely low-power and high-speed data transfer. The routing devices (118) are based on Josephson junctions, such as Rapid Single Flux Quantum (RSFQ) logic. This architecture is designed as a control plane for a large-scale quantum processor, where each "node" (104) is a classical control and measurement co-processor responsible for a small group of qubits. The two-hop latency guarantee is critical for distributing control pulses and reading out qubit states with minimal decoherence.

graph TD
    subgraph "Cryostat (T < 4K)"
        A[Control Node 1<br>(RSFQ Logic)]
        B[Control Node 2<br>(RSFQ Logic)]
        C[Control Node 3<br>(RSFQ Logic)]
        D[Control Node 4<br>(RSFQ Logic)]

        A -- Superconducting<br>Row Bus --> B
        B -- Superconducting<br>Column Bus --> D
    end

    QPU[Quantum Processing Unit] <--> A
    QPU <--> B
    QPU <--> C
    QPU <--> D

    style QPU fill:#99f,stroke:#333,stroke-width:2px

2.2. Terabit-Scale Wafer-Scale Integration

Enabling Description: The architecture is scaled up from a single chip to a full 300mm silicon wafer, creating a "wafer-scale engine." The array (102) consists of thousands of nodes (e.g., 64x64 grid). The communication channels are implemented as on-wafer transmission lines. Due to the long distances, repeaters and re-timers are integrated directly into the bus structure at regular intervals. The number of rows of channels (303) per a row of nodes remains equal to the number of nodes in that row (e.g., 64 channels per row of 64 nodes). This maintains the two-hop routing property even at this extreme scale, enabling ultra-low latency communication across the entire wafer for applications like large-scale AI model training or complex physics simulations.

graph LR
    subgraph "Wafer-Scale System"
        direction LR
        subgraph "Row 1"
            N11(Node 1,1) -- Bus1.1 --> N12(Node 1,2) -- Bus1.2 --> ... -- Bus1.n --> N1n(Node 1,n)
        end
        subgraph "Row i"
             Ni1(Node i,1) -- Busi.1 --> Ni2(Node i,2) -- Busi.2 --> ... -- Busi.n --> Nin(Node i,n)
        end
        subgraph "Row m"
             Nm1(Node m,1) -- Busm.1 --> Nm2(Node m,2) -- Busm.2 --> ... -- Busm.n --> Nmn(Node m,n)
        end
    end
    Ni1 -- Vertical Bus j=1 --> Nm1
    Ni2 -- Vertical Bus j=2 --> Nm2
    Nin -- Vertical Bus j=n --> Nmn

2.3. Radiation-Hardened Variant for Aerospace Applications

Enabling Description: The system is fabricated using a Silicon-on-Insulator (SOI) process or with radiation-hardening by design (RHBD) techniques, making it resistant to single-event upsets (SEUs) and total ionizing dose (TID) effects in space environments. Each node's router (118) and processor (617) includes triple-modular redundancy (TMR) on its critical state machines and registers. The communication channels feature error-correcting codes (ECC) implemented in hardware. A failed node can be logically bypassed by its neighbors, and the core/node controller (116) can dynamically re-map tasks. The inherent path diversity of the grid, combined with the two-hop routing, allows for rapid rerouting around a damaged node or channel with minimal performance degradation.

graph TD
    subgraph "Rad-Hard Node"
        CPU1[Core A] -- Vote --> V(Voter)
        CPU2[Core B] -- Vote --> V
        CPU3[Core C] -- Vote --> V
        V -- Corrected Output --> R[Router]
    end
    R -- "ECC-Protected Bus" --> NeighborNode[Neighbor Node]

3. Cross-Domain Applications

3.1. Automotive Sensor Fusion and ADAS Control

Enabling Description: The multinodal array is implemented as the central processing unit for an Advanced Driver-Assistance System (ADAS). Each node (104) is a specialized processing core: some are DSPs for radar signal processing, others are GPU-like for camera image recognition, and others are control-oriented CPUs for decision-making. For example, Node_LIDAR processes raw point cloud data, Node_CAMERA processes video streams, and Node_RADAR processes object tracking. These sensor nodes broadcast their processed object lists onto their respective row buses. A central "Fusion & Planning" node can subscribe to these broadcasts. It receives data from the LIDAR node in one hop (on its column bus) and from the CAMERA node in two hops (row bus, then column bus). This guarantees low-latency data aggregation for critical path planning and emergency braking decisions.

sequenceDiagram
    participant LIDAR_Node
    participant RADAR_Node
    participant CAMERA_Node
    participant FUSION_Node

    LIDAR_Node ->> Row_Bus_1: Broadcast(LidarObjects)
    CAMERA_Node ->> Row_Bus_2: Broadcast(CameraObjects)
    RADAR_Node ->> Row_Bus_1: Broadcast(RadarObjects)

    Note over FUSION_Node: Receives broadcasts on Column_Bus_3
    FUSION_Node ->> FUSION_Node: Process & Fuse Data
    FUSION_Node ->> Actuator_Control: Issue Drive Commands

3.2. Distributed Genomics Sequencing Analysis Pipeline

Enabling Description: The architecture is used to accelerate the Burrows-Wheeler Aligner (BWA) or similar sequence alignment algorithms. The reference genome is partitioned and distributed across the main memory (120) accessible by different node groups. Raw sequencing reads (data 202) are streamed into the array. Nodes in the first few rows perform initial quality control and k-mer counting. The results are passed via row/column buses to subsequent rows of nodes that perform alignment against their assigned genome partitions. The final row of nodes performs a "reduce" operation, aggregating alignment scores from all partitions to identify variants. The two-hop multicast capability is used to efficiently distribute a single read to all nodes responsible for a chromosome, drastically reducing data movement overhead.

flowchart TD
    A[Raw DNA Reads] --> B(Node Group 1: Pre-processing & QC)
    B -- Hop 1: Horizontal Broadcast --> C{Node Group 2: Parallel Alignment}
    C -- Hop 2: Vertical Aggregation --> D(Node Group 3: Variant Calling)
    D --> E[Final Alignment Map]

3.3. Smart Fabric and Wearable Sensor Networks

Enabling Description: The "nodes" are miniaturized, flexible silicon dies or other micro-controllers woven into a textile. The "physical communication channels" are conductive threads forming a grid pattern within the fabric. Each node could contain a sensor (e.g., temperature, strain, EKG). A master node (e.g., near the garment's power source) can query the entire sensor grid. For instance, to read all temperature sensors in a specific region, it sends a multicast request along a row bus (first hop). The relevant nodes reply on their respective column buses (second hop). This allows for rapid, low-power polling of a large-area sensor surface, making it suitable for medical monitoring garments or athletic performance tracking apparel.

graph TD
    subgraph "Smart Fabric Grid"
        direction TB
        N11(T°) -- w11 --> N12(EKG) -- w12 --> N13(Acc)
        N21(T°) -- w21 --> N22(EKG) -- w22 --> N23(Acc)
        N31(T°) -- w31 --> N32(EKG) -- w32 --> N33(Acc)

        N11 -- v11 --> N21 -- v21 --> N31
        N12 -- v12 --> N22 -- v22 --> N32
        N13 -- v13 --> N23 -- v23 --> N33
    end
    Master[Master Controller] -- Query --> N11
    N11 -- Hop 1 (Row Bus) --> N12
    N12 -- Hop 2 (Col Bus) --> Master

4. Integration with Emerging Technologies

4.1. AI-Driven Predictive Routing and Thermal Management

Enabling Description: A lightweight neural network (NN) is implemented within the core/node controller (116) or distributed across dedicated subnodes (301). This NN monitors traffic patterns and thermal sensor data from each node (104). It predicts future congestion hotspots and thermal throttling events. Based on these predictions, it dynamically adjusts routing tables or channel priorities to proactively steer traffic away from congested or hot regions, even if it means occasionally taking a non-minimal (more than two hops) but faster path. This allows the network to adapt to workload phases (e.g., from compute-bound to memory-bound) and maintain higher overall system throughput.

graph TD
    subgraph "Node Array"
        N1(Node 1) -- Traffic/Temp Data --> C
        N2(Node 2) -- Traffic/Temp Data --> C
        N3(Node 3) -- Traffic/Temp Data --> C
        N... -- ... --> C
    end
    subgraph "Controller"
        C(Core Controller)
        NN[AI Prediction Engine]
        C -- Live Data --> NN
        NN -- "Predictive Routing<br> & Power Gating" --> C
        C -- "Update Routing Tables" --> R1(Router 1)
        C -- "Update Routing Tables" --> R2(Router 2)
        C -- "Update Routing Tables" --> R3(Router 3)
    end

4.2. IoT Sensor Hub with On-Chip Event Processing

Enabling Description: The multinodal architecture is used in an edge IoT gateway. Each node is connected to an external sensor (e.g., camera, microphone, temperature sensor). The shared communication channels (405) allow for efficient event-driven communication. When a sensor node detects an event (e.g., motion detected by Node_A), it broadcasts an alert packet on its row bus. Other nodes on the same row can listen and correlate this event with their own data (e.g., Node_B hears a sound). A "fusion" node on a different row but same column can receive both alerts within two hops and trigger a higher-level action, such as transmitting a compressed video stream to the cloud. This avoids flooding a central processor with raw sensor data.

sequenceDiagram
    participant Sensor_A as Node (1,1)
    participant Sensor_B as Node (1,3)
    participant Fusion_Node as Node (4,1)
    participant Cloud

    Sensor_A->>Row Bus 1: MOTION_DETECTED event
    Sensor_B->>Row Bus 1: SOUND_DETECTED event
    Fusion_Node->>Column Bus 1: Listen for events
    Note right of Fusion_Node: Receives MOTION event
    Fusion_Node->>Fusion_Node: Correlate events
    Fusion_Node->>Cloud: Send High-Priority Alert

4.3. Hardware-Accelerated Distributed Ledger

Enabling Description: The multinodal array functions as a dedicated hardware platform for a private blockchain or distributed ledger technology (DLT). Each node (104) acts as a validator, containing a hardware cryptographic engine. When a new transaction is introduced by one node, it uses the shared bus mechanism to multicast the transaction to all other nodes in its row (first hop). These nodes, in turn, forward it along their respective columns to reach every node in the array within two hops. This provides a highly efficient and low-latency gossip protocol for transaction dissemination. Consensus algorithms (e.g., a simplified pBFT) are then executed in hardware across the nodes, with voting messages also using the two-hop broadcast mechanism.

flowchart LR
    subgraph Node_A
        A[Propose Block]
    end
    subgraph Row_1_Nodes
        B[Node B]
        C[Node C]
    end
    subgraph Column_B_Nodes
        D[Node D]
    end
    subgraph Column_C_Nodes
        E[Node E]
    end

    A -- "Hop 1: Multicast on Row 1" --> B & C
    B -- "Hop 2: Multicast on Col B" --> D
    C -- "Hop 2: Multicast on Col C" --> E

5. The "Inverse" or Failure Mode

5.1. Graceful Degradation to Torus/Mesh Topology

Enabling Description: The system is designed to handle permanent hardware faults in the shared row/column buses. The core controller (116) includes a built-in self-test (BIST) unit that detects faulty channels at boot-up. If a shared channel (e.g., shared_row_bus_3) is non-functional, the controller reconfigures the routers of the nodes along that row to use their direct point-to-point links (401, 407, 411) to neighbors. The routing algorithm is switched from the two-hop bus-based protocol to a standard dimension-ordered routing (DOR) for a mesh or torus. While this increases average latency beyond two hops, it allows the system to remain operational, albeit in a degraded-performance mode, providing high system availability.

stateDiagram-v2
    state "Full Performance (2-Hop)" as S1
    state "Degraded Mode (Mesh Routing)" as S2

    [*] --> S1: Power-On Self-Test (POST) OK
    S1 --> S2: Bus Failure Detected
    S2 --> S1: System Reset / Repair
    S1 --> [*]: Shutdown
    S2 --> [*]: Shutdown

5.2. Power-Gated Sub-Array Operation

Enabling Description: The multinodal array is divided into quadrants or "power domains." During periods of low activity, the core/node controller (116) can completely power-gate one or more of these quadrants to save static leakage power. The routing devices (118) at the boundaries of the active quadrants are aware of the powered-down regions. If a packet needs to be routed to a destination in a powered-down quadrant, it is either buffered until the quadrant is re-activated, or the routing protocol dynamically calculates a path around the inactive region using only active nodes and channels, effectively treating the powered-down section as a physical obstacle.

graph TD
    subgraph Active_Quadrant_1
        A1(Node) -- B1 --- B2(Node)
    end
    subgraph Active_Quadrant_2
        A2(Node) -- B3 --- B4(Node)
    end
    subgraph "Powered-Down Quadrant 3"
        style "Powered-Down Quadrant 3" fill:#ddd,stroke:#333,stroke-dasharray: 5 5
        A3(...)
        B5(...)
    end
    subgraph "Powered-Down Quadrant 4"
        style "Powered-Down Quadrant 4" fill:#ddd,stroke:#333,stroke-dasharray: 5 5
        A4(...)
        B6(...)
    end
    A1 -- Active Link --> A2
    A1 -. Rerouted Link .-> B4
    A1 -. Inactive Link .-> A3
    A2 -. Inactive Link .-> A4

6. Combination Prior Art Scenarios

6.1. Combination with RISC-V ISA and TileLink Protocol

Enabling Description: The processing nodes (104) are implemented as RISC-V CPU cores, utilizing the open-standard instruction set architecture. The network interface of each node is designed to be compliant with the open TileLink cache-coherency protocol. The two-hop, shared-bus architecture of the '116 patent is used as the physical transport layer for TileLink messages (e.g., Get, Grant, Probe, Release). A Probe message, which must be broadcast to all sharers of a cache line, can be efficiently implemented using a single row or column broadcast (one hop) to a subset of nodes, or a two-hop sequence for a full-chip broadcast, significantly outperforming a serialized point-to-point mesh for snoopy coherence traffic.

6.2. Combination with AMBA AXI4-Stream Protocol

Enabling Description: The routers (118) and node processors (617) are designed with standard AMBA AXI4-Stream interfaces. This allows for the seamless integration of third-party IP cores (e.g., a hardware video encoder, a DMA engine) as subnodes (301) within a larger node. The '116 network fabric acts as a high-speed, low-latency AXI4-Stream switch. The TDEST signal in the AXI4-Stream protocol is used to encode the destination node's (X, Y) coordinates, and the TLAST signal indicates the end of a packet. The shared communication channels (507, 509) can directly transport AXI4-Stream packets, enabling a "plug-and-play" environment for diverse IP blocks on a single chip.

6.3. Combination with DDS (Data Distribution Service) Standard

Enabling Description: The on-chip network is configured to function as a hardware-accelerated middleware layer implementing the OMG Data Distribution Service (DDS) standard. Each processing node (104) runs a lightweight DDS participant. DDS "Topics" are mapped to multicast addresses on the network. When a node publishes data to a Topic, the node controller (116) translates this into a multicast packet (609) with the bitmask (609(2)) set to target all subscribing nodes. The two-hop architecture guarantees that data is delivered to all subscribers on-chip with predictable, low-latency timing, which is a key requirement for real-time systems that use DDS, such as robotics and industrial control. This offloads the DDS communication from software to a dedicated hardware fabric.