Derivative works — US Patent 8438120

Defensive Disclosure and Prior Art Generation

Document ID: DPD-8438120-20260501
Publication Date: 2026-05-01
Relates To: U.S. Patent 8,438,120 ("Machine learning hyperparameter estimation")
Abstract: This document discloses a series of derivative methods, systems, and applications related to the core teachings of U.S. Patent 8,438,120. The purpose of this disclosure is to place into the public domain a comprehensive set of variations, extensions, and alternative embodiments of an elitist, sampling-based hyperparameter optimization algorithm, thereby rendering them obvious to a person having ordinary skill in the art. The following disclosures expand upon the core method of iteratively updating a target hyperparameter vector estimate by using a selected "best-so-far" vector from the present and all previous iterations.

A. Derivative Variations on the Core Method

A1. Material & Component Substitution

1. Quasi-Random Sampling for Search Space Exploration

Enabling Description: The core method's reliance on pseudo-random sampling (drawing a random sample) can lead to clustering and non-uniform coverage of the hyperparameter search space. This variation replaces the pseudo-random number generator (PRNG) with a deterministic, low-discrepancy sequence generator, specifically a Sobol sequence or Halton sequence. For a d-dimensional hyperparameter space, the i-th sample vector X_t_i in iteration t is generated using the i-th point from the d-dimensional Sobol sequence. This ensures a more systematic and uniform exploration of the search space, which is particularly effective for high-dimensional hyperparameter vectors and can lead to faster convergence by avoiding redundant sampling in previously explored regions. The rest of the algorithm, including the selection of the elite vector E_t and the weighted update, remains unchanged.

Mermaid Diagram:

graph TD
    A(Start Iteration t) --> B{Generate N Samples};
    B --> B1[Use Sobol Sequence Generator];
    B1 --> C{Evaluate S(X_t_i) for all N samples};
    C --> D{Identify Best Current Sample X_t_best};
    D --> E{Compare S(X_t_best) with S(E_{t-1})};
    E --> F{Select Global Best E_t};
    F --> G{Update Target Vector v_t using E_t};
    G --> H(End Iteration t);

2. Alternative Weighting Functions Based on Non-Euclidean Metrics

Enabling Description: The weighting function W in claim 10 is based on a normalized Euclidean distance. This variation substitutes the Euclidean metric with alternative distance or similarity functions to better handle different hyperparameter topologies.
- Variant A (Manhattan Distance): For hyperparameters where dimensions are largely independent, the squared difference (X_ij - E_j)^2 is replaced with the absolute difference |X_ij - E_j|. This L1-norm is less sensitive to large outliers in a single dimension.
- Variant B (Cosine Similarity): For high-dimensional sparse vectors (e.g., tuning feature selection hyperparameters), the weighting function W is defined as the cosine similarity between the sample vector X_t_i and the elite vector E_t. This measures the orientation rather than the magnitude of the vectors, focusing the search on vectors pointing in a similar direction to the best-so-far solution.

Mermaid Diagram:

flowchart TD
    subgraph Update Step for Target v_t
        direction LR
        Sample(Sample Vector X_t_i)
        Elite(Elite Vector E_t)
        WeightFunc{Weighting Function W}
        UpdateEq[Update v_t Formula]

        Sample -- Pass to --> WeightFunc
        Elite -- Pass to --> WeightFunc
        WeightFunc -- W(X_t_i, E_t) --> UpdateEq
    end

    subgraph Weighting Function Implementations
        direction TB
        W1[Normalized Euclidean Distance (Claim 10)]
        W2[Manhattan Distance (L1-Norm)]
        W3[Cosine Similarity]
    end

    WeightFunc --- W1
    WeightFunc --- W2
    WeightFunc --- W3

3. Distributed State Management for Elite Vector

Enabling Description: In a large-scale, distributed computing environment, storing the elite vector E_t on a single node creates a single point of failure. This variation implements the state management (storage of E_t and v_t) using a distributed in-memory data grid or key-value store like Redis, Hazelcast, or Apache Ignite. The E_t vector and its performance score S(E_t) are stored as a key-value pair. Worker nodes performing the S(X_t_i) evaluation read the current E_{t-1} from the distributed store. The main controller process performs an atomic Compare-And-Swap (CAS) operation to update E_t only if a new sample X_t_i has a better score S(X_t_i) > S(E_{t-1}). This ensures consistency and fault tolerance.

Mermaid Diagram:

sequenceDiagram
    participant Controller
    participant WorkerNodes
    participant DistributedCache as (Redis/Ignite)

    Controller->>DistributedCache: Set E_0 (initial elite vector)
    loop Iteration t
        Controller->>WorkerNodes: Dispatch Sample Generation Task
        WorkerNodes-->>WorkerNodes: Generate X_t_i, Evaluate S(X_t_i)
        WorkerNodes->>DistributedCache: Read S(E_{t-1})
        alt S(X_t_i) > S(E_{t-1})
            WorkerNodes->>DistributedCache: Atomic UPDATE E_t = X_t_i
        end
        Controller->>DistributedCache: Read all X_t_i and final E_t
        Controller-->>Controller: Calculate and update v_t
    end

A2. Operational Parameter Expansion

1. Industrial-Scale Optimization for Foundation Models

Enabling Description: This disclosure describes the application of the method to tune the vast number of hyperparameters in a large language or vision foundation model (e.g., >100 billion parameters). The hyperparameter vector X includes not just scalar values like learning rate but also architectural choices, such as the number of attention heads, layer dimensions, and activation functions, which are encoded numerically. The performance evaluation S(X) involves a partial training run of the massive model on a multi-petabyte dataset, executed on a cluster of thousands of TPUs or GPUs. The state E_t is managed via a distributed consensus protocol (e.g., Paxos) to ensure that all compute nodes agree on the current best-known configuration before a new evaluation run is initiated.

Mermaid Diagram:

graph TD
    subgraph Control_Plane
        A[Optimizer Controller]
        B[State Store (Paxos/Raft)]
        A -- Manages --> B
    end

    subgraph Data_Plane
        C1(TPU/GPU Pod 1)
        C2(TPU/GPU Pod 2)
        C3(...)
        C4(TPU/GPU Pod N)
        C1 -- S(X) Evaluation --> D{Partial Training Run}
        C2 -- S(X) Evaluation --> D
        C4 -- S(X) Evaluation --> D
    end
    A --> C1 & C2 & C4
    D -- Performance Score --> A
    A -- Update E_t --> B
    B -- Read E_{t-1} --> A

2. On-Device Tuning for TinyML Applications

Enabling Description: The method is adapted for resource-constrained embedded systems and microcontrollers (MCUs). The algorithm is implemented using 8-bit or 16-bit integer arithmetic to reduce memory and power consumption. The random sampling is performed over a quantized and heavily constrained hyperparameter space. The performance function S(X) is the inference accuracy and latency measured directly on the MCU using a small, representative validation dataset stored in flash memory. This allows a device, such as a smart sensor, to self-tune its onboard anomaly detection model in the field without requiring a connection to the cloud. The "best-so-far" E_t vector is persisted to non-volatile memory to survive power cycles.

Mermaid Diagram:

stateDiagram-v2
    state "On-Device Optimizer" as Optimizer {
        [*] --> Idle
        Idle --> Sampling: Power On / Trigger
        Sampling: Generate quantized X_t_i
        Sampling --> Evaluating
        Evaluating: Run inference on local data
        Evaluating --> Updating: All samples evaluated
        Updating: Identify E_t, Update v_t
        Updating --> Idle: Iteration complete
        Updating --> Persist: Write E_t to NVM
        Persist --> Idle
    }

A3. Cross-Domain Application

1. Aerospace: Adaptive GNC for Deep Space Probes

Enabling Description: The method is used to perform in-flight optimization of a spacecraft's Guidance, Navigation, and Control (GNC) system. The hyperparameter vector X consists of PID controller gains, Kalman filter process noise parameters (Q), and reaction wheel control allocation parameters. The performance function S(X) is a multi-objective function evaluated in simulation onboard the spacecraft, rewarding low fuel consumption, high pointing accuracy, and minimal actuator stress. The elitist mechanism E_t ensures that a known, stable GNC configuration is always preserved, preventing the system from converging to an unsafe state while exploring new parameter sets to compensate for hardware degradation over a multi-year mission.

Mermaid Diagram:

flowchart LR
    subgraph On-Board Flight Computer
        GNC[GNC System]
        SIM[Physics Simulator]
        OPT[Optimizer (Method of '120)]
        State[Telemetry Data]

        OPT -- Sample HP Vector X --> SIM
        SIM -- Simulated Performance --> OPT
        OPT -- Best HP Vector E_t --> GNC
        GNC -- Controls --> Actuators
        Actuators -- State --> State
        State -- Inputs --> GNC & SIM
    end

2. AgTech: Real-Time Vision Model Tuning for Smart Harvesters

Enabling Description: A smart harvester uses the method to continuously tune the hyperparameters of its onboard computer vision model, which differentiates between ripe produce, unripe produce, and foreign objects. The hyperparameter vector X includes image augmentation parameters (brightness, contrast ranges), confidence thresholds, and non-maximum suppression (NMS) thresholds. S(X) is the F1-score of the classifier, evaluated on a small, continuously updated dataset labeled by a human supervisor via a remote interface. The "best-so-far" vector E_t allows the harvester to maintain robust performance as environmental conditions like sunlight, shadows, and humidity change throughout the day.

Mermaid Diagram:

sequenceDiagram
    participant Supervisor
    participant HarvesterVisionSystem
    participant Optimizer

    loop Continuous Operation
        HarvesterVisionSystem->>Optimizer: Request new HPs
        Optimizer-->>HarvesterVisionSystem: Provide v_t
        HarvesterVisionSystem->>HarvesterVisionSystem: Classify produce using v_t
        Supervisor->>HarvesterVisionSystem: Provide corrections (labels)
        HarvesterVisionSystem->>Optimizer: Send new labeled data
        Optimizer->>Optimizer: Run one iteration, update E_t and v_t
    end

3. Consumer Electronics: Personalized Active Noise Cancellation (ANC)

Enabling Description: In high-end headphones, the method optimizes the coefficients of the adaptive filters used for active noise cancellation. The hyperparameter vector X defines parameters for the ANC algorithm, such as the filter order, step size of the adaptive algorithm (e.g., LMS/NLMS), and leakage factors. The performance S(X) is a measure of the noise reduction achieved, calculated by comparing the signal from an internal microphone (inside the earcup) with the signal from an external microphone. This optimization runs in the background on the headphone's DSP, continuously adapting the ANC profile to the specific user's ear shape and the ambient noise environment, preserving the best-found profile E_t as the user's personal default.

Mermaid Diagram:

graph TD
    ExtMic[External Mic] --> DSP
    IntMic[Internal Mic] --> DSP
    Speaker --> IntMic
    DSP -- Controls --> Speaker
    subgraph DSP
        ANC[ANC Filter Algorithm]
        OPT[Optimizer (Method of '120)]
        ANC -- Error Signal --> OPT
        OPT -- Updates HP Vector --> ANC
    end

A4. Integration with Emerging Tech

1. AI-Driven Meta-Optimization

Enabling Description: The optimization method itself is wrapped by a higher-level meta-learning agent, such as a reinforcement learning (RL) agent. The '120 method's own parameters (N - sample size, ρ - elite fraction) are the "actions" that the RL agent can take. The "state" is the convergence history of the hyperparameter search (e.g., the rate of improvement of S(E_t)). The "reward" is high for fast convergence and low for stagnation. The RL agent learns a policy to dynamically adjust N and ρ during the optimization run, effectively learning how to best run the search algorithm for a given class of problems.

Mermaid Diagram:

flowchart TD
    subgraph Meta-Learner (RL Agent)
        A[Observe State: Convergence Rate of S(E_t)]
        B[Select Action: Adjust N, ρ]
        C[Receive Reward: + for improvement, - for stagnation]
        A --> B --> C --> A
    end
    subgraph HP_Optimizer ('120 Method)
        D[Run Iteration with current N, ρ]
        E[Update E_t, v_t]
        D --> E
    end
    Meta-Learner -- Action: Set N, ρ --> HP_Optimizer
    HP_Optimizer -- State: S(E_t) history --> Meta-Learner

2. Blockchain-Audited Hyperparameter Search for Regulated AI

Enabling Description: For AI models in regulated fields like medicine or finance, this variation provides a tamper-proof audit trail of the model tuning process. The hyperparameter optimization process is controlled by a decentralized application (DApp). In each iteration t, the controller hashes the set of sampled vectors X_t and their performance scores, along with the resulting E_t and v_t. This hash is stored on a public or private blockchain (e.g., Ethereum, Hyperledger Fabric) as part of a transaction. The full data is stored off-chain (e.g., in IPFS) and linked by the on-chain hash. This creates an immutable, time-stamped record, allowing a regulator to perfectly reconstruct and verify the entire optimization history that led to the final "optimal" hyperparameters.

Mermaid Diagram:

sequenceDiagram
    participant Optimizer
    participant IPFS
    participant Blockchain

    Optimizer->>Optimizer: Run iteration t, get {X_t, S(X_t), E_t, v_t}
    Optimizer->>IPFS: Store Data_t = {X_t, S(X_t), E_t, v_t}
    IPFS-->>Optimizer: Return DataHash_t
    Optimizer->>Blockchain: Call SmartContract.recordIteration(t, DataHash_t)
    Blockchain-->>Optimizer: Transaction Confirmed

A5. The "Inverse" or Failure Mode

1. Graceful Degradation with Safe-Mode Reversion

Enabling Description: To enhance robustness, the optimizer is augmented with a "safe mode" mechanism. A known-good, stable hyperparameter vector (E_safe) is pre-configured. The optimizer monitors the performance S(E_t). If the performance drops below a critical threshold for k consecutive iterations (indicating instability or noisy evaluations), or if the optimization process fails to improve S(E_t) for a much larger number of iterations M, the system automatically discards the current state (E_t, v_t) and reverts to using E_safe. This prevents the system from deploying a poorly performing model discovered during an anomalous optimization period.

Mermaid Diagram:

stateDiagram-v2
    state "Optimizing" as Optimizing
    state "Safe Mode" as SafeMode
    [*] --> Optimizing
    Optimizing --> Optimizing: S(E_t) improves or is stable
    Optimizing --> SafeMode: S(E_t) drops below threshold for k iterations
    SafeMode --> Optimizing: Manual Reset / Trigger

B. Combination Prior Art Scenarios

1. Integration with MLflow for MLOps Standardization

Enabling Description: The hyperparameter optimization method is implemented as a Python class that integrates with the open-source MLflow platform. An MLflowOptim class is created. Upon initialization, it starts a parent MLflow run. In each iteration t, a nested run is created. Within the nested run, each sampled hyperparameter vector X_t_i is logged via mlflow.log_params(), and its performance S(X_t_i) is logged via mlflow.log_metric(). The best-so-far vector E_t is saved as a tagged artifact (e.g., a YAML file) at the end of each iteration's nested run. This allows a data scientist to use the standard MLflow UI to track, compare, and visualize the entire optimization history, comparing the efficacy of different weighting functions or sampling strategies.

2. Orchestration on Kubernetes for Scalable Execution

Enabling Description: The method is containerized and orchestrated on the open-source Kubernetes platform for massive parallelism. A CustomResourceDefinition (CRD) for HyperparameterSearch is created. A user submits a YAML manifest defining the search space and the container image for the model evaluation function. A custom Kubernetes controller watches for these resources. For each iteration, the controller launches N Kubernetes Jobs, each responsible for evaluating one sample X_t_i. The results are written to a shared persistent volume. The controller pod reads the results, calculates E_t and v_t, and then launches the jobs for the next iteration. This architecture leverages Kubernetes's native scheduling, fault tolerance, and scalability for hyperparameter tuning.

3. Optimization of ONNX Models for Framework Agnosticism

Enabling Description: The method is used to optimize models represented in the open ONNX (Open Neural Network Exchange) format. The hyperparameter vector X includes parameters that are not tied to a specific framework like TensorFlow or PyTorch, such as the number of nodes in a specific Gemm (General Matrix Multiply) layer or the kernel shape in a Conv (Convolution) operator. The evaluation function S(X) takes a vector X, programmatically modifies a base ONNX model graph according to X, and then executes the resulting model using an ONNX-compliant runtime (e.g., ONNX Runtime). This decouples the optimization algorithm from the model training framework, allowing the same process to tune models destined for diverse deployment environments.