Patent 8438120

Derivative works

Defensive disclosure: derivative variations of each claim designed to render future incremental improvements obvious or non-novel.

Active provider: Google · gemini-2.5-pro

Derivative works

Defensive disclosure: derivative variations of each claim designed to render future incremental improvements obvious or non-novel.

✓ Generated

Defensive Disclosure and Prior Art Generation

Document ID: DPD-8438120-20260501
Publication Date: 2026-05-01
Relates To: U.S. Patent 8,438,120 ("Machine learning hyperparameter estimation")
Abstract: This document discloses a series of derivative methods, systems, and applications related to the core teachings of U.S. Patent 8,438,120. The purpose of this disclosure is to place into the public domain a comprehensive set of variations, extensions, and alternative embodiments of an elitist, sampling-based hyperparameter optimization algorithm, thereby rendering them obvious to a person having ordinary skill in the art. The following disclosures expand upon the core method of iteratively updating a target hyperparameter vector estimate by using a selected "best-so-far" vector from the present and all previous iterations.


A. Derivative Variations on the Core Method

A1. Material & Component Substitution

1. Quasi-Random Sampling for Search Space Exploration
  • Enabling Description: The core method's reliance on pseudo-random sampling (drawing a random sample) can lead to clustering and non-uniform coverage of the hyperparameter search space. This variation replaces the pseudo-random number generator (PRNG) with a deterministic, low-discrepancy sequence generator, specifically a Sobol sequence or Halton sequence. For a d-dimensional hyperparameter space, the i-th sample vector X_t_i in iteration t is generated using the i-th point from the d-dimensional Sobol sequence. This ensures a more systematic and uniform exploration of the search space, which is particularly effective for high-dimensional hyperparameter vectors and can lead to faster convergence by avoiding redundant sampling in previously explored regions. The rest of the algorithm, including the selection of the elite vector E_t and the weighted update, remains unchanged.
  • Mermaid Diagram:
    graph TD
        A(Start Iteration t) --> B{Generate N Samples};
        B --> B1[Use Sobol Sequence Generator];
        B1 --> C{Evaluate S(X_t_i) for all N samples};
        C --> D{Identify Best Current Sample X_t_best};
        D --> E{Compare S(X_t_best) with S(E_{t-1})};
        E --> F{Select Global Best E_t};
        F --> G{Update Target Vector v_t using E_t};
        G --> H(End Iteration t);
    
2. Alternative Weighting Functions Based on Non-Euclidean Metrics
  • Enabling Description: The weighting function W in claim 10 is based on a normalized Euclidean distance. This variation substitutes the Euclidean metric with alternative distance or similarity functions to better handle different hyperparameter topologies.
    • Variant A (Manhattan Distance): For hyperparameters where dimensions are largely independent, the squared difference (X_ij - E_j)^2 is replaced with the absolute difference |X_ij - E_j|. This L1-norm is less sensitive to large outliers in a single dimension.
    • Variant B (Cosine Similarity): For high-dimensional sparse vectors (e.g., tuning feature selection hyperparameters), the weighting function W is defined as the cosine similarity between the sample vector X_t_i and the elite vector E_t. This measures the orientation rather than the magnitude of the vectors, focusing the search on vectors pointing in a similar direction to the best-so-far solution.
  • Mermaid Diagram:
    flowchart TD
        subgraph Update Step for Target v_t
            direction LR
            Sample(Sample Vector X_t_i)
            Elite(Elite Vector E_t)
            WeightFunc{Weighting Function W}
            UpdateEq[Update v_t Formula]
    
            Sample -- Pass to --> WeightFunc
            Elite -- Pass to --> WeightFunc
            WeightFunc -- W(X_t_i, E_t) --> UpdateEq
        end
    
        subgraph Weighting Function Implementations
            direction TB
            W1[Normalized Euclidean Distance (Claim 10)]
            W2[Manhattan Distance (L1-Norm)]
            W3[Cosine Similarity]
        end
    
        WeightFunc --- W1
        WeightFunc --- W2
        WeightFunc --- W3
    
3. Distributed State Management for Elite Vector
  • Enabling Description: In a large-scale, distributed computing environment, storing the elite vector E_t on a single node creates a single point of failure. This variation implements the state management (storage of E_t and v_t) using a distributed in-memory data grid or key-value store like Redis, Hazelcast, or Apache Ignite. The E_t vector and its performance score S(E_t) are stored as a key-value pair. Worker nodes performing the S(X_t_i) evaluation read the current E_{t-1} from the distributed store. The main controller process performs an atomic Compare-And-Swap (CAS) operation to update E_t only if a new sample X_t_i has a better score S(X_t_i) > S(E_{t-1}). This ensures consistency and fault tolerance.
  • Mermaid Diagram:
    sequenceDiagram
        participant Controller
        participant WorkerNodes
        participant DistributedCache as (Redis/Ignite)
    
        Controller->>DistributedCache: Set E_0 (initial elite vector)
        loop Iteration t
            Controller->>WorkerNodes: Dispatch Sample Generation Task
            WorkerNodes-->>WorkerNodes: Generate X_t_i, Evaluate S(X_t_i)
            WorkerNodes->>DistributedCache: Read S(E_{t-1})
            alt S(X_t_i) > S(E_{t-1})
                WorkerNodes->>DistributedCache: Atomic UPDATE E_t = X_t_i
            end
            Controller->>DistributedCache: Read all X_t_i and final E_t
            Controller-->>Controller: Calculate and update v_t
        end
    

A2. Operational Parameter Expansion

1. Industrial-Scale Optimization for Foundation Models
  • Enabling Description: This disclosure describes the application of the method to tune the vast number of hyperparameters in a large language or vision foundation model (e.g., >100 billion parameters). The hyperparameter vector X includes not just scalar values like learning rate but also architectural choices, such as the number of attention heads, layer dimensions, and activation functions, which are encoded numerically. The performance evaluation S(X) involves a partial training run of the massive model on a multi-petabyte dataset, executed on a cluster of thousands of TPUs or GPUs. The state E_t is managed via a distributed consensus protocol (e.g., Paxos) to ensure that all compute nodes agree on the current best-known configuration before a new evaluation run is initiated.
  • Mermaid Diagram:
    graph TD
        subgraph Control_Plane
            A[Optimizer Controller]
            B[State Store (Paxos/Raft)]
            A -- Manages --> B
        end
    
        subgraph Data_Plane
            C1(TPU/GPU Pod 1)
            C2(TPU/GPU Pod 2)
            C3(...)
            C4(TPU/GPU Pod N)
            C1 -- S(X) Evaluation --> D{Partial Training Run}
            C2 -- S(X) Evaluation --> D
            C4 -- S(X) Evaluation --> D
        end
        A --> C1 & C2 & C4
        D -- Performance Score --> A
        A -- Update E_t --> B
        B -- Read E_{t-1} --> A
    
2. On-Device Tuning for TinyML Applications
  • Enabling Description: The method is adapted for resource-constrained embedded systems and microcontrollers (MCUs). The algorithm is implemented using 8-bit or 16-bit integer arithmetic to reduce memory and power consumption. The random sampling is performed over a quantized and heavily constrained hyperparameter space. The performance function S(X) is the inference accuracy and latency measured directly on the MCU using a small, representative validation dataset stored in flash memory. This allows a device, such as a smart sensor, to self-tune its onboard anomaly detection model in the field without requiring a connection to the cloud. The "best-so-far" E_t vector is persisted to non-volatile memory to survive power cycles.
  • Mermaid Diagram:
    stateDiagram-v2
        state "On-Device Optimizer" as Optimizer {
            [*] --> Idle
            Idle --> Sampling: Power On / Trigger
            Sampling: Generate quantized X_t_i
            Sampling --> Evaluating
            Evaluating: Run inference on local data
            Evaluating --> Updating: All samples evaluated
            Updating: Identify E_t, Update v_t
            Updating --> Idle: Iteration complete
            Updating --> Persist: Write E_t to NVM
            Persist --> Idle
        }
    

A3. Cross-Domain Application

1. Aerospace: Adaptive GNC for Deep Space Probes
  • Enabling Description: The method is used to perform in-flight optimization of a spacecraft's Guidance, Navigation, and Control (GNC) system. The hyperparameter vector X consists of PID controller gains, Kalman filter process noise parameters (Q), and reaction wheel control allocation parameters. The performance function S(X) is a multi-objective function evaluated in simulation onboard the spacecraft, rewarding low fuel consumption, high pointing accuracy, and minimal actuator stress. The elitist mechanism E_t ensures that a known, stable GNC configuration is always preserved, preventing the system from converging to an unsafe state while exploring new parameter sets to compensate for hardware degradation over a multi-year mission.
  • Mermaid Diagram:
    flowchart LR
        subgraph On-Board Flight Computer
            GNC[GNC System]
            SIM[Physics Simulator]
            OPT[Optimizer (Method of '120)]
            State[Telemetry Data]
    
            OPT -- Sample HP Vector X --> SIM
            SIM -- Simulated Performance --> OPT
            OPT -- Best HP Vector E_t --> GNC
            GNC -- Controls --> Actuators
            Actuators -- State --> State
            State -- Inputs --> GNC & SIM
        end
    
2. AgTech: Real-Time Vision Model Tuning for Smart Harvesters
  • Enabling Description: A smart harvester uses the method to continuously tune the hyperparameters of its onboard computer vision model, which differentiates between ripe produce, unripe produce, and foreign objects. The hyperparameter vector X includes image augmentation parameters (brightness, contrast ranges), confidence thresholds, and non-maximum suppression (NMS) thresholds. S(X) is the F1-score of the classifier, evaluated on a small, continuously updated dataset labeled by a human supervisor via a remote interface. The "best-so-far" vector E_t allows the harvester to maintain robust performance as environmental conditions like sunlight, shadows, and humidity change throughout the day.
  • Mermaid Diagram:
    sequenceDiagram
        participant Supervisor
        participant HarvesterVisionSystem
        participant Optimizer
    
        loop Continuous Operation
            HarvesterVisionSystem->>Optimizer: Request new HPs
            Optimizer-->>HarvesterVisionSystem: Provide v_t
            HarvesterVisionSystem->>HarvesterVisionSystem: Classify produce using v_t
            Supervisor->>HarvesterVisionSystem: Provide corrections (labels)
            HarvesterVisionSystem->>Optimizer: Send new labeled data
            Optimizer->>Optimizer: Run one iteration, update E_t and v_t
        end
    
3. Consumer Electronics: Personalized Active Noise Cancellation (ANC)
  • Enabling Description: In high-end headphones, the method optimizes the coefficients of the adaptive filters used for active noise cancellation. The hyperparameter vector X defines parameters for the ANC algorithm, such as the filter order, step size of the adaptive algorithm (e.g., LMS/NLMS), and leakage factors. The performance S(X) is a measure of the noise reduction achieved, calculated by comparing the signal from an internal microphone (inside the earcup) with the signal from an external microphone. This optimization runs in the background on the headphone's DSP, continuously adapting the ANC profile to the specific user's ear shape and the ambient noise environment, preserving the best-found profile E_t as the user's personal default.
  • Mermaid Diagram:
    graph TD
        ExtMic[External Mic] --> DSP
        IntMic[Internal Mic] --> DSP
        Speaker --> IntMic
        DSP -- Controls --> Speaker
        subgraph DSP
            ANC[ANC Filter Algorithm]
            OPT[Optimizer (Method of '120)]
            ANC -- Error Signal --> OPT
            OPT -- Updates HP Vector --> ANC
        end
    

A4. Integration with Emerging Tech

1. AI-Driven Meta-Optimization
  • Enabling Description: The optimization method itself is wrapped by a higher-level meta-learning agent, such as a reinforcement learning (RL) agent. The '120 method's own parameters (N - sample size, ρ - elite fraction) are the "actions" that the RL agent can take. The "state" is the convergence history of the hyperparameter search (e.g., the rate of improvement of S(E_t)). The "reward" is high for fast convergence and low for stagnation. The RL agent learns a policy to dynamically adjust N and ρ during the optimization run, effectively learning how to best run the search algorithm for a given class of problems.
  • Mermaid Diagram:
    flowchart TD
        subgraph Meta-Learner (RL Agent)
            A[Observe State: Convergence Rate of S(E_t)]
            B[Select Action: Adjust N, ρ]
            C[Receive Reward: + for improvement, - for stagnation]
            A --> B --> C --> A
        end
        subgraph HP_Optimizer ('120 Method)
            D[Run Iteration with current N, ρ]
            E[Update E_t, v_t]
            D --> E
        end
        Meta-Learner -- Action: Set N, ρ --> HP_Optimizer
        HP_Optimizer -- State: S(E_t) history --> Meta-Learner
    
2. Blockchain-Audited Hyperparameter Search for Regulated AI
  • Enabling Description: For AI models in regulated fields like medicine or finance, this variation provides a tamper-proof audit trail of the model tuning process. The hyperparameter optimization process is controlled by a decentralized application (DApp). In each iteration t, the controller hashes the set of sampled vectors X_t and their performance scores, along with the resulting E_t and v_t. This hash is stored on a public or private blockchain (e.g., Ethereum, Hyperledger Fabric) as part of a transaction. The full data is stored off-chain (e.g., in IPFS) and linked by the on-chain hash. This creates an immutable, time-stamped record, allowing a regulator to perfectly reconstruct and verify the entire optimization history that led to the final "optimal" hyperparameters.
  • Mermaid Diagram:
    sequenceDiagram
        participant Optimizer
        participant IPFS
        participant Blockchain
    
        Optimizer->>Optimizer: Run iteration t, get {X_t, S(X_t), E_t, v_t}
        Optimizer->>IPFS: Store Data_t = {X_t, S(X_t), E_t, v_t}
        IPFS-->>Optimizer: Return DataHash_t
        Optimizer->>Blockchain: Call SmartContract.recordIteration(t, DataHash_t)
        Blockchain-->>Optimizer: Transaction Confirmed
    

A5. The "Inverse" or Failure Mode

1. Graceful Degradation with Safe-Mode Reversion
  • Enabling Description: To enhance robustness, the optimizer is augmented with a "safe mode" mechanism. A known-good, stable hyperparameter vector (E_safe) is pre-configured. The optimizer monitors the performance S(E_t). If the performance drops below a critical threshold for k consecutive iterations (indicating instability or noisy evaluations), or if the optimization process fails to improve S(E_t) for a much larger number of iterations M, the system automatically discards the current state (E_t, v_t) and reverts to using E_safe. This prevents the system from deploying a poorly performing model discovered during an anomalous optimization period.
  • Mermaid Diagram:
    stateDiagram-v2
        state "Optimizing" as Optimizing
        state "Safe Mode" as SafeMode
        [*] --> Optimizing
        Optimizing --> Optimizing: S(E_t) improves or is stable
        Optimizing --> SafeMode: S(E_t) drops below threshold for k iterations
        SafeMode --> Optimizing: Manual Reset / Trigger
    

B. Combination Prior Art Scenarios

1. Integration with MLflow for MLOps Standardization
  • Enabling Description: The hyperparameter optimization method is implemented as a Python class that integrates with the open-source MLflow platform. An MLflowOptim class is created. Upon initialization, it starts a parent MLflow run. In each iteration t, a nested run is created. Within the nested run, each sampled hyperparameter vector X_t_i is logged via mlflow.log_params(), and its performance S(X_t_i) is logged via mlflow.log_metric(). The best-so-far vector E_t is saved as a tagged artifact (e.g., a YAML file) at the end of each iteration's nested run. This allows a data scientist to use the standard MLflow UI to track, compare, and visualize the entire optimization history, comparing the efficacy of different weighting functions or sampling strategies.
2. Orchestration on Kubernetes for Scalable Execution
  • Enabling Description: The method is containerized and orchestrated on the open-source Kubernetes platform for massive parallelism. A CustomResourceDefinition (CRD) for HyperparameterSearch is created. A user submits a YAML manifest defining the search space and the container image for the model evaluation function. A custom Kubernetes controller watches for these resources. For each iteration, the controller launches N Kubernetes Jobs, each responsible for evaluating one sample X_t_i. The results are written to a shared persistent volume. The controller pod reads the results, calculates E_t and v_t, and then launches the jobs for the next iteration. This architecture leverages Kubernetes's native scheduling, fault tolerance, and scalability for hyperparameter tuning.
3. Optimization of ONNX Models for Framework Agnosticism
  • Enabling Description: The method is used to optimize models represented in the open ONNX (Open Neural Network Exchange) format. The hyperparameter vector X includes parameters that are not tied to a specific framework like TensorFlow or PyTorch, such as the number of nodes in a specific Gemm (General Matrix Multiply) layer or the kernel shape in a Conv (Convolution) operator. The evaluation function S(X) takes a vector X, programmatically modifies a base ONNX model graph according to X, and then executes the resulting model using an ONNX-compliant runtime (e.g., ONNX Runtime). This decouples the optimization algorithm from the model training framework, allowing the same process to tune models destined for diverse deployment environments.

Generated 5/1/2026, 12:35:17 AM