Patent 8438120
Derivative works
Defensive disclosure: derivative variations of each claim designed to render future incremental improvements obvious or non-novel.
Active provider: Google · gemini-2.5-pro
Derivative works
Defensive disclosure: derivative variations of each claim designed to render future incremental improvements obvious or non-novel.
Defensive Disclosure and Prior Art Generation
Document ID: DPD-8438120-20260501
Publication Date: 2026-05-01
Relates To: U.S. Patent 8,438,120 ("Machine learning hyperparameter estimation")
Abstract: This document discloses a series of derivative methods, systems, and applications related to the core teachings of U.S. Patent 8,438,120. The purpose of this disclosure is to place into the public domain a comprehensive set of variations, extensions, and alternative embodiments of an elitist, sampling-based hyperparameter optimization algorithm, thereby rendering them obvious to a person having ordinary skill in the art. The following disclosures expand upon the core method of iteratively updating a target hyperparameter vector estimate by using a selected "best-so-far" vector from the present and all previous iterations.
A. Derivative Variations on the Core Method
A1. Material & Component Substitution
1. Quasi-Random Sampling for Search Space Exploration
- Enabling Description: The core method's reliance on pseudo-random sampling (
drawing a random sample) can lead to clustering and non-uniform coverage of the hyperparameter search space. This variation replaces the pseudo-random number generator (PRNG) with a deterministic, low-discrepancy sequence generator, specifically a Sobol sequence or Halton sequence. For ad-dimensional hyperparameter space, thei-th sample vectorX_t_iin iterationtis generated using thei-th point from thed-dimensional Sobol sequence. This ensures a more systematic and uniform exploration of the search space, which is particularly effective for high-dimensional hyperparameter vectors and can lead to faster convergence by avoiding redundant sampling in previously explored regions. The rest of the algorithm, including the selection of the elite vectorE_tand the weighted update, remains unchanged. - Mermaid Diagram:
graph TD A(Start Iteration t) --> B{Generate N Samples}; B --> B1[Use Sobol Sequence Generator]; B1 --> C{Evaluate S(X_t_i) for all N samples}; C --> D{Identify Best Current Sample X_t_best}; D --> E{Compare S(X_t_best) with S(E_{t-1})}; E --> F{Select Global Best E_t}; F --> G{Update Target Vector v_t using E_t}; G --> H(End Iteration t);
2. Alternative Weighting Functions Based on Non-Euclidean Metrics
- Enabling Description: The weighting function
Win claim 10 is based on a normalized Euclidean distance. This variation substitutes the Euclidean metric with alternative distance or similarity functions to better handle different hyperparameter topologies.- Variant A (Manhattan Distance): For hyperparameters where dimensions are largely independent, the squared difference
(X_ij - E_j)^2is replaced with the absolute difference|X_ij - E_j|. This L1-norm is less sensitive to large outliers in a single dimension. - Variant B (Cosine Similarity): For high-dimensional sparse vectors (e.g., tuning feature selection hyperparameters), the weighting function
Wis defined as the cosine similarity between the sample vectorX_t_iand the elite vectorE_t. This measures the orientation rather than the magnitude of the vectors, focusing the search on vectors pointing in a similar direction to the best-so-far solution.
- Variant A (Manhattan Distance): For hyperparameters where dimensions are largely independent, the squared difference
- Mermaid Diagram:
flowchart TD subgraph Update Step for Target v_t direction LR Sample(Sample Vector X_t_i) Elite(Elite Vector E_t) WeightFunc{Weighting Function W} UpdateEq[Update v_t Formula] Sample -- Pass to --> WeightFunc Elite -- Pass to --> WeightFunc WeightFunc -- W(X_t_i, E_t) --> UpdateEq end subgraph Weighting Function Implementations direction TB W1[Normalized Euclidean Distance (Claim 10)] W2[Manhattan Distance (L1-Norm)] W3[Cosine Similarity] end WeightFunc --- W1 WeightFunc --- W2 WeightFunc --- W3
3. Distributed State Management for Elite Vector
- Enabling Description: In a large-scale, distributed computing environment, storing the elite vector
E_ton a single node creates a single point of failure. This variation implements the state management (storage ofE_tandv_t) using a distributed in-memory data grid or key-value store like Redis, Hazelcast, or Apache Ignite. TheE_tvector and its performance scoreS(E_t)are stored as a key-value pair. Worker nodes performing theS(X_t_i)evaluation read the currentE_{t-1}from the distributed store. The main controller process performs an atomic Compare-And-Swap (CAS) operation to updateE_tonly if a new sampleX_t_ihas a better scoreS(X_t_i) > S(E_{t-1}). This ensures consistency and fault tolerance. - Mermaid Diagram:
sequenceDiagram participant Controller participant WorkerNodes participant DistributedCache as (Redis/Ignite) Controller->>DistributedCache: Set E_0 (initial elite vector) loop Iteration t Controller->>WorkerNodes: Dispatch Sample Generation Task WorkerNodes-->>WorkerNodes: Generate X_t_i, Evaluate S(X_t_i) WorkerNodes->>DistributedCache: Read S(E_{t-1}) alt S(X_t_i) > S(E_{t-1}) WorkerNodes->>DistributedCache: Atomic UPDATE E_t = X_t_i end Controller->>DistributedCache: Read all X_t_i and final E_t Controller-->>Controller: Calculate and update v_t end
A2. Operational Parameter Expansion
1. Industrial-Scale Optimization for Foundation Models
- Enabling Description: This disclosure describes the application of the method to tune the vast number of hyperparameters in a large language or vision foundation model (e.g., >100 billion parameters). The hyperparameter vector
Xincludes not just scalar values like learning rate but also architectural choices, such as the number of attention heads, layer dimensions, and activation functions, which are encoded numerically. The performance evaluationS(X)involves a partial training run of the massive model on a multi-petabyte dataset, executed on a cluster of thousands of TPUs or GPUs. The stateE_tis managed via a distributed consensus protocol (e.g., Paxos) to ensure that all compute nodes agree on the current best-known configuration before a new evaluation run is initiated. - Mermaid Diagram:
graph TD subgraph Control_Plane A[Optimizer Controller] B[State Store (Paxos/Raft)] A -- Manages --> B end subgraph Data_Plane C1(TPU/GPU Pod 1) C2(TPU/GPU Pod 2) C3(...) C4(TPU/GPU Pod N) C1 -- S(X) Evaluation --> D{Partial Training Run} C2 -- S(X) Evaluation --> D C4 -- S(X) Evaluation --> D end A --> C1 & C2 & C4 D -- Performance Score --> A A -- Update E_t --> B B -- Read E_{t-1} --> A
2. On-Device Tuning for TinyML Applications
- Enabling Description: The method is adapted for resource-constrained embedded systems and microcontrollers (MCUs). The algorithm is implemented using 8-bit or 16-bit integer arithmetic to reduce memory and power consumption. The random sampling is performed over a quantized and heavily constrained hyperparameter space. The performance function
S(X)is the inference accuracy and latency measured directly on the MCU using a small, representative validation dataset stored in flash memory. This allows a device, such as a smart sensor, to self-tune its onboard anomaly detection model in the field without requiring a connection to the cloud. The "best-so-far"E_tvector is persisted to non-volatile memory to survive power cycles. - Mermaid Diagram:
stateDiagram-v2 state "On-Device Optimizer" as Optimizer { [*] --> Idle Idle --> Sampling: Power On / Trigger Sampling: Generate quantized X_t_i Sampling --> Evaluating Evaluating: Run inference on local data Evaluating --> Updating: All samples evaluated Updating: Identify E_t, Update v_t Updating --> Idle: Iteration complete Updating --> Persist: Write E_t to NVM Persist --> Idle }
A3. Cross-Domain Application
1. Aerospace: Adaptive GNC for Deep Space Probes
- Enabling Description: The method is used to perform in-flight optimization of a spacecraft's Guidance, Navigation, and Control (GNC) system. The hyperparameter vector
Xconsists of PID controller gains, Kalman filter process noise parameters (Q), and reaction wheel control allocation parameters. The performance functionS(X)is a multi-objective function evaluated in simulation onboard the spacecraft, rewarding low fuel consumption, high pointing accuracy, and minimal actuator stress. The elitist mechanismE_tensures that a known, stable GNC configuration is always preserved, preventing the system from converging to an unsafe state while exploring new parameter sets to compensate for hardware degradation over a multi-year mission. - Mermaid Diagram:
flowchart LR subgraph On-Board Flight Computer GNC[GNC System] SIM[Physics Simulator] OPT[Optimizer (Method of '120)] State[Telemetry Data] OPT -- Sample HP Vector X --> SIM SIM -- Simulated Performance --> OPT OPT -- Best HP Vector E_t --> GNC GNC -- Controls --> Actuators Actuators -- State --> State State -- Inputs --> GNC & SIM end
2. AgTech: Real-Time Vision Model Tuning for Smart Harvesters
- Enabling Description: A smart harvester uses the method to continuously tune the hyperparameters of its onboard computer vision model, which differentiates between ripe produce, unripe produce, and foreign objects. The hyperparameter vector
Xincludes image augmentation parameters (brightness, contrast ranges), confidence thresholds, and non-maximum suppression (NMS) thresholds.S(X)is the F1-score of the classifier, evaluated on a small, continuously updated dataset labeled by a human supervisor via a remote interface. The "best-so-far" vectorE_tallows the harvester to maintain robust performance as environmental conditions like sunlight, shadows, and humidity change throughout the day. - Mermaid Diagram:
sequenceDiagram participant Supervisor participant HarvesterVisionSystem participant Optimizer loop Continuous Operation HarvesterVisionSystem->>Optimizer: Request new HPs Optimizer-->>HarvesterVisionSystem: Provide v_t HarvesterVisionSystem->>HarvesterVisionSystem: Classify produce using v_t Supervisor->>HarvesterVisionSystem: Provide corrections (labels) HarvesterVisionSystem->>Optimizer: Send new labeled data Optimizer->>Optimizer: Run one iteration, update E_t and v_t end
3. Consumer Electronics: Personalized Active Noise Cancellation (ANC)
- Enabling Description: In high-end headphones, the method optimizes the coefficients of the adaptive filters used for active noise cancellation. The hyperparameter vector
Xdefines parameters for the ANC algorithm, such as the filter order, step size of the adaptive algorithm (e.g., LMS/NLMS), and leakage factors. The performanceS(X)is a measure of the noise reduction achieved, calculated by comparing the signal from an internal microphone (inside the earcup) with the signal from an external microphone. This optimization runs in the background on the headphone's DSP, continuously adapting the ANC profile to the specific user's ear shape and the ambient noise environment, preserving the best-found profileE_tas the user's personal default. - Mermaid Diagram:
graph TD ExtMic[External Mic] --> DSP IntMic[Internal Mic] --> DSP Speaker --> IntMic DSP -- Controls --> Speaker subgraph DSP ANC[ANC Filter Algorithm] OPT[Optimizer (Method of '120)] ANC -- Error Signal --> OPT OPT -- Updates HP Vector --> ANC end
A4. Integration with Emerging Tech
1. AI-Driven Meta-Optimization
- Enabling Description: The optimization method itself is wrapped by a higher-level meta-learning agent, such as a reinforcement learning (RL) agent. The '120 method's own parameters (
N- sample size,ρ- elite fraction) are the "actions" that the RL agent can take. The "state" is the convergence history of the hyperparameter search (e.g., the rate of improvement ofS(E_t)). The "reward" is high for fast convergence and low for stagnation. The RL agent learns a policy to dynamically adjustNandρduring the optimization run, effectively learning how to best run the search algorithm for a given class of problems. - Mermaid Diagram:
flowchart TD subgraph Meta-Learner (RL Agent) A[Observe State: Convergence Rate of S(E_t)] B[Select Action: Adjust N, ρ] C[Receive Reward: + for improvement, - for stagnation] A --> B --> C --> A end subgraph HP_Optimizer ('120 Method) D[Run Iteration with current N, ρ] E[Update E_t, v_t] D --> E end Meta-Learner -- Action: Set N, ρ --> HP_Optimizer HP_Optimizer -- State: S(E_t) history --> Meta-Learner
2. Blockchain-Audited Hyperparameter Search for Regulated AI
- Enabling Description: For AI models in regulated fields like medicine or finance, this variation provides a tamper-proof audit trail of the model tuning process. The hyperparameter optimization process is controlled by a decentralized application (DApp). In each iteration
t, the controller hashes the set of sampled vectorsX_tand their performance scores, along with the resultingE_tandv_t. This hash is stored on a public or private blockchain (e.g., Ethereum, Hyperledger Fabric) as part of a transaction. The full data is stored off-chain (e.g., in IPFS) and linked by the on-chain hash. This creates an immutable, time-stamped record, allowing a regulator to perfectly reconstruct and verify the entire optimization history that led to the final "optimal" hyperparameters. - Mermaid Diagram:
sequenceDiagram participant Optimizer participant IPFS participant Blockchain Optimizer->>Optimizer: Run iteration t, get {X_t, S(X_t), E_t, v_t} Optimizer->>IPFS: Store Data_t = {X_t, S(X_t), E_t, v_t} IPFS-->>Optimizer: Return DataHash_t Optimizer->>Blockchain: Call SmartContract.recordIteration(t, DataHash_t) Blockchain-->>Optimizer: Transaction Confirmed
A5. The "Inverse" or Failure Mode
1. Graceful Degradation with Safe-Mode Reversion
- Enabling Description: To enhance robustness, the optimizer is augmented with a "safe mode" mechanism. A known-good, stable hyperparameter vector (
E_safe) is pre-configured. The optimizer monitors the performanceS(E_t). If the performance drops below a critical threshold forkconsecutive iterations (indicating instability or noisy evaluations), or if the optimization process fails to improveS(E_t)for a much larger number of iterationsM, the system automatically discards the current state (E_t,v_t) and reverts to usingE_safe. This prevents the system from deploying a poorly performing model discovered during an anomalous optimization period. - Mermaid Diagram:
stateDiagram-v2 state "Optimizing" as Optimizing state "Safe Mode" as SafeMode [*] --> Optimizing Optimizing --> Optimizing: S(E_t) improves or is stable Optimizing --> SafeMode: S(E_t) drops below threshold for k iterations SafeMode --> Optimizing: Manual Reset / Trigger
B. Combination Prior Art Scenarios
1. Integration with MLflow for MLOps Standardization
- Enabling Description: The hyperparameter optimization method is implemented as a Python class that integrates with the open-source MLflow platform. An
MLflowOptimclass is created. Upon initialization, it starts a parent MLflow run. In each iterationt, a nested run is created. Within the nested run, each sampled hyperparameter vectorX_t_iis logged viamlflow.log_params(), and its performanceS(X_t_i)is logged viamlflow.log_metric(). The best-so-far vectorE_tis saved as a tagged artifact (e.g., a YAML file) at the end of each iteration's nested run. This allows a data scientist to use the standard MLflow UI to track, compare, and visualize the entire optimization history, comparing the efficacy of different weighting functions or sampling strategies.
2. Orchestration on Kubernetes for Scalable Execution
- Enabling Description: The method is containerized and orchestrated on the open-source Kubernetes platform for massive parallelism. A
CustomResourceDefinition(CRD) forHyperparameterSearchis created. A user submits a YAML manifest defining the search space and the container image for the model evaluation function. A custom Kubernetes controller watches for these resources. For each iteration, the controller launchesNKubernetes Jobs, each responsible for evaluating one sampleX_t_i. The results are written to a shared persistent volume. The controller pod reads the results, calculatesE_tandv_t, and then launches the jobs for the next iteration. This architecture leverages Kubernetes's native scheduling, fault tolerance, and scalability for hyperparameter tuning.
3. Optimization of ONNX Models for Framework Agnosticism
- Enabling Description: The method is used to optimize models represented in the open ONNX (Open Neural Network Exchange) format. The hyperparameter vector
Xincludes parameters that are not tied to a specific framework like TensorFlow or PyTorch, such as the number of nodes in a specificGemm(General Matrix Multiply) layer or the kernel shape in aConv(Convolution) operator. The evaluation functionS(X)takes a vectorX, programmatically modifies a base ONNX model graph according toX, and then executes the resulting model using an ONNX-compliant runtime (e.g., ONNX Runtime). This decouples the optimization algorithm from the model training framework, allowing the same process to tune models destined for diverse deployment environments.
Generated 5/1/2026, 12:35:17 AM