Prior art — US Patent 8438120

Prior Art Analysis for U.S. Patent 8,438,120

The following analysis details the most relevant prior art cited against U.S. Patent 8,438,120. The evaluation focuses on non-patent literature cited on the face of the patent, assessing its potential to anticipate the claims under 35 U.S.C. § 102. Anticipation requires a single prior art reference to disclose every element of a claimed invention.

The core invention of patent '120 is an iterative method for determining machine learning hyperparameters. Its key feature, as described in independent claim 1, is the process of "selecting, from the random sample of hyperparameter vectors, a hyperparameter vector producing a best result in the present and any previous iterations," and using this "best-so-far" vector to update the estimate for the next iteration. This introduction of "elitism" or "memory" into a cross-entropy-like method for hyperparameter tuning is the central inventive concept.

1. Mannor et al., "The cross entropy method for classification"

Full Citation: Mannor, S., Peleg, D., & Rubinstein, R. (2005). The cross entropy method for classification. Proceedings of the 22nd International Conference on Machine Learning (ICML '05), 561-568.
Publication Date: August 7, 2005.
Brief Description: This paper applies the Cross-Entropy (CE) method to machine learning classification, specifically to search for the optimal set of support vectors (SVs) in a Support Vector Machine (SVM). The goal is to produce a classifier with similar performance to a standard SVM but with a much smaller number of support vectors (i.e., a sparser solution). The CE method is used to solve this combinatorial optimization problem.
Anticipation Analysis (35 U.S.C. § 102):
- This reference is highly relevant but unlikely to anticipate the claims of the '120 patent.
- The '120 patent itself distinguishes its invention from this paper by stating that Mannor et al. use the CE algorithm to search the space of support vectors, while determining hyperparameter values (like the 'C' value in an SVM) through a "simple grid search". Research confirms this; the paper states, "The value of hyperparameter C for each algorithm was set as the minimizer of the errors on the test set," which is separate from the CE method applied to find the support vectors.
- Because Mannor et al. do not apply the iterative, random sampling CE method to the problem of determining hyperparameters, they do not teach a core element of claim 1. The reference applies a similar optimization technique to a different part of the machine learning problem (feature/data point selection, not control parameter tuning). Therefore, it does not anticipate claims 1, 12, 13, or 14, which are all predicated on a method for determining hyperparameters.

2. De Boer et al., "A Tutorial on the Cross-Entropy Method"

Full Citation: de Boer, P. T., Kroese, D. P., Mannor, S., & Rubinstein, R. Y. (2005). A Tutorial on the Cross-Entropy Method. Annals of Operations Research, 134(1), 19-67.
Publication Date: February 2005.
Brief Description: This paper is a comprehensive tutorial on the CE method, explaining its application to both rare-event simulation and combinatorial/continuous optimization. It details the standard two-step iterative process: (1) generate a random sample of solutions based on a parameterized probability distribution, and (2) update the distribution's parameters using a subset of the best-performing ("elite") samples from the current generation to steer subsequent sampling toward better regions of the search space.
Anticipation Analysis (35 U.S.C. § 102):
- This reference is also highly relevant but unlikely to anticipate the claims.
- The standard CE method described in this tutorial updates its parameters based on the elite samples of the current iteration. The key inventive step in claim 1 of the '120 patent is the concept of "elitism," where the single best solution found across all iterations is preserved and used to guide the search. This "best-so-far" or "elitist" preservation is a common variant in evolutionary algorithms but is not inherent to the baseline CE method described by De Boer et al. The '120 patent explicitly proposes "to include a memory facility into the algorithm by preserving samples which produce a good result," suggesting this is an addition to the standard CE method.
- Because De Boer et al. describe a method that updates based on the current population's elite, not the single best historical performer, it does not disclose a key limitation of claim 1 and therefore does not anticipate the claims.

3. Raaijmakers, "Sentiment classification with interpolated information diffusion kernels"

Full Citation: Raaijmakers, S. (2007). Sentiment classification with interpolated information diffusion kernels. Proceedings of the 1st International Workshop on Data Mining and Audience Intelligence for Advertising (ADKDD'07), 34-39.
Publication Date: August 12, 2007.
Brief Description: This paper, authored by the inventor of the '120 patent, presents a method for document sentiment classification using a specific type of machine learning kernel. The focus is on the application of information diffusion kernels to this task.
Anticipation Analysis (35 U.S.C. § 102):
- This reference has a high potential for relevance but is unlikely to anticipate the claims.
- Under 35 U.S.C. 102(b)(1)(A), a disclosure made one year or less before the effective filing date of an application is not considered prior art if the disclosure was made by the inventor. The '120 patent claims priority to an application filed on April 25, 2007. This paper was published in August 2007, which is within the one-year grace period following the priority date.
- Even if it were considered prior art, a review of the paper shows its focus is on the classification method itself, not on a general method for hyperparameter optimization. It does not appear to explicitly describe the iterative, elitist, cross-entropy-based optimization method that is the subject of the '120 patent claims. Therefore, it does not anticipate the claims.