Prior art — US Patent 11275900

Analysis of Prior Art for U.S. Patent 11,275,900

Washington D.C. - A thorough analysis of the prior art cited against U.S. Patent 11,275,900, titled "Systems and methods for automatically assigning one or more labels to discussion topics shown in online forums on the dark web," reveals several key patents that could be considered relevant for assessing novelty and non-obviousness. The patent, assigned to Skysong Innovations LLC, details a system for classifying dark web forum discussions using machine learning, addressing challenges like class imbalance and maintaining tag hierarchies.

The core of the invention lies in its multi-step process: accessing a ground truth dataset from deep web forums with a predefined tag hierarchy, extracting features from new discussion topics, applying a machine classifier to generate a prediction list of tags with probabilities, and then refining this list based on the tag hierarchy and probability thresholds.

Below is an examination of the most relevant prior art and their potential impact on the claims of the '900 patent.

Key Prior Art and Potential Anticipation

U.S. Patent Application Publication No. 2018/0082211 A1

Full Citation: US 2018/0082211 A1
Publication Date: March 22, 2018
Filing Date: September 19, 2016
Assignee: International Business Machines Corporation
Brief Description: This application describes a method for generating ground truth data for machine learning-based quality assessment of corpora. It involves receiving a corpus of data, identifying a subset of records, presenting these to a user for labeling, and using these labels to train a machine learning model to assess the quality of the entire corpus.
Potential Anticipation: This reference appears relevant to the initial stages of the process described in the '900 patent, particularly concerning the creation and use of a "ground truth dataset." Claim 1 of the '900 patent calls for "access[ing] a ground truth dataset generated from deep web forum information." While US 2018/0082211 A1 does not specifically mention the "deep web," its detailed disclosure on creating a labeled "ground truth" for a machine learning classifier could be argued to anticipate the foundational step of claim 1. The methods for labeling and training are fundamental to the field of machine learning and could be viewed as covering the process of generating the dataset used in the '900 patent.

U.S. Patent Application Publication No. 2017/0032276 A1

Full Citation: US 2017/0032276 A1
Publication Date: February 2, 2017
Filing Date: July 29, 2015
Assignee: AGT International GmbH
Brief Description: This publication details a system for data fusion and classification, particularly in scenarios with imbalanced datasets. It discloses methods for training a classifier by adjusting for the imbalance, which can include over-sampling minority classes or under-sampling majority classes.
Potential Anticipation: Several claims of the '900 patent address the issue of "class imbalance." For instance, claim 6 describes creating "synthetic data samples" to add to the ground truth dataset to address this imbalance. US 2017/0032276 A1's focus on classification with imbalanced datasets, and its potential disclosure of techniques like SMOTE (Synthetic Minority Over-sampling Technique) which is explicitly mentioned in the '900 patent's specification, could anticipate the novelty of claim 6. The method of creating synthetic sample points to balance the dataset is a known technique in machine learning, and this reference could provide evidence of its prior disclosure.

U.S. Patent Application Publication No. 2013/0097103 A1

Full Citation: US 2013/0097103 A1
Publication Date: April 18, 2013
Filing Date: October 14, 2011
Assignee: International Business Machines Corporation
Brief Description: This application presents techniques for generating balanced and class-independent training data from an unlabeled dataset. It involves clustering the unlabeled data, identifying clusters that are close to labeled data of a minority class, and then using these clusters to augment the training data.
Potential Anticipation: Similar to the previous reference, US 2013/0097103 A1 is highly relevant to the claims of the '900 patent that deal with imbalanced data. Claim 7 of the '900 patent outlines supplementing the ground truth dataset by finding and adding "top similar documents" using an "elastic search similarity score" for minority class samples. The methods described in US 2013/0097103 A1 for generating balanced training data from unlabeled sets by identifying similar data points (through clustering) could be seen as anticipating the process outlined in claim 7. The concept of leveraging unlabeled data to enhance a training set for imbalanced classes is a central theme in this prior art.

Summary

The prior art for U.S. Patent 11,275,900 demonstrates that the foundational concepts of using machine learning for text classification, creating ground truth datasets, and addressing class imbalance were known in the field prior to the invention. While the '900 patent applies these concepts to the specific domain of dark web forums, the underlying techniques described in the cited references could pose challenges to the novelty and non-obviousness of certain claims. A detailed analysis by a patent examiner or in a legal setting would be required to determine the ultimate validity of the patent's claims in light of this prior art.