Obviousness — US Patent 10751029

Obviousness Analysis under 35 U.S.C. § 103

This analysis identifies combinations of prior art references that would render claims of US10751029 obvious to a person having ordinary skill in the art (POSA). The motivation to combine these references is rooted in established principles of neural network design and the known benefits of specific architectural components for improving performance, efficiency, and training stability in image analysis tasks.

Prior Art References

The patent US10751029 itself references the following:

Huang, G., Liu, Z., Weinberger, K. Q., van der Maaten, L.: Densely connected convolutional networks. In: IEEE CVPR. vol. 1-2, p. 3 (2017) (DenseNet): This reference describes Densely Connected Convolutional Networks (DenseNet), a neural network architecture where each layer is directly connected to every other layer in a feed-forward fashion within a dense block. DenseNets are known to alleviate the vanishing gradient problem, strengthen feature propagation, encourage feature reuse, and substantially reduce the number of parameters, while achieving state-of-the-art accuracies on various image recognition tasks.
Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Proceedings of the 32nd International Conference on Machine Learning. pp. 448-456. ICML'15, JMLR (2015) (Batch Normalization): This paper introduces Batch Normalization, a technique to accelerate deep neural network training by normalizing layer inputs, re-centering them around zero, and re-scaling them to a standard size. It helps address internal covariate shift, allowing for higher learning rates and less careful initialization, leading to faster convergence and improved generalization.
Nair, V., Hinton, G. E.: Rectified linear units improve restricted boltzmann machines. In: Proceedings of the 27th international conference on machine learning (ICML-10). pp. 807-814 (2010) (ReLU): This reference describes Rectified Linear Units (ReLU), an activation function used in neural networks. ReLUs are known to preserve information about relative intensities as information travels through multiple layers of feature detectors.

Obviousness Combinations and Motivation

The independent claims of US10751029 focus on a computer-implemented method and system for ultrasonic image analysis that involves deriving feature representations using neural networks, determining a quality assessment value and an image property (e.g., view category) based on these features, and associating these outputs with the images. The training method also describes using a neural network comprising a feature extracting neural network, an image property specific neural network, and a quality assessment value specific neural network, where the feature extracting neural network can include a commonly defined first feature extracting neural network (e.g., CNN) and a second feature extracting neural network (e.g., RNN).

A POSA in the field of deep learning for medical image analysis in 2018 (the prior art date) would have been motivated to combine the disclosed prior art references as follows:

1. Combination of DenseNet + Batch Normalization + ReLU for Feature Extraction (Claims 1, 5, 6, 13, 16, 17, 20)

Claims: Claims 1 and 13 (methods of analysis and training, respectively) broadly cover deriving one or more extracted feature representations from ultrasound images. Claims 5, 6, 16, and 17 specifically mention the use of a "commonly defined first feature extracting neural network" that "may include a convolutional neural network" and that can derive a "first feature representation." The patent explicitly states that the "commonly defined first feature extracting neural networks (e.g., 304, 306, and 308) may include convolutional neural networks" and that "each of the neural networks 304, 306, and 308 may be implemented as a seven-layer DenseNet model" using specific hyper-parameters including "batch-normalization layer" and "Rectified Linear layer (ReLU)".
Motivation:
- DenseNet for feature extraction in CNNs: DenseNets were a prominent convolutional neural network architecture in 2017, known for their efficiency in feature reuse and improved gradient flow, leading to higher accuracy and fewer parameters in image recognition tasks. A POSA would naturally consider DenseNet for any image-based feature extraction, including medical images like ultrasound. The patent itself references DenseNet.
- Batch Normalization for stable and faster training: Batch Normalization, introduced in 2015, was widely adopted as a standard practice in deep learning to accelerate training, improve stability, and allow for higher learning rates by reducing internal covariate shift. Incorporating batch normalization into any deep neural network, including a DenseNet for feature extraction, would be an obvious choice to improve training efficiency and model performance.
- ReLU for efficient non-linearity: ReLU was established by 2010 as an effective activation function, known for its computational efficiency and ability to mitigate vanishing gradients compared to other activation functions. Its use in convolutional neural networks for feature extraction was standard practice.
Obviousness: The patent explicitly describes using DenseNet, batch normalization, and ReLU in the first feature extracting neural network (CNNs 304, 306, 308). The combination of these well-known and complementary techniques to build an efficient and robust convolutional neural network for image feature extraction would be obvious to a POSA.

2. Combination of Convolutional Neural Networks (DenseNet with BN and ReLU) + Recurrent Neural Networks (LSTM) for Spatio-temporal Feature Extraction (Claims 7, 8, 18, 19)

Claims: Claims 7 and 18 describe deriving extracted feature representations by "inputting the first feature representations into a second feature extracting neural network to generate respective second feature representations," where "the one or more extracted feature representations may include the second feature representations." Claims 8 and 19 specify that "the second feature extracting neural network may be a recurrent neural network," particularly an LSTM. The patent further states that the LSTM layer operates on the outputs of the DenseNet networks of multiple frames, extracting encodings of both spatial and temporal patterns.
Motivation:
- Combining CNNs and RNNs for spatio-temporal data: By 2018, it was well-known in the field of deep learning that CNNs are highly effective at extracting spatial features from images, while RNNs, particularly LSTMs, excel at processing sequential data and capturing temporal dependencies. For analyzing video data or "cine" ultrasound images, where both spatial information within each frame and temporal relationships between frames are crucial, combining these architectures would be a natural and obvious approach. The patent explicitly states that the "set of ultrasound images received may represent a video or cine and may be a temporally ordered set of ultrasound images."
- LSTM for capturing temporal patterns: LSTMs were a prevalent and effective type of RNN for handling sequences and were known to address the vanishing gradient problem in traditional RNNs. Their use for extracting temporal features from sequences of spatially-encoded data (e.g., features from CNNs) was a well-established technique in video analysis and other spatio-temporal tasks. The patent mentions that features extracted by LSTM networks "may be encodings of both spatial and temporal patterns of a multitude of echo frames."
Obviousness: The sequential application of a CNN (e.g., a DenseNet with BN and ReLU) to extract per-frame spatial features, followed by an RNN (e.g., an LSTM) to capture temporal dependencies across a sequence of these features, was a standard and obvious approach for video analysis and understanding in 2018.

3. Combining Quality Assessment and Image Property (View Category) Prediction in a Shared Neural Network (Claims 1, 2, 9, 10, 13, 14, 20)

Claims: Claims 1 and 13 describe determining both a "quality assessment value" and an "image property" (which "may be a view category") based on the derived feature representations. Claims 9 and 20 further specify that determining the quality assessment value and the image property may involve inputting the feature representations into a "quality assessment value specific neural network" and an "image property specific neural network" respectively. The patent also notes that these could be "commonly defined neural subnetworks."
Motivation:
- Efficiency and shared representations: The patent itself provides motivation for combining these tasks in a shared network: "a highly shared neural network may yield faster processing time compared to using a separate quality assessment and image property assessment," and "the joint training of the two modalities may prevent the neural network from overfitting the label from either modality." From a machine learning perspective, if two tasks (quality assessment and view categorization) rely on similar underlying features, it is a well-known and often advantageous practice to share early layers of a neural network to extract common features. This reduces model complexity, improves computational efficiency, and can lead to better generalization by forcing the shared layers to learn more robust and universally useful representations.
- Multi-task learning: Training a single neural network to perform multiple related tasks simultaneously (multi-task learning) was a recognized technique to improve performance on individual tasks by leveraging shared information. Both image quality and view category are properties of an ultrasound image, suggesting a strong correlation in their underlying visual features.
Obviousness: Given the known benefits of shared network architectures and multi-task learning for efficiency and performance, it would be obvious for a POSA to design a neural network that jointly extracts features and then branches into separate heads for quality assessment and view category prediction. The patent's own stated motivations reinforce this as an obvious design choice.

Therefore, the methods and systems described in US10751029, particularly those pertaining to the architecture and training of the neural network for joint quality assessment and view categorization, would have been obvious to a POSA in light of the cited prior art references.