Obviousness — US Patent 7502518

Obviousness Analysis of US Patent 7502518 under 35 U.S.C. § 103

This analysis identifies combinations of prior art references that would render the independent claims of US Patent 7502518 obvious to a person having ordinary skill in the art (PHOSITA). The primary inventive concept of US7502518 lies in correcting quantization width based on a combination of both a level of visual attention and a level of perceivable image quality distortion, to overcome the limitations of prior art that considered these factors individually or inadequately.

Common Motivation for Combining Prior Art

A PHOSITA in the field of image coding, seeking to improve perceived image quality and coding efficiency, would be aware of existing techniques that adjust quantization parameters based on human visual system characteristics. The patent itself identifies a problem with the "first literature" (TM5) and the "second literature" (JP2003284071A) when applied individually:

TM5 (First Literature): Reduces quantization distortion in flat areas (where distortion is easily perceived) by making quantization steps smaller. However, it "relatively increases the quantization step for the focused area, thereby deteriorating the subjective image quality of the focused area."
JP2003284071A (Second Literature): Enhances image quality in focused areas by decreasing the quantization step. However, it "relatively increases the quantization step of the flat background thereby strengthening the quantization distortion in the flat background which is easily perceived."

This explicit articulation of the shortcomings of single-factor approaches provides a strong motivation for a PHOSITA to combine the concepts to address both problems simultaneously. The goal would be to enhance the image quality in visually focused areas without unduly degrading the quality in flat, easily distorted background areas. This would lead to an improved overall subjective image quality, which is a common objective in image coding.

Analysis of Independent Claims

Claims 1 and 3 (Apparatus and Method - Multiplication-based Correction)

Claim 1: An apparatus for coding an image, comprising: a setting unit that sets a quantization width for each coded block; a visual attention calculating unit that calculates a level of visual attention for a first element; a perceptual distortion calculating unit that calculates a level of perceptual distortion for a second element whose distorted image quality is easily visually perceived; a correcting unit that corrects the quantization width to a value obtained as a result of a multiplication in which a product of the level of visual attention and the level of distorted precision is multiplied by the quantization width set by the setting unit; and a quantizing unit that quantizes the image data based on the corrected quantization width.

Claim 3: A method for coding an image, comprising: setting a quantization width for each coded block; calculating for each coded block a level of visual attention for a first element; calculating for each coded block a level of perceptual distortion for a second element whose distorted image quality is easily visually perceived; correcting the quantization width to a value obtained as a result of a multiplication in which a product of the level of visual attention and the level of distorted precision is multiplied by the quantization width set by the setting; and quantizing the image data based on the corrected quantization width.

Combination: JP2003284071A (second literature) in view of TM5 (first literature).

Rationale:

Setting and Quantizing Unit/Step: The general concept of setting a quantization width (or step) for each coded block and then quantizing based on it is fundamental to image compression and is broadly disclosed by various prior art references, including US5291282A, US6272177B1, JPH10164581A, US6295375B1, US6792152B1, and explicitly in TM5 and JP2003284071A.
Visual Attention Calculation: JP2003284071A explicitly teaches a method that "enhances the image quality of the focused area by relatively decreasing the quantization step of the focused area," which directly aligns with calculating a "level of visual attention" to a first element. US7274741B2 also provides general methods for generating a "user attention model."
Perceptual Distortion Calculation: TM5 teaches "calculating an activity of an input image and correcting the quantization step so that the quantization step for flat areas is made smaller, considering the human visual characteristics that human visual system is more sensitive to distortions in flat areas." This "activity" directly serves as a "level of perceptual distortion" for a second element (e.g., flat areas) whose distorted image quality is easily visually perceived.
Correcting by Multiplication based on both Visual Attention and Perceptual Distortion:
- As discussed in the "Common Motivation," both JP2003284071A and TM5 demonstrate deficiencies when applied in isolation, particularly creating issues in areas not prioritized by their respective single-factor approaches (focused areas in TM5, flat backgrounds in JP2003284071A).
- A PHOSITA would be motivated to combine the teachings of these two references to address these known problems. To avoid the drawbacks of each method, a PHOSITA would seek a correction mechanism that simultaneously accounts for both factors.
- Multiplying a factor derived from visual attention (e.g., G' in US7502518's description) and a factor derived from perceptual distortion (e.g., N_act in US7502518's description) to yield a combined coefficient (GA) which then adjusts the base quantization width (QP = GA × QP') is a straightforward and common mathematical approach for combining multiple influencing factors in adaptive coding. The patent's description notes that G' takes values around one (smaller for higher attention, larger for lower), and N_act also takes values around one (smaller for flat, larger for complex). Multiplying these values allows both factors to modulate the quantization step. This mathematical operation would be an obvious choice for a PHOSITA looking to integrate two independent adjustment factors into a single corrective action.

Therefore, claims 1 and 3, which describe an apparatus and method for correcting quantization width by multiplying the initial width by a product of visual attention and perceptual distortion levels, would be obvious in light of JP2003284071A in combination with TM5.

Claims 2 and 4 (Apparatus and Method - Addition-based Logarithmic Correction)

Claim 2: An apparatus for coding an image, comprising: a setting unit that sets a quantization width for each coded block; a visual attention calculating unit that calculates a level of visual attention for a first element; a perceptual distortion calculating unit that calculates a level of perceptual distortion for a second element whose distorted image quality is easily visually perceived; a correcting unit that corrects the quantization width to a value obtained as a result of an addition in which the quantization width is added to a product of a predetermined positive real number and a predetermined logarithm of a product of the level of visual attention and the level of perceptual distortion, the base of the logarithm being a real number equal to or larger than one; and a quantizing unit that quantizes the image data based on the corrected quantization width.

Claim 4: A method for coding an image, comprising: setting a quantization width for each coded block; calculating for each coded block a level of visual attention for a first element; calculating for each coded block a level of perceptual distortion for a second element whose distorted image quality is easily visually perceived; correcting the quantization width to a value obtained as a result of an addition in which the quantization width is added to a product of a predetermined positive real number and a predetermined logarithm of a product of the level of visual attention and the level of perceptual distortion, the base of the logarithm being a real number equal to or larger than one; and quantizing the image data based on the corrected quantization width.

Combination: JP2003284071A (second literature) in view of TM5 (first literature), and general knowledge of adaptive quantization techniques.

Rationale:

Setting, Visual Attention, Perceptual Distortion, and Quantizing: The elements for setting quantization width, calculating visual attention, calculating perceptual distortion, and quantizing are the same as discussed for Claims 1 and 3, and are taught by JP2003284071A and TM5.
Correcting by Addition and Logarithm: The core motivation for combining visual attention and perceptual distortion remains the same—to overcome the identified shortcomings of prior art. While Claims 1 and 3 specify a multiplication, Claims 2 and 4 describe an additive, logarithmic correction (e.g., QP = K ⋅ logL(GA) + QP' as described in US7502518's modification).
- In image and video coding, various mathematical functions are employed to model human visual perception and apply adaptive quantization. Logarithmic relationships are well-known to represent human perceptual responses (e.g., to luminance changes or contrast).
- A PHOSITA, when optimizing an adaptive quantization scheme that considers multiple perceptual factors, would routinely explore different mathematical combinations, including linear addition, multiplication, and non-linear functions like logarithms. The patent itself presents this logarithmic formulation as a "first modification" with "a similar characteristic" to the multiplicative approach, suggesting it's an alternative, rather than a fundamentally distinct inventive step.
- The choice of specific constants (K, L) and the logarithm's base (real number ≥ 1) would be within the purview of routine experimentation for a PHOSITA optimizing the system for desired perceptual quality and compression efficiency.

Therefore, claims 2 and 4, which describe an apparatus and method for correcting quantization width using an addition-based logarithmic function of combined visual attention and perceptual distortion levels, would be obvious in light of JP2003284071A in combination with TM5, and the general knowledge in the art regarding perceptual coding models and mathematical functions for adaptive quantization.

Claim 5 (Apparatus - Specific Visual Attention and Perceptual Distortion Details, Multiplication-based Correction)

Claim 5: An apparatus for coding an image, comprising: a setting unit that sets quantization width for each coded block; a visual attention calculating unit that calculates for each coded block a level of visual attention for a first element, the level of visual attention taking a smaller value than one in a coded block with a higher level than other levels of coded blocks, and taking a larger value than one in a coded block with a smaller level of coded block than other levels of coded blocks, levels of coded blocks being set based on at least one of elements including value, color saturation of an average color value of pixels and hue of an average color value of pixels; a perceptual distortion calculating unit that calculates for each coded block a level of perceptual distortion for a second element whose distorted image quality is visually perceived, the level of perceptual distortion taking a smaller value than one when the variance of the input image of the coded block is less than average, and taking a larger value than one when the variance of the input image of the coded block is more than average; a correcting unit that corrects the quantization width to a value obtained as a result of a multiplication in which a product of the level of visual attention and the level of distorted precision is multiplied by the quantization width set by the setting unit; and a quantizing unit that quantizes the image data based on the corrected quantization width.

Combination: JP2003284071A (second literature) in view of TM5 (first literature), further in view of US7274741B2, US5291282A, and general knowledge of image processing.

Rationale:

Setting, Correcting by Multiplication, and Quantizing: These aspects are made obvious by the combination of JP2003284071A and TM5, as discussed for Claim 1.
Specific Visual Attention Calculation: Claim 5 specifies calculating visual attention where it "takes a smaller value than one in a coded block with a higher level... and taking a larger value than one... with a smaller level... based on at least one of elements including value, color saturation of an average color value of pixels and hue of an average color value of pixels."
- JP2003284071A teaches adjusting quantization based on "focused area."
- US7502518's detailed description clarifies that visual attention can be based on "focal level, coloring, position in the image frame, and motion." For "coloring," it explicitly mentions "red (r), flesh color (sk), and with respect to the difference with an average color in the frame (cd)," further specifying "value V and color saturation S of an average color value of pixels in a block represented in HSV color model" for red, and "hue H of an average color value of pixels in a block represented in HSV color model" for flesh color.
- US7274741B2 focuses on generating a "comprehensive user attention model" using various factors. These types of image features (color, luminance, position, motion) were well-known indicators of visual saliency or attention in the art. The specific assignment of smaller values for higher attention and larger values for lower attention is a design choice to make the subsequent multiplication (GA = G' × N_act) work as intended, where a smaller GA leads to a smaller QP (higher quality).
Specific Perceptual Distortion Calculation: Claim 5 specifies the perceptual distortion "taking a smaller value than one when the variance of the input image of the coded block is less than average, and taking a larger value than one when the variance of the input image of the coded block is more than average."
- TM5 teaches using "activity" to reduce quantization for "flat areas" (where distortion is easily perceived). US7502518's description explicitly states that "an activity indicating a degree of complexity of the image in the macroblock is used to indicate the level of perceptual distortion," and that this activity is calculated based on "Vy which is the variance of the input image signal of the pertinent macroblock."
- US5291282A also describes varying quantization width based on "activity."
- The relationship between image complexity (variance) and perceptual distortion (easier to perceive in flat areas) was well-established in the art. Setting the activity (N_act) to be smaller than one for flatter (less variance) and larger than one for more complex (more variance) areas, as described in US7502518, is a standard way to scale such a factor for use in an adaptive quantization formula.

Thus, the specific details for calculating visual attention based on color features (value, saturation, hue) and perceptual distortion based on image variance, as described in Claim 5, represent selections from well-known image processing techniques that a PHOSITA would routinely apply when implementing the broader concepts of visual attention and perceptual distortion taught by JP2003284071A and TM5, respectively. The scaling of these values (e.g., less than one for high attention/flat, greater than one for low attention/complex) is a straightforward design choice to achieve the desired effect when combined via multiplication.

Therefore, claim 5 would be obvious in light of JP2003284071A and TM5, further in view of US7274741B2, US5291282A, and general knowledge in image processing for calculating specific perceptual characteristics.