Obviousness — US Patent 7624138

US Patent 7624138, titled "Method and apparatus for efficient integer transform," describes a method and apparatus for performing efficient integer transforms of content data, particularly for compression and decompression of audio, images, and video. The core of the invention, as outlined in the claims and disclosure, involves using a sequence of specific Single-Instruction-Multiple-Data (SIMD) operations: primarily a "multiply-add instruction" and a "horizontal-add instruction." [cite: "a method and apparatus for efficient integer transforms of content data include generating, in response to executing a multiply-add instruction, a plurality of sums of product pairs within a destination data storage device.", "adjacent summed-product pairs may be added in response to executing a horizontal-add instruction."]

The patent itself identifies several relevant pieces of prior art and general knowledge, which form the basis for an obviousness analysis under 35 U.S.C. § 103.

Identified Prior Art and General Knowledge:

Prior Art 1 (PA-PackedData): "A Set of Instructions for Operating on Packed Data," filed on Aug. 31, 1995, application number 521,360. This reference describes a packed instruction set (e.g., set 140) that includes instructions for supporting "a packed add operation" and "a packed multiply operation" on packed data formats. [cite: "Packed instruction set 140 may also include instructions for supporting: a pack operation, an unpack operation, a packed add operation, a packed subtract operation, a packed multiply operation, a packed shift operation, a packed compare operation, a population count operation, and a set of packed logical operations (including packed AND, packed ANDNOT, packed OR, and packed XOR) as described in “A Set of Instructions for Operating on Packed Data,” filed on Aug. 31, 1995, application number 521,360."] It teaches the general concept of processing multiple data elements in parallel using SIMD operations.
Prior Art 2 (PA-Filtering): "An Apparatus and Method for Efficient Filtering and Convolution of Content Data," filed on Oct. 29, 2001, application Ser. No. 09/952,891. This application is explicitly referenced in US7624138 and is also noted as a priority document for US7624138. [cite: "Priority claimed from US09/952,891"] It teaches a packed instruction set that "may also include one or more instructions for supporting: ... a horizontal-add instruction for adding adjacent bytes, words and doublewords, two word values, two words to produce a 16-bit result, two quadwords to produce a quadword result." [cite: "Packed instruction set 140 may also include one or more instructions for supporting: a move data operation; a data shuffle operation for organizing data within a data storage device; a horizontal-add instruction for adding adjacent bytes, words and doublewords, two word values, two words to produce a 16-bit result, two quadwords to produce a quadword result; and a register merger operation as are described in “An Apparatus and Method for Efficient Filtering and Convolution of Content Data,” filed on Oct. 29, 2001, application Ser. No. 09/952,891."] The title of this reference indicates its applicability to efficient processing of content data, similar to the objective of US7624138.
General Knowledge: The patent itself states that "Discrete transforms such as the discrete cosine transform (DCT) used in prior compression techniques have made use of floating-point or fixed-point number representations to approximate real irrational coefficients." [cite: "Discrete transforms such as the discrete cosine transform (DCT) used in prior compression techniques have made use of floating-point or fixed-point number representations to approximate real irrational coefficients."] It also highlights the known benefits of "integer transforms," which "have integer basis components and permit coefficients to be accurately represented by integers" [cite: "integer transforms which have integer basis components and permit coefficients to be accurately represented by integers."] and can be implemented efficiently. Furthermore, the concept of a "multiply-add" or "multiply-accumulate" (MAC) operation as a common and efficient instruction in digital signal processing (DSP) architectures for combining multiplication and addition is well-known in the art.

Obviousness Analysis based on 35 U.S.C. § 103:

A person having ordinary skill in the art (POSITA) in the field of computer architecture, instruction set design, and digital signal processing for multimedia applications, facing the challenge of efficiently implementing integer transforms, would have been motivated to combine the teachings of PA-PackedData and PA-Filtering.

Combination of PA-PackedData and PA-Filtering:

Deriving the Multiply-Add Instruction: PA-PackedData teaches the existence of both a "packed multiply operation" and a "packed add operation" for processing packed data elements in parallel. [cite: "Packed instruction set 140 may also include instructions for supporting: a pack operation, an unpack operation, a packed add operation, a packed subtract operation, a packed multiply operation, a packed shift operation, a packed compare operation, a population count operation, and a set of packed logical operations (including packed AND, packed ANDNOT, packed OR, and packed XOR) as described in “A Set of Instructions for Operating on Packed Data,” filed on Aug. 31, 1995, application number 521,360."] In DSP applications, which include transforms for multimedia content, the multiplication of data by coefficients (as described in US7624138) is frequently followed by an accumulation (addition). The combination of a multiply and an add into a single "multiply-add" or "multiply-accumulate" (MAC) instruction is a well-established architectural optimization to reduce instruction count, improve pipeline efficiency, and enhance performance. Therefore, it would have been obvious for a POSITA, seeking to optimize computations involving packed data (as taught by PA-PackedData), to implement a "packed multiply-add instruction" by combining the known packed multiply and packed add operations. The patent itself mentions support for "multiply-add and/or multiply-subtract operations" within its packed instruction set. [cite: "Packed instruction set 140 includes instructions for supporting multiply-add and/or multiply-subtract operations."]
Incorporating the Horizontal-Add Instruction: PA-Filtering directly teaches the "horizontal-add instruction for adding adjacent bytes, words and doublewords" within packed data registers. [cite: "Packed instruction set 140 may also include one or more instructions for supporting: a move data operation; a data shuffle operation for organizing data within a data storage device; a horizontal-add instruction for adding adjacent bytes, words and doublewords, two word values, two words to produce a 16-bit result, two quadwords to produce a quadword result; and a register merger operation as are described in “An Apparatus and Method for Efficient Filtering and Convolution of Content Data,” filed on Oct. 29, 2001, application Ser. No. 09/952,891."] Transforms, filters, and convolutions often require summing intermediate products across a data vector or matrix. A horizontal-add instruction is perfectly suited for this purpose, enabling efficient accumulation of results within a single packed register.

Motivation for Combination:

A POSITA, skilled in optimizing multimedia processing on SIMD architectures, would be strongly motivated to combine these elements to achieve the "efficient integer transform processing of content data" claimed by US7624138. The patent itself identifies the problem of "Discrete transforms... used in prior compression techniques have made use of floating-point or fixed-point number representations to approximate real irrational coefficients" [cite: "Discrete transforms such as the discrete cosine transform (DCT) used in prior compression techniques have made use of floating-point or fixed-point number representations to approximate real irrational coefficients."] and proposes "integer transforms" as a solution. [cite: "integer transforms which have integer basis components and permit coefficients to be accurately represented by integers."]

To implement these desired integer transforms efficiently on a processor with SIMD capabilities (as described in PA-PackedData), a POSITA would naturally consider:

Performance Improvement: Combining multiplication and addition into a single instruction (packed multiply-add) directly addresses the computational core of transforms, which are rich in multiply-accumulate operations. This reduces the number of instructions executed and improves data throughput.
Completing Transform Computations: After initial packed multiply-add operations generate "sums of product pairs" (often representing partial sums or intermediate terms of a transform), these partial results frequently need to be summed together horizontally to produce the final transform coefficients. The "horizontal-add instruction" from PA-Filtering provides precisely the mechanism to perform these necessary intra-register summations efficiently. The applicability of PA-Filtering to "Efficient Filtering and Convolution" further suggests its relevance to the broader domain of signal processing that includes transforms.

Thus, the combination of a packed multiply-add instruction (a straightforward evolution from the packed multiply and packed add of PA-PackedData for performance in DSP) and the explicitly taught horizontal-add instruction of PA-Filtering, applied to the known problem of efficient integer transforms for content data, would have been obvious to a POSITA at the time of the invention. The predictable result would be a more efficient implementation of integer transforms, fulfilling a recognized need in multimedia processing.