Obviousness — US Patent 9832017

Obviousness Analysis under 35 U.S.C. § 103 for US9832017

This analysis considers the independent claims 1, 6, and 13 of US patent 9832017 and evaluates whether their subject matter would have been obvious to a person having ordinary skill in the art (POSITA) at the patent's priority date of September 30, 2002, in light of the identified prior art references.

A POSITA in 2002-2003 would possess knowledge of digital multimedia capture (cameras, microphones), digital data processing, networking, database management, and emerging technologies in speech-to-text and image recognition. The problems described in the background of US9832017 – the difficulty of manually indexing a vast number of digital media files and the need for improved search and retrieval – would strongly motivate a POSITA to seek automated solutions for content organization and tagging.

Combination of Prior Art for Obviousness

The following combination of prior art references would render claims 1, 6, and 13 of US9832017 obvious:

Primary Reference: US7778438B2 (Malone): "Method for Multi-Media Recognition, Data Conversion, Creation of Metatags, Storage and Search Retrieval"
Secondary Reference 1: US5995936A (Brais): "Report generation system and method for capturing prose, audio, and video by voice command and automatically linking sound and image to formatted text locations"
Secondary Reference 2: US5737491A (Eastman Kodak): "Electronic imaging system capable of image capture, local wireless transmission and voice recognition"
Secondary Reference 3: US6996251B2 (Malone): "Forensic Communication Apparatus and Method" (as a foundational system for secure capture and transmission)
Secondary Reference 4: US5828809A (Matsushita): "Method and apparatus for extracting indexing information from digital video data"

Motivation to Combine:

A POSITA would be motivated to combine these references to create an integrated system that automates the generation of searchable metadata for multimedia content, thereby addressing the challenges of managing and retrieving large digital media libraries (as articulated in the background of US9832017).

Motivation from US7778438B2: This patent (an ancestor of US9832017) explicitly teaches "multi-media recognition, data conversion, creation of metatags, storage and search retrieval" [cite: The "Prior art" section of US9832017]. A POSITA would recognize its value in providing a framework for automated tagging, specifically mentioning "speech-to-text algorithm" for audio files and "image recognition and identification" for still or moving images [cite: The "Description" section of US9832017].
Motivation from Brais (US5995936A): Brais describes capturing "audio, and video by voice command" and "automatically linking sound and image to formatted text locations" [cite: The "Prior art" section of US9832017]. This provides a clear teaching and motivation for using captured audio to generate textual descriptions (via speech-to-text) and associating these with visual media, reinforcing and elaborating on the audio-to-text conversion aspect of US7778438B2.
Motivation from Eastman Kodak (US5737491A): Eastman Kodak discloses an "electronic imaging system capable of image capture, local wireless transmission and voice recognition" [cite: The "Prior art" section of US9832017]. This provides the motivation to integrate image capture (camera) and audio capture (microphone for voice recognition) into a single device, capable of both local processing and transmission, forming the basis of the "capture device" in US9832017.
Motivation from US6996251B2: As a foundational patent in the same family, US6996251B2 teaches the capture of information, creation of a "composite data set" with "meta data," and secure transmission to a "storage facility" [cite: The "Prior art" section of US9832017]. This provides the overall system architecture for transmitting and storing multimedia and associated metadata, whether processed locally or remotely.
Motivation from Matsushita (US5828809A): Matsushita teaches "extracting indexing information from digital video data" [cite: The "Prior art" section of US9832017]. This directly provides the motivation for using image recognition techniques to generate descriptive tags from the visual content itself, complementing the audio-derived tags and further enriching the searchability of the media.

The combination of these known elements and techniques, driven by the desire to automate and enhance multimedia organization and retrieval, would have been apparent to a POSITA.

Obviousness of Independent Claims

Claim 1: Remote Storage and Processing System

Claim 1 describes a system where image and audio information are captured, stored digitally within a capture device, transmitted to a remote network location, and then processed at the remote location to create text and image recognition context tags, which are associated with the digital image and stored in a database.

Microphone and Camera (Capture Device): Eastman Kodak (US5737491A) explicitly teaches an "electronic imaging system capable of image capture... and voice recognition" [cite: The "Prior art" section of US9832017], covering the combined microphone and camera. US6996251B2 also describes a "capture device" for gathering "still pictures, moving pictures, audio" [cite: The "Description" section of US9832017].
Processing and Storing Digitally in Capture Device: Both Eastman Kodak and US6996251B2 imply local storage of captured digital media. The "first data converter" processing and storing audio, and the camera processing and storing images in digital formats, are standard functionalities of such capture devices.
Transmitting to Remote Location: US6996251B2 details a "capture device" transmitting "meta data" and a "composite element" to a "storage facility" over a "communications medium" [cite: The "Description" section of US9832017], covering the transmission of digital audio and images to a remote network location. Eastman Kodak also mentions "local wireless transmission" [cite: The "Prior art" section of US9832017].
Remote System with Receiver and Database: US6996251B2 clearly teaches a "storage facility" at a remote node on the network with a receiver and a "mass storage" mechanism (database) [cite: The "Description" section of US9832017]. Carnegie Mellon (US5835667A) further describes a "searchable digital video library" [cite: The "Prior art" section of US9832017].
System Data Converter for Tag Creation (Audio-to-Text & Image Recognition):
- Audio-to-text tag: Brais (US5995936A) teaches "automatically linking sound and image to formatted text locations" [cite: The "Prior art" section of US9832017], which necessitates speech-to-text conversion to generate text from audio. US7778438B2 explicitly discloses a "speech-to-text algorithm" for creating context tags from audio files [cite: The "Description" section of US9832017].
- Image recognition tag: Matsushita (US5828809A) describes "extracting indexing information from digital video data" [cite: The "Prior art" section of US9832017], directly covering image recognition for tagging. US7778438B2 also explicitly teaches "image recognition and identification" to create context tags from still or moving images [cite: The "Description" section of US9832017].
- Associating tags with digital image: Brais teaches "linking sound and image to formatted text" [cite: The "Prior art" section of US9832017]. US7778438B2 describes storing data elements with a "set of searchable tags specific to that image, video, audio or other media" [cite: The "Description" section of US9832017].
Database storing digital image in association with tags: This is taught by US7778438B2's storage of "data element 102, the context element 110, and meta data 106 that now includes a set of searchable tags" [cite: The "Description" section of US9832017], and by Carnegie Mellon's searchable digital video library [cite: The "Prior art" section of US9832017].

Therefore, a POSITA would find it obvious to combine the functionalities of multimedia capture and transmission (Eastman Kodak, US6996251B2) with the automated multimedia recognition and tagging capabilities (Brais, Matsushita, US7778438B2) within a remote storage and retrieval system (US6996251B2, Carnegie Mellon) to solve the problem of organizing and searching digital media.

Claim 6: Local Storage and Processing System within a Capture Device

Claim 6 describes a capture device with internal storage where captured image and audio are associated. A media data converter within the device converts the audio to text tags and performs image recognition for image tags, which are then associated with the digital image and stored internally. Claim 10 further adds transmission of this locally tagged content to a remote database.

Capture Device with Internal Storage, Microphone, and Camera: As discussed for Claim 1, Eastman Kodak (US5737491A) provides an electronic imaging system with image capture and voice recognition capabilities [cite: The "Prior art" section of US9832017], implying an integrated device. US9832017 itself notes that "Variations of the system include placing the ability to enter tags on the data capture device itself... by means of a keypad, a touch screen or voice recognition software" [cite: The "Detailed Description" section of US9832017], acknowledging that local tagging was a known or obvious variation.
Processing and Storing Digitally in Internal Storage: Standard for digital capture devices.
Combiner for Association: Brais's "automatically linking sound and image" [cite: The "Prior art" section of US9832017] demonstrates the concept of associating multimedia elements. US6996251B2 describes creating a "composite data set" [cite: The "Description" section of US9832017].
Media Data Converter for Tag Creation (within Capture Device):
- Given Eastman Kodak's "voice recognition" capability within an "electronic imaging system" [cite: The "Prior art" section of US9832017], and the explicit teachings of speech-to-text (Brais, US7778438B2) and image recognition (Matsushita, US7778438B2) for tag generation, it would be obvious for a POSITA to implement these conversion steps locally on a capture device. The motivation would be to enhance user experience by providing immediate tagging, reducing bandwidth requirements for transmitting raw audio/video for remote processing, and enabling offline search capabilities. The increasing processing power of portable devices by the priority date would make such local processing feasible.
Internal storage storing with associated tags: This is a logical extension of local processing and tagging.

Claim 10 (dependent on Claim 6) adds the transmission of the locally processed and tagged data to a remote database. This is explicitly taught by US6996251B2, which involves a "capture device" sending data to a "storage facility" [cite: The "Description" section of US9832017].

Therefore, combining the integrated capture device with local processing (Eastman Kodak, US9832017's own described variations) with the specific tagging methods (Brais, Matsushita, US7778438B2) would make Claim 6 obvious.

Claim 13: Local Capture and Tagging System

Claim 13 is very similar to Claim 6, also describing a system for capturing image and audio information with internal storage and a "second data converter" performing the audio-to-text and image recognition tagging locally, storing the digital image with associated tags in internal storage.

The obviousness analysis for Claim 6 directly applies to Claim 13. The "second data converter" functions identically to the "media data converter" in Claim 6, performing local audio-to-text and image recognition to create searchable tags. The motivation to implement these functionalities locally within the capture device (as discussed for Claim 6) remains the same.

In summary, the combination of these prior art references, particularly the explicit teachings of multimedia recognition and metatag creation in US7778438B2, the integrated capture and voice recognition capabilities of Eastman Kodak, and the automated linking of audio to text in Brais, would have made the claimed inventions of US9832017 obvious to a POSITA at the time of invention.