Obviousness — US Patent 10721066

US Patent 10721066 describes methods for capturing image and audio information, processing it to create searchable tags, and storing it either locally or remotely. The patent's priority date is September 30, 2002. An analysis under 35 U.S.C. § 103 for obviousness considers whether the claimed invention would have been obvious to a person having ordinary skill in the art (POSITA) at the time of the invention, given the prior art.

The background of US10721066 highlights several problems: the difficulty of manually describing and indexing numerous media files, the challenge of transferring files between computers, limited access to locally stored files, and the loss of searchable tags when media files are shared. The claimed invention purports to address these issues through automated tagging, remote storage, and secure transmission.

Based on a review of prior art references cited within US10721066 and published before its priority date, many individual elements of the claims were known in the art. The motivation to combine these elements would stem from the desire to overcome the recognized problems associated with managing and utilizing digital multimedia, as articulated in the patent's background.

Obviousness Analysis for Independent Claims

Claim 1: Remote Storage Method

Claim 1 describes a method for capturing image and audio information, capturing location and time, combining, encrypting, transmitting to a remote network, and at the remote location, decrypting, converting audio to text tags, creating image recognition tags, and storing them with the digital image and captured data.

Primary Reference:

US6240212B1 (May 29, 2001): "Multimedia data capture, storage, and retrieval system." This patent teaches a system for capturing, storing, and retrieving multimedia data, which inherently involves a capture device (e.g., camera and microphone), processing captured information, and transmission to a remote storage facility on a network for storage and retrieval.

Combination with Secondary References:

Capturing Location and Time Information:
- US6078308A (June 20, 2000): "Digital camera with integrated global positioning system." This reference clearly teaches integrating GPS into a digital camera to capture location information. Time information is an intrinsic component of digital media capture.
- Motivation: It would have been obvious to a POSITA to integrate GPS and timestamping capabilities (as shown in US6078308A) into the multimedia capture device described in US6240212B1. This provides valuable contextual metadata that enhances the organization and searchability of multimedia files, directly addressing the problem of poorly organized computer files mentioned in the background of US10721066.
Combining into a Composite Data Set and Encryption:
- US6173076B1 (January 9, 2001): "Method and apparatus for secure digital image capture and transmission." This patent teaches secure digital image capture and transmission, implying the use of encryption.
- Motivation: Combining multiple data types (image, audio, location, time) into a single composite data set for efficient transmission and storage is a common data management technique (e.g., using compression algorithms like ZIP, as mentioned in US10721066). Securing this data during transmission to a remote location is a fundamental concern for data integrity and privacy. Therefore, it would have been obvious to a POSITA to combine the data into a composite set and encrypt it prior to transmission, utilizing known security methods as taught by US6173076B1, especially when transmitting over a network.
Converting Digital Audio to Text Tags and Creating Image Recognition Tags:
- US5710834A (January 20, 1998): "Method and apparatus for converting audible content to a retrievable electronic document." This patent directly teaches converting audio to text for searchable documents.
- US6148301A (November 14, 2000): "System and method for generating image metadata from audio input." This reference explicitly links audio input to the generation of image metadata.
- US6233342B1 (May 15, 2001): "System and method for object recognition and feature extraction." This patent teaches object recognition techniques that can be used to generate descriptive tags from images.
- US5812779A (September 22, 1998): "System and method for organizing and retrieving images based on image features and associated text." This patent demonstrates the use of image features and associated text for retrieval.
- Motivation: Given the stated problem of manually indexing and the loss of searchability when sharing, it would be obvious to a POSITA to automate the creation of searchable "context tags" at the remote storage facility (as part of the data conversion process of US6240212B1). This automation would leverage existing speech-to-text technology (US5710834A, US6148301A) for audio data and image recognition technology (US6233342B1) for visual data. This directly addresses the long-felt need for improved organization and retrieval of large multimedia collections, as also acknowledged by US10721066 itself, which states the preference for automated tagging "as image and voice recognition improve."

Conclusion for Claim 1: Claim 1 would be obvious over the combination of US6240212B1, US6078308A, US6173076B1, US5710834A, US6144898A, and US6233342B1, with the motivation to improve multimedia content management, searchability, and security.

Claim 6: Local Storage Method in Capture Device

Claim 6 describes a method for capturing image and audio, capturing location and time, converting audio to text tags, creating image recognition tags, and storing them with the digital image and captured data in the internal storage of the capture device.

Primary Reference:

US5602458A (February 11, 1997): "Rechargeable camera with image, audio and text data storage and retrieval capabilities." This patent describes a camera capable of storing images, audio, and associated text data, and retrieving them, all within the device's internal storage.

Combination with Secondary References:

Capturing Location and Time Information:
- US6078308A (June 20, 2000): "Digital camera with integrated global positioning system." This reference teaches integrating GPS into a digital camera to capture location information, with time information being a standard feature.
- Motivation: It would have been obvious to a POSITA to integrate GPS and timestamping into a multimedia capture device like that of US5602458A to automatically enrich locally stored media with contextual metadata, thus improving local organization and retrieval.
Converting Digital Audio to Text Tags and Creating Image Recognition Tags:
- US5710834A (January 20, 1998): "Method and apparatus for converting audible content to a retrievable electronic document." This teaches audio-to-text conversion for searchable documents.
- US5546145A (August 13, 1996): "Camera on-board voice recognition." This patent explicitly teaches voice recognition on the camera itself.
- US6148301A (November 14, 2000): "System and method for generating image metadata from audio input." This directly teaches generating image metadata from audio input.
- US6233342B1 (May 15, 2001): "System and method for object recognition and feature extraction." This patent teaches object recognition, which can generate image-based tags.
- Motivation: Since US5602458A already provides for storing "text data" with images and audio, it would be an obvious improvement to automate the generation of this text data. A POSITA would be motivated to use known speech-to-text technologies (US5710834A, US5546145A, US6148301A) for captured audio and image recognition technologies (US6233342B1) for captured images to automatically create searchable tags directly on the device. This addresses the problem of manual indexing and improves the richness and accuracy of tags, thereby enhancing the local search and retrieval capabilities of media files. US10721066 itself suggests placing "the ability to enter tags on the data capture device itself" using "voice recognition software."

Conclusion for Claim 6: Claim 6 would be obvious over the combination of US5602458A, US6078308A, US5710834A, US5546145A, US6148301A, and US6233342B1, motivated by the desire to automate local multimedia tagging and improve on-device search and organization.

Claim 13: Local Storage Method with Two Data Converters

Claim 13 is very similar to Claim 6, further specifying a "first data converter" for audio conversion and a "second data converter" for converting audio to text tags and creating image recognition tags.

Primary Reference:

US5602458A (February 11, 1997): "Rechargeable camera with image, audio and text data storage and retrieval capabilities." As with Claim 6, this provides the foundational camera with internal multimedia and text storage.

Combination with Secondary References:

Capturing Location and Time Information:
- US6078308A (June 20, 2000): "Digital camera with integrated global positioning system." This teaches integrating GPS into a digital camera.
- Motivation: As explained for Claim 6, it would be obvious to a POSITA to incorporate GPS and timestamping into the capture device of US5602458A to enhance the contextual information of locally stored media.
Explicit "Second Data Converter" for Text and Image Recognition Tags:
- US5710834A (January 20, 1998): "Method and apparatus for converting audible content to a retrievable electronic document." This describes an audio-to-text converter, which functions as a "second data converter" for processing audio into text tags.
- US6233342B1 (May 15, 2001): "System and method for object recognition and feature extraction." This provides the image recognition capability for generating tags.
- US6148301A (November 14, 2000): "System and method for generating image metadata from audio input." This directly supports generating metadata from audio.
- Motivation: While US5602458A supports text data storage, specifying the use of a "second data converter" for generating text from audio and for image recognition tags simply describes a modular implementation of known functionalities within a capture device. It would be obvious to a POSITA to employ distinct processing modules or "converters" (either hardware or software) to perform speech-to-text conversion (as in US5710834A) and image recognition (as in US6233342B1) to automatically create searchable tags. This modular approach is a standard engineering design choice for clarity, efficiency, and maintainability, further addressing the problem of manual indexing and improving local searchability.

Conclusion for Claim 13: Claim 13 would be obvious over the combination of US5602458A, US6078308A, US5710834A, US6148301A, and US6233342B1. The differentiation into "first" and "second" data converters merely describes a conventional design choice for implementing known functions within a system, which would be obvious to a POSITA.