Prior art — US Patent 12236947

Prior Art Analysis for U.S. Patent 12,236,947

An analysis of the prior art cited during the prosecution of U.S. Patent 12,236,947, "Flexible-format voice command," reveals several key references that the patent examiner considered. This review is critical in understanding the novel contributions of the '947 patent as determined by the United States Patent and Trademark Office (USPTO). The following analysis details the most relevant cited patents and their potential relationship to the claims of the '947 patent.

It is important to note that anticipation under 35 U.S.C. § 102 requires that a single prior art reference disclose each and every element of a claimed invention. The following analysis identifies claims that are potentially anticipated by the cited references, reflecting the examiner's likely considerations.

U.S. Patent Application Publication No. US2013/0297319A1

Full Citation: Kim, Yongsin. "Mobile device having at least one microphone sensor and method for controlling the same." U.S. Patent Application Publication No. US2013/0297319A1, published November 7, 2013. Filed May 1, 2012.
Brief Description: This patent application describes a mobile device that uses a microphone to detect a user's voice command. The system can activate and control functions based on the recognized voice command. It discusses using a voice trigger to initiate voice recognition.
Potential Anticipation of Claims: This reference was likely considered in relation to the foundational aspects of voice command processing.
- Claim 1 & 18: The '319 application describes receiving an audio input and causing a system to act based on a command within that input. However, it does not appear to disclose the crucial element of simultaneously receiving and processing a video input to identify a visual characteristic of the user to determine their intent, which is a key limitation of claims 1 and 18 of the '947 patent.
- Claim 17: Similarly, while the '319 application deals with voice commands, it does not seem to incorporate the state of a dialog between the user and the system to determine if an utterance is a command.

U.S. Patent Application Publication No. US2019/0069017A1

Full Citation: "Methods and systems for enhancing set-top box capabilities." U.S. Patent Application Publication No. US2019/0069017A1, published February 28, 2019. Filed August 31, 2017. Assignee: Rovi Guides, Inc.
Brief Description: This application details methods for enhancing the functionality of a set-top box, including through voice commands. It discusses receiving user inputs, which can include voice, to control the device.
Potential Anticipation of Claims: This reference addresses voice control in a specific consumer electronics context.
- Claim 1 & 18: The '017 application discloses receiving audio commands to control a system. However, like the '319 application, it does not appear to teach the combination of audio and video processing where a visual characteristic of the user is identified to help determine if an utterance is a command.
- Claim 17: The concept of using the state of a dialog is not a central feature of the '017 application's disclosure in the way it is claimed in the '947 patent.

U.S. Patent No. 10,388,272B1

Full Citation: "Training speech recognition systems using word sequences." U.S. Patent No. 10,388,272B1, issued August 20, 2019. Filed December 4, 2018. Assignee: Sorenson IP Holdings, LLC.
Brief Description: This patent focuses on the training of speech recognition systems. It describes methods for improving the accuracy of these systems by using specific word sequences during the training process.
Potential Anticipation of Claims: This patent is relevant to the underlying speech recognition technology but less so to the specific multi-modal command determination method of the '947 patent. Its focus is on the training of the language model, which is a component of the system described in the '947 patent but not the core of the independent claims. It does not appear to disclose the claimed method of using combined audio and video inputs at the time of command issuance to determine user intent.

U.S. Patent Application Publication No. US2020/0310842A1

Full Citation: "System for User Sentiment Tracking." U.S. Patent Application Publication No. US2020/0310842A1, published October 1, 2020. Filed March 27, 2019. Assignee: Electronic Arts Inc.
Brief Description: This application describes a system for tracking user sentiment, which can involve analyzing voice data to determine a user's emotional state. This can be used to adapt a system's behavior.
Potential Anticipation of Claims:
- Claim 1, 8, & 18: This reference is particularly relevant as it discusses analyzing user states, which could be interpreted to include visual cues. However, the '842 application's primary focus is on sentiment analysis rather than the specific problem of determining whether an utterance is a "command directed to a system" based on combined audio and video inputs, including identifying a "visual characteristic associated with the user uttering the first utterance" for this purpose. The '947 patent's claims are more specific about using this multi-modal analysis to solve the command-intent problem.
- Claim 17: The use of dialog state is not a primary teaching of this reference in the context of identifying a command.

U.S. Patent Application Publication No. US2020/0329297A1

Full Citation: "Automated control of noise reduction or noise masking." U.S. Patent Application Publication No. US2020/0329297A1, published October 15, 2020. Filed April 12, 2019. Assignee: Bose Corporation.
Brief Description: This application is focused on audio processing, specifically the control of noise reduction and masking technologies. It may involve analyzing audio to distinguish speech from background noise.
Potential Anticipation of Claims: While this reference deals with advanced audio processing, it does not appear to disclose the multi-modal, intent-determining aspects of the '947 patent's independent claims. Its teachings are ancillary to the core invention claimed in the '947 patent.

In summary, while the cited prior art establishes a background for voice command and speech recognition technologies, none of the examined references appear to fully disclose the specific combination of features recited in the independent claims of U.S. Patent 12,236,947, particularly the use of video input to identify a user's visual characteristics in conjunction with audio processing to determine if an utterance is a system-directed command, and the further use of dialog context for this determination. This suggests that the novelty of the '947 patent, in the eyes of the examiner, resided in this specific multi-modal approach to discerning user intent.