Obviousness — US Patent 9502025

To assess the obviousness of US patent 9502025 under 35 U.S.C. § 103, we must consider whether the claimed subject matter, as a whole, would have been obvious to a person having ordinary skill in the art (PHOSITA) at the time of the invention (priority date: 2009-11-10) in light of prior art. This involves identifying potential combinations of prior art references and articulating a motivation for a PHOSITA to combine them to achieve the claimed invention.

The patent itself extensively references co-pending and issued U.S. patent applications, indicating a rich prior art landscape in natural language processing (NLP) and multi-modal voice services, largely developed by the original assignee, VoiceBox Technologies Corp. These references are incorporated by reference into US9502025, making them strong candidates for prior art.

Key Prior Art References Referenced in US9502025:

US 12/389,678: "System and Method for Processing Multi-Modal Device Interactions in a Natural Language Voice Services Environment" (Filed Feb. 20, 2009). Teaches multi-modal interactions, voice-click modules, ASR, and conversational language processors (CLP) for intent determination.
US 12/127,343: "System and Method for an Integrated, Multi-Modal, Multi-Device Natural Language Voice Services Environment" (Filed May 27, 2008). Teaches an integrated, multi-modal, multi-device natural language voice services environment, including a constellation model for shared knowledge across devices.
US 7,640,160 (derived from US 11/197,504): "Systems and Methods for Responding to Natural Language Speech Utterance" (Issued Dec. 29, 2009; filed Aug. 4, 2005). Teaches multi-pass speech recognition (ASR) with primary and secondary engines, dynamic recognition grammars.
US 11/212,693: "Mobile Systems and Methods of Supporting Natural Language Human-Machine Interactions" (Filed Aug. 29, 2005). Teaches intent determination using knowledge-enhanced speech recognition, long-term and short-term semantic knowledge, and personalized/general/environmental cognitive models.
US 11/580,926: "System and Method for a Cooperative Conversational Voice User Interface" (Filed Oct. 16, 2006). Teaches engaging users in cooperative conversations to resolve intent.
US Provisional Patent Application Ser. No. 61/259,827: "System and Method for Hybrid Processing in a Natural Language Voice Services Environment" (Filed Nov. 10, 2009). Teaches hybrid processing where multiple devices cooperatively determine intent and process multi-modal interactions.
US 7,398,209 (derived from US 10/452,147): "Systems and Methods for Responding to Natural Language Speech Utterance" (Issued Jul. 8, 2008; filed Jun. 2, 2003). Explicitly discusses "natural language services and subscriptions" and "purchase options" for services.

Motivation for a Person Having Ordinary Skill in the Art (PHOSITA) to Combine References:

The problem statement in US9502025 highlights limitations of existing voice user interfaces, stating they "fall short in utilizing information distributed across different domains, devices, and applications in order to resolve natural language voice-based inputs" and "suffer from being constrained to a finite set of applications" (Definitions section). A PHOSITA would be motivated to overcome these limitations by extending the capabilities of existing natural language voice service environments to new, user-desired applications and services, especially those involving digital content and e-commerce.

The patent itself describes its invention as generally operating "in a voice services environment that includes one or more electronic devices that can receive multi-modal natural language device interactions" and mentions "cooperating with other devices in the hybrid processing environment to search for or otherwise identify the content requested for dedication" (Definitions section). This indicates that the core natural language processing and hybrid environment components are considered known, and the invention is an application of these to a new service.

Obviousness Analysis of Independent Claims:

Independent Claim 1: Computer-implemented method

This claim describes a method involving:

Receiving a multi-modal interaction with a content dedication request.
ASR generating a first interpretation of the utterance.
A conversational language processor determining intent using various inputs (non-speech, prior requests, knowledge).
Identifying the content for dedication based on intent.
Receiving a dedication message (utterance) from the user.
Identifying the recipient.
Sending a request to a content dedication system for transaction processing and delivery.

Combination for Obviousness:
A PHOSITA would find Independent Claim 1 obvious by combining the teachings of US 12/127,343, US 12/389,678, US 7,640,160, US 11/212,693, US 11/580,926, US Provisional Patent Application Ser. No. 61/259,827, and US 7,398,209.

Reasoning:

Receiving multi-modal interaction and ASR (elements 1 & 2): US 12/389,678 and US 12/127,343 extensively describe systems for receiving multi-modal device interactions, including natural language utterances and non-voice inputs, and processing them in a voice services environment. US 7,640,160 specifically teaches Automatic Speech Recognizers (ASR) to generate interpretations of utterances.
Conversational language processor determining intent (element 3): US 12/389,678 and US 11/212,693 detail conversational language processors that analyze utterances and accompanying non-voice interactions to determine intent, leveraging shared knowledge, prior interactions (cognitive models), and context. US 11/580,926 further teaches engaging users in cooperative conversations to resolve intent or ambiguity. The patent itself states the conversational language processor "may determine a most likely context for the interaction from the preliminary interpretation of the utterance, any accompanying non-speech inputs in the multi-modal interaction that relate to the utterance, contexts associated with prior requests, short-term and long-term shared knowledge, or any other suitable information for interpreting the multi-modal interaction" (Definitions section).
Identifying content for dedication (element 4): The conversational language processor in the cited prior art is shown to have capabilities for searching data repositories and cooperating with other devices in a hybrid environment to retrieve information or identify content (e.g., US 12/127,343 for "content, services, applications, intent determination capabilities" and US Provisional Patent Application Ser. No. 61/259,827 for "cooperative hybrid processing"). The patent states, "the conversational language processor may search one or more data repositories that contain content information to identify content matching criteria contained in the content dedication request" (Definitions section). Extending a known content search capability to identify content for dedication is a straightforward application.
Receiving dedication message and identifying recipient (elements 5 & 6): These are specific inputs for the dedication service. Given a conversational voice user interface (US 11/580,926), a PHOSITA would find it obvious to prompt a user for and receive a natural language utterance (the dedication message) and recipient information (e.g., from an address book context, as described in US9502025's definitions). The patent states, "the content dedication application may further prompt the user to identify a recipient of the dedication, wherein the user may provide any suitable multi-modal input that includes information identifying the recipient of the content dedication" (Definitions section).
Sending request to content dedication system for transaction processing and delivery (element 7): US 7,398,209 teaches natural language services and subscriptions, which implies transaction processing through a natural language interface. Combining the established NLP environment with known e-commerce capabilities (e.g., billing systems as mentioned in US9502025's definitions for the content dedication system) for digital content is a common commercial practice. The "content dedication system" itself, as described in US9502025, is primarily a logical construct for coordinating these known functionalities (NLP, content identification, billing, delivery).

Motivation for Combination: A PHOSITA would be motivated to extend the existing robust natural language processing and multi-modal interaction capabilities (as taught by US 12/127,343, US 12/389,678, etc.) to offer new, commercially desirable services, such as facilitating the dedication of digital content. The motivation for integrating transaction processing (as taught by US 7,398,209) is obvious when dealing with purchasable content. The patent's own problem statement clearly articulates the need for voice user interfaces to move beyond "finite sets of applications" and utilize "information distributed across different domains, devices, and applications," directly motivating such an extension. The hybrid processing environment (US Provisional Patent Application Ser. No. 61/259,827) provides the architectural framework for such an extension.

Independent Claim 12: System

This claim describes a system including:

A voice-enabled client device to receive multi-modal NL interactions.
A content dedication system with a processor and memory, configured to:
- Recognize a request to dedicate content.
- Identify content from NL utterance.
- Process a transaction.
- Use NL processing to customize content with dedication.
- Deliver customized content.

Combination for Obviousness:
A PHOSITA would find Independent Claim 12 obvious by combining the system architectures of US 12/127,343 (integrated, multi-modal, multi-device environment), US Provisional Patent Application Ser. No. 61/259,827 (hybrid processing with client devices, virtual router, and voice-enabled server), and the functional elements described in US 12/389,678, US 7,640,160, US 11/212,693, and US 7,398,209.

Reasoning:

Voice-enabled client device (element 1): US 12/127,343 and US 12/389,678 describe such devices capable of receiving multi-modal natural language interactions. FIG. 1 of US9502025 itself illustrates an "exemplary voice-enabled device 100" as a known component in the broader system.
Content dedication system (element 2): This system is depicted in FIG. 6 of US9502025 as comprising a virtual router 660, a voice-enabled server 640, and a billing system 638.
- Virtual Router and Voice-Enabled Server: The virtual router 260 and voice-enabled server 240, along with their functions in coordinating hybrid processing and natural language processing, are explicitly detailed in US Provisional Patent Application Ser. No. 61/259,827 and illustrated in FIG. 2 of US9502025 as part of the known "exemplary system for hybrid processing in a natural language voice service environment."
- Billing System: Integrating a billing system for transactions is well-known in e-commerce. US 7,398,209 teaches methods for "natural language services and subscriptions" that involve purchase options, which inherently implies a billing system. The patent itself states the content dedication system "may further include a billing system for processing transactions relating to the content dedication service" (Definitions section).
Instructions for content dedication (sub-elements):
- Recognize dedication request & Identify content: These capabilities flow directly from the ASR (US 7,640,160) and conversational language processor's intent determination (US 12/389,678, US 11/212,693) operating within the hybrid processing environment (US Provisional Patent Application Ser. No. 61/259,827).
- Process transaction: This is the function of the billing system, integrated with the natural language service as motivated above by US 7,398,209.
- Use NL processing to customize content with dedication: "Customizing" the content with a dedication utterance (e.g., inserting encoded audio, verbally annotating, or transcribing to text metadata) combines standard digital content manipulation (e.g., audio editing, metadata tagging) with the NL capabilities of recording an utterance and transcribing it (ASR from US 7,640,160). The patent describes, "the content dedication system may then insert the encoded audio corresponding to the dedication utterance within the dedicated content, verbally annotate the dedicated content with the encoded audio, and/or transcribe the dedication utterance into a textual annotation for the dedicated content" (Definitions section). These are known technical operations applied to the specific "dedication utterance" input.
- Deliver customized content: Content delivery is a well-established technical field. Systems for delivering digital content are ubiquitous.

Motivation for Combination: A PHOSITA, aiming to build a more comprehensive and commercially viable natural language voice service (as motivated by the problem statement in US9502025), would find it obvious to integrate existing NLP components (ASR, CLP from '678, '660, '693 patents) and the hybrid processing architecture (from '827 provisional) with a billing system (implied by '209 patent and general e-commerce knowledge) to offer a content dedication service. The customization of content by associating a recorded/transcribed dedication is a direct and obvious way to implement the "dedication" aspect of such a service using known techniques.

Independent Claim 19: Non-transitory computer readable medium

This claim is a method claim stored on a computer readable medium, encompassing the steps of:

Detecting a multi-modal device interaction with a content dedication request.
Identifying content from natural language utterances.
Processing transactions for the content.
Using natural language processing to customize content for recipients.
Delivering customized content to recipients.

Combination for Obviousness:
The obviousness analysis for Independent Claim 19 is substantially identical to that of Independent Claim 1, as it claims the same method steps, merely expressed as instructions on a computer-readable medium. The prior art references (specifically US 12/127,343, US 12/389,678, US 7,640,160, US 11/212,693, US 11/580,926, US Provisional Patent Application Ser. No. 61/259,827, and US 7,398,209) teach all the underlying processing steps.

Reasoning:
The motivation remains the same: to extend sophisticated natural language processing and hybrid computing environments to a commercially appealing application like content dedication, incorporating transaction capabilities. Storing these instructions on a non-transitory computer-readable medium is a conventional practice for implementing computer-implemented methods and would be obvious to a PHOSITA.

Conclusion on Obviousness:

Given the extensive prior art cited within US9502025 itself, particularly patents and applications from the same assignee dealing with integrated, multi-modal, multi-device natural language voice services, hybrid processing, and even natural language services with subscriptions, the claimed invention appears to be an application of these known technologies to a new, but logically related, domain of "content dedication." A person having ordinary skill in the art, motivated by the stated problems of limited application scope in existing voice interfaces and the desire to expand commercial offerings, would have been motivated to combine these existing components and techniques to provide a natural language content dedication service, including the identification of content and recipients, transaction processing, and the customization of content with a dedication utterance.