Obviousness — US Patent 12417756

Obviousness Analysis of U.S. Patent 12,417,756 under 35 U.S.C. § 103

This analysis examines whether the claimed invention in U.S. Patent 12,417,756 would have been obvious to a Person Having Ordinary Skill in the Art (PHOSITA) at the time of the invention, based on the prior art cited in the patent's prosecution history. An invention is considered obvious under 35 U.S.C. § 103 if the differences between the claimed invention and the prior art are such that the subject matter as a whole would have been obvious to a PHOSITA.

A PHOSITA in this field would typically possess a graduate degree in computer science or electrical engineering and have several years of experience in speech processing, machine learning, and digital signal processing. This individual would be familiar with technologies such as speech synthesis, voice conversion, speaker identification, and the application of machine learning models to audio data.

The independent claims (1, 7, and 14) of the '756 patent cover a process that can be broken down into the following key steps:

Analyze a first user's speech using machine learning to extract their specific accent features.
Analyze a second user's speech to generate characteristics of their distinct "natural voice."
Synthesize a modified version of the second user's speech that combines the first user's accent with the second user's preserved natural voice.
Provide this modified speech as audio output in real-time.

Several combinations of the cited prior art references could be argued to render these claims obvious.

Combination 1: US 2024/0161764 A1 (Dell) in view of US 2020/0193971 A1 (i2x GmbH)

This combination presents a strong argument for obviousness.

Primary Reference: US 2024/0161764 A1 ("Accent personalization for speakers and listeners")
The Dell '764 application serves as an excellent base reference. It explicitly teaches the core concept of the '756 patent: modifying a speaker's accent to match a listener's accent in real-time to improve comprehension. In the context of the '756 patent's claims, the Dell "listener" is the "first user," and the Dell "speaker" is the "second user." The '764 application therefore discloses capturing a second user's speech and modifying its accent to mimic that of a first user. A PHOSITA would understand that for this system to be commercially viable, the output must sound natural and not robotic, which implies the preservation of the speaker's core vocal identity.
Secondary Reference: US 2020/0193971 A1 ("System and methods for accent and dialect modification")
The i2x '971 application explicitly addresses a known challenge in voice modification: preserving the speaker's unique voice. It teaches a system for transforming speech to a target accent while specifically aiming to "maintain the speaker's voice identity."
Motivation to Combine and Obviousness:
A PHOSITA, starting with the system taught by Dell ('764), would be motivated to improve the quality and naturalness of the accent-modified audio output. A common and predictable problem in speech synthesis and conversion is the loss of the original speaker's vocal characteristics, making the output sound artificial. The PHOSITA would have been motivated to look for known techniques to solve this problem. The i2x ('971) application provides an explicit solution by teaching how to maintain the speaker's voice identity during accent modification.

Combining these teachings would have been a matter of applying a known technique (preserving speaker identity from i2x) to an existing system (listener-based accent conversion from Dell) to achieve a predictable and desired result (a more natural-sounding accent-modified output). Therefore, it would have been obvious to combine Dell '764 and i2x '971 to arrive at the invention claimed in the '756 patent.

Combination 2: US 2020/0193971 A1 (i2x GmbH) in view of US 2024/0161764 A1 (Dell)

This presents an alternative but equally strong argument.

Primary Reference: US 2020/0193971 A1 (i2x GmbH)
The i2x '971 application teaches a system for modifying a speaker's accent to a "target accent" while preserving the speaker's voice identity. This discloses the majority of the technical steps claimed in the '756 patent: analyzing a second user's speech, preserving its natural voice characteristics, and modifying the accent.
Secondary Reference: US 2024/0161764 A1 (Dell)
The missing element in the i2x reference is the specific nature of the "target accent." The i2x system is described in the context of a call center, where the target might be a "standard" or more neutral accent. The Dell '764 application teaches a specific and advantageous application: making the target accent the accent of the listener in a conversation to maximize clarity for that specific listener.
Motivation to Combine and Obviousness:
A PHOSITA tasked with improving the i2x ('971) system for real-time communication (e.g., video conferencing, as taught by US 11,134,217 B1) would seek ways to make the accent modification more effective. Instead of converting all speakers to a single, pre-defined target accent, it would have been an obvious and logical improvement to dynamically adapt the target accent to that of the other participant(s) in the conversation. The Dell ('764) reference provides the exact blueprint for this improvement: analyze the listener's accent and use it as the target for conversion.

The motivation is to enhance the primary goal of the system—improving communication clarity. By personalizing the accent conversion for the listener, the system becomes more effective. This combination represents the use of a known technique (listener-based targeting from Dell) to improve a known system (accent modification with voice preservation from i2x).

Conclusion

The independent claims of U.S. Patent 12,417,756 appear to be obvious over combinations of the prior art cited during its prosecution. The prior art, particularly the Dell '764 and i2x '971 publications, already established the key inventive concepts: real-time accent conversion to match a listener's accent and the necessity of preserving the speaker's natural voice identity during this process. The combination of these teachings would have been a predictable step for a Person Having Ordinary Skill in the Art seeking to improve the quality and effectiveness of real-time communication systems. The claimed invention is a synthesis of known elements from the prior art, solving a known problem to achieve a predictable result. This high degree of overlap and clear motivation to combine prior art elements forms a strong basis for the pending Post-Grant Review (PGR2026-00033).