Prior art — US Patent 12131745

Most Relevant Prior Art for US Patent 12131745

Here is an analysis of the most relevant prior art cited in US patent 12131745, along with a brief description and potential anticipation under 35 U.S.C. § 102. The focus of the prior art analysis is on patents that address accent conversion, speech modification, or phonetic processing using machine learning, particularly neural networks, and methods for aligning speech components.

1. US20220358903A1 - Real-Time Accent Conversion Model

Full Citation: US20220358903A1 (Sanas.ai Inc., published November 10, 2022)
Publication/Filing Date: This application was filed on May 6, 2021.
Brief Description: This patent application describes techniques for real-time accent conversion using machine-learning algorithms. It involves receiving speech content in a first accent, deriving a linguistic representation, and then synthesizing audio data representing the speech content in a second accent. The conversion is performed with low latency (e.g., 50-700 milliseconds).
Potential Anticipation (35 U.S.C. § 102): This reference appears to broadly anticipate the concept of real-time accent conversion using machine learning models, as described in independent claims 1, 8, and 15 of US12131745. Specifically, it mentions receiving input speech in a first accent and outputting a synthesized version in a second accent, using machine learning. While US12131745 specifically claims generating phonetic embedding vectors and maximizing cosine distance for differentiable alignment, US20220358903A1 broadly covers the real-time accent conversion model and process. The explicit mention of "an updated set of phonemes associated with the second accent" also suggests phonetic transformation. Therefore, claims 1 (system), 8 (method), and 15 (computer-readable medium) might be anticipated to the extent that they claim a general real-time accent conversion system using machine learning for phonetic transformation. However, the specific mechanism of "jointly maximizing a cosine distance between the first phonetic embedding vectors and the second phonetic embedding vectors" for differentiable alignment is a key distinction that would need further analysis for full anticipation.

2. US10176819B2 - Phonetic posteriorgrams for many-to-one voice conversion

Full Citation: US10176819B2 (The Chinese University Of Hong Kong, published January 8, 2019)
Publication/Filing Date: The priority date for this patent is July 11, 2016.
Brief Description: This patent describes a method for converting speech using phonetic posteriorgrams (PPGs). It involves obtaining target speech, generating a PPG based on acoustic features (potentially using a speaker-independent automatic speech recognition system), and generating a mapping between the PPG and segments of the target speech. The method emphasizes non-parallel training data and aims for voice conversion.
Potential Anticipation (35 U.S.C. § 102): This patent anticipates aspects of generating numerical representations of speech content (phonetic content, phonetic embedding vectors) and using them for conversion. The PPGs described "capture the posterior probabilities of each phonetic class for each specific time frame of one utterance," which is akin to phonetic embedding vectors capturing phonetic characteristics. It also describes a "many-to-one voice conversion" which implies a source and target. While it doesn't explicitly mention "maximizing cosine distance for differentiable alignment," it does deal with aligning phonetic information. Therefore, elements of claims 1, 8, and 15 that involve generating phonetic representations (first phonetic embedding vectors) from input audio and then performing a conversion to a target accent could be considered anticipated.

3. US10186251B1 - Voice conversion using deep neural network with intermediate voice training

Full Citation: US10186251B1 (Oben, Inc., published January 22, 2019)
Publication/Filing Date: The priority date for this patent is August 6, 2015.
Brief Description: This patent focuses on voice conversion using deep neural networks with intermediate voice training. While the detailed description isn't fully available in the provided snippets, the title suggests the use of deep neural networks for voice conversion, which broadly aligns with the machine learning model used in US12131745.
Potential Anticipation (35 U.S.C. § 102): Given the limited information, it's difficult to pinpoint exact anticipation. However, the use of "deep neural networks" for "voice conversion" could potentially anticipate the "trained accent conversion neural network" mentioned in claims 1, 8, and 15, especially regarding the general application of neural networks for transforming speech characteristics. Further details on the specific training and conversion mechanisms would be needed for a more precise assessment.

4. US20220122579A1 - End-to-end speech conversion

Full Citation: US20220122579A1 (Google Llc, published April 21, 2022)
Publication/Filing Date: The priority date for this patent is February 21, 2019.
Brief Description: This patent application describes end-to-end speech conversion. While details are not extensively provided in the snippets, "end-to-end speech conversion" implies a system that takes speech as input and produces converted speech as output, likely encompassing accent conversion or similar speech attribute modification.
Potential Anticipation (35 U.S.C. § 102): The broad concept of "end-to-end speech conversion" could anticipate claims 1, 8, and 15 of US12131745 in terms of the overall goal of converting speech characteristics from a source to a target. However, the specific methodology of "differentiable alignment by jointly maximizing a cosine distance between the first phonetic embedding vectors and the second phonetic embedding vectors" is not explicitly mentioned in the available snippets for this reference and would be a distinguishing feature.

5. US20040148161A1 - Normalization of speech accent

Full Citation: US20040148161A1 (Das Sharmistha S., published July 29, 2004)
Publication/Filing Date: The filing date is January 28, 2003.
Brief Description: This patent describes a system and method for normalizing speech accent to produce substantially unaccented or less-heavily accented speech. It modifies characteristics of input signals representing accented speech to form output signals representing the same speech with less or no accent. The normalization can involve adjusting parameters like voice onset time, vowel duration, and word stop-release time.
Potential Anticipation (35 U.S.C. § 102): This patent directly addresses accent normalization, which is a form of accent conversion. It anticipates the general idea of converting a source accent to a target accent (implicitly a "normalized" or "unaccented" target). While the described techniques for modifying speech characteristics (e.g., adjusting voice onset time) differ from the phonetic embedding vector approach of US12131745, the fundamental purpose of accent conversion is anticipated. Therefore, the overarching goal of claims 1, 8, and 15 regarding "accent conversion" is anticipated. The specific "differentiable alignment by jointly maximizing a cosine distance" using phonetic embedding vectors would be a distinguishing feature.

6. US20230352001A1 - Voice attribute conversion using speech to speech

Full Citation: US20230352001A1 (Meaning.Team, Inc., published November 2, 2023)
Publication/Filing Date: The filing date is April 28, 2022.
Brief Description: This patent describes a computer-implemented method for near real-time adaptation of voice attributes, including accent, using a speech-to-speech (S2S) machine learning model. It involves feeding source audio content with a source voice attribute into a trained S2S ML model to obtain target audio content with a target voice attribute, where both have the same lexical content and are time-synchronized.
Potential Anticipation (35 U.S.C. § 102): This reference is highly relevant as it explicitly discusses "accent and/or voice identity of source audio content... adapted to a target accent and/or target voice identity in target audio" using an "S2S ML model" for "near real-time adaptation." The mention of "time-synchronized" content suggests an alignment process. This directly anticipates the core functionality of accent conversion in claims 1, 8, and 15. The use of a machine learning model to transform voice attributes from a source to a target, with the output preserving linguistic content, strongly aligns with the independent claims. The specific differentiator for US121331745 would remain the explicit "differentiable alignment by jointly maximizing a cosine distance between the first phonetic embedding vectors and the second phonetic embedding vectors."

7. US20230335107A1 - Reference-Free Foreign Accent Conversion System and Method

Full Citation: US20230335107A1 (The Texas A&M University, published October 19, 2023)
Publication/Filing Date: The priority date for this patent is August 24, 2020.
Brief Description: This patent describes a reference-free foreign accent conversion system and method. While the snippets don't provide extensive detail, the "reference-free" aspect suggests a system that does not require parallel speech data for training, which could be a different approach compared to systems relying on paired samples.
Potential Anticipation (35 U.S.C. § 102): The overall concept of a "foreign accent conversion system and method" directly anticipates the general purpose of US12131745. Depending on the details of its methodology, particularly how it handles the alignment and conversion process without explicit reference data, it could potentially anticipate aspects of claims 1, 8, and 15. However, the specific "differentiable alignment by jointly maximizing a cosine distance between the first phonetic embedding vectors and the second phonetic embedding vectors" using explicitly defined phonetic embedding vectors might be a distinguishing feature.

8. US20230223006A1 - Voice conversion method and related device

Full Citation: US20230223006A1 (Huawei Technologies Co., Ltd., published July 13, 2023)
Publication/Filing Date: The priority date for this patent is September 21, 2020.
Brief Description: This patent describes a voice conversion method and related device. Without further details from the snippets, it can be inferred that it relates to transforming voice characteristics from one form to another.
Potential Anticipation (35 U.S.C. § 102): As with other voice conversion patents, the general concept of "voice conversion" could be considered broadly anticipatory to the extent that it encompasses accent conversion. The specifics of how this patent achieves conversion and alignment would determine its direct relevance to the unique claims of US12131745 regarding phonetic embedding vectors and cosine distance maximization for differentiable alignment.

9. US20210193160A1 - Method and apparatus for voice conversion and storage medium

Full Citation: US20210193160A1 (Ubtech Robotics Corp Ltd., published June 24, 2021)
Publication/Filing Date: The priority date for this patent is December 24, 2019.
Brief Description: This patent describes a method and apparatus for voice conversion and a storage medium. Like several other general voice conversion patents, the available information is limited to the title.
Potential Anticipation (35 U.S.C. § 102): The general concept of "voice conversion" by an "apparatus" or through a "method" could broadly anticipate the system, method, and computer-readable medium claims (1, 8, 15) of US12131745 concerning the fundamental act of converting speech characteristics. The unique elements of phonetic embedding vectors, cosine distance, and differentiable alignment in US12131745 would need to be considered for detailed analysis.