Patent 11948550
Prior art
Earlier patents, publications, and products that may anticipate or render the claims unpatentable.
Active provider: Google · gemini-2.5-flash
Prior art
Earlier patents, publications, and products that may anticipate or render the claims unpatentable.
The search results provide summaries and sometimes claims of the cited patents, which will be helpful for the "brief description" and "potential anticipation" sections.
Let's break down the analysis for each identified relevant prior art.
1. US10163451B2 - Accent translation
- Full Citation: US10163451B2, titled "Accent translation," assigned to Amazon Technologies, Inc.
- Publication/Filing Date: Publication date: 2018-12-25. Priority date: 2016-12-21. This is prior art to US11948550 (priority date 2021-05-06).
- Brief Description: This patent describes an accent translation model that adjusts audio characteristics of input audio from a first accent to resemble those of a second accent. It involves performing voice recognition analysis to identify letters, phonemes, words, and other units of speech and then adjusting audio characteristics for these identified portions. The accent translation is performed on speech captured by audio components, such as a microphone.
- Potential Anticipation (35 U.S.C. § 102):
- Shared elements with US11948550: Both patents deal with receiving speech content from a microphone, identifying a first accent, translating it to a second accent, and outputting the converted speech. Both also mention processing at the phoneme level.
- Distinguishing features of US11948550: US11948550 specifically claims deriving a non-text linguistic representation and synthesizing audio data by mapping non-text linguistic representations of different phonemes for pronunciation change. While US10163451B2 mentions identifying phonemes and adjusting audio characteristics, it doesn't explicitly detail the "non-text linguistic representation" as the core intermediate for conversion or the direct mapping of different phonemes for different pronunciations as defined in US11948550 claims. The description of US10163451B2 implies adjusting audio characteristics based on identified phonemes, which could be interpreted as modifying existing phonemes' acoustic features rather than mapping to different phonemes to reflect pronunciation shifts (e.g., 'th' to 'd' in Indian English vs. SAE).
- Claims potentially anticipated: Claims 1, 11, and 19 of US11948550 describe the overarching system, non-transitory computer-readable medium, and method, respectively. US10163451B2 might anticipate elements of these claims related to general accent translation, receiving speech, and outputting converted speech. However, the specific non-text linguistic representation and mapping of different phonemes for different pronunciations steps, as explicitly defined in US11948550 claims, would likely differentiate it. For example, Claim 1: "...derive a non-text linguistic representation of the set of phonemes associated with a first pronunciation... synthesize... mapping at least a first non-text linguistic representation of a first phoneme... to a second non-text linguistic representation of a second phoneme of an updated set of phonemes associated with a second pronunciation... wherein the first and second phonemes are different phonemes." This specific phoneme mapping for pronunciation difference might not be explicitly present in US10163451B2's abstract.
2. CN111462769A - End-to-end accent conversion method
- Full Citation: CN111462769A, titled "End-to-end accent conversion method," assigned to 深圳市声希科技有限公司 (Shenzhen Voice-X Technology Co., Ltd.).
- Publication/Filing Date: Publication date: 2020-07-28. Priority date: 2020-03-30. This is prior art to US11948550 (priority date 2021-05-06).
- Brief Description: This patent describes an end-to-end accent conversion method. While specific details from the abstract aren't fully available in English, "end-to-end" often implies avoiding intermediate representations like text, which aligns with US11948550's non-text approach. Another document discussing "Non-parallel Accent Transfer based on Fine-grained Controllable Accent Modeling" (a non-patent citation, but highly relevant to the concept) mentions an "end-to-end accent conversion approach" that converts non-native accented into native-accented speech without native reference audio during conversion, using independently trained neural networks including a speaker encoder, a multi-speaker TTS model, an accented ASR model, and a neural vocoder. It aims to model prosodic characteristics, like speaking rate and duration, for more native-sounding output.
- Potential Anticipation (35 U.S.C. § 102):
- Shared elements with US11948550: The "end-to-end accent conversion" directly targets the core concept of US11948550. If this method indeed uses non-text linguistic representations for accent conversion and handles pronunciation changes without a STT-TTS bottleneck, it could be highly anticipatory. The concept of "end-to-end" aligns with US11948550's goal of low latency and preserving nuances, by avoiding STT-TTS. The non-patent citation elaborates that it "is the first model that is able to convert non-native accented into native-accented speech without any guidance from native reference audio during conversion phase" and uses "four independently trained neural networks: a speaker encoder, a multi-speaker TTS model, an accented ASR model and a neural vocoder." The use of a TTS model within the "end-to-end" framework could be a point of distinction, as US11948550 explicitly states its VC engine does "not need to predict and generate output speech as a midpoint for the conversion," functioning "more quickly than alternatives such as a STT-TTS approach" (description, col. 9, lines 40-44).
- Distinguishing features of US11948550: The core distinction for US11948550 lies in its explicit "non-text linguistic representation" and the direct mapping of "different phonemes" (Claim 1) for pronunciation. If CN111462769A or related "end-to-end" approaches still implicitly or explicitly rely on text-based intermediates or don't perform the direct phoneme-to-phoneme mapping for pronunciation change as specified, there could be a distinction. The non-patent citation indicates use of "a multi-speaker TTS model," which might imply an intermediate text-like step, distinguishing it from the specific non-text claims of US11948550.
- Claims potentially anticipated: This patent could potentially anticipate claims 1, 11, and 19, particularly their broader scope on "real-time accent conversion" and the use of machine learning algorithms for deriving linguistic representation and synthesizing audio data. The "end-to-end" aspect could anticipate the low latency and continuous conversion aspects of claims 9, 18, and 20. However, the specific non-text linguistic representation and the mapping of different phonemes as the mechanism for pronunciation change would be key to distinguishing.
3. CN112382267A - Method, apparatus, device and storage medium for converting accents
- Full Citation: CN112382267A, titled "Method, apparatus, device and storage medium for converting accents," assigned to 北京有竹居网络技术有限公司 (Beijing Youzhuju Network Technology Co., Ltd.).
- Publication/Filing Date: Publication date: 2021-02-19. Priority date: 2020-11-13. This is prior art to US11948550 (priority date 2021-05-06).
- Brief Description: Similar to CN111462769A, the title directly indicates "converting accents," making it highly relevant. Without a detailed English abstract, it's hard to ascertain the precise technical approach. However, given its publication date and explicit focus on accent conversion, it is likely to be considered strong prior art.
- Potential Anticipation (35 U.S.C. § 102):
- Shared elements with US11948550: The broad scope of converting accents using a method, apparatus, device, and storage medium is directly analogous to the claims of US11948550.
- Distinguishing features of US11948550: Similar to the discussion for CN111462769A, the precise nature of the "linguistic representation" (text-based or non-text-based) and the specific mechanism for handling pronunciation differences (e.g., direct phoneme mapping) would be critical for differentiation. If it relies on a traditional STT-TTS pipeline or only modifies acoustic features without changing phonemes for pronunciation, US11948550 could distinguish itself.
- Claims potentially anticipated: This patent could potentially anticipate claims 1, 11, and 19 regarding the general concept of accent conversion. Further analysis of its full text would be required to determine anticipation of the specific technical details like "non-text linguistic representation" and explicit "phoneme mapping for different pronunciations."
4. US10614826B2 - System and method for voice-to-voice conversion
- Full Citation: US10614826B2, titled "System and method for voice-to-voice conversion," assigned to Modulate, Inc.
- Publication/Filing Date: Publication date: 2020-04-07. Priority date: 2017-05-24. This is prior art to US11948550 (priority date 2021-05-06).
- Brief Description: This patent describes a system and method for voice-to-voice conversion. Voice conversion often focuses on changing speaker identity (timbre, pitch, etc.) rather than accent (which includes pronunciation changes). The background of US11948550 explicitly distinguishes itself from "voice conversion methods that attempt to adjust the audio characteristics (e.g., pitch, intonation, melody, stress) of a first speaker's voice to more closely resemble the audio characteristics of a second speaker's voice," stating "this type of approach does not account for the different pronunciations of certain sounds that are inherent to a given accent."
- Potential Anticipation (35 U.S.C. § 102):
- Shared elements with US11948550: Both patents generally involve converting aspects of a received voice.
- Distinguishing features of US11948550: The primary distinction for US11948550 is its explicit focus on accent conversion, which includes changing pronunciations through mapping of different phonemes in a non-text linguistic representation. If US10614826B2 primarily focuses on voice identity (timbre, pitch) while preserving the original accent's pronunciation, it would not anticipate US11948550's core innovation regarding accent-specific pronunciation changes. The abstract of US9613620B2 (which is a similar type of voice conversion patent, though not directly cited in US11948550's family citations) states it "may determine a given representation configured to associate the first voice characteristics with the second voice characteristics. The device may provide an output indicative of pronunciations of the one or more speech sounds of the first voice according to the second voice characteristics based on the given representation." This sounds like it might touch on pronunciation, but the emphasis is usually on making the first voice sound like the second speaker, not necessarily changing the underlying accent phonemes for pronunciation differences. Given US11948550's explicit distinguishing of "voice conversion" from "accent conversion," it's likely this patent would not fully anticipate the pronunciation-focused claims of US11948550.
- Claims potentially anticipated: It might anticipate very broad aspects of receiving speech and generating modified speech. However, it's unlikely to anticipate the claims specifying "first accent," "second accent," "non-text linguistic representation," and the specific "mapping of different phonemes for different pronunciations."
I should also consider US20140365216A1 ([Apple Inc.](/litigations/by-plaintiff/Apple%20Inc.)) "System and method for user-specified pronunciation of words for speech synthesis and recognition" and US20150170642A1 (Google Inc.) "Identifying substitute pronunciations." These deal with pronunciation which is a key part of accent conversion.
5. US20140365216A1 - System and method for user-specified pronunciation of words for speech synthesis and recognition
- Full Citation: US20140365216A1, titled "System and method for user-specified pronunciation of words for speech synthesis and recognition," assigned to Apple Inc.
- Publication/Filing Date: Publication date: 2014-12-11. Priority date: 2013-06-07. This is prior art to US11948550 (priority date 2021-05-06).
- Brief Description: This patent focuses on allowing users to specify pronunciations for words for speech synthesis and recognition. This is more about customizing how a system handles specific words or phrases, not necessarily real-time, end-to-end accent conversion for continuous speech, which is the focus of US11948550.
- Potential Anticipation (35 U.S.C. § 102): While it touches on "pronunciation," it seems to be in the context of user-defined rules or dictionaries for speech synthesis/recognition rather than the ML-driven, accent-specific, non-text linguistic mapping for continuous accent conversion. It might anticipate the idea of handling different pronunciations, but not the specific mechanism or the real-time, low-latency, end-to-end accent conversion as claimed by US11948550. It likely falls into the category of "STT-TTS" or rule-based systems that US11948550 distinguishes itself from.
6. US20150170642A1 - Identifying substitute pronunciations
- Full Citation: US20150170642A1, titled "Identifying substitute pronunciations," assigned to Google Inc.
- Publication/Filing Date: Publication date: 2015-06-18. Priority date: 2013-12-17. This is prior art to US11948550 (priority date 2021-05-06).
- Brief Description: This patent describes identifying substitute pronunciations, likely for improving speech recognition or synthesis. Similar to US20140365216A1, this appears to be more about handling variations in pronunciation within a speech system, rather than the real-time conversion of a continuous speech stream from one accent to another using a non-text linguistic representation.
- Potential Anticipation (35 U.S.C. § 102): Again, while "pronunciation" is a common theme, the specific method of US11948550 (non-text linguistic representation, ML models, direct phoneme mapping for accent change, real-time) is likely distinguishable.
Now, let me confirm the exact claims I should cite for potential anticipation for US11948550.
Claim 1 is a system claim, Claim 11 is a non-transitory computer-readable medium claim, and Claim 19 is a method claim. These are the independent claims. Claims 2-10 are dependent on Claim 1, Claims 12-18 on Claim 11, and Claims 20-22 on Claim 19. If an independent claim is anticipated, its dependent claims are also potentially anticipated if they do not add a novel feature that distinguishes from the prior art. Since the prompt asks "which claim(s) it potentially anticipates," I should list the independent claims (1, 11, 19) and any dependent claims if they add specific elements that are also present in the prior art, but for the scope of this task, focusing on the independent claims and their core elements will be sufficient. The request specifically asks "which claim(s) it potentially anticipates," implying I should list the claims of US11948550 that are anticipated by the specific prior art.
The core novelty of US11948550, as stated in its "Overview" and "Detailed Description," lies in:
- Direct accent conversion (changing pronunciation), not just voice characteristics.
- Using non-text linguistic representation to avoid STT-TTS, preserving nuances.
- Achieving real-time, low latency.
- Specifically, mapping a first non-text linguistic representation of a first phoneme to a second non-text linguistic representation of a different second phoneme to reflect pronunciation change. (Claim 1)
Considering these points, the prior art most likely to anticipate would be those that also implement accent conversion by directly changing pronunciation via a non-text intermediate in real-time.
Based on the summaries:
- US10163451B2 ("Accent translation") seems to involve adjusting audio characteristics based on phonemes, which could encompass pronunciation changes. It doesn't explicitly state "non-text linguistic representation" or "mapping different phonemes," but the general idea of accent translation on a phonetic level is close.
- CN111462769A ("End-to-end accent conversion method") and CN112382267A ("Method... for converting accents") are very strong candidates due to their titles explicitly mentioning "accent conversion" and "end-to-end" (for CN111462769A). The description of the related non-patent literature on "end-to-end accent conversion" (citing Wang et al. (2021) and others) uses PPG (phonetic posteriorgrams) which are non-text linguistic representations, and explicitly mentions converting "non-native-accented into native-accented speech" and modeling "prosodic characteristics." This is very close to what US11948550 claims. The key difference I noted was the mention of a "multi-speaker TTS model" in the non-patent reference, which could imply a text-like intermediate, a point US11948550 specifically avoids.
I will focus on the three strongest candidates: US10163451B2, CN111462769A, and CN112382267A. I will briefly mention US10614826B2 and the Apple/Google patents as less direct, but still relevant to the general field.
Final check on the provided patent document to ensure no internal contradictions. The provided document for US11948550 is the granted patent. I will interpret the citations directly from the "Patent Citations" section.
Let's refine the "potential anticipation" for each of the top three based on the claims of US11948550.
The key elements of Claim 1 (and implicitly 11 and 19) of US11948550 are:
- Training a first ML algorithm with first accent audio data.
- Applying first ML algorithm to received speech to derive a non-text linguistic representation of phonemes associated with a first pronunciation.
- Synthesizing, using a second ML algorithm trained with first and second accent audio data, fourth audio data by mapping a first non-text linguistic representation of a first phoneme to a second non-text linguistic representation of a second phoneme of an updated set of phonemes associated with a second pronunciation that is different from the first pronunciation, wherein the first and second phonemes are different phonemes.
- Converting synthesized audio data into a synthesized version of received speech content having the second accent, comprising the updated set of phonemes associated with the second pronunciation.
- Real-time, low latency (claims 9, 18, 20).
This "mapping different phonemes" to change pronunciation, within a "non-text linguistic representation" and in "real-time," is the specific inventive step.
I need to make sure I state that it potentially anticipates, as a full anticipation analysis would require reviewing the full text of each cited patent and a detailed claim comparison.
Clarification on USPTO database search: The prompt asks to "Search the USPTO database for the specific patent number 11948550. Do not return results for similar numbers." My initial instruction was to use the provided patent text as authoritative, which is already the result of such a search. So I will simply confirm that I'm analyzing the provided patent.Here is an analysis of the most relevant prior art for US Patent 11948550, "Real-time accent conversion model," based on the patent's own citations and a review of publicly available abstracts and summaries. The analysis focuses on publications prior to US11948550's priority date of May 6, 2021.
US Patent 11948550 describes a system and method for real-time accent conversion. Its core inventive features include:
- Receiving speech content with a first accent.
- Deriving a non-text linguistic representation of the speech content using a first machine-learning (ML) algorithm (e.g., an ASR engine) trained with audio data of the first accent.
- Synthesizing audio data representative of the speech content having a second accent, using a second ML algorithm (e.g., a VC engine). This synthesis involves mapping a first non-text linguistic representation of a first phoneme to a second non-text linguistic representation of a second, different phoneme to reflect a change in pronunciation.
- Converting this synthesized audio data into a listenable version of the speech, all performed with very low latency (e.g., 50-700 ms) for real-time communication.
- A key distinction is avoiding a speech-to-text (STT) to text-to-speech (TTS) pipeline, which can lose nuances like prosody and emotion and introduce significant latency.
Based on these differentiating aspects, the most relevant prior art citations are analyzed below:
Most Relevant Prior Art for US11948550
1. US10163451B2
- Full Citation: US10163451B2, "Accent translation," assigned to Amazon Technologies, Inc.
- Publication/Filing Date: Published on December 25, 2018; Priority Date: December 21, 2016. This predates the priority date of US11948550 (May 6, 2021) and is therefore prior art.
- Brief Description: This patent describes a system and method for "accent translation" where an accent translation model adjusts the audio characteristics of input audio from a first accent to more closely resemble those of a second accent. It involves performing voice recognition analysis to identify letters, phonemes, words, and other units of speech and then adjusting audio characteristics for these identified portions. Speech spoken by a user is captured by a microphone and translated from a first accent to a second accent.
- Potential Anticipation (35 U.S.C. § 102):
- This patent broadly anticipates the concept of "accent translation" between a first and second accent, including receiving speech from a microphone and outputting converted speech. It also mentions processing at the phoneme level for adjustment.
- However, US10163451B2 does not explicitly detail the use of a "non-text linguistic representation" as the primary intermediate format, nor does it explicitly claim the specific mechanism of "mapping a first non-text linguistic representation of a first phoneme to a second non-text linguistic representation of a second phoneme that is different from the first pronunciation" to achieve the accent conversion, which is a key distinguishing feature of US11948550.
- Claims potentially anticipated: Elements of claims 1, 11, and 19 of US11948550 related to the general idea of receiving speech in a first accent and converting it to a second accent using machine learning for pronunciation changes might be broadly anticipated. The specific "non-text linguistic representation" and "mapping of different phonemes" for pronunciation differences would be points of distinction for US11948550.
2. CN111462769A
- Full Citation: CN111462769A, "End-to-end accent conversion method," assigned to 深圳市声希科技有限公司 (Shenzhen Voice-X Technology Co., Ltd.).
- Publication/Filing Date: Published on July 28, 2020; Priority Date: March 30, 2020. This predates the priority date of US11948550 (May 6, 2021) and is therefore prior art.
- Brief Description: This patent describes an "end-to-end accent conversion method." While a detailed English abstract for this specific Chinese patent isn't readily available in the search results, related non-patent literature discusses "end-to-end accent conversion" using phonetic posteriorgrams (PPG), a non-text linguistic representation. It aims to convert non-native accented speech to native accented speech, modeling prosodic characteristics, and potentially uses neural networks including a speaker encoder, a multi-speaker TTS model, an accented ASR model, and a neural vocoder.
- Potential Anticipation (35 U.S.C. § 102):
- The "end-to-end accent conversion" directly aligns with the primary objective of US11948550. The use of non-text linguistic representations (like PPGs) for accent transfer, as suggested by related works, is also highly relevant to US11948550's claims. The ability to model prosodic characteristics further supports its relevance in preserving speech nuances, similar to US11948550's objective to avoid loss from STT-TTS.
- A potential point of distinction could be the mention of a "multi-speaker TTS model" in some "end-to-end" frameworks, which US11948550 explicitly states its voice conversion engine avoids by operating on encoded linguistic data rather than generating output speech as a midpoint (like an STT-TTS approach).
- Claims potentially anticipated: This patent could potentially anticipate claims 1, 11, and 19 of US11948550, especially regarding the broad concept of real-time accent conversion, the use of ML algorithms, and the aim for low-latency, continuous conversion (e.g., claims 9, 18, 20). The specific details of the non-text linguistic representation and the explicit mapping of different phonemes for pronunciation changes would need a full textual comparison for a definitive assessment of anticipation.
3. CN112382267A
- Full Citation: CN112382267A, "Method, apparatus, device and storage medium for converting accents," assigned to 北京有竹居网络技术有限公司 (Beijing Youzhuju Network Technology Co., Ltd.).
- Publication/Filing Date: Published on February 19, 2021; Priority Date: November 13, 2020. This predates the priority date of US11948550 (May 6, 2021) and is therefore prior art.
- Brief Description: The title explicitly states a "Method, apparatus, device and storage medium for converting accents," indicating a direct focus on the same problem addressed by US11948550. Without a detailed English abstract, the specific technical approach remains unknown. However, its explicit problem domain makes it highly relevant.
- Potential Anticipation (35 U.S.C. § 102):
- The broad scope of converting accents through a method, apparatus, device, and storage medium is directly analogous to the claims of US11948550.
- Similar to CN111462769A, the precise nature of the "linguistic representation" used (text-based or non-text-based) and the specific mechanism for altering pronunciation (e.g., direct phoneme mapping versus only acoustic feature adjustment) would be critical to determine if it fully anticipates the specific claims of US11948550.
- Claims potentially anticipated: This patent could potentially anticipate claims 1, 11, and 19 of US11948550, which describe the general system, non-transitory computer-readable medium, and method for accent conversion. A full analysis of its technical details would be needed to assess anticipation of the specific elements related to "non-text linguistic representation" and "mapping of different phonemes for different pronunciations."
Other Relevant Prior Art (Less Direct)
US10614826B2: "System and method for voice-to-voice conversion," assigned to Modulate, Inc. (Published April 7, 2020). While it addresses "voice conversion," US11948550 explicitly distinguishes itself from voice conversion methods that only adjust audio characteristics (pitch, intonation) without accounting for pronunciation differences inherent to accents. Therefore, it is less likely to anticipate the specific pronunciation-changing aspects of US11948550.
US20140365216A1: "System and method for user-specified pronunciation of words for speech synthesis and recognition," assigned to Apple Inc. (Published December 11, 2014). This patent focuses on user-defined pronunciation rules for words rather than a comprehensive, real-time, ML-driven accent conversion for continuous speech that dynamically maps phonemes based on learned linguistic representations.
US20150170642A1: "Identifying substitute pronunciations," assigned to Google Inc. (Published June 18, 2015). Similar to the Apple patent, this likely deals with handling pronunciation variations within speech systems for improved recognition or synthesis, rather than real-time, end-to-end accent conversion for continuous speech using non-text linguistic representations to specifically alter pronunciation between accents.
Generated 5/29/2026, 5:56:29 PM