Obviousness — US Patent 11120271

I have access to the full text of the patent from the initial prompt, which includes the claims. I do not need to perform an external search for this information. I will proceed with the analysis based on the provided text.

Obviousness Analysis of U.S. Patent No. 11,120,271

A determination of obviousness under 35 U.S.C. § 103 requires a factual inquiry into the scope and content of the prior art, the differences between the prior art and the claims at issue, and the level of ordinary skill in the pertinent art. The central question is whether "the differences between the subject matter sought to be patented and the prior art are such that the subject matter as a whole would have been obvious at the time the invention was made to a person having ordinary skill in the art." The Supreme Court in KSR International Co. v. Teleflex Inc. emphasized a flexible approach, moving away from a rigid "teaching, suggestion, or motivation" (TSM) test. An invention can be deemed obvious if it represents a predictable variation of prior art elements, or if there was a known problem for which there was an obvious solution.

Based on the provided prior art, the claims of U.S. Patent No. 11,120,271 ("the '271 patent") appear vulnerable to an obviousness challenge. The core of the invention lies in using spatiotemporal data from a video feed to identify "semantic elements" and "semantic contexts" to generate and display context-aware augmentations, often in an interactive manner. The prior art, particularly when combined, teaches all the key elements of this process.

A person of ordinary skill in the art (POSA) at the time of the invention (with a priority date of February 28, 2014) would have been a computer scientist or engineer with experience in computer vision, machine learning, and interactive media. This individual would have been familiar with techniques for object recognition in video, data overlay, and user interface design for interactive applications.

Combination 1: Sato (U.S. Patent No. 9,736,547) in view of Tseng et al. (U.S. Patent No. 10,791,281)

Motivation to Combine:
Sato discloses a system for identifying objects in a video stream, such as players in a sports broadcast, and overlaying related information like statistics. This directly addresses the "augmentation" aspect of the '271 patent. However, Sato's system is primarily described as a passive display of information. A person of ordinary skill in the art, recognizing the growing trend of interactive user experiences in media consumption, would have been motivated to enhance Sato's system to allow for user engagement. Tseng et al. provides the clear solution by teaching an interactive augmented reality system for live events where users can interact with superimposed digital information. The motivation would be to improve the user experience of Sato's system by making the displayed information interactive, thereby increasing user engagement and providing a more dynamic and personalized viewing experience, a well-established goal in the field of digital media. This combination represents a predictable union of known technologies to achieve a desirable and expected result.

Mapping to Claim Elements (Illustrative Independent Claim):

"receiving...video data...comprising video content and spatiotemporal data": Sato discloses receiving a video stream, which inherently contains spatiotemporal data (the position of players and objects over time). Tseng et al. also describe capturing and tracking object positions in a live event.
"determining...one or more semantic elements": Sato teaches identifying objects in video frames, which corresponds to identifying "semantic elements" like players.
"determining...one or more semantic contexts": Sato's system, by identifying a player, inherently determines a basic context (e.g., "player X is on the field"). The '271 patent's concept of "semantic context" appears to be a more granular level of analysis.
"determining...an augmentation for each respective semantic element": Sato's system displays statistics for identified players, which is a form of augmentation.
"generating...augmented video content": Sato's system superimposes information on the video, thereby generating augmented content.
"enabling user interaction with the augmentation": This element, while not explicitly detailed in Sato, is the core teaching of Tseng et al. A POSA would find it obvious to apply the interactive features of Tseng et al. to the augmented data provided by Sato's system. For example, allowing a user to tap on a player (the "semantic element" identified by a Sato-like system) to bring up more detailed statistics or replays (the "interactive augmentation" taught by Tseng et al.).

Combination 2: Kondo (U.S. Patent No. 9,992,544) in view of Levin et al. (US 2018/0278972 A1)

Motivation to Combine:
Kondo describes a system that goes beyond simple object recognition to identify complex events and actions in a video, such as a "pick and roll" in basketball, based on patterns of movement (spatiotemporal data). This aligns with the '271 patent's focus on determining "semantic context" from spatiotemporal data. Kondo's system then provides information related to these recognized events. However, the augmentation described is general. Levin et al., on the other hand, teach a system for personalizing a media experience by tailoring the displayed information and augmentations based on a user's profile and past interactions.

A person of ordinary skill in the art would be motivated to combine these teachings to create a more sophisticated and engaging user experience. Kondo provides the powerful capability of deep, contextual understanding of the event (e.g., "this is a pick and roll play"). Levin et al. provides the mechanism to make the information presented about that event highly relevant to the individual user. The motivation would be to leverage the detailed event recognition of Kondo to provide the personalized augmentations described by Levin et al. For instance, if a user is a known fan of a particular player (a "user context" from Levin), and that player is involved in a "pick and roll" (a "semantic context" from Kondo), the system could display specialized statistics or highlights related to that player's performance in that specific type of play. This combination would be a logical step to enhance the value and engagement of the augmented video experience.

Mapping to Claim Elements (Illustrative Independent Claim):

"receiving...video data...comprising video content and spatiotemporal data": Both Kondo and Levin et al. operate on video data from events, which includes spatiotemporal information.
"determining...one or more semantic elements": Kondo explicitly teaches recognizing elements like players and the ball.
"determining...one or more semantic contexts": Kondo's core contribution is the recognition of complex actions and plays (e.g., a "pick and roll"), which is a clear example of determining a "semantic context" from spatiotemporal data.
"determining...an augmentation...based at least in part on...user context": This is directly taught by Levin et al., which describes tailoring augmentations based on user profiles and preferences. A POSA would find it obvious to use the user context from Levin et al. to select from the potential augmentations available for the specific game event identified by Kondo.
"generating...augmented video content": Both references describe the display of information overlaid on the video. The combination would result in the generation of personalized augmented video content.

Conclusion

The claims of U.S. Patent No. 11,120,271, as understood from the provided information, appear to cover a system that integrates known concepts from the prior art in a predictable manner. The core ideas of identifying objects and events in video, overlaying contextual information, and making that information interactive and personalized were all present in the art before the patent's priority date. While the '271 patent may describe a particularly sophisticated and well-integrated system, the combination of references like Sato and Tseng et al., or Kondo and Levin et al., would have provided a person of ordinary skill in the art with both the components and the motivation to create the claimed invention. The novelty and non-obviousness of the patent likely reside in specific, detailed implementations of the spatiotemporal data analysis and the generation of "semantic contexts," which may not be fully captured in the high-level descriptions of the prior art. A more definitive conclusion would require a detailed comparison of the specific claim limitations with the disclosures in the prior art, along with an analysis of the arguments made during prosecution that persuaded the examiner of the patent's non-obviousness.