Obviousness — US Patent 11935082

To establish obviousness under 35 U.S.C. § 103 for US Patent 11935082, it is necessary to identify prior art references that, when combined, would have made the claimed invention obvious to a person having ordinary skill in the art (PHOSITA) at the time of the invention (i.e., before the priority date of August 30, 2012). The patent itself cites and incorporates several relevant academic works that predate this priority date, offering strong grounds for an obviousness analysis. A PHOSITA in this field would likely possess expertise in data mining, machine learning, geographic information systems (GIS), and social network analysis.

The independent claims of US Patent 11935082 generally describe:

Claims 1 and 9 (Venue-based): A system/method for identifying geographic clusters of venues by storing venue check-in data, generating a pairwise venue similarity matrix based on both geographical distance and social distance (derived from common venue visitors), and then clustering venues using this matrix.
Claims 17 and 20 (Sub-region-based): A system/method for identifying geographic clusters of sub-regions (e.g., census tracts) by storing venue check-in data for venues within those sub-regions, generating check-in intensity vectors for the sub-regions, creating a pairwise sub-region similarity matrix based on these vectors, and then clustering sub-regions.

Prior Art References for Obviousness Analysis:

Cheng et al., "Exploring millions of footprints in location sharing services," AAAI ICWSM, 2011 (hereinafter "Cheng"): This paper describes the collection and analysis of check-in data from location-based social networks like Foursquare, often shared via Twitter. It details methods for extracting venue names, IDs, and categories from such data.
Blei and Frazier, "Distance dependent Chinese restaurant processes," J. Mach. Learn. Res., November 2011 (hereinafter "Blei"): This reference introduces the Distance Dependent Chinese Restaurant Process (ddCRP), a non-parametric Bayesian method for clustering non-exchangeable data using a pairwise similarity matrix. It explicitly notes that the similarity matrix is a "flexible way to specify prior assumptions about the strength of relationships between pairs."
Ghosh et al., "Spatial distance dependent Chinese restaurant processes for image segmentation," Neural Information Processing Systems, 2011 (hereinafter "Ghosh 2011"): This work extends the ddCRP to hierarchical modeling and incorporates spatial information, further indicating the use of spatial data in clustering with ddCRP.
General knowledge in the art: Prior to 2012, clustering algorithms (e.g., spectral clustering, k-means, hierarchical clustering) and the concept of combining multiple similarity metrics (e.g., geographic and social) for data analysis were well-established.

Obviousness of Claims 1 and 9 (Venue-Based Clustering)

Claim 1 (System) and Claim 9 (Method) elements:

A computer system/method.
Storing venue check-in data from multiple venue visitors for multiple venues in a geographic region (from apps, POS, ratings, reviews).
Generating a check-in intensity vector for each venue (Claim 9).
Generating a pairwise venue similarity matrix, where each score is based on both geographical distance and social distance.
Social distance is determined based on whether common venue visitors (or groups of visitors) visit the pair of venues.
Identifying two or more geographic clusters of venues based on this matrix.

Combination and Motivation:

Data Collection and Check-in Intensity: Cheng (2011) explicitly teaches the collection of venue check-in data from location-based social services (e.g., Foursquare via Twitter), including user IDs, venue IDs, and categories. [0164-0165] This directly provides the "venue check-in data from multiple venue visitors for multiple venues" as recited in the claims. A PHOSITA would readily understand how to process this raw check-in data to quantify user activity at venues, thereby generating "check-in intensity vectors" for each venue as described in the patent.
Social Similarity: The patent explicitly states that "social similarity is assessed based on whether common users visit (or check-into) the venues." Given the user-venue check-in data from Cheng (2011), calculating social similarity (e.g., via cosine or Jaccard similarity of check-in intensity vectors) between pairs of venues based on common users would be a routine application of known similarity measures in data analysis. The patent itself suggests using cosine or Jaccard similarity. [0080-0081]
Geographical Distance: The patent discloses computing "a geographical distance d(i,j) based on, for example, the GPS coordinates (latitude and longitude) for the venues i,j." Given that location-based services inherently rely on geographical coordinates (e.g., GPS from mobile devices), obtaining such data for venues would be obvious from Cheng (2011) and general knowledge of location services.
Combining Social and Geographical Distance into a Similarity Matrix: Blei (2011) and Ghosh (2011) introduce the ddCRP, which uses a "similarity matrix A={a i,j}" to cluster non-exchangeable data, with Ghosh (2011) specifically extending it to incorporate spatial data. The patent's own description indicates that "Almost always, the geographical proximity of venues is a factor in grouping venues into a cluster" and that "venues are grouped based on the social similarity of the venues." A PHOSITA, wanting to discover meaningful "neighborhood clusters" (as described in the patent's abstract and the provisional application's title, "Utilizing social media to understand the dynamics of a city") from location-based social media data (Cheng 2011), would be motivated to combine these two inherently relevant factors—social interaction and geographic proximity—into a unified similarity metric. The patent itself outlines methods for combining them, such as: a(i, j) = g × s(i, j) + α where s(i, j) is social similarity and the term is zero if venues are beyond m closest neighbors, or by using a decay function of geographical distance [0084, 0086-0087]. These specific formulations would be obvious design choices for a PHOSITA implementing a combined similarity metric.
Identifying Clusters: The patent refers to using "spectral clustering" or "other graph-based clustering algorithms besides spectral clustering" such as "hierarchical clustering, density-based clustering, centroid-based clustering such as k-means, distribution or model based clustering such as Gaussian mixture models, graph partition clustering, social network community detection, graph layout-based clustering, and others." [0094-0096] These are all well-known clustering techniques in the art and would be obvious to apply to a similarity matrix derived as above.

Therefore, the combination of Cheng (2011) providing the necessary data, Blei (2011) and Ghosh (2011) providing the framework for distance-dependent clustering with similarity matrices, and general knowledge of combining metrics and clustering algorithms, would render Claims 1 and 9 obvious to a PHOSITA. The motivation would be to create more accurate and meaningful geographic groupings of venues by leveraging both the social and physical characteristics of urban spaces revealed by check-in data.

Obviousness of Claims 17 and 20 (Sub-region-Based Clustering)

Claim 17 (System) and Claim 20 (Method) elements:

A computer system/method.
Storing venue check-in data from multiple venue visitors for multiple venues in a geographic region, where each venue is located in one of multiple sub-regions.
Generating a check-in intensity vector for each sub-region (based on check-ins to venues in that sub-region).
Generating a pairwise sub-region similarity matrix (based on the similarity of their check-in intensity vectors).
Identifying two or more geographic clusters of sub-regions based on this matrix.

Combination and Motivation:

Data Collection and Assignment to Sub-regions: Cheng (2011) teaches gathering venue check-in data. The patent itself explicitly mentions using "TIGER/Line municipal boundary Shapefiles published by the United States Census Bureau were used to assign venues to their proper local administrative unit (e.g. city or town)." This demonstrates that the practice of assigning venues to defined geographic sub-regions (like census tracts or municipal boundaries) was a known preprocessing step for location data, prior to the priority date.
Sub-region Check-in Intensity Vector: Once venues are assigned to sub-regions (as described in the patent and using data from Cheng 2011), aggregating the check-in data from all venues within a sub-region to form a "check-in intensity vector for each of multiple sub-regions" is a straightforward data aggregation step. The patent describes that "the elements of the check-in count vector would show the cumulative number of times that the venue visitors checked into venues in the various geographic sub-regions over a period of time." This is a direct application of the venue-level check-in intensity concept (from Claims 1/9) but aggregated to the sub-region level.
Pairwise Sub-region Similarity Matrix: Similar to the venue-based clustering, once check-in intensity vectors are generated for sub-regions, generating a pairwise similarity matrix between these sub-regions based on the similarity of these vectors would be an obvious extension. This leverages the same principles of social similarity (common users visiting venues within a sub-region) as applied in Claims 1 and 9. The patent states, "the elements of the pairwise similarity matrix would correspond to the similarity score between pairs of geographic sub-regions."
Identifying Clusters of Sub-regions: Applying known clustering algorithms (as identified for Claims 1 and 9) to this sub-region similarity matrix to "identify two or more geographic clusters of sub-regions" is a direct and obvious application of standard clustering techniques.

Motivation for Combination:
A PHOSITA, having developed or understood the methods for clustering individual venues, would be motivated to extend this analysis to larger, administratively defined geographic units (sub-regions). This is a common practice in fields like urban planning, demography, and market analysis, where insights are often needed at a broader scale than individual venues. The patent itself describes this as an "other embodiment" where "the system could be used to cluster sub-regions... such as census tracts, school districts, or some other geographic regions with defined boundaries." [0215-0216] This indicates it was a recognized and straightforward generalization of the venue-level clustering. The motivation is to provide insights at a more aggregated, policy-relevant level, such as "how a municipality allocates its resources, such as the location of fire stations, police stations, schools, polling stations, bust stops, etc."

Thus, the combination of Cheng (2011) for data, the concept of social/geographic similarity and clustering from Blei (2011) and Ghosh (2011), coupled with the routine practice of aggregating data to administrative geographic units, would render Claims 17 and 20 obvious.