Traditional methods for studying the activity dynamics of people and
their social interactions in cities require time-consuming and
resource-intensive observations and surveys. Dynamic online trails from
geosocial networks (e.g. Twitter, Instagram, Flickr etc.) have been
increasingly used as proxies for human activity, focusing on mobility
behavior, spatial interaction, and social connectivity, among others.
Social media records incorporate geo-tags, timestamps, textual
components, user-profile attributes and points-of-interest (POI)
features, which respectively address spatial, temporal, topical,
demographic, and contextual dimensions of human activity. While the
information contained in social media data is complex and
high-dimensional, there is a lack of studies exploiting the combined
potential of their information layers. This article introduces a
framework that considers multiple dimensions (i.e. spatial, temporal,
topical, and demographic) of information from social media data, and
combines Geo-Self-Organizing Maps (GeoSOMs) in conjunction with
contiguity-constrained hierarchical clustering, to identify homogeneous
regions of social interaction in cities and, subsequently, estimate
appropriate locations for new POIs. Drawing on the discovered regions,
we build a Factorization Machine-based model to estimate appropriate
locations for new POIs in different urban contexts. Using geo-referenced
Twitter records and Foursquare data from Amsterdam, Boston, and
Jakarta, we evaluate the potential of machine learning techniques in
discovering knowledge about the geography of social dynamics from
unstructured and high-dimensional social web data. Moreover, we
demonstrate that the discovered homogeneous regions are significant
predictors of new POI locations.
History
Affiliation
Web Information Systems group, Delft University of Technology