Spatial crime distribution and prediction for sporting events using social media
Sporting events attract high volumes of people, which in turn leads to increased use of social media. In addition, research shows that sporting events may trigger violent behavior that can lead to crime. This study analyses the spatial relationships between crime occurrences, demographic, socio-economic and environmental variables, together with geo-located Twitter messages and their 'violent' subsets. The analysis compares basketball and hockey game days and non-game days. Moreover, this research aims to analyze crime prediction models using historical crime data as a basis and then introducing tweets and additional variables in their role as covariates of crime. First, this study investigates the spatial distribution of and correlation between crime and tweets during the same temporal periods. Feature selection models are applied in order to identify the best explanatory variables. Then, we apply localized kernel density estimation model for crime prediction during basketball and hockey games, and on non-game days. Findings from this study show that Twitter data, and a subset of violent tweets, are useful in building prediction models for the seven investigated crime types for home and away sporting events, and non-game days, with different levels of improvement.
Global multi-layer network of human mobility
Recent availability of geo-localized data capturing individual human activity together with the statistical data on international migration opened up unprecedented opportunities for a study on global mobility. In this paper, we consider it from the perspective of a multi-layer complex network, built using a combination of three datasets: Twitter, Flickr and official migration data. Those datasets provide different, but equally important insights on the global mobility - while the first two highlight short-term visits of people from one country to another, the last one - migration - shows the long-term mobility perspective, when people relocate for good. The main purpose of the paper is to emphasize importance of this multi-layer approach capturing both aspects of human mobility at the same time. On the one hand, we show that although the general properties of different layers of the global mobility network are similar, there are important quantitative differences among them. On the other hand, we demonstrate that consideration of mobility from a multi-layer perspective can reveal important global spatial patterns in a way more consistent with those observed in other available relevant sources of international connections, in comparison to the spatial structure inferred from each network layer taken separately.
Why GPS makes distances bigger than they are
Global navigation satellite systems such as the Global Positioning System (GPS) is one of the most important sensors for movement analysis. GPS is widely used to record the trajectories of vehicles, animals and human beings. However, all GPS movement data are affected by both measurement and interpolation errors. In this article we show that measurement error causes a systematic bias in distances recorded with a GPS; the distance between two points recorded with a GPS is - on average - bigger than the true distance between these points. This systematic 'overestimation of distance' becomes relevant if the influence of interpolation error can be neglected, which in practice is the case for movement sampled at high frequencies. We provide a mathematical explanation of this phenomenon and illustrate that it functionally depends on the autocorrelation of GPS measurement error (). We argue that can be interpreted as a quality measure for movement data recorded with a GPS. If there is a strong autocorrelation between any two consecutive position estimates, they have very similar error. This error cancels out when average speed, distance or direction is calculated along the trajectory. Based on our theoretical findings we introduce a novel approach to determine in real-world GPS movement data sampled at high frequencies. We apply our approach to pedestrian trajectories and car trajectories. We found that the measurement error in the data was strongly spatially and temporally autocorrelated and give a quality estimate of the data. Most importantly, our findings are not limited to GPS alone. The systematic bias and its implications are bound to occur in any movement data collected with absolute positioning if interpolation error can be neglected.
An uncertainty and sensitivity analysis approach for GIS-based multicriteria landslide susceptibility mapping
GIS-based multicriteria decision analysis (MCDA) methods are increasingly being used in landslide susceptibility mapping. However, the uncertainties that are associated with MCDA techniques may significantly impact the results. This may sometimes lead to inaccurate outcomes and undesirable consequences. This article introduces a new GIS-based MCDA approach. We illustrate the consequences of applying different MCDA methods within a decision-making process through uncertainty analysis. Three GIS-MCDA methods in conjunction with Monte Carlo simulation (MCS) and Dempster-Shafer theory are analyzed for landslide susceptibility mapping (LSM) in the Urmia lake basin in Iran, which is highly susceptible to landslide hazards. The methodology comprises three stages. First, the LSM criteria are ranked and a sensitivity analysis is implemented to simulate error propagation based on the MCS. The resulting weights are expressed through probability density functions. Accordingly, within the second stage, three MCDA methods, namely analytical hierarchy process (AHP), weighted linear combination (WLC) and ordered weighted average (OWA), are used to produce the landslide susceptibility maps. In the third stage, accuracy assessments are carried out and the uncertainties of the different results are measured. We compare the accuracies of the three MCDA methods based on (1) the Dempster-Shafer theory and (2) a validation of the results using an inventory of known landslides and their respective coverage based on object-based image analysis of IRS-ID satellite images. The results of this study reveal that through the integration of GIS and MCDA models, it is possible to identify strategies for choosing an appropriate method for LSM. Furthermore, our findings indicate that the integration of MCDA and MCS can significantly improve the accuracy of the results. In LSM, the AHP method performed best, while the OWA reveals better performance in the reliability assessment. The WLC operation yielded poor results.
The GeoViz Toolkit: Using component-oriented coordination methods for geographic visualization and analysis
In this paper we present the GeoViz Toolkit, an open-source, internet-delivered program for geographic visualization and analysis that features a diverse set of software components which can be flexibly combined by users who do not have programming expertise. The design and architecture of the GeoViz Toolkit allows us to address three key research challenges in geovisualization: allowing end users to create their own geovisualization and analysis component set on-the-fly, integrating geovisualization methods with spatial analysis methods, and making geovisualization applications sharable between users. Each of these tasks necessitates a robust yet flexible approach to inter-tool coordination. The coordination strategy we developed for the GeoViz Toolkit, called Introspective Observer Coordination, leverages and combines key advances in software engineering from the last decade: automatic introspection of objects, software design patterns, and reflective invocation of methods.
Identifying Regions Based on Flexible User Defined Constraints
The identification of regions is both a computational and conceptual challenge. Even with growing computational power, regionalization algorithms must rely on heuristic approaches in order to find solutions. Therefore, the constraints and evaluation criteria that define a region must be translated into an algorithm that can efficiently and effectively navigate the solution space to find the best solution. One limitation of many existing regionalization algorithms is a requirement that the number of regions be selected . The max- algorithm, introduced in Duque et al. (2012), does not have this requirement, and thus the number of regions is an output of, not an input to, the algorithm. In this paper we extend the max- algorithm to allow for greater flexibility in the constraints available to define a feasible region, placing the focus squarely on the multidimensional characteristics of region. We also modify technical aspects of the algorithm to provide greater flexibility in its ability to search the solution space. Using synthetic spatial and attribute data we are able to show the algorithm's broad ability to identify regions in maps of varying complexity. We also conduct a large scale computational experiment to identify parameter settings that result in the greatest solution accuracy under various scenarios. The rules of thumb identified from the experiment produce maps that correctly assign areas to their "true" region with 94% average accuracy, with nearly 50 percent of the simulations reaching 100 percent accuracy.
A new method for discovering behavior patterns among animal movements
Advanced satellite tracking technologies enable biologists to track animal movements at fine spatial and temporal scales. The resultant data present opportunities and challenges for understanding animal behavioral mechanisms. In this paper, we develop a new method to elucidate animal movement patterns from tracking data. Here, we propose the notion of continuous behavior patterns as a concise representation of popular migration routes and underlying sequential behaviors during migration. Each stage in the pattern is characterized in terms of space (i.e., the places traversed during movements) and time (i.e. the time spent in those places); that is, the behavioral state corresponding to a stage is inferred according to the spatiotemporal and sequential context. Hence, the pattern may be interpreted predictably. We develop a candidate generation and refinement framework to derive all continuous behavior patterns from raw trajectories. In the framework, we first define the representative spots to denote the underlying potential behavioral states that are extracted from individual trajectories according to the similarity of relaxed continuous locations in certain distinct time intervals. We determine the common behaviors of multiple individuals according to the spatiotemporal proximity of representative spots and apply a projection-based extension approach to generate candidate sequential behavior sequences as candidate patterns. Finally, the candidate generation procedure is combined with a refinement procedure to derive continuous behavior patterns. We apply an ordered processing strategy to accelerate candidate refinement. The proposed patterns and discovery framework are evaluated through conceptual experiments on both real GPS-tracking and large synthetic datasets.
Movement analysis of free-grazing domestic ducks in Poyang Lake, China: a disease connection
Previous work suggests domestic poultry are important contributors to the emergence and transmission of highly pathogenic avian influenza throughout Asia. In Poyang Lake, China, domestic duck production cycles are synchronized with arrival and departure of thousands of migratory wild birds in the area. During these periods, high densities of juvenile domestic ducks are in close proximity to migratory wild ducks, increasing the potential for the virus to be transmitted and subsequently disseminated via migration. In this paper, we use GPS dataloggers and dynamic Brownian bridge models to describe movements and habitat use of free-grazing domestic ducks in the Poyang Lake basin and identify specific areas that may have the highest risk of H5N1 transmission between domestic and wild birds. Specifically, we determine relative use by free-grazing domestic ducks of natural wetlands, which are the most heavily used areas by migratory wild ducks, and of rice paddies, which provide habitat for resident wild ducks and lower densities of migratory wild ducks. To our knowledge, this is the first movement study on domestic ducks, and our data show potential for free-grazing domestic ducks from farms located near natural wetlands to come in contact with wild waterfowl, thereby increasing the risk for disease transmission. This study provides an example of the importance of movement ecology studies in understanding dynamics such as disease transmission on a complicated landscape.
A grammar for interpreting geo-analytical questions as concept transformations
Geographic Question Answering (GeoQA) systems can automatically answer questions phrased in natural language. Potentially this may enable data analysts to make use of geographic information without requiring any GIS skills. However, going beyond the retrieval of existing geographic facts on particular places remains a challenge. Current systems usually cannot handle geo-analytical questions that require GIS analysis procedures to arrive at answers. To enable , GeoQA systems need to interpret questions in terms of a transformation that can be implemented in a GIS workflow. To this end, we propose a novel approach to question parsing that interprets questions in terms of core concepts of spatial information and their functional roles in context-free grammar. The core concepts help model spatial information in questions independently from implementation formats, and their functional roles indicate how concepts are transformed and used in a workflow. Using our parser, geo-analytical questions can be converted into expressions of concept transformations corresponding to abstract GIS workflows. We developed our approach on a corpus of 309 GIS-related questions and tested it on an independent source of 134 test questions including workflows. The evaluation results show high precision and recall on a gold standard of concept transformations.
A comparison of multiple indicator kriging and area-to-point Poisson kriging for mapping patterns of herbivore species abundance in Kruger National Park, South Africa
Kruger National Park (KNP), South Africa, provides protected habitats for the unique animals of the African savannah. For the past 40 years, annual aerial surveys of herbivores have been conducted to aid management decisions based on (1) the spatial distribution of species throughout the park and (2) total species populations in a year. The surveys are extremely time consuming and costly. For many years, the whole park was surveyed, but in 1998 a transect survey approach was adopted. This is cheaper and less time consuming but leaves gaps in the data spatially. Also the distance method currently employed by the park only gives estimates of total species populations but not their spatial distribution. We compare the ability of multiple indicator kriging and area-to-point Poisson kriging to accurately map species distribution in the park. A leave-one-out cross-validation approach indicates that multiple indicator kriging makes poor estimates of the number of animals, particularly the few large counts, as the indicator variograms for such high thresholds are pure nugget. Poisson kriging was applied to the prediction of two types of abundance data: spatial density and proportion of a given species. Both Poisson approaches had standardized mean absolute errors (St. MAEs) of animal counts at least an order of magnitude lower than multiple indicator kriging. The spatial density, Poisson approach (1), gave the lowest St. MAEs for the most abundant species and the proportion, Poisson approach (2), did for the least abundant species. Incorporating environmental data into Poisson approach (2) further reduced St. MAEs.
Towards an Integrated Science of Movement: Converging Research on Animal Movement Ecology and Human Mobility Science
There is long-standing scientific interest in understanding purposeful movement by animals and humans. Traditionally, collecting data on individual moving entities was difficult and time-consuming, limiting scientific progress. The growth of location-aware and other geospatial technologies for capturing, managing and analyzing moving objects data are shattering these limitations, leading to revolutions in animal movement ecology and human mobility science. Despite parallel transitions towards massive individual-level data collected automatically via sensors, there is little scientific cross-fertilization across the animal and human divide. There are potential synergies from converging these separate domains towards an integrated science of movement. This paper discusses the data-driven revolutions in the animal movement ecology and human mobility science, their contrasting worldviews and, as examples of complementarity, transdisciplinary questions that span both fields. We also identify research challenges that should be met to develop an integrated science of movement trajectories.
Ecological metrics and methods for GPS movement data
The growing field of movement ecology uses high resolution movement data to analyze animal behavior across multiple scales: from individual foraging decisions to population-level space-use patterns. These analyses contribute to various subfields of ecology- behavioral, disease, landscape, resource, and wildlife-and facilitate facilitate novel exploration in fields ranging from conservation planning to public health. Despite the growing availability and general accessibility of animal movement data, much potential remains for the analytical methods of movement ecology to be incorporated in all types of geographic analyses. This review provides for the Geographical Information Sciences (GIS) community an overview of the most common movement metrics and methods of analysis employed by animal ecologists. Through illustrative applications, we emphasize the potential for movement analyses to promote transdisciplinary GIS/wildlife-ecology research.
Using Multiple Scale Spatio-Temporal Patterns for Validating Spatially Explicit Agent-Based Models
Spatially explicit agent-based models (ABMs) have been widely utilized to simulate the dynamics of spatial processes that involve the interactions of individual agents. The assumptions embedded in the ABMs may be responsible for uncertainty in the model outcomes. To ensure the reliability of the outcomes in terms of their space-time patterns, model validation should be performed. In this paper, we propose the use of multiple scale spatio-temporal patterns for validating spatially explicit ABMs. We evaluated several specifications of vector-borne disease transmission models by comparing space-time patterns of model outcomes to observations at multiple scales via the sum of root mean square error (RMSE) measurement. The results indicate that specifications of the spatial configurations of residential area and immunity status of individual humans are of importance to reproduce observed patterns of dengue outbreaks at multiple space-time scales. Our approach to using multiple scale spatio-temporal patterns can help not only to understand the dynamic associations between model specifications and model outcomes, but also to validate spatially explicit ABMs.
From 2SFCA to i2SFCA: integration, derivation and validation
Uneven distributions of population and service providers lead to geographic disparity in access for residents and varying workload for staff in facilities. The former can be captured by spatial accessibility in the traditional two-step floating catchment area (2SFCA) method; and the latter can be measured by potential crowdedness in the newly developed inverted 2SFCA (or i2SFCA) method. Residents-based accessibility and facility crowdedness are two sides of the same coin in examining the geographic variability of resource allocation. This short research note derives the formulations of both methods to solidify their theoretical foundation, and uses a case study to validate both. By doing so, the 2SFCA and i2SFCA are fully integrated into one conceptual framework, derived with extensions to the Huff model, and validated by empirical data.
An Evaluation of Geo-located Twitter Data for Measuring Human Migration
This study evaluates the spatial patterns of flows generated from geo-located Twitter data to measure human migration. Using geo-located tweets continuously collected in the U.S. from 2013 to 2015, we identified Twitter users who migrated per changes in county-of-residence every two years and compared the Twitter-estimated county-to-county migration flows with the ones from the U.S. Internal Revenue Service (IRS). To evaluate the spatial patterns of Twitter migration flows when representing the IRS counterparts, we developed a normalized difference representation index to visualize and identify those counties of over-/under-representations in the Twitter estimates. Further, we applied a multidimensional spatial scan statistic approach based on a Poisson process model to detect pairs of origin and destination regions where the over-/under-representativeness occurred. The results suggest that Twitter migration flows tend to under-represent the IRS estimates in regions with a large population and over-represent them in metropolitan regions adjacent to tourist attractions. This study demonstrated that geo-located Twitter data could be a sound statistical proxy for measuring human migration. Given that the spatial patterns of Twitter-estimated migration flows vary significantly across the geographic space, related studies will benefit from our approach by identifying those regions where data calibration is necessary.
Qualitative GIS and the Visualization of Narrative Activity Space Data
Qualitative activity space data, i.e. qualitative data associated with the routine locations and activities of individuals, are recognized as increasingly useful by researchers in the social and health sciences for investigating the influence of environment on human behavior. However, there has been little research on techniques for exploring qualitative activity space data. This research illustrates the theoretical principles of combining qualitative and quantitative data and methodologies within the context of GIS, using visualization as the means of inquiry. Through the use of a prototype implementation of a visualization system for qualitative activity space data, and its application in a case study of urban youth, we show how these theoretical methodological principles are realized in applied research. The visualization system uses a variety of visual variables to simultaneously depict multiple qualitative and quantitative attributes of individuals' activity spaces. The visualization is applied to explore the activity spaces of a sample of urban youth participating in a study on the geographic and social contexts of adolescent substance use. Examples demonstrate how the visualization may be used to explore individual activity spaces to generate hypotheses, investigate statistical outliers, and explore activity space patterns among subject subgroups.
How many days are enough?: capturing routine human mobility
Wedding mobile phone sensor technology and human spatial behaviour has great potential. The ubiquity of Global Positioning Systems (GPS) technology has made gathering data about human mobility simpler, more precise, and with higher fidelity, providing minute-by-minute records of the locations of cohorts from dozens of participants. While this data provides a strong basis for Geographic Information Science research, it also constitutes an invasion of the participants' privacy and can provide more information than researchers require to answer their questions. As an ethical and practical consideration, researchers should gather only as much data as they need. In this paper, we take three weeks of GPS traces from over a hundred student participants in mobile phone-based tracking studies and show that fewer than 14 days of data is necessary to establish complete activity spaces. We define 'complete' as the point at which marginal information gains become negligible according to a pairwise temporal analysis of the Kullback-Leibler (KL) divergence of the spatial (bivariate) histogram through time. For the fixed level of information difference, observable in the data, impacts due to individual variability, population composition, and spatial resolution are evident. However, all populations at each level of resolution examined in the paper demonstrated convergence to low divergence levels occurred within a matter of days, and to negligible information gain in less than two weeks. The methods described in the paper represent a novel metric useful to understand the interaction between measurements and information in human mobility.
Measuring the temporal instability of land change using the Flow matrix
Enhancing Areal Interpolation Frameworks through Dasymetric Refinement to Create Consistent Population Estimates across Censuses
To assess micro-scale population dynamics effectively, demographic variables should be available over temporally consistent small area units. However, fine-resolution census boundaries often change between survey years. This research advances areal interpolation methods with dasymetric refinement to create accurate consistent population estimates in 1990 and 2000 (source zones) within tract boundaries of the 2010 census (target zones) for five demographically distinct counties in the U.S. Three levels of dasymetric refinement of source and target zones are evaluated. First, residential parcels are used as a binary ancillary variable prior to regular areal interpolation methods. Second, Expectation Maximization (EM) and its data-extended version leverage housing types of residential parcels as a related ancillary variable. Finally, a third refinement strategy to mitigate the overestimation effect of large residential parcels in rural areas uses road buffers and developed land cover classes. Results suggest the effectiveness of all three levels of dasymetric refinement in reducing estimation errors. They provide a first insight into the potential accuracy improvement achievable in varying geographic and demographic settings but also through the combination of different refinement strategies in parts of a study area. Such improved consistent population estimates are the basis for advanced spatio-temporal demographic research.
The geography of elderly minority populations in the United States
"Because minority populations often have greater needs for health care and fewer resources to pay for it, it is important to assess the demand for services. This paper takes an initial step in that direction by focusing upon the geographic distribution of elderly minority populations in the United States. The study is carried out at several spatial scales, and it is concluded that elderly minority populations tend to be even more segregated than their non-elderly counterparts."
Reasoning cartographic knowledge in deep learning-based map generalization with explainable AI
Cartographic map generalization involves complex rules, and a full automation has still not been achieved, despite many efforts over the past few decades. Pioneering studies show that some map generalization tasks can be partially automated by deep neural networks (DNNs). However, DNNs are still used as black-box models in previous studies. We argue that integrating explainable AI (XAI) into a DL-based map generalization process can give more insights to develop and refine the DNNs by understanding what cartographic knowledge exactly is learned. Following an XAI framework for an empirical case study, visual analytics and quantitative experiments were applied to explain the importance of input features regarding the prediction of a pre-trained ResU-Net model. This experimental case study finds that the XAI-based visualization results can easily be interpreted by human experts. With the proposed XAI workflow, we further find that the DNN pays more attention to the building boundaries than the interior parts of the buildings. We thus suggest that boundary intersection over union is a better evaluation metric than commonly used intersection over union in qualifying raster-based map generalization results. Overall, this study shows the necessity and feasibility of integrating XAI as part of future DL-based map generalization development frameworks.