osmenrich.Rmd
library(osmenrich)
osmenrich
is an R package to easily enrich geocoded data (latitude/longitude) with geographic features from OpenStreetMap (OSM). This package is designed to work with the sf
and osmdata
packages. This package leverages the work provided in sf
for the manipulation of simple features (i.e. real-world objects), and osmdata
for querying OpenStreetMap data (i.e. geographical data).
This vignette:
osmenrich
work?Often a user is interested in retrieving information about the location and the closeness of real-world objects. If the objects in a dataset have geocoded data (latitude/longitude), then this package enables the user to interact with these objects and enrich them with information about other objects around them. We call the object in the dataset " reference points “, while the objects we are interested in retrieving” feature points ".
Therefore, if a dataset contains geocoded data, with this package one can extract information about real-world object around each of the objects contained in the data, compute their distance/duration from the objects and then enrich the dataset with this information. The result is a tidy sf
dataset.
To do this, the package needs to connect to a server containing OpenStreetMap data and one (or more) servers containing routing engines - used to compute durations and distances.
osmenrich
If you do have this package, due to recent changes in GitHub’s naming of branches, please make sure you have the latest version of remotes
or at least version 2.2
(install.packages("remotes")
).
Once you did this, to continue the installation of the osmenrich
package, run:
osmenrich
can be installed with the remotes package from GitHub with
remotes::install_github("sodascience/osmenrich@main")
and then load it in the usual way:
library(osmenrich)
Out of the box osmenrich
uses pubic remote servers to retrieve OSM data and to compute distances/durations from reference points to feature points.
As stated above, osmenrich
makes use of an OSM server and one or more OSRM servers to retrieve OSM data ( feature points ) and to calculate metrics such as distances and durations. The OSM feature points available can be found by: 1. Visiting the OSM wiki: https://wiki.openstreetmap.org/wiki/Map_features. 2. Loading the osmdata
(library(osmdata)
) and calling the function available_features()
and available_tags()
.
The basic data enrichment will work without having to setup any one of these server locally, thanks to publicly available servers. However, for large data enrichment tasks and for tasks involving the computation of durations between reference points and feature points and/or the computation of custom distances or durations between these points (such as the distances between two points computed on a walking distance or cycling), the setup of one or more of these servers is required.
We created a GitHub repository hosting the instruction and the docker_compose.yml
files needed to setup these servers.
To facilitate the routing of users to the right setup for their need, we provide some use cases and their respective recommended setup:
overpass
(OSM) server. The OSRM connection will rely on public servers (only car distances available!)docker_compose.yml
to setup both the overpass
(OSM) and all three OSRM
servers.Let’s enrich a spatial (sf
) dataset (sf_example
) with the number of waste baskets in a radius of 500 meters from each of the point specified in a dataset:
# Import libraries
library(tidyverse)
library(sf)
library(osmenrich)
# Create an example dataset to enrich
sf_example <-
tribble(
~person, ~lat, ~lon,
"Alice", 52.12, 5.09,
"Bob", 52.13, 5.08,
) %>%
sf::st_as_sf(
coords = c("lon", "lat"),
crs = 4326
)
# Print it
sf_example
#> Simple feature collection with 2 features and 1 field
#> geometry type: POINT
#> dimension: XY
#> bbox: xmin: 5.08 ymin: 52.12 xmax: 5.09 ymax: 52.13
#> CRS: EPSG:4326
#> # A tibble: 2 x 2
#> person geometry
#> * <chr> <POINT [°]>
#> 1 Alice (5.09 52.12)
#> 2 Bob (5.08 52.13)
To enrich the sf_example
dataset with “waste baskets” in a 500m radius, we create a query using the enrich_osm()
function. This function uses the bounding box created by the points present in the example dataset and searches for the specified key = "amenity"
and value = "waste_basket
. We also add a custom name
for the newly created column and specify the radius (r
) used in the search.
# Simple OSMEnrich query
sf_example_enriched <- sf_example %>%
enrich_osm(
name = "n_waste_baskets",
key = "amenity",
value = "waste_basket",
r = 500
)
#> Downloading data for waste_baskets... Done.
#> Downloaded 147 points, 0 lines, 0 polygons, 0 mlines, 0 mpolygons.
#> Computing distance matrix for waste_baskets...Done.
sf_example_enriched
#> Simple feature collection with 2 features and 2 fields
#> geometry type: POINT
#> dimension: XY
#> bbox: xmin: 5.08 ymin: 52.12 xmax: 5.09 ymax: 52.13
#> geographic CRS: WGS 84
#> # A tibble: 2 x 3
#> person geometry waste_baskets
#> * <chr> <POINT [°]> <int>
#> 1 Alice (5.09 52.12) 75
#> 2 Bob (5.08 52.13) 1
The waste baskets column is now the result of summing all the wastebaskets in a 500 meter radius for Alice and Bob:
Using the example dataset sf_example
specified in the previous example, we continue with a more advanced enrichment example. Here, we use a number of additional available variables to specify our initial “waste_baskets” query. We add the following:
type: "points"
: we specify that we are interested only in retrieving points from OSM. In this example there will not be a difference, however when querying different types of objects this might help us reduce the the amount of data retrieved.distance: "distance_by_car"
: we are not anymore interested in just retrieving the number of points in a certain area (given by the radius r
), but we now want to retrieve the sum of the driving distances from a point to all the waste_baskets within radius r
.kernel: "parabola"
: we can specify the kernel function used in summarizing the features retrieved (in this example waste_baskets). Kernels convert distance or duration vectors to single numbers, with a certain weight for certain distances. This package also support the usage of custom kernel functions.In this example, we make use of a local instance of the OSRM server to query the driving distances (distance = "distance_by_car"
). Follow the instructions in section osmenrich Docker repository to set it up. Otherwise, out-of-the-box this package will support querying only driving distances. If you are interested in querying distances or durations for other means of transportation, you will need to set up local OSRM instances.
# Specify the address of local OSRM instance
# options(osrm.server = "http://localhost:<port>/")
options(osrm.server = "http://localhost:8080/")
# You can specify also the address of the Overpass (OSM) instance
# osmdata::set_overpass_url("http://localhost:<port>/api/interpreter")
osmdata::set_overpass_url("http://localhost:8888/api/interpreter")
# Advanced OSMEnrich query
sf_example_advanced <- sf_example %>%
enrich_osm(
name = "waste_baskets",
key = "amenity",
value = "waste_basket",
type = "points",
distance = "distance_by_foot",
kernel = "uniform",
r = 100
)
sf_example_advanced
# > Simple feature collection with 2 features and 4 fields
# > geometry type: POINT
# > dimension: XY
# > bbox: xmin: 5.08 ymin: 52.12 xmax: 5.09 ymax: 52.13
# > CRS: EPSG:4326
# > # A tibble: 2 x 5
# > person id val geometry waste_baskets
# > * <chr> <dbl> <int> <POINT [°]> <int>
# > 1 Alice 1 5 (5.09 52.12) 1
# > 2 Bob 2 2 (5.08 52.13) 0
For a more advanced example in which osmenrich
is put to use with other packages, please refer to this tutorial.