library(osmenrich)

Introduction

osmenrich is an R package to easily enrich geocoded data (latitude/longitude) with geographic features from OpenStreetMap (OSM). This package is designed to work with the sf and osmdata packages. This package leverages the work provided in sf for the manipulation of simple features (i.e. real-world objects), and osmdata for querying OpenStreetMap data (i.e. geographical data).

This vignette:

  • Explains the basic functioning of the package
  • Shows how to use the package
  • Explains the differences of use cases of this package
  • Directs the user to the right use case for more examples

How does osmenrich work?

Often a user is interested in retrieving information about the location and the closeness of real-world objects. If the objects in a dataset have geocoded data (latitude/longitude), then this package enables the user to interact with these objects and enrich them with information about other objects around them. We call the object in the dataset " reference points “, while the objects we are interested in retrieving” feature points ".

Therefore, if a dataset contains geocoded data, with this package one can extract information about real-world object around each of the objects contained in the data, compute their distance/duration from the objects and then enrich the dataset with this information. The result is a tidy sf dataset.

To do this, the package needs to connect to a server containing OpenStreetMap data and one (or more) servers containing routing engines - used to compute durations and distances.

How to use osmenrich

If you do have this package, due to recent changes in GitHub’s naming of branches, please make sure you have the latest version of remotes or at least version 2.2 (install.packages("remotes")).

Once you did this, to continue the installation of the osmenrich package, run:

osmenrich can be installed with the remotes package from GitHub with

remotes::install_github("sodascience/osmenrich@main")

and then load it in the usual way:

library(osmenrich)

Out of the box osmenrich uses pubic remote servers to retrieve OSM data and to compute distances/durations from reference points to feature points.

Use of remote and local servers

As stated above, osmenrich makes use of an OSM server and one or more OSRM servers to retrieve OSM data ( feature points ) and to calculate metrics such as distances and durations. The OSM feature points available can be found by: 1. Visiting the OSM wiki: https://wiki.openstreetmap.org/wiki/Map_features. 2. Loading the osmdata (library(osmdata)) and calling the function available_features() and available_tags().

The basic data enrichment will work without having to setup any one of these server locally, thanks to publicly available servers. However, for large data enrichment tasks and for tasks involving the computation of durations between reference points and feature points and/or the computation of custom distances or durations between these points (such as the distances between two points computed on a walking distance or cycling), the setup of one or more of these servers is required.

We created a GitHub repository hosting the instruction and the docker_compose.yml files needed to setup these servers.

Package use cases

To facilitate the routing of users to the right setup for their need, we provide some use cases and their respective recommended setup:

  • Base Example Target use-case: the user only wants to enrich adding nearby features. The user is not interested in distances nor durations as measured by OSRM servers.
    • Size of data to be requested: limited
    • Setup servers via Docker: not necessary. The connection relies on public servers.
    • Link to example: Basic Example
  • Normal Example Target use-case: the user only wants to enrich adding features for a large area. The user might be interested in distances but does not require a specific metric (i.e. car vs. foot vs. bike)
    • Size of data to be requested: theoretically unlimited
    • Setup servers via Docker: setup only the overpass (OSM) server. The OSRM connection will rely on public servers (only car distances available!)
    • Link to example: Normal Example
  • Advanced Example
    • Target use-case: the user wants to enrich adding features for a large area and/or is interested in specific metrics for distances/durations (i.e. foot or bike)
    • Size of data to be requested: theoretically unlimited
    • Setup servers via Docker: use the docker_compose.yml to setup both the overpass (OSM) and all three OSRM servers.
    • Link to example: Advanced Example - Vignette

Examples

Base Example

Let’s enrich a spatial (sf) dataset (sf_example) with the number of waste baskets in a radius of 500 meters from each of the point specified in a dataset:

# Import libraries
library(tidyverse)
library(sf)
library(osmenrich)

# Create an example dataset to enrich
sf_example <-
  tribble(
    ~person,  ~lat,   ~lon,
    "Alice",  52.12,  5.09,
    "Bob",    52.13,  5.08,
  ) %>%
  sf::st_as_sf(
    coords = c("lon", "lat"),
    crs = 4326
  )

# Print it
sf_example
#> Simple feature collection with 2 features and 1 field
#> geometry type:  POINT
#> dimension:      XY
#> bbox:           xmin: 5.08 ymin: 52.12 xmax: 5.09 ymax: 52.13
#> CRS:            EPSG:4326
#> # A tibble: 2 x 2
#>   person     geometry
#> * <chr>   <POINT [°]>
#> 1 Alice  (5.09 52.12)
#> 2 Bob    (5.08 52.13)

To enrich the sf_example dataset with “waste baskets” in a 500m radius, we create a query using the enrich_osm() function. This function uses the bounding box created by the points present in the example dataset and searches for the specified key = "amenity" and value = "waste_basket. We also add a custom name for the newly created column and specify the radius (r) used in the search.

# Simple OSMEnrich query
sf_example_enriched <- sf_example %>%
  enrich_osm(
    name = "n_waste_baskets",
    key = "amenity",
    value = "waste_basket",
    r = 500
  )
#> Downloading data for waste_baskets... Done.
#> Downloaded 147 points, 0 lines, 0 polygons, 0 mlines, 0 mpolygons.
#> Computing distance matrix for waste_baskets...Done.

sf_example_enriched
#> Simple feature collection with 2 features and 2 fields
#> geometry type:  POINT
#> dimension:      XY
#> bbox:           xmin: 5.08 ymin: 52.12 xmax: 5.09 ymax: 52.13
#> geographic CRS: WGS 84
#> # A tibble: 2 x 3
#>   person     geometry waste_baskets
#> * <chr>   <POINT [°]>         <int>
#> 1 Alice  (5.09 52.12)            75
#> 2 Bob    (5.08 52.13)             1

The waste baskets column is now the result of summing all the wastebaskets in a 500 meter radius for Alice and Bob:

Normal Example

Normal enrichment example

Using the example dataset sf_example specified in the previous example, we continue with a more advanced enrichment example. Here, we use a number of additional available variables to specify our initial “waste_baskets” query. We add the following:

  • type: "points": we specify that we are interested only in retrieving points from OSM. In this example there will not be a difference, however when querying different types of objects this might help us reduce the the amount of data retrieved.
  • distance: "distance_by_car": we are not anymore interested in just retrieving the number of points in a certain area (given by the radius r), but we now want to retrieve the sum of the driving distances from a point to all the waste_baskets within radius r.
  • kernel: "parabola": we can specify the kernel function used in summarizing the features retrieved (in this example waste_baskets). Kernels convert distance or duration vectors to single numbers, with a certain weight for certain distances. This package also support the usage of custom kernel functions.

In this example, we make use of a local instance of the OSRM server to query the driving distances (distance = "distance_by_car"). Follow the instructions in section osmenrich Docker repository to set it up. Otherwise, out-of-the-box this package will support querying only driving distances. If you are interested in querying distances or durations for other means of transportation, you will need to set up local OSRM instances.

# Specify the address of local OSRM instance
# options(osrm.server = "http://localhost:<port>/")
options(osrm.server = "http://localhost:8080/")
# You can specify also the address of the Overpass (OSM) instance
# osmdata::set_overpass_url("http://localhost:<port>/api/interpreter")
osmdata::set_overpass_url("http://localhost:8888/api/interpreter")

# Advanced OSMEnrich query
sf_example_advanced <- sf_example %>%
  enrich_osm(
    name = "waste_baskets",
    key = "amenity",
    value = "waste_basket",
    type = "points",
    distance = "distance_by_foot",
    kernel = "uniform",
    r = 100
  )

sf_example_advanced
# > Simple feature collection with 2 features and 4 fields
# > geometry type:  POINT
# > dimension:      XY
# > bbox:           xmin: 5.08 ymin: 52.12 xmax: 5.09 ymax: 52.13
# > CRS:            EPSG:4326
# > # A tibble: 2 x 5
# >   person    id   val     geometry waste_baskets
# > * <chr>  <dbl> <int>  <POINT [°]>         <int>
# > 1 Alice      1     5 (5.09 52.12)             1
# > 2 Bob        2     2 (5.08 52.13)             0

Advanced Example

For a more advanced example in which osmenrich is put to use with other packages, please refer to this tutorial.