Subscribe to R-spatial feed
R Spatial software blogs and ideas 2020-06-17T13:01:58+00:00 Jekyll v3.8.7
Updated: 11 hours 32 min ago

Plotting and subsetting stars objects

Thu, 03/22/2018 - 01:00

Plotting and subsetting stars objects

Thu, 03/22/2018 - 01:00

RSAGA 1.0.0

Tue, 03/20/2018 - 01:00

sf 0.6-0 news

Mon, 01/08/2018 - 01:00

[view raw Rmd]

Version 0.6-0 of the sf package (an R package for handling vector geometries in R) has been released to CRAN. It contains several innovations, summarized in the NEWS file. This blog post will illustrate some of these further.

Ring directions

Consider the following two polygons:

library(sf) ## Loading required package: methods ## Linking to GEOS 3.5.1, GDAL 2.1.2, proj.4 4.9.3 p1 = rbind(c(0,0), c(1,0), c(1,1), c(0,1), c(0,0)) p2 = 0.5 * p1 + 0.25 (pol1 = st_polygon(list(p1, p2))) ## POLYGON ((0 0, 1 0, 1 1, 0 1, 0 0), (0.25 0.25, 0.75 0.25, 0.75 0.75, 0.25 0.75, 0.25 0.25)) (pol2 = st_polygon(list(p1, p2[5:1,]))) ## POLYGON ((0 0, 1 0, 1 1, 0 1, 0 0), (0.25 0.25, 0.25 0.75, 0.75 0.75, 0.75 0.25, 0.25 0.25)) opar = par(mfrow = c(1, 2), mar = rep(0, 4)) plot(pol1, col = grey(.8), rule = "winding") plot(pol2, col = grey(.8), rule = "winding")

Although the simple feature standard describes that all secondary rings indicate holes, it also specifies that outer rings should be counter clockwise and inner rings (holes) clockwise. It doesn’t specify that polygons for which the hole has the same ring direction as the outer ring are invalid - and they aren’t. But how should software deal with them? In prior sf versions, plot (and ggplot) would take the winding rule, requiring holes to have the opposite direction as outer rings. This has been changed into evenodd as default, which plots both cases with holes:

library(sf) p1 = rbind(c(0,0), c(1,0), c(1,1), c(0,1), c(0,0)) p2 = 0.5 * p1 + 0.25 (pol1 = st_polygon(list(p1, p2))) ## POLYGON ((0 0, 1 0, 1 1, 0 1, 0 0), (0.25 0.25, 0.75 0.25, 0.75 0.75, 0.25 0.75, 0.25 0.25)) (pol2 = st_polygon(list(p1, p2[5:1,]))) ## POLYGON ((0 0, 1 0, 1 1, 0 1, 0 0), (0.25 0.25, 0.25 0.75, 0.75 0.75, 0.75 0.25, 0.25 0.25)) opar = par(mfrow = c(1, 2), mar = rep(0, 4)) plot(pol1, col = grey(.8)) # rule = "evenodd" plot(pol2, col = grey(.8)) # rule = "evenodd"

In addition, st_sfc and st_read gained a parameter check_ring_dir, by default FALSE, which when TRUE will check every ring and revert them to counter clockwise for outer, and clockwise for inner (hole) rings. By default this is FALSE because it is an expensive operation for large datasets.

Higher-order geometry differences

This was reported here; two nice graphs are available from ?st_difference

set.seed(131) m = rbind(c(0,0), c(1,0), c(1,1), c(0,1), c(0,0)) p = st_polygon(list(m)) n = 100 l = vector("list", n) for (i in 1:n) l[[i]] = p + 10 * runif(2) s = st_sfc(l) d = st_difference(s) # sequential differences: s1, s2-s1, s3-s2-s1, ... i = st_intersection(s) # all intersections plot(s, col = sf.colors(categorical = TRUE, alpha = .5)) title("overlapping squares")

par(mfrow = c(1, 2), mar = c(0,0,2.4,0)) plot(d, col = sf.colors(categorical = TRUE, alpha = .5)) title("non-overlapping differences") plot(i, col = sf.colors(categorical = TRUE, alpha = .5)) title("non-overlapping intersections")

Spherical geometry

All geometric operations (area, length, intersects, intersection, union etc) provided by the GEOS library assume two-dimensional coordinates. If your data have geographic (longitude-latitude) coordinates, this is may be quite OK when your area is small and close to the equator, otherwise it is not. One way out is to project the data using a suitable projection, the other is to use spherical geometry: algorithms that compute on the sphere (or, more precisely, on the spheroid. This has a number of advantages:

  • it is easy, you have no-worries about which projection to choose
  • it is always correct

however, it comes at some computational cost.

Spherical geometry functions were formerly taken from R package geosphere; with the new sf they use package lwgeom, which interfaces liblwgeom, the library that is also used by PostGIS. (and the development of which was funded by palantir).

Liblwgeom functions st_make_valid, st_geohash, st_split, which were formerly in sf have now been moved to lwgeom. Other functions in lwgeom enable the following functions to work with geographic coordinates: st_length, st_area st_distance, st_is_within_distance, st_segmentize.

Where geosphere could only compute distances between points, st_distance now computes distances between arbitrary simple feature geometries. st_distance is clearly slower when computed on a spheroid than when computed on the sphere. For point data, faster results are obtained when we assume the Earth is a sphere:

n = 2000 df = data.frame(x = runif(n), y = runif(n)) pts = st_as_sf(df, coords = c("x", "y")) system.time(x0 <- st_distance(pts)) ## user system elapsed ## 3.456 0.008 3.479 st_crs(pts) = 4326 # spheroid system.time(x1 <- st_distance(pts)) ## user system elapsed ## 5.564 0.016 5.594 st_crs(pts) = "+proj=longlat +ellps=sphere" # sphere ## Warning: st_crs<- : replacing crs does not reproject data; use st_transform ## for that system.time(x2 <- st_distance(pts)) ## user system elapsed ## 1.220 0.024 1.246 system.time(x3 <- dist(as.matrix(df))) ## user system elapsed ## 0.012 0.000 0.010 Hausdorff and Frechet distance

For two-dimensional (flat) geometries, st_distance now has the option of computing Hausdorff distances, and (if sf was linked to GEOS 3.7.0) Frechet distances.


For two-dimensional (flat) geometries, st_snap is now available; we refer to the PostGIS documentation for examples what it does.

join to largest matching feature

This feature was reported and illustrated here:

sf::st_join with “largest=TRUE” now joins to the single largest intersecting feature: #rspatial #rstats

— Edzer Pebesma (@edzerpebesma) December 3, 2017

polygon geometries have zero length

Function st_length now returns zero for non-linear geometries including polygons. For length of polygon rings, st_cast to MULTILINESTRING first.

printing coordinates now honors digits setting

Printing of geometries, as well as st_as_text now use the default digits of R:

st_point(c(1/3, 1/6)) ## POINT (0.3333333 0.1666667) options(digits = 3) st_point(c(1/3, 1/6)) ## POINT (0.333 0.167) st_as_text(st_point(c(1/3, 1/6)), digits = 16) ## [1] "POINT (0.3333333333333333 0.1666666666666667)"

Before sf 0.6, as.character was used, which used around 16 digits.

sf 0.6-0 news

Mon, 01/08/2018 - 01:00

Spatiotemporal arrays for R - blog one

Thu, 11/23/2017 - 01:00

Tidy storm trajectories

Mon, 08/28/2017 - 02:00

Setting up large scale OSM environments for R using Osmosis and PostgreSQL with PostGIS

Fri, 07/14/2017 - 02:00

Importing OpenStreetMap (OSM) data into R can sometimes be rather difficult, especially when it comes to processing large datasets. There are some packages that aim at easy integration of OSM data, including the very versatile osmar package, that allows you to scrape data directly in R via the OSM API. Packages like osmar,rgdal or sf also offer build-in functions to read the spatial data formats that OSM data comes along with.

However, these packages reach their limits when it comes to larger datasets and running the programmes on weak machines. I want to introduce an easy way to set up an environment to process large OSM datasets in R, using the Java application Osmosis and the open-source database PostgreSQL with the PostGIS extension for spatial data.

This tutorial was created using a Asus Zenbook UX32LA with a i5-4200U CPU, 8 GB RAM and a 250 GB SSD, running on Windows

  1. The data used has a size of 1.9 GB (unzipped). Under this setting, OSM data import using osmar, rgdal and sf takes up several hours, if not days, especially if you want to continue using your system. The following steps thus show a way to set up larger spatial data environments using the PostgreSQL database scheme and how to easily import and set up this data in R.
Getting OSM data

The place to extract large OSM datasets is the file dump Planet.osm, which can be found here:

Here, we can download all available OSM data or search for extracts from our area of interest. I am interested in downloading the most recent OSM data for Greater London, which for instance is provided by Geofabrik. This archive offers OSM data for predefined layers like countries, states or urban areas. The London data that I will be using in this tutorial can be found here:

I download the file greater-london-latest.osm.pbf, conatining the complete dataset for the Greater London area. Note that this file is updated regularly.

Setting up PostgreSQL and PostGIS

We now need to download and install PostgreSQL with the PostGIS extension. A detailed explanation on how to install PostgreSQL can be found here:

Make sure you note username, password and the port you use for the installation. After PostgreSQL is installed on the system, PostGIS can be added as described here:

Now, open pgAdmin to set up your database. We can create new databases by clicking on Object – Create – Database, inserting a name of your choice, e.g. London OSM.

My new database London OSM is now in place and can be prepared for data import. We have to create two extensions to our database, using a SQL script. We navigate into the new database and open the script command line by clicking on Object – CREATE Script and execute two commands:


These extensions should now show up when openng the Extensions path in our London OSM database.

Setting up Osmosis and importing the data

The tool connecting our dataset with PostgreSQL is called Osmosis. It is a command line Java application and can be used to read, write and manipulate OSM data. The latest stable version including detailed installation information for different OS can be found here: (Note that Osmosis requires the Java Runtime Environment, which can be downloaded at

If you are using Windows, you can navigate into the Osmosis installation folder, e.g. C:\Program Files (x86)\osmosis\bin\, and open osmosis.bat. Double clicking this file opens the Osmosis command line. To keep the Osmosis window open, create a shortcut to the osmosis.bat file, open its properties and add cmd /k at the beginning of the target in the shortcut tab. The Osmosis output should look like this:

We now have to prepare our PostgreSQL database for the OSM data import (courtesy of Stackexchange user zehpunktbarron). Navigate back into pgAdmin and the OSM London database and create a new script via Object – CREATE Script. Now, execute the SQL code that you find in two of the files that you created when installing Osmosis. First execute the code from [PATH_TO_OSMOSIS]\script\pgsnapshot_schema_0.6.sql and afterwards the code from [PATH_TO_OSMOSIS]\script\pgsnapshot_schema_0.6_linestring.sql.

Now, add indices to the database to better process the data. Execute the following SQL commands in the script:

  • CREATE INDEX idx_nodes_tags ON nodes USING GIN(tags);
  • CREATE INDEX idx_ways_tags ON ways USING GIN(tags);
  • CREATE INDEX idx_relations_tags ON relations USING GIN(tags);

We have now successfully prepared our database for the OSM import. Open Osmosis and run the following command to import the previously downloaded .pbf file:

"[PATH_TO_OSMOSIS]\bin\osmosis" --read-pbf file="[PATH_TO_OSM_FILE]\greater-london-latest.osm.pbf" --write-pgsql host="localhost" database="London OSM" user="YOUR_USERNAME" password="YOUR_PASSWORD"

Note that if the .pbf file is larger, this process might take a while – also depending on the specs of your system. If the data import was successful, this should give you an output that looks like this:

Accessing PostgreSQL databases in R

Our freshly imported database is now ready to be accessed via R. Connecting to the PostgreSQL database requires the R package RPostgreSQL. First, we load the PostgreSQL driver and connect to the database using our credentials:

require(RPostgreSQL) # LOAD POSTGRESQL DRIVER driver <- dbDriver("PostgreSQL") # CREATE CONNECTION TO THE POSTGRESQL DATABASE # THE CONNECTION VARIABLE WILL BE USED FOR ALL FURTHER OPERATIONS connection <- dbConnect(driver, dbname = "London OSM", host = "localhost", port = 5432, user = "YOUR_USERNAME", password = "YOUR_PASSWORD")

We can now check, whether we have successfully established a connection to our database using a simple command:

dbExistsTable(connection, "lines") ## [1] TRUE

We have now set up the environment to load OSM data into R flawlessly. Note that queries using RPostgreSQL are written in the SQL syntax. Further information on the use of the RPostgreSQL package can be found here:

Creating spatial data frames in R

In the last step of this tutorial we will explore how to put the accessed data to work and how to properly establish the geographical reference. We first load data into the R environment, using a RPostgreSQL query. The following query creates a data.frame with all available OSM point data. We use the PostGIS command ST_AsText on the wkb_geometry column to return the Well Known Text (WKT) geometries and save it in the newly created column geom. After that, we delete the now redundant wkb_geometry column.

#LOAD POINT DATA FROM OSM DATABASE points <- dbGetQuery(connection, "SELECT * , ST_AsText(wkb_geometry) AS geom from points") points$wkb_geometry <- NULL

The points data frame contains all available OSM point data, including the several different tagging schemes, which can be further explored looking at OSMs’ map features:

head(points) ## ogc_fid osm_id name barrier highway ## 1 1 1 Prime Meridian of the World <NA> <NA> ## 2 2 99941 <NA> lift_gate <NA> ## 3 3 101831 <NA> <NA> crossing ## 4 4 101833 <NA> <NA> crossing ## 5 5 101839 <NA> <NA> traffic_signals ## 6 6 101843 <NA> <NA> traffic_signals ## ref address is_in place man_made ## 1 <NA> <NA> <NA> <NA> <NA> ## 2 <NA> <NA> <NA> <NA> <NA> ## 3 <NA> <NA> <NA> <NA> <NA> ## 4 <NA> <NA> <NA> <NA> <NA> ## 5 <NA> <NA> <NA> <NA> <NA> ## 6 <NA> <NA> <NA> <NA> <NA> ## other_tags ## 1 "historic"=>"memorial","memorial"=>"stone" ## 2 <NA> ## 3 "crossing"=>"traffic_signals","crossing_ref"=>"pelican" ## 4 "crossing"=>"island" ## 5 <NA> ## 6 <NA> ## geom ## 1 POINT(-0.0014863 51.4779481) ## 2 POINT(-0.1553793 51.5231639) ## 3 POINT(-0.1470438 51.5356116) ## 4 POINT(-0.1588224 51.5350894) ## 5 POINT(-0.1526586 51.5375096) ## 6 POINT(-0.163653 51.534922)

Now, to get the geometry working, we can transform the data frame into a spatial data frame using the sf package. Note that I have to set a coordinate reference system (CRS), in this case the WGS84 projection:

#SET COORDINATE SYSTEM require(sf) points <- st_as_sf(points, wkt="geom") %>% `st_crs<-`(4326)

We can now scrape our dataset for the data we are looking for, e.g. all bicycle parking spots (see Since sf data is stored in spatial data frames, we can easily create a subset containing our desired points - e.g. using the filter function from the dplyr package and str_detect from stringr:

require(dplyr) require(stringr) #EXTRACTING ALL POINTS TAGGED 'BUS_STOP' bikepark <- points %>% filter(str_detect(other_tags, "bicycle_parking"))

Additionally to all bike parking spots, we also want to include all explicitly marked bicycle routes, as found in the lines data. These can be extracted from OSM relation data via the cycleway tag:

We can contrast cycleways from the regular road network by also selecting the most common road types (see Note that after our final PostgreSQL query, we close the connection using the dbDisconnect command in order to not overload the driver.

#LOAD RELATION DATA lines <- dbGetQuery(connection, "SELECT * , ST_AsText(wkb_geometry) AS geom from lines") lines$wkb_geometry <- NULL dbDisconnect(connection) ## [1] TRUE #SET CRS lines <- st_as_sf(lines, wkt="geom") %>% `st_crs<-`(4326) #SUBSET CYCLEWAYS cycleways <- lines %>% filter(highway=="cycleway") #SUBSET OTHER STREETS streets <- lines %>% filter(highway=="motorway" | highway=="trunk" | highway=="primary" | highway=="secondary" | highway=="tertiary")

Having created subsets for bicycle parking spots, cycleways and regular roads. We finally plot our data using ggplot2 and the geom_sf function:

require(ggplot2) #PLOTTING ALL BUS STOPS ggplot(bikepark) + geom_sf(data=streets,aes(colour="lightgrey")) + #Requires development version of ggplot2: devtools::install_github("tidyverse/ggplot2") geom_sf(data=cycleways,aes(colour="turquoise")) + geom_sf(data=bikepark,aes(colour="turquoise4"),shape=".") + coord_sf(crs = st_crs(bikepark)) + ggtitle("Biking infrastructure (parking + cycleways) in London") + scale_colour_manual("",values = c("lightgrey","turquoise","turquoise4"),labels=c("Other Roads","Cycleways","Bike Parking")) + theme_void() + theme(legend.position="bottom")

UseR! 2017 Spatial Tutorials

Thu, 06/29/2017 - 02:00

This year’s UseR! (next week Tuesday!!) will have two spatial tutorials:

1. Geospatial visualization using R (morning)

Bashkar Karambelkar posted earlier that attendees should download his docker image by Jul 1, by

docker pull bhaskarvk/rgeodataviz 2. Spatial data in R: new directions (afternoon)

My tutorial deals mostly with sf, the material (html, R markdown) is found at here:

I’m looking forward to it!

Related posts:

UseR! 2016 tutorial

Spatial indexes coming to sf

Thu, 06/22/2017 - 02:00

Spatial indexes give you fast results on spatial queries, such as finding whether pairs of geometries intersect or touch, or finding their intersection. They reduce the time to get results from quadratic in the number of geometries to linear in the number of geometries. A recent commit brings spatial indexes to sf for the binary logical predicates (intersects, touches, crosses, within, contains, contains_properly, overlaps, covers, covered_by), as well as the binary predicates that yield geometries (intersection, union, difference, sym_difference).

The spatial join function st_join using a logical predicate to join features, and aggregate or summarise using union to union aggregated feature geometries, are also affected by this speedup.


There have been attempts to use spatial planar indices, including enhancement issue sfr:76. In rgeos, GEOS STRtrees were used in rgeos/src/rgeos_poly2nb.c, which is mirrored in a modern Rcpp setting sf/src/geos.cpp, around lines 276 and 551. The STRtree is constructed by building envelopes (bounding boxes) of input entities, which are then queried for intersection with envelopes of another set of entities (in rgeos, R functions gUnarySTRtreeQuery and gBinarySTRtreeQuery). The use case was to find neighbours of all the about 90,000 US Census entities in Los Angeles, via spdep::poly2nb(), which received an argument to enter the candidate neighbours found by Unary querying the STRtree of entities by the same entities.


A simple benchmark shows the obvious: st_intersects without spatial index behaves quadratic in the number of geometries (black line), and is much faster for the case where a spatial index is created, stronger so for larger number of polygons:

The polygon datasets used are simple checker boards with square polygons (showing a nice Moiré pattern):

The black small square polygons are essentially matched to the red ones; the number of polygons along the x axis is the number of a single geometry set (black).

To show that the behaviour of intersects and intersection is indeed linear in the number of polygons, we show runtimes for both, as a function of the number of polygons (where intersection was divided by 10 for scaling purposes):


Spatial indexes are available in the GEOS library used by sf, through the functions starting with STRtree. The algorithm implements a Sort-Tile-Recursive R-tree, according to the JTS documentation described in P. Rigaux, Michel Scholl and Agnes Voisard. Spatial Databases With Application To GIS. Morgan Kaufmann, San Francisco, 2002.

The sf implementation (some commits to follow this one) excludes some binary operations. st_distance, st_relate, and st_relate_pattern, as these all need to go through all combinations, rather than a subset found by checking for overlapping bounding boxes. st_equals_exact and st_equals are excluded because they do not have an implementation for prepared geometries. st_disjoint could benefit from the search tree, but needs a dedicated own implementation.

On which argument is an index built?

The R-tree is built on the first argument (x), and used to match all geometries over the second argument (y) of binary functions. This could give runtime differences, but for instance for the dataset that triggered this development in sfr:394, we see hardly any difference:

library(sf) # Linking to GEOS 3.5.1, GDAL 2.1.3, proj.4 4.9.2, lwgeom 2.3.2 r15302 load("test_intersection.Rdata") nrow(test) # [1] 16398 nrow(hsg2) # [1] 6869 system.time(int1 <- st_intersection(test, hsg2)) # user system elapsed # 105.712 0.040 105.758 system.time(int2 <- st_intersection(hsg2, test)) # user system elapsed # 107.756 0.060 107.822 # Warning messages: # 1: attribute variables are assumed to be spatially constant throughout all geometries # 2: attribute variables are assumed to be spatially constant throughout all geometries

The resulting feature sets int1 and int2 are identical, only the order of the features (records) and of the attribute columns (variables) differs. (Runtime without index is 35 minutes, 20 times as long.)

Is the spatial index always built?

In the current implemenation it is always built of logical predicates when argument prepare = TRUE, which means by default. This made it easier to run benchmarks, and I strongly doubt anyone ever sets prepare = FALSE. This may change, to have them always built.

It would be nice to also have them on st_relate and st_relate_pattern, e.g. for rook or queen neighborhood selections (sfr:234), but this still requires some work, since two non-intersecting geometries have a predictable but not a constant relationship.

What about prepared geometries?

Prepared geometries in GEOS are essentially indexes over single geometries and not over sets of geometries; they speed things up in particular when single geometries are very complex, and only for a single geometry to single geometry comparison. The spatial indexes are indexes over collections of geometries; they make a cheap preselection based on bounding boxes before the expensive pairwise comparison takes place.

Script used

The followinig script was used to create the benchmark plots.

library(sf) sizes = c(10, 20, 50, 100, 160, 200) res = matrix(NA, length(sizes), 4) for (i in seq_along(sizes)) { g1 = st_make_grid(st_polygon(list(rbind(c(0,0),c(0,1),c(1,1),c(0,1),c(0,0)))), n = sizes[i]) * sizes[i] g2 = g1 + c(.5,.5) res[i, 1] = system.time(i1 <- st_intersects(g1, g2))[1] res[i, 2] = system.time(i2 <- st_intersects(g1, g2, prepare = FALSE))[1] res[i, 3] = system.time(i1 <- st_intersection(g1, g2))[1] res[i, 4] = identical(i1, i2) } plot(sizes^2, res[,2], type = 'b', ylab = 'time [s]', xlab = '# of polygons') lines(sizes^2, res[,1], type = 'b', col = 'red') legend("topleft", lty = c(1,1), col = c(1,2), legend = c("st_intersects without index", "st_intersects with spatial index")) plot(sizes^2, res[,3]/10, type = 'b', ylab = 'time [s]', xlab = '# of polygons') lines(sizes^2, res[,1], type = 'b', col = 'red') legend("topleft", lty = c(1,1), col = c(1,2), legend = c("st_intersection * 0.1", "st_intersects"))