## Error message

• Deprecated function: ini_set(): Use of mbstring.http_input is deprecated in include_once() (line 654 of /home/spatiala/public_html/geostat-course.org/sites/default/settings.php).
• Deprecated function: ini_set(): Use of mbstring.http_output is deprecated in include_once() (line 655 of /home/spatiala/public_html/geostat-course.org/sites/default/settings.php).
R Spatial software blogs and ideas 2020-06-17T13:01:58+00:00 Jekyll v3.8.7 https://avatars1.githubusercontent.com/u/25086656
Updated: 9 hours 26 min ago

### In r-spatial, the Earth is no longer flat

Wed, 06/17/2020 - 02:00

Summary: Package sf is undergoing a major change: all operations on geographical coordinates (degrees longitude, latitude) will use the s2 package, which interfaces the S2 geometry library for spherical geometry.

sf up to 0.9-x uses mostly a flat Earth model

Suppose you work with package sf, have loaded some data

library(sf) ## Linking to GEOS 3.8.0, GDAL 3.0.4, PROJ 7.0.0 nc = read_sf(system.file("gpkg/nc.gpkg", package="sf"))

then chances are large that after running e.g.

i = st_intersects(nc)

you’ve run into the following message

## although coordinates are longitude/latitude, st_intersects assumes that they are planar

It indicates that

• your data are in geographical coordinates, degrees longitude/latitude indicating a position on the globe, and
• you are carrying out an operation that assumes these data lie in a flat plane, where one degree longitude equals one degree latitude, irrespective where you are on the world

This means that your data are assumed implicitly to be projected to the equirectangular projection, looking like this:

library(rnaturalearth) ne <- countries110 %>% st_as_sf() %>% st_geometry() plot(ne, axes = TRUE)

A number of operations in sf were actually carried out using ellipsoidal geometries, these included

• computing the area of polygons, which used lwgeom::st_geod_area
• computing length of lines, which used lwgeom::st_geod_length
• computing distance between features, which used lwgeom::st_geod_distance, and
• segmentizing lines along great circles, which used lwgeom::st_geod_segmentize

but for all other operations, despite the degree symbols, everything happens just as if they were in the equivalent equirectangular projection:

st_transform(ne, "+proj=eqc") %>% plot(axes = TRUE)

As an example, consider the polygon from POLYGON((-150 -65, 0 -62, 120 -78, -150 -65)), drawn as a line in this projection

corresponds to this polygon when drawn in a stereographic polar projection:

and does not include the South Pole:

pol = st_as_sfc("POLYGON((-150 -65, 0 -62, 120 -78, -150 -65))") pole = st_as_sfc("POINT(0 -90)") st_contains(pol, pole) ## Sparse geometry binary predicate list of length 1, where the predicate was contains' ## 1: (empty)

(with sf 0.9-x, setting crs to 4326 will have no effect other than printing the familiar warning message)

The functions in sf up to 0.9-x that assume a flat Earth include:

• all binary predicates (intersects, touches, covers, contains, equals, equals_exact, relate, …)
• all geometry generating operators (centroid, intersection, union, difference, sym_difference)
• st_sample
• nearest functions: nearest_point, nearest_feature
• functions or methods using these: st_filter, st_join, agreggate, [, …

In addition to this, a number of ugly “hacks” needed to make things work include:

• polygons and lines crossing the antimeridian (longitude +/- 180) had to be cut in two, using sf::st_wrap_dateline
• polygons containing e.g. the South Pole needed to pass through (-180,-90) and (180,90)
• raster data with longitude ranging from 0 to 360 could not be properly combined with -180,180 data
• for orthographic projections (Earth seen from space, or “spinning globes”), selecting countries on the “visible” half of the Earth was very ugly
sf 1.0: Goodbye flat Earth, welcome S2 spherical geometry

From version 1.0 on, wherever possible, when handling geographic coordinates package sf uses the S2 geometry library for spatial operations. This library was written by Google, and empowers critical parts of Google Earth, Google Maps, Google Earth Engine and Google Bigquery GIS. The s2 R package is a complete rewrite of the original s2 package by Ege Rubak (still on CRAN), mostly done by Dewey Dunnington and Edzer Pebesma.

S2 geometry assumes that straight lines between points on the globe are not formed by straight lines in the equirectangular projection, but by great circles: the shortest path over the sphere. For the polygon above this would give:

where the light green polygon is the “straight line polygon” in equirectangular. Based on the great circle lines, this polygon now contains the South Pole:

pol = st_as_sfc("POLYGON((-150 -65, 0 -62, 120 -78, -150 -65))", crs = 4326) pole = st_as_sfc("POINT(0 -90)", crs = 4326) st_contains(pol, pole) ## Sparse geometry binary predicate list of length 1, where the predicate was contains' ## 1: 1 Where did the ellipsoid go?

Although we know that an ellipsoid better approximates the Earths’ shape, computations done with s2 are all on a spere. The ellipsoidal functions in package lwgeom (lwgeom::st_geod_area, lwgeom::st_geod_length, lwgeom::st_geod_distance, and lwgeom::st_geod_segmentize) can, for now, still be called when setting argument use_lwgeom=TRUE to sf::st_area() etc., but in order to reduce the complexity of maintaining dependencies of package sf this will most likely be deprecated, in which case users will have to call these functions directly when needed.

The difference between ellipsoidal and spherical computations is roughly up to 0.5%; here, for areas it is

units::set_units(1) - mean(st_area(nc) / st_area(nc, use_lwgeom = TRUE)) # difference to ellipsoidal ## 6.446441e-05 [1]

In calls to s2 measures, the radius of the Earth can be specified:

st_area(nc[1,]) # default radius: 6371010 m ## 1137107793 [m^2] st_area(nc[1,], radius = units::set_units(1, m)) # unit sphere ## 2.801464e-05 [m^2] Where did my equirectangular logic go?

You can always get back the old behaviour by projecting your geographic coordinates to the equirectangular projection, and working from there, e.g. by

nc = st_transform(nc, "+proj=eqc") How to test this?

The new s2 package can be installed by

remotes::install_github("r-spatial/s2")

For sf support, as shown above, you now need to install the s2 branch:

remotes::install_github("r-spatial/sf", ref = "s2")

What is next?

This is all part of the R-global R Consortium ISC project, the proposal of which can be found here. Actually using the S2 library is new to us, and we still need to learn much more about

• how to use the S2 library effectively
• what is faster, what is slower than e.g. GEOS, and why
• how to effectively use the S2 indexing structures

In follow-up blog posts we will elaborate on

• differences between geometrical operations in the flat plane and on the sphere
• special features of the S2 library: S2Cells, S2Caps, the full polygon, the virtue of half-open polygons, …
• how all this can be beneficial for spatial analysis, and e.g. with handling raster data

### R spatial follows GDAL and PROJ development

Tue, 03/17/2020 - 01:00
GDAL and PROJ

GDAL and PROJ (formerly proj.4) are two libraries that form the basis, if not foundations, for most open source geospatial software, including most R packages (sf, sp, rgdal, and all their dependencies). The dependency for package sf is for instance pictured here:

Briefly:

• PROJ provides methods for coordinate representation, conversion (projection) and transformation, and
• GDAL allows reading and writing of spatial raster and vector data in a standardised form, and provides a high-level interface to PROJ for these data structures, including the representation of coordinate reference systems (CRS)
gdalbarn

Motivated by the need for higher precision handling of coordinate transformations and the wish to support for a better description of coordinate reference systems (WKT2), a succesful fundraising helped the implementation of a large number of changes in GDAL and PROJ, most notably:

• PROJ changes from (mostly) a projection library into a full geodetic library, taking care of different representations of the shape of the Earth (datums)
• PROJ now has the ability to choose between different transformation paths (pipelines), and can report the precision obtained by each
• rather than distributing datum transformation grids to local users, PROJ (7.0.0 and higher) offers access to an on-line distribution network (CDN) of free transformation grids, thereby allowing for local caching of portions of grids
• PROJ respects authorities (such as EPSG) for determining whether coordinate pairs refer to longitude-latitude (such as 3857), or latitude-longitude (such as 4326)
• GDAL offers the ability to handle coordinate pairs authority-compliant (lat-long for 4326), or “traditional” GIS-compliant (long-lat for 4326)
• use of so-called PROJ4-strings (like +proj=longlat +datum=WGS84) are discouraged, they no longer offer sufficient description of coordinate reference systems; use of +init=epsg:XXXX leads to warnings
• PROJ offers access to a large number of vertical reference systems and reference systems of authorities different from EPSG
crs objects in sf

Pre-0.9 versions of sf used crs (coordinate reference system) objects represented as lists with two components, epsg (possibly set as NA) and proj4string:

library(sf) # Linking to GEOS 3.8.0, GDAL 3.0.2, PROJ 6.2.1 st_crs(4326) # Coordinate Reference System: # EPSG: 4326 # proj4string: "+proj=longlat +datum=WGS84 +no_defs"

now, with sf >= 0.9, crs objects are lists with two components, input and wkt:

library(sf) ## Linking to GEOS 3.8.0, GDAL 3.0.2, PROJ 6.2.1 (x = st_crs(4326)) ## Coordinate Reference System: ## User input: EPSG:4326 ## wkt: ## GEOGCRS["WGS 84", ## DATUM["World Geodetic System 1984", ## ELLIPSOID["WGS 84",6378137,298.257223563, ## LENGTHUNIT["metre",1]]], ## PRIMEM["Greenwich",0, ## ANGLEUNIT["degree",0.0174532925199433]], ## CS[ellipsoidal,2], ## AXIS["geodetic latitude (Lat)",north, ## ORDER[1], ## ANGLEUNIT["degree",0.0174532925199433]], ## AXIS["geodetic longitude (Lon)",east, ## ORDER[2], ## ANGLEUNIT["degree",0.0174532925199433]], ## USAGE[ ## SCOPE["unknown"], ## AREA["World"], ## BBOX[-90,-180,90,180]], ## ID["EPSG",4326]]

where a $method allows for retrieving the epsg and proj4string values: x$epsg ## [1] 4326 x$proj4string ## [1] "+proj=longlat +datum=WGS84 +no_defs" but this means that packages that hard-code for instance x[["proj4string"]] ## NULL now fail to get the result wanted; NULL is not a value that would have occurred in legacy code. Regretably, assignment to a crs object component still works, as the objects are lists, so not all downstream legacy code will fail x$proj4string <- "+proj=longlat +ellps=intl" x$proj4string ## Warning in $.crs(x, proj4string): old-style crs object found: please update ## code ## [1] "+proj=longlat +ellps=intl +no_defs"

Package maintainers and authors of production scripts will need to review their use of crs objects.

Many external data sources provide a WKT CRS directly and as such do not have an “input” field. In such cases, the input field is filled with the CRS name, which is a user-readable representation

st = stars::read_stars(system.file("tif/L7_ETMs.tif", package = "stars")) st_crs(st)$input ## [1] "UTM Zone 25, Southern Hemisphere" but this representation can not be used as input to a CRS: st_crs(st_crs(st)$input) ## Error in st_crs.character(st_crs(st)$input): invalid crs: UTM Zone 25, Southern Hemisphere however wkt fields obviously can be used as input: st_crs(st_crs(st)$wkt) == st_crs(st) ## [1] TRUE CRS objects in sp

When equiped with a new ( >= 1.5.6) rgdal version, sp’s CRS objects carry a comment field with the WKT representation of a CRS:

# install.packages("rgdal", repos="http://R-Forge.R-project.org") library(sp) (x = CRS("+init=epsg:4326")) # or better: CRS(SRS_string='EPSG:4326') ## CRS arguments: +proj=longlat +datum=WGS84 +no_defs cat(comment(x), "\n") ## GEOGCRS["WGS 84", ## DATUM["World Geodetic System 1984", ## ELLIPSOID["WGS 84",6378137,298.257223563, ## LENGTHUNIT["metre",1]], ## ID["EPSG",6326]], ## PRIMEM["Greenwich",0, ## ANGLEUNIT["degree",0.0174532925199433], ## ID["EPSG",8901]], ## CS[ellipsoidal,2], ## AXIS["longitude",east, ## ORDER[1], ## ANGLEUNIT["degree",0.0174532925199433, ## ID["EPSG",9122]]], ## AXIS["latitude",north, ## ORDER[2], ## ANGLEUNIT["degree",0.0174532925199433, ## ID["EPSG",9122]]], ## USAGE[ ## SCOPE["unknown"], ## AREA["World"], ## BBOX[-90,-180,90,180]]]

and it is this WKT representation that is used to communicate with GDAL and PROJ when using packages rgdal or sf. At present, rgdal generates many warnings about discarded PROJ string keys, intended to alert package maintainers and script authors to the need to review code. It is particularly egregious to assign to the CRS object projargs slot directly, and this is unfortunately seem in much code in packages.

Coercion from CRS objects to crs and back

Because workflows often need to combine packages using sp and sf representations, coercion methods from CRS to crs have been updated to use the WKT information; from sp to sf one can use

(x2 <- st_crs(x)) ## Coordinate Reference System: ## User input: WGS 84 ## wkt: ## GEOGCRS["WGS 84", ## DATUM["World Geodetic System 1984", ## ELLIPSOID["WGS 84",6378137,298.257223563, ## LENGTHUNIT["metre",1]], ## ID["EPSG",6326]], ## PRIMEM["Greenwich",0, ## ANGLEUNIT["degree",0.0174532925199433], ## ID["EPSG",8901]], ## CS[ellipsoidal,2], ## AXIS["longitude",east, ## ORDER[1], ## ANGLEUNIT["degree",0.0174532925199433, ## ID["EPSG",9122]]], ## AXIS["latitude",north, ## ORDER[2], ## ANGLEUNIT["degree",0.0174532925199433, ## ID["EPSG",9122]]], ## USAGE[ ## SCOPE["unknown"], ## AREA["World"], ## BBOX[-90,-180,90,180]]]

The sp CRS constructor has been provided with an additional argument SRS_string= which accepts WKT, among other representations

(x3 <- CRS(SRS_string = x2$wkt)) ## CRS arguments: +proj=longlat +datum=WGS84 +no_defs but also (x4 <- as(x2, "CRS")) ## CRS arguments: +proj=longlat +datum=WGS84 +no_defs uses the WKT information when present. all.equal(x, x3) ## [1] TRUE all.equal(x, x4) ## [1] TRUE Axis order R-spatial packages have, for the past 25 years, pretty much assumed that two-dimensional data are XY-ordered, or longitude-latitude. Geodesists, on the other hand, typically use $$(\phi,\lambda)$$, or latitude-longitude, as coordinate pairs; the PROJ logo is now PR$$\phi$$J. If we use geocentric coordinates, there is no logical ordering. Axis direction may also vary; the y-axis index of images typically increases when going south. As pointed out in sf/#1033, there are powers out there that will bring us spatial data with (latitude,longitude) as (X,Y) coordinates. Even stronger, officially, EPSG:4326 has axis order latitude, longitude (see WKT description above). Package sf by default uses a switch in GDAL that brings everything in the old, longitude-latitude order, but data may come in in another ordering. This can now be controlled (to some extent), as st_axis_order can be used to query, and set whether axis ordering is “GIS style” (longitude,latitude; non-authority compliant) or “authority compliant” (often: latitude,longitude): pt = st_sfc(st_point(c(0, 60)), crs = 4326) st_axis_order() # query default: FALSE means interpret pt as (longitude latitude) ## [1] FALSE st_transform(pt, 3857)[[1]] ## POINT (0 8399738) (old_value = st_axis_order(TRUE)) ## [1] FALSE # now interpret pt as (latitude longitude), as EPSG:4326 prescribes: st_axis_order() # query current value ## [1] TRUE st_transform(pt, 3857)[[1]] ## POINT (6679169 0) st_axis_order(old_value) # set back to old value sf::plot is sensitive to this and will swap axis if needed, but for instance ggplot2::geom_sf is not yet aware of this. Workflows using sp/rgdal should expect “GIS style” axis order to be preserved rgdal::get_enforce_xy() ## [1] TRUE pt_sp <- as(pt, "Spatial") coordinates(pt_sp) ## coords.x1 coords.x2 ## [1,] 0 60 coordinates(spTransform(pt_sp, CRS(SRS_string="EPSG:3857"))) ## Warning in showSRID(SRS_string, format = "PROJ", multiline = "NO"): Discarded ## ellps WGS 84 in CRS definition: +proj=merc +a=6378137 +b=6378137 +lat_ts=0 ## +lon_0=0 +x_0=0 +y_0=0 +k=1 +units=m +nadgrids=@null +wktext +no_defs ## Warning in showSRID(SRS_string, format = "PROJ", multiline = "NO"): Discarded ## datum WGS_1984 in CRS definition ## coords.x1 coords.x2 ## [1,] 0 8399738 Further reading ### Spatial networks in R with sf and tidygraph Thu, 09/26/2019 - 02:00 Spatial networks in R with sf and tidygraph Lucas van der Meer, Robin Lovelace & Lorena Abad September 26, 2019 Introduction Street networks, shipping routes, telecommunication lines, river bassins. All examples of spatial networks: organized systems of nodes and edges embedded in space. For most of them, these nodes and edges can be associated with geographical coordinates. That is, the nodes are geographical points, and the edges geographical lines. Such spatial networks can be analyzed using graph theory. Not for nothing, Leonhard Eulers famous work on the Seven Bridges of Köningsberg, which laid the foundations of graph theory and network analysis, was in essence a spatial problem. In R, there are advanced, modern tools for both the analysis of spatial data and networks. Furthermore, several packages have been developed that cover (parts of) spatial network analysis. As stated in this github issue and this tweet, concensus on how best to represent and analyse spatial networks has proved elusive. This blogpost demonstrates an approach to spatial networks that starts with a set of geographic lines, and leads to an object ready to be used for network analysis. Along the way, we will see that there are still steps that need to be taken before the process of analyzing spatial networks in R is user friendly and efficient. Existing R packages for spatial networks Although R was originally designed as a language for statistical computing, an active ‘R-spatial’ ecosystem has evolved. Powerful and high performance packages for spatial data analysis have been developed, thanks largely to interfaces to mature C/C++ libraries such as GDAL, GEOS and PROJ, notably in the package sf (see section 1.5 of Geocomputation with R for a brief history). Likewise, a number of packages for graph representation and analysis have been developed, notably tidygraph, which is based on igraph. Both sf and tidygraph support the tibble class and the broader ‘tidy’ approach to data science, which involves data processing pipelines, type stability and a convention of representing everything as a data frame (well a tibble, which is a data frame with user friendly default settings). In sf, this means storing spatial vector data as objects of class sf, which are essentially the same as a regular data frame (or tibble), but with an additional ‘sticky’ list column containing a geometry for each feature (row), and attributes such as bounding box and CRS. Tidygraph stores networks in objects of class tbl_graph. A tbl_graph is an igraph object, but enables the user to manipulate both the edges and nodes elements as if they were data frames also. Both sf and tidygraph are relatively new packages (first released on CRAN in 2016 and 2017, respectively). It is unsurprising, therefore, that they have yet to be combined to allow a hybrid, tibble-based representation of spatial networks. Nevertheless, a number of approaches have been developed for representing spatial networks, and some of these are in packages that have been published on CRAN. stplanr, for instance, contains the SpatialLinesNetwork class, which works with both the sp (a package for spatial data analysis launched in 2005) and sf packages. dodgr is a more recent package that provides analytical tools for street networks, with a focus on directed graphs (that can have direction-dependent weights, e.g. representing a one-way street). Other packages seeking to implement spatial networks in R include spnetwork, a package that defined a class system combining sp and igraph, and shp2graph, which provides tools to switch between sp and igraph objects. Each package has its merits that deserve to be explored in more detail (possibly in a subsequent blog post). The remainder of this post outlines an approach that combines sf and igraph objects in a tidygraph object. Set-up The following code chunk will install the packages used in this post: # We'll use remotes to install packages, install it if needs be: if(!"remotes" %in% installed.packages()) { install.packages("remotes") } cran_pkgs = c( "sf", "tidygraph", "igraph", "osmdata", "dplyr", "tibble", "ggplot2", "units", "tmap", "rgrass7", "link2GI", "nabor" ) remotes::install_cran(cran_pkgs) library(sf) library(tidygraph) library(igraph) library(dplyr) library(tibble) library(ggplot2) library(units) library(tmap) library(osmdata) library(rgrass7) library(link2GI) library(nabor) Getting the data As an example, we use the street network of the city center of Münster, Germany. We will get the data from OpenStreetMap. Packages like dodgr have optimized their code for such data, however considering that we want to showcase this workflow for any source of data, we will generate an object of class sf containing only LINESTRING geometries. However, streets that form loops, are returned by osmdata as polygons, rather than lines. These, we will convert to lines, using the osm_poly2line function. One additional variable, the type of street, is added to show that the same steps can be used for sf objects that contain any number of additional variables. muenster <- opq(bbox = c(7.61, 51.954, 7.636, 51.968)) %>% add_osm_feature(key = 'highway') %>% osmdata_sf() %>% osm_poly2line() muenster_center <- muenster$osm_lines %>% select(highway) muenster_center ## Simple feature collection with 2233 features and 1 field ## geometry type: LINESTRING ## dimension: XY ## bbox: xmin: 7.603762 ymin: 51.94823 xmax: 7.641418 ymax: 51.97241 ## epsg (SRID): 4326 ## proj4string: +proj=longlat +datum=WGS84 +no_defs ## First 10 features: ## highway geometry ## 4064462 primary LINESTRING (7.624554 51.955... ## 4064463 primary LINESTRING (7.626498 51.956... ## 4064467 residential LINESTRING (7.630898 51.955... ## 4064474 primary LINESTRING (7.61972 51.9554... ## 4064476 primary LINESTRING (7.619844 51.954... ## 4064482 tertiary LINESTRING (7.616395 51.957... ## 4064485 service LINESTRING (7.63275 51.9603... ## 4984982 secondary LINESTRING (7.614156 51.967... ## 4985138 cycleway LINESTRING (7.61525 51.9673... ## 4985140 residential LINESTRING (7.616774 51.968... ggplot(data = muenster_center) + geom_sf()

From sf to tbl_graph: a step wise approach Step 1: Clean the network

To perform network analysis, we need a network with a clean topology. In theory, the best way to clean up the network topology is by manual editing, but this can be very labour intensive and time consuming, mainly for large networks. The v.clean toolset from the GRASS GIS software provides automated functionalities for this task, and is therefore a popular instrument within the field of spatial network analysis. As far as we know, there is no R equivalent for this toolset, but fortunately, the rgrass7 and link2GI packages enable us to easily ‘bridge’ to GRASS GIS. Obviously, this requires to have GRASS GIS installed on your computer. For an in depth description of combining R with open source GIS software, see Chapter 9 of Geocomputation with R. Take into account that the linking process may take up some time, especially on Windows operating systems. Also, note that there have been some large changes recently to the rgrass7 package, which enabled a better integration with sf. However, it also means that the code below will not work when using an older version of rgrass7, so make sure to update if needed.

Here, we will clean the network topology by breaking lines at intersections and also breaking lines that form a collapsed loop. This will be followed by a removal of duplicated geometry features. Once done, we will read the data back into R, and convert again into an sf object with LINESTRING geometry. For this we use rgrass7, which requires some set-up code that can be seen in the source code of this post.

# Add data to GRASS spatial database writeVECT( SDF = muenster_center, vname = 'muenster_center', v.in.ogr_flags = 'overwrite' ) # Execute the v.clean tool execGRASS("g.proj", flags = c("c", "quiet"), proj4 = proj4) execGRASS( cmd = 'v.clean', input = 'muenster_center', output = 'muenster_cleaned', tool = 'break', flags = c('overwrite', 'c') ) # Read back into R use_sf() muenster_center <- readVECT('muenster_cleaned') %>% rename(geometry = geom) %>% select(-cat) muenster_center ## Simple feature collection with 4667 features and 1 field ## geometry type: LINESTRING ## dimension: XY ## bbox: xmin: 7.603762 ymin: 51.94823 xmax: 7.641418 ymax: 51.97241 ## epsg (SRID): 4326 ## proj4string: +proj=longlat +datum=WGS84 +no_defs ## First 10 features: ## highway geometry ## 1 service LINESTRING (7.63275 51.9603... ## 2 secondary LINESTRING (7.614156 51.967... ## 3 cycleway LINESTRING (7.61525 51.9673... ## 4 footway LINESTRING (7.629304 51.967... ## 5 steps LINESTRING (7.627696 51.965... ## 6 service LINESTRING (7.633612 51.965... ## 7 residential LINESTRING (7.630564 51.957... ## 8 service LINESTRING (7.613545 51.960... ## 9 cycleway LINESTRING (7.619781 51.957... ## 10 residential LINESTRING (7.62373 51.9643... Step 2: Give each edge a unique index

The edges of the network, are simply the linestrings in the data. Each of them gets a unique index, which can later be related to their start and end node.

edges <- muenster_center %>% mutate(edgeID = c(1:n())) edges ## Simple feature collection with 4667 features and 2 fields ## geometry type: LINESTRING ## dimension: XY ## bbox: xmin: 7.603762 ymin: 51.94823 xmax: 7.641418 ymax: 51.97241 ## epsg (SRID): 4326 ## proj4string: +proj=longlat +datum=WGS84 +no_defs ## First 10 features: ## highway geometry edgeID ## 1 service LINESTRING (7.63275 51.9603... 1 ## 2 secondary LINESTRING (7.614156 51.967... 2 ## 3 cycleway LINESTRING (7.61525 51.9673... 3 ## 4 footway LINESTRING (7.629304 51.967... 4 ## 5 steps LINESTRING (7.627696 51.965... 5 ## 6 service LINESTRING (7.633612 51.965... 6 ## 7 residential LINESTRING (7.630564 51.957... 7 ## 8 service LINESTRING (7.613545 51.960... 8 ## 9 cycleway LINESTRING (7.619781 51.957... 9 ## 10 residential LINESTRING (7.62373 51.9643... 10 Step 3: Create nodes at the start and end point of each edge

The nodes of the network, are the start and end points of the edges. The locations of these points can be derived by using the st_coordinates function in sf. When given a set of linestrings, this function breaks down each of them into the points they are built up from. It returns a matrix with the X and Y coordinates of those points, and additionally an integer indicator L1 specifying to which line a point belongs. These integer indicators correspond to the edge indices defined in step 1. That is, if we convert the matrix into a data.frame or tibble, group the features by the edge index, and only keep the first and last feature of each group, we have the start and end points of the linestrings.

nodes <- edges %>% st_coordinates() %>% as_tibble() %>% rename(edgeID = L1) %>% group_by(edgeID) %>% slice(c(1, n())) %>% ungroup() %>% mutate(start_end = rep(c('start', 'end'), times = n()/2)) nodes ## # A tibble: 9,334 x 4 ## X Y edgeID start_end ## <dbl> <dbl> <dbl> <chr> ## 1 7.63 52.0 1 start ## 2 7.63 52.0 1 end ## 3 7.61 52.0 2 start ## 4 7.61 52.0 2 end ## 5 7.62 52.0 3 start ## 6 7.62 52.0 3 end ## 7 7.63 52.0 4 start ## 8 7.63 52.0 4 end ## 9 7.63 52.0 5 start ## 10 7.63 52.0 5 end ## # … with 9,324 more rows Step 4: Give each node a unique index

Each of the nodes in the network needs to get a unique index, such that they can be related to the edges. However, we need to take into account that edges can share either startpoints and/or endpoints. Such duplicated points, that have the same X and Y coordinate, are one single node, and should therefore get the same index. Note that the coordinate values as displayed in the tibble are rounded, and may look the same for several rows, even when they are not. We can use the group_indices function in dplyr to give each group of unique X,Y-combinations a unique index.

nodes <- nodes %>% mutate(xy = paste(.$X, .$Y)) %>% mutate(nodeID = group_indices(., factor(xy, levels = unique(xy)))) %>% select(-xy) nodes ## # A tibble: 9,334 x 5 ## X Y edgeID start_end nodeID ## <dbl> <dbl> <dbl> <chr> <int> ## 1 7.63 52.0 1 start 1 ## 2 7.63 52.0 1 end 2 ## 3 7.61 52.0 2 start 3 ## 4 7.61 52.0 2 end 4 ## 5 7.62 52.0 3 start 5 ## 6 7.62 52.0 3 end 6 ## 7 7.63 52.0 4 start 7 ## 8 7.63 52.0 4 end 8 ## 9 7.63 52.0 5 start 9 ## 10 7.63 52.0 5 end 10 ## # … with 9,324 more rows Step 5: Combine the node indices with the edges

Now each of the start and endpoints have been assigned a node ID in step 4, so that we can add the node indices to the edges. In other words, we can specify for each edge, in which node it starts, and in which node it ends.

source_nodes <- nodes %>% filter(start_end == 'start') %>% pull(nodeID) target_nodes <- nodes %>% filter(start_end == 'end') %>% pull(nodeID) edges = edges %>% mutate(from = source_nodes, to = target_nodes) edges ## Simple feature collection with 4667 features and 4 fields ## geometry type: LINESTRING ## dimension: XY ## bbox: xmin: 7.603762 ymin: 51.94823 xmax: 7.641418 ymax: 51.97241 ## epsg (SRID): 4326 ## proj4string: +proj=longlat +datum=WGS84 +no_defs ## First 10 features: ## highway geometry edgeID from to ## 1 service LINESTRING (7.63275 51.9603... 1 1 2 ## 2 secondary LINESTRING (7.614156 51.967... 2 3 4 ## 3 cycleway LINESTRING (7.61525 51.9673... 3 5 6 ## 4 footway LINESTRING (7.629304 51.967... 4 7 8 ## 5 steps LINESTRING (7.627696 51.965... 5 9 10 ## 6 service LINESTRING (7.633612 51.965... 6 11 12 ## 7 residential LINESTRING (7.630564 51.957... 7 13 14 ## 8 service LINESTRING (7.613545 51.960... 8 15 16 ## 9 cycleway LINESTRING (7.619781 51.957... 9 17 18 ## 10 residential LINESTRING (7.62373 51.9643... 10 19 20 Step 6: Remove duplicate nodes

Having added the unique node ID’s to the edges data, we don’t need the duplicated start and endpoints anymore. After removing them, we end up with a tibble in which each row represents a unique, single node. This tibble can be converted into an sf object, with POINT geometries.

nodes <- nodes %>% distinct(nodeID, .keep_all = TRUE) %>% select(-c(edgeID, start_end)) %>% st_as_sf(coords = c('X', 'Y')) %>% st_set_crs(st_crs(edges)) nodes ## Simple feature collection with 3329 features and 1 field ## geometry type: POINT ## dimension: XY ## bbox: xmin: 7.603762 ymin: 51.94823 xmax: 7.641418 ymax: 51.97241 ## epsg (SRID): 4326 ## proj4string: +proj=longlat +datum=WGS84 +no_defs ## # A tibble: 3,329 x 2 ## nodeID geometry ## <int> <POINT [°]> ## 1 1 (7.63275 51.9603) ## 2 2 (7.631843 51.96061) ## 3 3 (7.614156 51.96724) ## 4 4 (7.613797 51.96723) ## 5 5 (7.61525 51.96736) ## 6 6 (7.615148 51.96745) ## 7 7 (7.629304 51.96712) ## 8 8 (7.629308 51.9673) ## 9 9 (7.627696 51.96534) ## 10 10 (7.62765 51.96534) ## # … with 3,319 more rows Step 7: Convert to tbl_graph

The first six steps led to one sf object with LINESTRING geometries, representing the edges of the network, and one sf object with POINT geometries, representing the nodes of the network. The tbl_graph function allows us to convert these two into a tbl_graph object. There are two tricky parts in this step that need to be highlighted. One, is that the columns containing the indices of the source and target nodes should either be the first two columns of the sf object, or be named ‘to’ and ‘from’, respectively. Secondly, inside the tbl_graph function, these columns are converted into a two-column matrix. However, an sf object has a so-called ‘sticky geometry’, which means that the geometry column sticks to the attributes whenever specific columns are selected. Therefore, the matrix created inside tbl_graph has three columns instead of two, and that causes an error. Therefore, we first need to convert the sf object to a regular data.frame or tibble, before we can construct a tbl_graph. In the end, this doesn’t matter, since both the nodes and edges will be ‘integrated’ into an igraph structure, and loose their specific sf characteristics.

graph = tbl_graph(nodes = nodes, edges = as_tibble(edges), directed = FALSE) graph ## # A tbl_graph: 3329 nodes and 4667 edges ## # ## # An undirected multigraph with 34 components ## # ## # Node Data: 3,329 x 2 (active) ## nodeID geometry ## <int> <POINT [°]> ## 1 1 (7.63275 51.9603) ## 2 2 (7.631843 51.96061) ## 3 3 (7.614156 51.96724) ## 4 4 (7.613797 51.96723) ## 5 5 (7.61525 51.96736) ## 6 6 (7.615148 51.96745) ## # … with 3,323 more rows ## # ## # Edge Data: 4,667 x 5 ## from to highway geometry edgeID ## <int> <int> <fct> <LINESTRING [°]> <int> ## 1 1 2 service (7.63275 51.9603, 7.631843 51.96061) 1 ## 2 3 4 secondary (7.614156 51.96724, 7.613797 51.96723) 2 ## 3 5 6 cycleway (7.61525 51.96736, 7.615165 51.96744, 7.615… 3 ## # … with 4,664 more rows Step 8: Putting it together

To make the approach more convenient, we can combine all steps above into a single function, that takes a cleaned sf object with LINESTRING geometries as input, and returns a spatial tbl_graph.

sf_to_tidygraph = function(x, directed = TRUE) { edges <- x %>% mutate(edgeID = c(1:n())) nodes <- edges %>% st_coordinates() %>% as_tibble() %>% rename(edgeID = L1) %>% group_by(edgeID) %>% slice(c(1, n())) %>% ungroup() %>% mutate(start_end = rep(c('start', 'end'), times = n()/2)) %>% mutate(xy = paste(.$X, .$Y)) %>% mutate(nodeID = group_indices(., factor(xy, levels = unique(xy)))) %>% select(-xy) source_nodes <- nodes %>% filter(start_end == 'start') %>% pull(nodeID) target_nodes <- nodes %>% filter(start_end == 'end') %>% pull(nodeID) edges = edges %>% mutate(from = source_nodes, to = target_nodes) nodes <- nodes %>% distinct(nodeID, .keep_all = TRUE) %>% select(-c(edgeID, start_end)) %>% st_as_sf(coords = c('X', 'Y')) %>% st_set_crs(st_crs(edges)) tbl_graph(nodes = nodes, edges = as_tibble(edges), directed = directed) } sf_to_tidygraph(muenster_center, directed = FALSE) ## # A tbl_graph: 3329 nodes and 4667 edges ## # ## # An undirected multigraph with 34 components ## # ## # Node Data: 3,329 x 2 (active) ## nodeID geometry ## <int> <POINT [°]> ## 1 1 (7.63275 51.9603) ## 2 2 (7.631843 51.96061) ## 3 3 (7.614156 51.96724) ## 4 4 (7.613797 51.96723) ## 5 5 (7.61525 51.96736) ## 6 6 (7.615148 51.96745) ## # … with 3,323 more rows ## # ## # Edge Data: 4,667 x 5 ## from to highway geometry edgeID ## <int> <int> <fct> <LINESTRING [°]> <int> ## 1 1 2 service (7.63275 51.9603, 7.631843 51.96061) 1 ## 2 3 4 secondary (7.614156 51.96724, 7.613797 51.96723) 2 ## 3 5 6 cycleway (7.61525 51.96736, 7.615165 51.96744, 7.615… 3 ## # … with 4,664 more rows Combining the best of both worlds

Having the network stored in the tbl_graph structure, with a geometry list column for both the edges and nodes, enables us to combine the wide range of functionalities in sf and tidygraph, in a way that fits neatly into the tidyverse.

With the activate() verb, we specify if we want to manipulate the edges or the nodes. Then, most dplyr verbs can be used in the familiar way, also when directly applied to the geometry list column. For example, we can add a variable describing the length of each edge, which, later, we use as a weight for the edges.

graph <- graph %>% activate(edges) %>% mutate(length = st_length(geometry)) graph ## # A tbl_graph: 3329 nodes and 4667 edges ## # ## # An undirected multigraph with 34 components ## # ## # Edge Data: 4,667 x 6 (active) ## from to highway geometry edgeID length ## <int> <int> <fct> <LINESTRING [°]> <int> [m] ## 1 1 2 service (7.63275 51.9603, 7.631843 51.96061) 1 71.2778… ## 2 3 4 seconda… (7.614156 51.96724, 7.613797 51.967… 2 24.7146… ## 3 5 6 cycleway (7.61525 51.96736, 7.615165 51.9674… 3 12.2303… ## 4 7 8 footway (7.629304 51.96712, 7.629304 51.967… 4 20.0122… ## 5 9 10 steps (7.627696 51.96534, 7.62765 51.9653… 5 3.2926… ## 6 11 12 service (7.633612 51.96548, 7.633578 51.965… 6 7.4291… ## # … with 4,661 more rows ## # ## # Node Data: 3,329 x 2 ## nodeID geometry ## <int> <POINT [°]> ## 1 1 (7.63275 51.9603) ## 2 2 (7.631843 51.96061) ## 3 3 (7.614156 51.96724) ## # … with 3,326 more rows

With one flow of pipes, we can ‘escape’ the graph structure, turn either the edges or nodes back into real sf objects, and, for example, summarise the data based on a specific variable.

graph %>% activate(edges) %>% as_tibble() %>% st_as_sf() %>% group_by(highway) %>% summarise(length = sum(length)) ## Simple feature collection with 17 features and 2 fields ## geometry type: MULTILINESTRING ## dimension: XY ## bbox: xmin: 7.603762 ymin: 51.94823 xmax: 7.641418 ymax: 51.97241 ## epsg (SRID): 4326 ## proj4string: +proj=longlat +datum=WGS84 +no_defs ## # A tibble: 17 x 3 ## highway length geometry ## * <fct> [m] <MULTILINESTRING [°]> ## 1 corridor 9.377… ((7.620506 51.96262, 7.620542 51.96266), (7.6205… ## 2 cycleway 19275.917… ((7.619683 51.95395, 7.619641 51.95378, 7.619559… ## 3 footway 42822.070… ((7.640529 51.95325, 7.640528 51.95323, 7.640531… ## 4 path 8193.341… ((7.624007 51.95379, 7.624223 51.95378, 7.624253… ## 5 pedestrian 11399.337… ((7.620362 51.95471, 7.620477 51.9547), (7.62012… ## 6 primary 3574.842… ((7.625556 51.95272, 7.625594 51.95284, 7.625714… ## 7 primary_li… 184.385… ((7.617285 51.96609, 7.617286 51.96624, 7.617295… ## 8 residential 22729.748… ((7.614509 51.95351, 7.614554 51.95346), (7.6230… ## 9 secondary 4460.683… ((7.629784 51.95446, 7.629842 51.95444), (7.6312… ## 10 secondary_… 160.708… ((7.635309 51.95946, 7.635705 51.95948), (7.6349… ## 11 service 27027.822… ((7.624803 51.95393, 7.625072 51.95393), (7.6158… ## 12 steps 1321.841… ((7.634423 51.9546, 7.634438 51.95462), (7.61430… ## 13 tertiary 4356.365… ((7.607112 51.94991, 7.607126 51.94992, 7.607183… ## 14 tertiary_l… 43.856… ((7.623592 51.96612, 7.623568 51.96611, 7.623468… ## 15 track 389.866… ((7.610671 51.95778, 7.610571 51.95759, 7.610585… ## 16 unclassifi… 612.816… ((7.634492 51.95613, 7.634689 51.95611), (7.6343… ## 17 <NA> 3161.883… ((7.634374 51.95579, 7.634545 51.95575, 7.634662…

Switching back to sf objects is useful as well when plotting the network, in a way that preserves its spatial properties.

ggplot() + geom_sf(data = graph %>% activate(edges) %>% as_tibble() %>% st_as_sf()) + geom_sf(data = graph %>% activate(nodes) %>% as_tibble() %>% st_as_sf(), size = 0.5)

Or, alternatively, in only a few lines of code, plot the network as an interactive map. On this page, the interactive map might show as an image, but here, you should be able to really interact with it!

tmap_mode('view') tm_shape(graph %>% activate(edges) %>% as_tibble() %>% st_as_sf()) + tm_lines() + tm_shape(graph %>% activate(nodes) %>% as_tibble() %>% st_as_sf()) + tm_dots() + tmap_options(basemaps = 'OpenStreetMap')

All nice and well, but these are not things that we necessarily need the graph representation for. The added value of tidygraph, is that it opens the door to the functions of the igraph library, all specifically designed for network analysis, and enables us to use them inside a ‘tidy’ workflow. To cover them all, we would need to write a book, but let’s at least show a few examples below.

Centrality measures

Centraltity measures describe the importances of nodes in the network. The simplest of those measures is the degree centrality: the number of edges connected to a node. Another example is the betweenness centrality, which, simply stated, is the number of shortest paths that pass through a node. In tidygraph, we can calculate these and many other centrality measures, and simply add them as a variable to the nodes.

The betweenness centrality can also be calculated for edges. In that case, it specifies the number of shortest paths that pass through an edge.

graph <- graph %>% activate(nodes) %>% mutate(degree = centrality_degree()) %>% mutate(betweenness = centrality_betweenness(weights = length)) %>% activate(edges) %>% mutate(betweenness = centrality_edge_betweenness(weights = length)) graph ## # A tbl_graph: 3329 nodes and 4667 edges ## # ## # An undirected multigraph with 34 components ## # ## # Edge Data: 4,667 x 7 (active) ## from to highway geometry edgeID length betweenness ## <int> <int> <fct> <LINESTRING [°]> <int> [m] <dbl> ## 1 1 2 service (7.63275 51.9603, 7.6318… 1 71.2778… 90271 ## 2 3 4 second… (7.614156 51.96724, 7.61… 2 24.7146… 32946 ## 3 5 6 cyclew… (7.61525 51.96736, 7.615… 3 12.2303… 12409 ## 4 7 8 footway (7.629304 51.96712, 7.62… 4 20.0122… 22043 ## 5 9 10 steps (7.627696 51.96534, 7.62… 5 3.2926… 83380 ## 6 11 12 service (7.633612 51.96548, 7.63… 6 7.4291… 17119 ## # … with 4,661 more rows ## # ## # Node Data: 3,329 x 4 ## nodeID geometry degree betweenness ## <int> <POINT [°]> <dbl> <dbl> ## 1 1 (7.63275 51.9603) 3 117041 ## 2 2 (7.631843 51.96061) 3 94899 ## 3 3 (7.614156 51.96724) 3 32881 ## # … with 3,326 more rows ggplot() + geom_sf(data = graph %>% activate(edges) %>% as_tibble() %>% st_as_sf(), col = 'grey50') + geom_sf(data = graph %>% activate(nodes) %>% as_tibble() %>% st_as_sf(), aes(col = betweenness, size = betweenness)) + scale_colour_viridis_c(option = 'inferno') + scale_size_continuous(range = c(0,4))

ggplot() + geom_sf(data = graph %>% activate(edges) %>% as_tibble() %>% st_as_sf(), aes(col = betweenness, size = betweenness)) + scale_colour_viridis_c(option = 'inferno') + scale_size_continuous(range = c(0,4))

Shortest paths

A core part of spatial network analysis is generally finding the path between two nodes that minimizes either the travel distance or travel time. In igraph, there are several functions that can be used for this purpose, and since a tbl_graph is just a subclass of an igraph object, we can directly input it into every function in the igraph package.

The function distances, for example, returns a numeric matrix containing the distances of the shortest paths between every possible combination of nodes. It will automatically choose a suitable algorithm to calculate these shortest paths.

distances <- distances( graph = graph, weights = graph %>% activate(edges) %>% pull(length) ) distances[1:5, 1:5] ## [,1] [,2] [,3] [,4] [,5] ## [1,] 0.00000 71.27789 1670.22046 1694.9351 1596.80669 ## [2,] 71.27789 0.00000 1619.80567 1644.5203 1546.39190 ## [3,] 1670.22046 1619.80567 0.00000 24.7146 77.64829 ## [4,] 1694.93506 1644.52027 24.71460 0.0000 102.36289 ## [5,] 1596.80669 1546.39190 77.64829 102.3629 0.00000

The function ‘shortest_paths’ not only returns distances, but also the indices of the nodes and edges that make up the path. When we relate them to their corresponding geometry columns, we get the spatial representation of the shortest paths. Instead of doing this for all possible combinations of nodes, we can specify from and to which nodes we want to calculate the shortest paths. Here, we will show an example of a shortest path from one node to another, but it is just as well possible to do the same for one to many, many to one, or many to many nodes. Whenever the graph is weighted, the Dijkstra algoritm will be used under the hood. Note here that we have to define the desired output beforehand: vpath means that only the nodes (called vertices in igraph) are returned, epath means that only the edges are returned, and both returns them both.

from_node <- graph %>% activate(nodes) %>% filter(nodeID == 3044) %>% pull(nodeID) to_node <- graph %>% activate(nodes) %>% filter(nodeID == 3282) %>% pull(nodeID) path <- shortest_paths( graph = graph, from = from_node, to = to_node, output = 'both', weights = graph %>% activate(edges) %>% pull(length) ) path$vpath ## [[1]] ## + 50/3329 vertices, from 2ee524d: ## [1] 3044 3043 1581 1076 1059 1058 1270 1489 609 608 1549 1550 2998 2448 ## [15] 2057 1528 1529 1530 1531 1532 1533 1534 1535 1536 1537 1538 1539 1468 ## [29] 1469 1470 1471 1704 1916 1476 23 24 3064 740 743 1053 1490 1447 ## [43] 1446 1445 3052 1444 1443 3051 1442 3282 path$epath ## [[1]] ## + 49/4667 edges from 2ee524d: ## [1] 3043--3044 1581--3043 1076--1581 1059--1076 1058--1059 1058--1270 ## [7] 1270--1489 609--1489 608-- 609 608--1549 1549--1550 1550--2998 ## [13] 2448--2998 2057--2448 1528--2057 1528--1529 1529--1530 1530--1531 ## [19] 1531--1532 1532--1533 1533--1534 1534--1535 1535--1536 1536--1537 ## [25] 1537--1538 1538--1539 1468--1539 1468--1469 1469--1470 1470--1471 ## [31] 1471--1704 1704--1916 1476--1916 23--1476 23-- 24 24--3064 ## [37] 740--3064 740-- 743 743--1053 1053--1490 1447--1490 1446--1447 ## [43] 1445--1446 1445--3052 1444--3052 1443--1444 1443--3051 1442--3051 ## [49] 1442--3282

A subgraph of the original graph can now be created, containing only those nodes and edges that are part of the calculated path. Note here that something inconvenient happens: igraph seems to have it’s own underlying structure for attaching indices to nodes, which makes that the from and to columns in the egdes data don’t match with our nodeID column in the nodes data anymore, but instead simply refer to the row numbers of the nodes data.

path_graph <- graph %>% subgraph.edges(eids = path$epath %>% unlist()) %>% as_tbl_graph() path_graph ## # A tbl_graph: 50 nodes and 49 edges ## # ## # An undirected simple graph with 1 component ## # ## # Node Data: 50 x 4 (active) ## nodeID geometry degree betweenness ## <int> <POINT [°]> <dbl> <dbl> ## 1 23 (7.62837 51.96292) 4 425585 ## 2 24 (7.628317 51.96308) 3 499622 ## 3 608 (7.626629 51.95707) 3 196573 ## 4 609 (7.626621 51.95699) 3 390732 ## 5 740 (7.629458 51.96364) 3 158948 ## 6 743 (7.629787 51.96392) 3 156575 ## # … with 44 more rows ## # ## # Edge Data: 49 x 7 ## from to highway geometry edgeID length betweenness ## <int> <int> <fct> <LINESTRING [°]> <int> [m] <dbl> ## 1 1 2 reside… (7.62837 51.96292, 7.628… 12 18.6254… 426879 ## 2 3 4 reside… (7.626629 51.95707, 7.62… 355 8.2067… 198070 ## 3 8 9 reside… (7.626552 51.95648, 7.62… 643 6.5847… 149777 ## # … with 46 more rows This subgraph can now be analyzed, for example by calculating the total length of the path, … path_graph %>% activate(edges) %>% as_tibble() %>% summarise(length = sum(length)) ## # A tibble: 1 x 1 ## length ## [m] ## 1 1362.259 … and plotted. ggplot() + geom_sf(data = graph %>% activate(edges) %>% as_tibble() %>% st_as_sf(), col = 'darkgrey') + geom_sf(data = graph %>% activate(nodes) %>% as_tibble() %>% st_as_sf(), col = 'darkgrey', size = 0.5) + geom_sf(data = path_graph %>% activate(edges) %>% as_tibble() %>% st_as_sf(), lwd = 1, col = 'firebrick') + geom_sf(data = path_graph %>% activate(nodes) %>% filter(nodeID %in% c(from_node, to_node)) %>% as_tibble() %>% st_as_sf(), size = 2) However, often we will be interested in shortest paths between geographical points that are not necessarily nodes in the network. For example, we might want to calculate the shortest path from the railway station of Münster to the cathedral. muenster_station <- st_point(c(7.6349, 51.9566)) %>% st_sfc(crs = 4326) muenster_cathedral <- st_point(c(7.626, 51.962)) %>% st_sfc(crs = 4326) ggplot() + geom_sf(data = graph %>% activate(edges) %>% as_tibble() %>% st_as_sf(), col = 'darkgrey') + geom_sf(data = graph %>% activate(nodes) %>% as_tibble() %>% st_as_sf(), col = 'darkgrey', size = 0.5) + geom_sf(data = muenster_station, size = 2, col = 'firebrick') + geom_sf(data = muenster_cathedral, size = 2, col = 'firebrick') + geom_sf_label(data = muenster_station, aes(label = 'station'), nudge_x = 0.004) + geom_sf_label(data = muenster_cathedral, aes(label = 'cathedral'), nudge_x = 0.005) To find the route on the network, we must first identify the nearest points on the network. The nabor package has a well performing function to do so. It does, however, require the coordinates of the origin and destination nodes to be given in a matrix. # Coordinates of the origin and destination node, as matrix coords_o <- muenster_station %>% st_coordinates() %>% matrix(ncol = 2) coords_d <- muenster_cathedral %>% st_coordinates() %>% matrix(ncol = 2) # Coordinates of all nodes in the network nodes <- graph %>% activate(nodes) %>% as_tibble() %>% st_as_sf() coords <- nodes %>% st_coordinates() # Calculate nearest points on the network. node_index_o <- knn(data = coords, query = coords_o, k = 1) node_index_d <- knn(data = coords, query = coords_d, k = 1) node_o <- nodes[node_index_o$nn.idx, ] node_d <- nodes[node_index_d$nn.idx, ] Like before, we use the ID to calculate the shortest path, and plot it: path <- shortest_paths( graph = graph, from = node_o$nodeID, # new origin to = node_d$nodeID, # new destination output = 'both', weights = graph %>% activate(edges) %>% pull(length) ) path_graph <- graph %>% subgraph.edges(eids = path$epath %>% unlist()) %>% as_tbl_graph() ggplot() + geom_sf(data = graph %>% activate(edges) %>% as_tibble() %>% st_as_sf(), col = 'darkgrey') + geom_sf(data = graph %>% activate(nodes) %>% as_tibble() %>% st_as_sf(), col = 'darkgrey', size = 0.5) + geom_sf(data = path_graph %>% activate(edges) %>% as_tibble() %>% st_as_sf(), lwd = 1, col = 'firebrick') + geom_sf(data = muenster_station, size = 2) + geom_sf(data = muenster_cathedral, size = 2) + geom_sf_label(data = muenster_station, aes(label = 'station'), nudge_x = 0.004) + geom_sf_label(data = muenster_cathedral, aes(label = 'cathedral'), nudge_x = 0.005)

It worked! We calculated a path from the rail station to the centre, a common trip taken by tourists visiting Muenster. Clearly this is not a finished piece of work but the post has demonstrated what is possible. Future functionality should look to make spatial networks more user friendly, including provision of ‘weighting profiles’, batch routing and functions that reduce the number of steps needed to work with spatial network data in R.

For alternative approaches and further reading, the following resources are recommended:

### Processing satellite image collections in R with the gdalcubes package

Thu, 07/18/2019 - 02:00
The problem

Scientists working with collections and time series of satellite imagery quickly run into some of the following problems:

• Images from different areas of the world have different spatial reference systems (e.g., UTM zones).
• The pixel size of a single image sometimes differs among its spectral bands / variables.
• Spatially adjacent image tiles often overlap.
• Time series of images are often irregular when the area of interest covers spatial areas larger than the extent of a single image.
• Images from different data products or different satellites are distributed in diverse data formats and structures.

GDAL and the rgdal R package can solve most of these difficulties by reading all relevant data formats and implementing image warping to reproject, rescale, resample, and crop images. However, GDAL does not know about image time series and hence there is a lot of manual work needed before data scientists can actually work with these data. Instead of collections of images, data users in many cases desire a regular data structure such as a four-dimensional raster data cube with dimensions x, y, time, and band.

In R, there is currently no implementation to build regular data cubes from image collections. The stars package provides a generic implementation for processing raster and vector data cubes with an arbitrary number of dimensions, but assumes that the data are already organized as an array.

Introduction and overview of gdalcubes

This blog post introduces the gdalcubes R package, aiming at making the work with collections and time series of satellite imagery easier and more interactive.

The core features of the package are:

• build regular dense data cubes from large satellite image collections based on a user-defined data cube view (spatiotemporal extent, resolution, and map projection of the cube)
• apply and chaining operations on data cubes
• allow for the execution of user-defined functions on data cubes
• export data cubes as netCDF files, making it easy to process further, e.g., with stars or raster.

Technically, the R package is a relatively lightweight wrapper around the gdalcubes C++ library. The library strongly builds on GDAL, netCDF, and SQLite (the full list of C++ libraries used by gdalcubes is found here).

This blog post focuses on how to use the R package gdalcubes, more details about the underlying ideas can be found in our recent open access paper in the datacube special issue of the DATA journal.

Installation

The package can be installed directly from CRAN:

install.packages("gdalcubes")

Some features and functions used in this blog post are still in the development version (which will be submitted to CRAN as version 0.2.0), which currently needs a source install:

# install.packages("remotes") remotes::install_git("https://github.com/appelmar/gdalcubes_R", ref="dev", args="--recursive")

If this fails with error messages like “no rule to make target …”, please read here.

Demo dataset

We use a collection of 180 Landsat 8 surface reflectance images, covering a small part of the Brazilian Amazon forest. If you would like to work with the dataset on your own (and maybe reproduce some parts of this blog post), you have two options:

Option A: Download the dataset with full resolution (30 meter pixel size) here (67 gigabytes compressed; 190 gigabytes unzipped)

Option B: Download a downsampled version of the dataset that contains all images at a coarse spatial resolution (300 meter pixel size) here (740 megabytes compressed; 2 gigabytes unzipped)

Except for the spatial resolution of the images, the datasets are identical, meaning that all code samples work for both options but result images may look blurred when using option B. Here is the code to download and unzip the dataset to your current working directory (which might take some time):

After extraction of the zip archive, we get one directory per image, where each image contains 10 GeoTIFF files representing the spectral bands and additional per-pixel quality information. As a first step, we must scan all available images once, extract some metadata (e.g. acquisition date/time and spatial extents of images and how the files relate to bands), and store this information in a simple image collection index file. This file does not store any pixel values but only metadata and references to where images can be found.

First, we simply need to find available GeoTIFF files in all subdirectories of our demo dataset:

files = list.files("L8_Amazon", recursive = TRUE, full.names = TRUE, pattern = ".tif") length(files) ## [1] 1800 head(files) ## [1] "L8_Amazon/LC082260632014071901T1-SC20190715045926/LC08_L1TP_226063_20140719_20170421_01_T1_pixel_qa.tif" ## [2] "L8_Amazon/LC082260632014071901T1-SC20190715045926/LC08_L1TP_226063_20140719_20170421_01_T1_radsat_qa.tif" ## [3] "L8_Amazon/LC082260632014071901T1-SC20190715045926/LC08_L1TP_226063_20140719_20170421_01_T1_sr_aerosol.tif" ## [4] "L8_Amazon/LC082260632014071901T1-SC20190715045926/LC08_L1TP_226063_20140719_20170421_01_T1_sr_band1.tif" ## [5] "L8_Amazon/LC082260632014071901T1-SC20190715045926/LC08_L1TP_226063_20140719_20170421_01_T1_sr_band2.tif" ## [6] "L8_Amazon/LC082260632014071901T1-SC20190715045926/LC08_L1TP_226063_20140719_20170421_01_T1_sr_band3.tif"

To understand the structure of particular data products, the package comes with a set of predefined rules (called collection formats) that define how required metadata can be derived from the data. These include formats for some Sentinel-2, MODIS, and Landsat products. We can list all available formats with:

The “L8_SR” format is what we need for our demo dataset. Next, we must tell gdalcubes to scan the files and build an image collection. Below, we create an image collection from the set of GeoTIFF files, using the “L8_SR” collection format and store the resulting image collection under “L8.db”.

L8.col = create_image_collection(files, "L8_SR", "L8.db") L8.col ## A GDAL image collection object, referencing 180 images with 17 bands ## Images: ## name left top bottom ## 1 LC08_L1TP_226063_20140719_20170421_01_T1 -54.15776 -3.289862 -5.392073 ## 2 LC08_L1TP_226063_20140820_20170420_01_T1 -54.16858 -3.289828 -5.392054 ## 3 LC08_L1GT_226063_20160114_20170405_01_T2 -54.16317 -3.289845 -5.392064 ## 4 LC08_L1TP_226063_20160724_20170322_01_T1 -54.16317 -3.289845 -5.392064 ## 5 LC08_L1TP_226063_20170609_20170616_01_T1 -54.17399 -3.289810 -5.392044 ## 6 LC08_L1TP_226063_20170711_20170726_01_T1 -54.15506 -3.289870 -5.392083 ## right datetime ## 1 -52.10338 2014-07-19T00:00:00 ## 2 -52.11418 2014-08-20T00:00:00 ## 3 -52.10878 2016-01-14T00:00:00 ## 4 -52.10878 2016-07-24T00:00:00 ## 5 -52.11958 2017-06-09T00:00:00 ## 6 -52.09798 2017-07-11T00:00:00 ## srs ## 1 +proj=utm +zone=22 +datum=WGS84 +units=m +no_defs ## 2 +proj=utm +zone=22 +datum=WGS84 +units=m +no_defs ## 3 +proj=utm +zone=22 +datum=WGS84 +units=m +no_defs ## 4 +proj=utm +zone=22 +datum=WGS84 +units=m +no_defs ## 5 +proj=utm +zone=22 +datum=WGS84 +units=m +no_defs ## 6 +proj=utm +zone=22 +datum=WGS84 +units=m +no_defs ## [ omitted 174 images ] ## ## Bands: ## name offset scale unit nodata image_count ## 1 AEROSOL 0 1 180 ## 2 B01 0 1 -9999.000000 180 ## 3 B02 0 1 -9999.000000 180 ## 4 B03 0 1 -9999.000000 180 ## 5 B04 0 1 -9999.000000 180 ## 6 B05 0 1 -9999.000000 180 ## 7 B06 0 1 -9999.000000 180 ## 8 B07 0 1 -9999.000000 180 ## 9 EVI 0 1 -9999.000000 0 ## 10 MSAVI 0 1 -9999.000000 0 ## 11 NBR 0 1 -9999.000000 0 ## 12 NBR2 0 1 -9999.000000 0 ## 13 NDMI 0 1 -9999.000000 0 ## 14 NDVI 0 1 -9999.000000 0 ## 15 PIXEL_QA 0 1 180 ## 16 RADSAT_QA 0 1 180 ## 17 SAVI 0 1 -9999.000000 0

Internally, the output file is a simple SQLite database. Please notice that our collection does not contain data for all possible bands (see image_count column). Depending on particular data download requests, Landsat 8 surface reflectance data may come e.g. with some post-processed bands (like vegetation indexes) that can be used if available.

Creating and processing data cubes

To create a raster data cube, we need (i) an image collection and (ii) a data cube view, defining how we look at the data, i.e., at which spatiotemporal resolution, window, and spatial reference system. For a quick look at the data, we define a cube view with 1km x 1km pixel size, yearly temporal resolution, covering the full spatiotemporal extent of the image collection, and using the web mercator spatial reference system.

v.overview = cube_view(extent=L8.col, dt="P1Y", dx=1000, dy=1000, srs="EPSG:3857", aggregation = "median", resampling = "bilinear") raster_cube(L8.col, v.overview) ## A GDAL data cube proxy object ## Dimensions: ## name low high size chunk_size ## 1 t 2013.0 2019.0 7 16 ## 2 y -764014.4 -205014.4 559 256 ## 3 x -6582280.1 -5799280.1 783 256 ## ## Bands: ## name type offset scale nodata unit ## 1 AEROSOL float64 0 1 NaN ## 2 B01 float64 0 1 NaN ## 3 B02 float64 0 1 NaN ## 4 B03 float64 0 1 NaN ## 5 B04 float64 0 1 NaN ## 6 B05 float64 0 1 NaN ## 7 B06 float64 0 1 NaN ## 8 B07 float64 0 1 NaN ## 9 EVI float64 0 1 NaN ## 10 MSAVI float64 0 1 NaN ## 11 NBR float64 0 1 NaN ## 12 NBR2 float64 0 1 NaN ## 13 NDMI float64 0 1 NaN ## 14 NDVI float64 0 1 NaN ## 15 PIXEL_QA float64 0 1 NaN ## 16 RADSAT_QA float64 0 1 NaN ## 17 SAVI float64 0 1 NaN

As specified in our data cube view, the time dimension of the resulting data cube only has 7 values, representing years from 2013 to 2019. The aggregation parameter in the data cube view defines how values from multiple images in the same year shall be combined. In contrast, the selected resampling algorithm is applied when reprojecting and rescaling individual images.

If we are interested in a smaller area at higher temporal resolution, we simply need to define a data cube view with different parameters, including a specific spatiotemporal extent by passing a list as extent argument to cube_view. Below, we define a data cube view for a 100km x 100km area with 50m pixel size at monthly temporal resolution.

v.subarea = cube_view(extent=list(left=-6320000, right=-6220000, bottom=-600000, top=-500000, t0="2014-01-01", t1="2018-12-31"), dt="P1M", dx=50, dy=50, srs="EPSG:3857", aggregation = "median", resampling = "bilinear") raster_cube(L8.col, v.subarea) ## A GDAL data cube proxy object ## Dimensions: ## name low high size chunk_size ## 1 t 201401 201812 60 16 ## 2 y -600000 -500000 2000 256 ## 3 x -6320000 -6220000 2000 256 ## ## Bands: ## name type offset scale nodata unit ## 1 AEROSOL float64 0 1 NaN ## 2 B01 float64 0 1 NaN ## 3 B02 float64 0 1 NaN ## 4 B03 float64 0 1 NaN ## 5 B04 float64 0 1 NaN ## 6 B05 float64 0 1 NaN ## 7 B06 float64 0 1 NaN ## 8 B07 float64 0 1 NaN ## 9 EVI float64 0 1 NaN ## 10 MSAVI float64 0 1 NaN ## 11 NBR float64 0 1 NaN ## 12 NBR2 float64 0 1 NaN ## 13 NDMI float64 0 1 NaN ## 14 NDVI float64 0 1 NaN ## 15 PIXEL_QA float64 0 1 NaN ## 16 RADSAT_QA float64 0 1 NaN ## 17 SAVI float64 0 1 NaN

The raster_cube function always returns a proxy object, meaning that neither any expensive computations nor any data reads from disk are started. Instead, gdalcubes delays the execution until the data is really needed when users call plot(), or write_ncdf(). However, the result of our call to raster_cube can be passed to data cube operators. For example, the code below drops all bands except the visible RGB bands and, again, returns a proxy object.

L8.cube.rgb = select_bands(raster_cube(L8.col, v.overview), c("B02","B03","B04")) L8.cube.rgb ## A GDAL data cube proxy object ## Dimensions: ## name low high size chunk_size ## 1 t 2013.0 2019.0 7 16 ## 2 y -764014.4 -205014.4 559 256 ## 3 x -6582280.1 -5799280.1 783 256 ## ## Bands: ## name type offset scale nodata unit ## 1 B02 float64 0 1 NaN ## 2 B03 float64 0 1 NaN ## 3 B04 float64 0 1 NaN

Calling plot() will eventually start computationn, and hence might take some time:

system.time(plot(L8.cube.rgb, rgb=3:1, zlim=c(0,1200)))

## user system elapsed ## 16.367 0.605 17.131 Chaining data cube operations

For the remaining examples, we use multiple threads to process data cubes by setting:

We can chain many of the provided data cube operators (e.g., using the pipe %>%). The following code will derive the median values of the RGB bands over time, producing a single RGB overview image for our selected subarea.

suppressPackageStartupMessages(library(magrittr)) # use the pipe raster_cube(L8.col, v.subarea) %>% select_bands(c("B02","B03","B04")) %>% reduce_time("median(B02)", "median(B03)", "median(B04)") %>% plot(rgb=3:1, zlim=c(0,1200))

Implemented data cube operators include:

• apply_pixel apply one or more arithmetic expressions on individual data cube pixels, e.g., to derive vegetation indexes.
• reduce_time apply on or more reducers over pixel time series.
• reduce_time apply on or more reducers over spatial slices.
• select_bands subset available bands.
• window_time apply an aggregation function or convolution kernel over moving time series windows.
• join_bands combines the bands of two indentically shaped data cubes.
• filter_pixel filter pixels by a logical expression on band values, e.g., select all pixels with NDVI larger than 0.
• write_ncdf export a data cube as a netCDF file.

In a second example, we compute the normalied difference vegetation index (NDVI) with apply_pixel and derive its maximum values over time:

suppressPackageStartupMessages(library(viridis)) # use colors from viridis package raster_cube(L8.col, v.subarea) %>% select_bands(c("B04","B05")) %>% apply_pixel(c("(B05-B04)/(B05+B04)"), names="NDVI") %>% reduce_time("max(NDVI)") %>% plot(zlim=c(-0.2,1), col=viridis, key.pos=1)

User-defined functions

Previous examples used character expressions to define reducer and arithmetic functions. Operations like apply_pixel and filter_pixel take character arguments to define the expressions. The reason for this is that expressions are translated to C++ functions and all computations then are purely C++. However, to give users more flexibility and allow the definition of user-defined functions, reduce_time and apply_pixel also allow to pass arbitrary R functions as an argument. In the example below, we derive the 0.25, and 0.75 quantiles over NDVI time series. There is of course no limitation what the provided reducer function does and it is thus possible to use functions from other packages.

v.16d = cube_view(view=v.overview, dt="P16D") raster_cube(L8.col, v.16d) %>% select_bands(c("B04", "B05")) %>% apply_pixel(c("(B05-B04)/(B05+B04)"), names="NDVI") %>% reduce_time(names = c("q1","q3"), FUN = function(x) { quantile(x["NDVI",], probs = c(0.25, 0.75), na.rm = TRUE) }) %>% plot(col=viridis, zlim=c(-0.2,1), key.pos=1)

However, there are some things, users need to keep in mind when working with user-defined functions:

1. Users should provide names of the output bands and make sure that the function always return the same number of elements.
2. When executed, the function runs in a new R session, meaning that it cannot access variables in the current worskspace and packages must be loaded within the function if needed.
3. Ideally, users should carefully check for errors. A frequent cause for errors is the presence of NA values, which are abundant in raster data cubes from irregular image collections.
4. In the current version, only apply_pixel and reduce_time allow for passing user-defined functions.
Interfacing with stars

The stars package is much more generic and supports higher dimensional arrays and hence supports e.g. data from climate model output. It also does not assume data to be orthorectified, i.e. it works also with curvilinear grids and hence supports data as from Sentinel-5P. In contrast, gdalcubes concentrates on multispectral image time series (4d) only.

gdalcubes currently comes with a simple as_stars() function, writing a data cube as a (temporary) netCDF file, which is then opened by read_stars. The stars object holds bands as attributes. If needed (e.g. for ggplot below), st_redimension converts attributes to a new dimension.

suppressPackageStartupMessages(library(ggplot2)) suppressPackageStartupMessages(library(dplyr)) suppressPackageStartupMessages(library(stars)) x = raster_cube(L8.col, v.overview) %>% select_bands(c("B02","B03","B04")) %>% as_stars() x ## stars object with 3 dimensions and 3 attributes ## attribute(s), summary of first 1e+05 cells: ## B02 B03 B04 ## Min. : 123.8 Min. : 174.1 Min. : 112.1 ## 1st Qu.: 209.9 1st Qu.: 397.3 1st Qu.: 247.2 ## Median : 295.0 Median : 497.9 Median : 380.3 ## Mean : 457.6 Mean : 653.0 Mean : 563.5 ## 3rd Qu.: 476.7 3rd Qu.: 686.5 3rd Qu.: 662.6 ## Max. :9419.0 Max. :9780.7 Max. :10066.2 ## NA's :79497 NA's :79497 NA's :79497 ## dimension(s): ## from to offset delta refsys point ## x 1 783 -6582280 1000 +proj=merc +a=6378137 +b=... FALSE ## y 1 559 -205014 -1000 +proj=merc +a=6378137 +b=... FALSE ## time 1 7 NA NA POSIXct FALSE ## values ## x NULL [x] ## y NULL [y] ## time 2013-01-01,...,2019-01-01 ggplot() + geom_stars(data = slice(st_redimension(x), "time", 5)) + facet_wrap(~new_dim) + theme_void() + coord_fixed() + scale_fill_viridis(limits=c(0, 2000))

Future work

There are many things, which we didn’t cover in this blog post like applying masks during the construction of data cubes. More importantly, a question we are very much interested in at the moment is how far we can go with gdalcubes and stars in cloud computing envrionments, where huge image collections such as the full Sentinel-2 archive are already stored. This will be the topic of another blog post.

We also have a lot of ideas for future developments but no real schedule. If you would like to contribute, get involved, or if you have further ideas, please get in touch! Development and discussion takes place on GitHub (R package on GitHub, C++ library on GitHub ).

### mapedit 0.5.0 and Leaflet.pm

Sun, 03/31/2019 - 01:00

In our last post mapedit and leaflet.js > 1.0 we discussed remaining tasks for the RConsortium funded project mapedit. mapedit 0.5.0 fixes a couple of lingering issues, but primarily focuses on bringing the power of Leaflet.pm as an alternate editor. Leaflet.draw, the original editor in mapedit provided by leaflet.extras, is a wonderful tool but struggles with snapping and those pesky holes that we commonly face in geospatial tasks. Depending on the task, a user might prefer to continue using Leaflet.draw, so we will maintain full support for both editors. We’ll spend the rest of the post demonstrating where Leaflet.pm excels to help illustrate when you might want to choose editor = "leafpm".

Install/Update

At a minimum, to follow along with the rest of this post, please update mapedit and install the new standalone package leafpm. While we are it, we highly recommend updating your other geospatial dependencies.

install.packages(c("sf", "leaflet", "leafpm", "mapview", "mapedit")) # lwgeom is optional but nice when working with holes in leaflet.pm # install.packages("lwgeom") Holes

mapedit now supports holes. Let’s look at a quick example in which we add, edit, and delete holes.

library(sf) library(leaflet) library(mapview) library(mapedit) library(leafpm) # make a contrived polygon with holes for testing outer1 = matrix(c(0,0,10,0,10,10,0,10,0,0),ncol=2, byrow=TRUE) hole1 = matrix(c(1,1,1,2,2,2,2,1,1,1),ncol=2, byrow=TRUE) hole2 = matrix(c(5,5,5,6,6,6,6,5,5,5),ncol=2, byrow=TRUE) outer2 = matrix(c(11,0,11,1,12,1,12,0,11,0),ncol=2, byrow=TRUE) pts1 = list(outer1, hole1, hole2) pts2 = list(outer2) pl1 = st_sf(geom = st_sfc(st_polygon(pts1))) pl2 = st_sf(geom = st_sfc(st_polygon(pts2))) mpl = st_sf(geom = st_combine(rbind(pl1, pl2)), crs=4326) tst = editFeatures(mpl, editor = "leafpm") # look at our creation mapview(tst)

Please note that right mouse click deletes vertexes. For a more real world application franconia[5,] from mapview has a hole. Try to edit it with the following code.

library(sf) library(leaflet) library(mapview) library(mapedit) library(leafpm) editFeatures(franconia[5,], editor="leafpm") Snapping

Leaflet.pm gives us a very pleasant snapping experience, so if you want to snap, set editor = "leafpm" and snap away. Snapping is particular important when drawing/digitizing features from scratch. Here is how it looks with the example from above.

Snapping is enabled by default.

Fixes For Lingering Issues GeoJSON Precision

Robin Lovelace discovered that at leaflet zoom level > 17 we lose coordinate precision. Of course, this is not good enough, so we will prioritize a fix as discussed in issue. Hopefully, this leaflet.js pull request will make this fix fairly straightforward.

I am happy to report that we have found a solution for the loss of precision. Please let us know if you discover any remaining problems.

Mulitlinestring Editing

Leaflet.js and multilinestrings don’t get along as Tim Appelhans reported in issue. For complete support of sf, mapedit should work with multilinestring, so we have promoted this to issue 62.

We backed into a solution with MULTILINESTRING since Leaflet.pm’s approach fits better with MULTI* features. As an example, let’s edit one of the trails from mapview.

library(sf) library(leaflet) library(mapview) library(mapedit) library(leafpm) editFeatures(trails[4,], editor="leafpm")

Conclusion and Thanks

As of this post we have reached the end of the extremely generous RConsortium funding of mapedit. Although the funding is over, we still expect to actively maintain and improve mapedit. One feature that we had hoped to implement as part of the mapedit toolset was editing of feature attributes. This turned out to be very ambitious, and unfortunately we were not able to implement a satisfactory solution for this feature during the funding period. We plan, however, to develop a solution. Your participation, ideas, and feedback are as vital as ever, so please continue to engage. Thanks to all those who have contributed so far and thanks to all open source contributors in the R and JavaScript communities.

### Wrapping up the stars project

Sat, 03/30/2019 - 01:00
Summary

This is the fourth blog on the stars project, an it completes the R-Consortium funded project for spatiotemporal tidy arrays with R. It reports on the current status of the project, and current development directions. Although this project ends, with the release of stars 0.3 on CRAN, the adoption, update, enthusiasm and participation in the development of the stars project have really only started, and will without doubt increase and continue.

Status

The stars package has now five vignettes (called “Articles” on the pkgdown site) that explain its main features. Besides writing these vignettes, a lot of work over the past few months went into

• writing support for stars_proxy objects, objects for which the metadata has been read but for which the payload is still on disk. This allows handling raster files or data cubes that do not fit into memory. Manipulating them uses lazy evaluation: only when pixel values are really needed they are read and processed: this is for instance when a plot is needed, or results are to be written with write_stars. In case of plotting, no more pixels are processed than can be seen on the device.
• making rectilinear and curvilinear grids work, by better parsing NetCDF files directly (rather than through GDAL), reading their bounds, and by writing conversions to sf objects so that they can be plotted;
• writing a tighter integration with GDAL, e.g. for warping grids, contouring grids, and rasterizing polygons;
• supporting 360-day and 365-day (noleap) calendars, which are used often in climate model data;
• providing an off-cran starsdata package, with around 1 Gb of real imagery, too large for submitting to CRAN or GitHub, but used for testing and demonstration;
• resolving issues (we’re at 154 now) and managing pull requests;
• adding stars support to gstat, a package for spatial and spatiotemporal geostatistical modelling, interpolation and simulation.

I have used stars and sf successfully last week in a two-day course at Munich Re on Spatial Data Science with R (online material), focusing on data handling and geostatistics. Both packages worked out beautifully (with a minor amount of rough edges), in particular in conjunction with each other and with the tidyverse.

Further resources on the status of the project are found in

• the video of my rstudio::conf presentation on “Spatial data science in the Tidyverse”
• chapter 4 of the Spatial Data Science book (under development)
Future

Near future development will entail experiments with very large datasets, such as the entire Sentinel-2 archive. We secured earlier some funding from the R Consortium for doing this, and first outcomes will be presented shortly in a follow-up blog. A large challenge here is the handling of multi-resolution imagery, imagery spread over different coordinate reference systems (e.g., crossing multiple UTM zones) and the temporal resampling needed to form space-time raster cubes. This is being handled gracefully by the gdalcubes C++ library and R package developed by Marius Appel. The gdalcubes package has been submitted to CRAN.

Earlier stars blogs

### Drawing beautiful maps programmatically with R, sf and ggplot2 — Part 3: Layouts

Thu, 10/25/2018 - 02:00

view raw Rmd

This tutorial is the third part in a series of three:

After the presentation of the basic map concepts, and the flexible approach in layer implemented in ggplot2, this part illustrates how to achieve complex layouts, for instance with map insets, or several maps combined. Depending on the visual information that needs to be displayed, maps and their corresponding data might need to be arranged to create easy to read graphical representations. This tutorial will provide different approaches to arranges maps in the plot, in order to make the information portrayed more aesthetically appealing, and most importantly, convey the information better.

Getting started

Many R packages are available from CRAN, the Comprehensive R Archive Network, which is the primary repository of R packages. The full list of packages necessary for this series of tutorials can be installed with:

install.packages(c("cowplot", "googleway", "ggplot2", "ggrepel", "ggspatial", "libwgeom", "sf", "rworldmap", "rworldxtra"))

We start by loading the basic packages necessary for all maps, i.e. ggplot2 and sf. We also suggest to use the classic dark-on-light theme for ggplot2 (theme_bw), which is more appropriate for maps:

library("ggplot2") theme_set(theme_bw()) library("sf")

The package rworldmap provides a map of countries of the entire world; a map with higher resolution is available in the package rworldxtra. We use the function getMap to extract the world map (the resolution can be set to "low", if preferred):

library("rworldmap") library("rworldxtra") world <- getMap(resolution = "high") class(world) ## [1] "SpatialPolygonsDataFrame" ## attr(,"package") ## [1] "sp"

The world map is available as a SpatialPolygonsDataFrame from the package sp; we thus convert it to a simple feature using st_as_sf from package sf:

world <- st_as_sf(world) class(world) ## [1] "sf" "data.frame" General concepts

There are 2 solutions to combine sub-maps:

• using Grobs (graphic objects, allow plots only in plot region, based on coordinates), which directly use ggplot2
• using ggdraw (allows plots anywhere, including outer margins, based on relative position) from package cowplot

Example illustrating the difference between the two, and their use:

(g1 <- qplot(0:10, 0:10))

(g1_void <- g1 + theme_void() + theme(panel.border = element_rect(colour = "black", fill = NA)))

Graphs from ggplot2 can be saved, like any other R object. They can then be reused later in other functions.

Using grobs, and annotation_custom:

g1 + annotation_custom( grob = ggplotGrob(g1_void), xmin = 0, xmax = 3, ymin = 5, ymax = 10 ) + annotation_custom( grob = ggplotGrob(g1_void), xmin = 5, xmax = 10, ymin = 0, ymax = 3 )

Using ggdraw (note: used to build on top of initial plot; could be left empty to arrange subplots on a grid; plots are “filled” with their plots, unless the plot itself has a constrained ratio, like a map):

ggdraw(g1) + draw_plot(g1_void, width = 0.25, height = 0.5, x = 0.02, y = 0.48) + draw_plot(g1_void, width = 0.5, height = 0.25, x = 0.75, y = 0.09)

Several maps side by side or on a grid

Having a way show in a visualization, a specific area can be very useful. Many scientists usually create maps for each specific area individually. This is fine, but there are simpler ways to display what is needed for a report, or publication.

This exmaple is using two maps side by side, including the legend of the first one. It illustrates how to use a custom grid, which can be made a lot more complex with different elements.

First, simplify REGION for the legend:

levels(worldREGION)[7] <- "South America" Prepare the subplots, #1 world map: (gworld <- ggplot(data = world) + geom_sf(aes(fill = REGION)) + geom_rect(xmin = -102.15, xmax = -74.12, ymin = 7.65, ymax = 33.97, fill = NA, colour = "black", size = 1.5) + scale_fill_viridis_d(option = "plasma") + theme(panel.background = element_rect(fill = "azure"), panel.border = element_rect(fill = NA))) And #2 Gulf map : (ggulf <- ggplot(data = world) + geom_sf(aes(fill = REGION)) + annotate(geom = "text", x = -90, y = 26, label = "Gulf of Mexico", fontface = "italic", color = "grey22", size = 6) + coord_sf(xlim = c(-102.15, -74.12), ylim = c(7.65, 33.97), expand = FALSE) + scale_fill_viridis_d(option = "plasma") + theme(legend.position = "none", axis.title.x = element_blank(), axis.title.y = element_blank(), panel.background = element_rect(fill = "azure"), panel.border = element_rect(fill = NA))) The command ggplotGrob signals to ggplot to take each created map, and how to arrange each map. The argument coord_equal can specify the length, ylim, and width, xlim, for the entire plotting area. Where as in annotation_custom, each maps’ xmin, xmax, ymin, and ymax can be specified to allow for complete customization. #Creating a faux empty data frame df <- data.frame() plot1<-ggplot(df) + geom_point() + xlim(0, 10) + ylim(0, 10) plot2<-ggplot(df) + geom_point() + xlim(0, 10) + ylim(0, 10) ggplot() + coord_equal(xlim = c(0, 3.3), ylim = c(0, 1), expand = FALSE) + annotation_custom(ggplotGrob(plot1), xmin = 0, xmax = 1.5, ymin = 0, ymax = 1) + annotation_custom(ggplotGrob(plot2), xmin = 1.5, xmax = 3, ymin = 0, ymax = 1) + theme_void() Below is the final map, using the same methodology as the exmaple plot above. Using ggplot to arrange maps, allows for easy and quick plotting in one function of R code. ggplot() + coord_equal(xlim = c(0, 3.3), ylim = c(0, 1), expand = FALSE) + annotation_custom(ggplotGrob(gworld), xmin = 0, xmax = 2.3, ymin = 0, ymax = 1) + annotation_custom(ggplotGrob(ggulf), xmin = 2.3, xmax = 3.3, ymin = 0, ymax = 1) + theme_void() In the second approach, using cowplot::plot_grid to arrange ggplot figures, is quite versatile. Any ggplot figure can be arranged just like the figure above. There are many commands that allow for the map to have different placements, such as nrow=1 means that the figure will only occupy one row and multiple columns, and ncol=1 means the figure will be plotted on one column and multiple rows. The command rel_widths establishes the width of each map, meaning that the first map gworld will have a relative width of 2.3, and the map ggulf has the relative width of 1. library("cowplot") theme_set(theme_bw()) plot_grid(gworld, ggulf, nrow = 1, rel_widths = c(2.3, 1)) Some other commands can adjust the position of the figures such as adding align=v to align vertically, and align=h to align horiztonally. Note also the existence of get_legend (cowplot), and that the legend can be used as any object. This map can be save using,ggsave: ggsave("grid.pdf", width = 15, height = 5) Map insets For map insets directly on the background map, both solutions are viable (and one might prefer one or the other depending on relative or absolute coordinates). Map example using map of the 50 states of the US, including Alaska and Hawaii (note: not to scale for the latter), using reference projections for US maps. First map (continental states) use a 10/6 figure: usa <- subset(world, ADMIN == "United States of America") ## US National Atlas Equal Area (2163) ## http://spatialreference.org/ref/epsg/us-national-atlas-equal-area/ (mainland <- ggplot(data = usa) + geom_sf(fill = "cornsilk") + coord_sf(crs = st_crs(2163), xlim = c(-2500000, 2500000), ylim = c(-2300000, 730000))) Alaska map (note: datum = NA removes graticules and coordinates): ## Alaska: NAD83(NSRS2007) / Alaska Albers (3467) ## http://www.spatialreference.org/ref/epsg/3467/ (alaska <- ggplot(data = usa) + geom_sf(fill = "cornsilk") + coord_sf(crs = st_crs(3467), xlim = c(-2400000, 1600000), ylim = c(200000, 2500000), expand = FALSE, datum = NA)) Hawaii map: ## Hawaii: Old Hawaiian (4135) ## http://www.spatialreference.org/ref/epsg/4135/ (hawaii <- ggplot(data = usa) + geom_sf(fill = "cornsilk") + coord_sf(crs = st_crs(4135), xlim = c(-161, -154), ylim = c(18, 23), expand = FALSE, datum = NA)) Using ggdraw from cowplot (tricky to define exact positions; note the use of the ratios of the inset, combined with the ratio of the plot): (ratioAlaska <- (2500000 - 200000) / (1600000 - (-2400000))) ## [1] 0.575 (ratioHawaii <- (23 - 18) / (-154 - (-161))) ## [1] 0.7142857 ggdraw(mainland) + draw_plot(alaska, width = 0.26, height = 0.26 * 10/6 * ratioAlaska, x = 0.05, y = 0.05) + draw_plot(hawaii, width = 0.15, height = 0.15 * 10/6 * ratioHawaii, x = 0.3, y = 0.05) This plot can be saved using ggsave: ggsave("map-us-ggdraw.pdf", width = 10, height = 6) The same kind of plot can be created using grobs, with ggplotGrob, (note the use of xdiff/ydiff and arbitrary ratios): mainland + annotation_custom( grob = ggplotGrob(alaska), xmin = -2750000, xmax = -2750000 + (1600000 - (-2400000))/2.5, ymin = -2450000, ymax = -2450000 + (2500000 - 200000)/2.5 ) + annotation_custom( grob = ggplotGrob(hawaii), xmin = -1250000, xmax = -1250000 + (-154 - (-161))*120000, ymin = -2450000, ymax = -2450000 + (23 - 18)*120000 ) This plot can be saved using ggsave: ggsave("map-inset-grobs.pdf", width = 10, height = 6) The print command can also be used place multiple maps in one plotting area. To specify where each plot is displayed with the print function, the argument viewport needs to include the maximum width and height of each map, and the minimum x and y coordinates of where the maps are located in the plotting area. The argument just will make a position on how the secondary maps will be displayed. All maps are defaulted the same size, until the sizes are adjusted with width and height. vp <- viewport(width = 0.37, height = 0.10, x = 0.20, y =0.25, just = c("bottom")) vp1<- viewport(width = 0.37, height = 0.10, x = 0.35, y =0.25, just = c("bottom")) Theprint function uses the previous specifications that were listed in each plots’ respective viewport, with vp=. print(mainland) print(alaska, vp=vp) print(hawaii, vp=vp1) Several maps connected with arrows To bring about a more lively map arrangement, arrows can be used to bring the viewers eyes to specific areas in the plot. The next example will create a map with zoomed in areas, pointed to by arrows. Firstly, we will create our main map, and then our zoomed in areas. Site coordinates, same as Tutorial #1: sites <- st_as_sf(data.frame(longitude = c(-80.15, -80.1), latitude = c(26.5, 26.8)), coords = c("longitude", "latitude"), crs = 4326, agr = "constant") Mainlaind map of Florida, #1: (florida <- ggplot(data = world) + geom_sf(fill = "antiquewhite1") + geom_sf(data = sites, size = 4, shape = 23, fill = "darkred") + annotate(geom = "text", x = -85.5, y = 27.5, label = "Gulf of Mexico", color = "grey22", size = 4.5) + coord_sf(xlim = c(-87.35, -79.5), ylim = c(24.1, 30.8)) + xlab("Longitude")+ ylab("Latitude")+ theme(panel.grid.major = element_line(colour = gray(0.5), linetype = "dashed", size = 0.5), panel.background = element_rect(fill = "aliceblue"), panel.border = element_rect(fill = NA))) A map for site A is created by layering the map and points we created earlier. ggplot layers geom_sf objects and plot them spatially. (siteA <- ggplot(data = world) + geom_sf(fill = "antiquewhite1") + geom_sf(data = sites, size = 4, shape = 23, fill = "darkred") + coord_sf(xlim = c(-80.25, -79.95), ylim = c(26.65, 26.95), expand = FALSE) + annotate("text", x = -80.18, y = 26.92, label= "Site A", size = 6) + theme_void() + theme(panel.grid.major = element_line(colour = gray(0.5), linetype = "dashed", size = 0.5), panel.background = element_rect(fill = "aliceblue"), panel.border = element_rect(fill = NA))) A map for site B: (siteB <- ggplot(data = world) + geom_sf(fill = "antiquewhite1") + geom_sf(data = sites, size = 4, shape = 23, fill = "darkred") + coord_sf(xlim = c(-80.3, -80), ylim = c(26.35, 26.65), expand = FALSE) + annotate("text", x = -80.23, y = 26.62, label= "Site B", size = 6) + theme_void() + theme(panel.grid.major = element_line(colour = gray(0.5), linetype = "dashed", size = 0.5), panel.background = element_rect(fill = "aliceblue"), panel.border = element_rect(fill = NA))) Coordinates of the two arrows will need to be specified before plotting. The argumemnts x1, and x2 will plot the arrow line from a specific starting x-axis location,x1, and ending in a specific x-axis,x2. The same applies for y1 and y2, with the y-axis respectively: arrowA <- data.frame(x1 = 18.5, x2 = 23, y1 = 9.5, y2 = 14.5) arrowB <- data.frame(x1 = 18.5, x2 = 23, y1 = 8.5, y2 = 6.5) Final map using (ggplot only). The argument geom_segment, will be the coordinates created in the previous script, to plot line segments ending with an arrow using arrow=arrow(): ggplot() + coord_equal(xlim = c(0, 28), ylim = c(0, 20), expand = FALSE) + annotation_custom(ggplotGrob(florida), xmin = 0, xmax = 20, ymin = 0, ymax = 20) + annotation_custom(ggplotGrob(siteA), xmin = 20, xmax = 28, ymin = 11.25, ymax = 19) + annotation_custom(ggplotGrob(siteB), xmin = 20, xmax = 28, ymin = 2.5, ymax = 10.25) + geom_segment(aes(x = x1, y = y1, xend = x2, yend = y2), data = arrowA, arrow = arrow(), lineend = "round") + geom_segment(aes(x = x1, y = y1, xend = x2, yend = y2), data = arrowB, arrow = arrow(), lineend = "round") + theme_void() This plot can be saved using ggsave: ggsave("florida-sites.pdf", width = 10, height = 7) ggdraw could also be used for a similar result, with the argument draw_plot: ggdraw(xlim = c(0, 28), ylim = c(0, 20)) + draw_plot(florida, x = 0, y = 0, width = 20, height = 20) + draw_plot(siteA, x = 20, y = 11.25, width = 8, height = 8) + draw_plot(siteB, x = 20, y = 2.5, width = 8, height = 8) + geom_segment(aes(x = x1, y = y1, xend = x2, yend = y2), data = arrowA, arrow = arrow(), lineend = "round") + geom_segment(aes(x = x1, y = y1, xend = x2, yend = y2), data = arrowB, arrow = arrow(), lineend = "round") ### Drawing beautiful maps programmatically with R, sf and ggplot2 — Part 2: Layers Thu, 10/25/2018 - 02:00 view raw Rmd This tutorial is the second part in a series of three: In the previous part, we presented general concepts with a map with little information (country borders only). The modular approach of ggplot2 allows to successively add additional layers, for instance study sites or administrative delineations, as will be illustrated in this part. Getting started Many R packages are available from CRAN, the Comprehensive R Archive Network, which is the primary repository of R packages. The full list of packages necessary for this series of tutorials can be installed with: install.packages(c("cowplot", "googleway", "ggplot2", "ggrepel", "ggspatial", "libwgeom", "sf", "rworldmap", "rworldxtra")) We start by loading the basic packages necessary for all maps, i.e. ggplot2 and sf. We also suggest to use the classic dark-on-light theme for ggplot2 (theme_bw), which is more appropriate for maps: library("ggplot2") theme_set(theme_bw()) library("sf") The package rworldmap provides a map of countries of the entire world; a map with higher resolution is available in the package rworldxtra. We use the function getMap to extract the world map (the resolution can be set to "low", if preferred): library("rworldmap") library("rworldxtra") world <- getMap(resolution = "high") class(world) ## [1] "SpatialPolygonsDataFrame" ## attr(,"package") ## [1] "sp" The world map is available as a SpatialPolygonsDataFrame from the package sp; we thus convert it to a simple feature using st_as_sf from package sf: world <- st_as_sf(world) class(world) ## [1] "sf" "data.frame" Adding additional layers: an example with points and polygons Field sites (point data) We start by defining two study sites, according to their longitude and latitude, stored in a regular data.frame: (sites <- data.frame(longitude = c(-80.144005, -80.109), latitude = c(26.479005, 26.83))) ## longitude latitude ## 1 -80.14401 26.47901 ## 2 -80.10900 26.83000 The quickest way to add point coordinates is with the general-purpose function geom_point, which works on any X/Y coordinates, of regular data points (i.e. not geographic). As such, we can adjust all characteristics of points (e.g. color of the outline and the filling, shape, size, etc.), for all points, or using grouping from the data (i.e defining their “aesthetics”). In this example, we add the two points as diamonds (shape = 23), filled in dark red (fill = "darkred") and of bigger size (size = 4): ggplot(data = world) + geom_sf() + geom_point(data = sites, aes(x = longitude, y = latitude), size = 4, shape = 23, fill = "darkred") + coord_sf(xlim = c(-88, -78), ylim = c(24.5, 33), expand = FALSE) A better, more flexible alternative is to use the power of sf: Converting the data frame to a sf object allows to rely on sf to handle on the fly the coordinate system (both projection and extent), which can be very useful if the two objects (here world map, and sites) are not in the same projection. To achieve the same result, the projection (here WGS84, which is the CRS code #4326) has to be a priori defined in the sf object: (sites <- st_as_sf(sites, coords = c("longitude", "latitude"), crs = 4326, agr = "constant")) ## Simple feature collection with 2 features and 0 fields ## geometry type: POINT ## dimension: XY ## bbox: xmin: -80.14401 ymin: 26.479 xmax: -80.109 ymax: 26.83 ## epsg (SRID): 4326 ## proj4string: +proj=longlat +datum=WGS84 +no_defs ## geometry ## 1 POINT (-80.14401 26.47901) ## 2 POINT (-80.109 26.83) ggplot(data = world) + geom_sf() + geom_sf(data = sites, size = 4, shape = 23, fill = "darkred") + coord_sf(xlim = c(-88, -78), ylim = c(24.5, 33), expand = FALSE) Note that coord_sf has to be called after all geom_sf calls, as to supersede any former input. States (polygon data) It would be informative to add finer administrative information on top of the previous map, starting with state borders and names. The package maps (which is automatically installed and loaded with ggplot2) provides maps of the USA, with state and county borders, that can be retrieved and converted as sf objects: library("maps") states <- st_as_sf(map("state", plot = FALSE, fill = TRUE)) head(states) ## Simple feature collection with 6 features and 1 field ## geometry type: MULTIPOLYGON ## dimension: XY ## bbox: xmin: -124.3834 ymin: 30.24071 xmax: -71.78015 ymax: 42.04937 ## epsg (SRID): 4326 ## proj4string: +proj=longlat +datum=WGS84 +no_defs ## geometry ID ## 1 MULTIPOLYGON (((-87.46201 3... alabama ## 2 MULTIPOLYGON (((-114.6374 3... arizona ## 3 MULTIPOLYGON (((-94.05103 3... arkansas ## 4 MULTIPOLYGON (((-120.006 42... california ## 5 MULTIPOLYGON (((-102.0552 4... colorado ## 6 MULTIPOLYGON (((-73.49902 4... connecticut State names are part of this data, as the ID variable. A simple (but not necessarily optimal) way to add state name is to compute the centroid of each state polygon as the coordinates where to draw their names. Centroids are computed with the function st_centroid, their coordinates extracted with st_coordinates, both from the package sf, and attached to the state object: states <- cbind(states, st_coordinates(st_centroid(states))) Note the warning, which basically says that centroid coordinates using longitude/latitude data (i.e. WGS84) are not exact, which is perfectly fine for our drawing purposes. State names, which are not capitalized in the data from maps, can be changed to title case using the function toTitleCase from the package tools: library("tools") statesID <- toTitleCase(states$ID) head(states) ## Simple feature collection with 6 features and 3 fields ## geometry type: MULTIPOLYGON ## dimension: XY ## bbox: xmin: -124.3834 ymin: 30.24071 xmax: -71.78015 ymax: 42.04937 ## epsg (SRID): 4326 ## proj4string: +proj=longlat +datum=WGS84 +no_defs ## ID X Y geometry ## 1 Alabama -86.83042 32.80316 MULTIPOLYGON (((-87.46201 3... ## 2 Arizona -111.66786 34.30060 MULTIPOLYGON (((-114.6374 3... ## 3 Arkansas -92.44013 34.90418 MULTIPOLYGON (((-94.05103 3... ## 4 California -119.60154 37.26901 MULTIPOLYGON (((-120.006 42... ## 5 Colorado -105.55251 38.99797 MULTIPOLYGON (((-102.0552 4... ## 6 Connecticut -72.72598 41.62566 MULTIPOLYGON (((-73.49902 4... To continue adding to the map, state data is directly plotted as an additional sf layer using geom_sf. In addition, state names will be added using geom_text, declaring coordinates on the X-axis and Y-axis, as well as the label (from ID), and a relatively big font size. ggplot(data = world) + geom_sf() + geom_sf(data = states, fill = NA) + geom_text(data = states, aes(X, Y, label = ID), size = 5) + coord_sf(xlim = c(-88, -78), ylim = c(24.5, 33), expand = FALSE) We can move the state names slightly to be able to read better “South Carolina” and “Florida”. For this, we create a new variable nudge_y, which is -1 for all states (moved slightly South), 0.5 for Florida (moved slightly North), and -1.5 for South Carolina (moved further South): states$nudge_y <- -1 states$nudge_y[states$ID == "Florida"] <- 0.5 states$nudge_y[states$ID == "South Carolina"] <- -1.5

To improve readability, we also draw a rectangle behind the state name, using the function geom_label instead of geom_text, and plot the map again.

ggplot(data = world) + geom_sf() + geom_sf(data = states, fill = NA) + geom_label(data = states, aes(X, Y, label = ID), size = 5, fontface = "bold", nudge_y = states$nudge_y) + coord_sf(xlim = c(-88, -78), ylim = c(24.5, 33), expand = FALSE) Counties (polygon data) County data are also available from the package maps, and can be retrieved with the same approach as for state data. This time, only counties from Florida are retained, and we compute their area using st_area from the package sf: counties <- st_as_sf(map("county", plot = FALSE, fill = TRUE)) counties <- subset(counties, grepl("florida", counties$ID)) counties$area <- as.numeric(st_area(counties)) head(counties) ## Simple feature collection with 6 features and 2 fields ## geometry type: MULTIPOLYGON ## dimension: XY ## bbox: xmin: -85.98951 ymin: 25.94926 xmax: -80.08804 ymax: 30.57303 ## epsg (SRID): 4326 ## proj4string: +proj=longlat +datum=WGS84 +no_defs ## geometry ID area ## 292 MULTIPOLYGON (((-82.66062 2... florida,alachua 2498863359 ## 293 MULTIPOLYGON (((-82.04182 3... florida,baker 1542466064 ## 294 MULTIPOLYGON (((-85.40509 3... florida,bay 1946587533 ## 295 MULTIPOLYGON (((-82.4257 29... florida,bradford 818898090 ## 296 MULTIPOLYGON (((-80.94747 2... florida,brevard 2189682999 ## 297 MULTIPOLYGON (((-80.89018 2... florida,broward 3167386973 County lines can now be added in a very simple way, using a gray outline: ggplot(data = world) + geom_sf() + geom_sf(data = counties, fill = NA, color = gray(.5)) + coord_sf(xlim = c(-88, -78), ylim = c(24.5, 33), expand = FALSE) We can also fill in the county using their area to visually identify the largest counties. For this, we use the “viridis” colorblind-friendly palette, with some transparency: ggplot(data = world) + geom_sf() + geom_sf(data = counties, aes(fill = area)) + scale_fill_viridis_c(trans = "sqrt", alpha = .4) + coord_sf(xlim = c(-88, -78), ylim = c(24.5, 33), expand = FALSE) Cities (point data) To make a more complete map of Florida, main cities will be added to the map. We first prepare a data frame with the five largest cities in the state of Florida, and their geographic coordinates: flcities <- data.frame(state = rep("Florida", 5), city = c("Miami", "Tampa", "Orlando", "Jacksonville", "Sarasota"), lat = c(25.7616798, 27.950575, 28.5383355, 30.3321838, 27.3364347), lng = c(-80.1917902, -82.4571776, -81.3792365, -81.655651, -82.5306527)) Instead of looking up coordinates manually, the package googleway provides a function google_geocode, which allows to retrieve geographic coordinates for any address, using the Google Maps API. Unfortunately, this requires a valid Google API key (follow instructions here to get a key, which needs to include “Places” for geocoding). Once you have your API key, you can run the following code to automatically retrieve geographic coordinates of the five cities: library("googleway") key <- "put_your_google_api_key_here" # real key needed flcities <- data.frame(state = rep("Florida", 5), city = c("Miami", "Tampa", "Orlando", "Jacksonville", "Sarasota")) coords <- apply(flcities, 1, function(x) { google_geocode(address = paste(x["city"], x["state"], sep = ", "), key = key) }) flcities <- cbind(flcities, do.call(rbind, lapply(coords, geocode_coordinates))) We can now convert the data frame with coordinates to sf format: (flcities <- st_as_sf(flcities, coords = c("lng", "lat"), remove = FALSE, crs = 4326, agr = "constant")) ## Simple feature collection with 5 features and 4 fields ## Attribute-geometry relationship: 4 constant, 0 aggregate, 0 identity ## geometry type: POINT ## dimension: XY ## bbox: xmin: -82.53065 ymin: 25.76168 xmax: -80.19179 ymax: 30.33218 ## epsg (SRID): 4326 ## proj4string: +proj=longlat +datum=WGS84 +no_defs ## state city lat lng geometry ## 1 Florida Miami 25.76168 -80.19179 POINT (-80.19179 25.76168) ## 2 Florida Tampa 27.95058 -82.45718 POINT (-82.45718 27.95058) ## 3 Florida Orlando 28.53834 -81.37924 POINT (-81.37924 28.53834) ## 4 Florida Jacksonville 30.33218 -81.65565 POINT (-81.65565 30.33218) ## 5 Florida Sarasota 27.33643 -82.53065 POINT (-82.53065 27.33643) We add both city locations and names on the map: ggplot(data = world) + geom_sf() + geom_sf(data = counties, fill = NA, color = gray(.5)) + geom_sf(data = flcities) + geom_text(data = flcities, aes(x = lng, y = lat, label = city), size = 3.9, col = "black", fontface = "bold") + coord_sf(xlim = c(-88, -78), ylim = c(24.5, 33), expand = FALSE) This is not really satisfactory, as the names overlap on the points, and they are not easy to read on the grey background. The package ggrepel offers a very flexible approach to deal with label placement (with geom_text_repel and geom_label_repel), including automated movement of labels in case of overlap. We use it here to “nudge” the labels away from land into the see, and connect them to the city locations: library("ggrepel") ggplot(data = world) + geom_sf() + geom_sf(data = counties, fill = NA, color = gray(.5)) + geom_sf(data = flcities) + geom_text_repel(data = flcities, aes(x = lng, y = lat, label = city), fontface = "bold", nudge_x = c(1, -1.5, 2, 2, -1), nudge_y = c(0.25, -0.25, 0.5, 0.5, -0.5)) + coord_sf(xlim = c(-88, -78), ylim = c(24.5, 33), expand = FALSE) Final map For the final map, we put everything together, having a general background map based on the world map, with state and county delineations, state labels, main city names and locations, as well as a theme adjusted with titles, subtitles, axis labels, and a scale bar: library("ggspatial") ggplot(data = world) + geom_sf(fill = "antiquewhite1") + geom_sf(data = counties, aes(fill = area)) + geom_sf(data = states, fill = NA) + geom_sf(data = sites, size = 4, shape = 23, fill = "darkred") + geom_sf(data = flcities) + geom_text_repel(data = flcities, aes(x = lng, y = lat, label = city), fontface = "bold", nudge_x = c(1, -1.5, 2, 2, -1), nudge_y = c(0.25, -0.25, 0.5, 0.5, -0.5)) + geom_label(data = states, aes(X, Y, label = ID), size = 5, fontface = "bold", nudge_y = states$nudge_y) + scale_fill_viridis_c(trans = "sqrt", alpha = .4) + annotation_scale(location = "bl", width_hint = 0.4) + annotation_north_arrow(location = "bl", which_north = "true", pad_x = unit(0.75, "in"), pad_y = unit(0.5, "in"), style = north_arrow_fancy_orienteering) + coord_sf(xlim = c(-88, -78), ylim = c(24.5, 33), expand = FALSE) + xlab("Longitude") + ylab("Latitude") + ggtitle("Observation Sites", subtitle = "(2 sites in Palm Beach County, Florida)") + theme(panel.grid.major = element_line(color = gray(0.5), linetype = "dashed", size = 0.5), panel.background = element_rect(fill = "aliceblue"))

This example fully demonstrates that adding layers on ggplot2 is relatively straightforward, as long as the data is properly stored in an sf object. Adding additional layers would simply follow the same logic, with additional calls to geom_sf at the right place in the ggplot2 sequence.

### Drawing beautiful maps programmatically with R, sf and ggplot2 — Part 1: Basics

Thu, 10/25/2018 - 02:00

view raw Rmd

Beautiful maps in a beautiful world

Maps are used in a variety of fields to express data in an appealing and interpretive way. Data can be expressed into simplified patterns, and this data interpretation is generally lost if the data is only seen through a spread sheet. Maps can add vital context by incorporating many variables into an easy to read and applicable context. Maps are also very important in the information world because they can quickly allow the public to gain better insight so that they can stay informed. It’s critical to have maps be effective, which means creating maps that can be easily understood by a given audience. For instance, maps that need to be understood by children would be very different from maps intended to be shown to geographers.

Knowing what elements are required to enhance your data is key into making effective maps. Basic elements of a map that should be considered are polygon, points, lines, and text. Polygons, on a map, are closed shapes such as country borders. Lines are considered to be linear shapes that are not filled with any aspect, such as highways, streams, or roads. Finally, points are used to specify specific positions, such as city or landmark locations. With that in mind, one need to think about what elements are required in the map to really make an impact, and convey the information for the intended audience. Layout and formatting are the second critical aspect to enhance data visually. The arrangement of these map elements and how they will be drawn can be adjusted to make a maximum impact.

A solution using R and its ecosystem of packages

Current solutions for creating maps usually involves GIS software, such as ArcGIS, QGIS, eSpatial, etc., which allow to visually prepare a map, in the same approach as one would prepare a poster or a document layout. On the other hand, R, a free and open-source software development environment (IDE) that is used for computing statistical data and graphic in a programmable language, has developed advanced spatial capabilities over the years, and can be used to draw maps programmatically.

R is a powerful and flexible tool. R can be used from calculating data sets to creating graphs and maps with the same data set. R is also free, which makes it easily accessible to anyone. Some other advantages of using R is that it has an interactive language, data structures, graphics availability, a developed community, and the advantage of adding more functionalities through an entire ecosystem of packages. R is a scriptable language that allows the user to write out a code in which it will execute the commands specified.

Using R to create maps brings these benefits to mapping. Elements of a map can be added or removed with ease — R code can be tweaked to make major enhancements with a stroke of a key. It is also easy to reproduce the same maps for different data sets. It is important to be able to script the elements of a map, so that it can be re-used and interpreted by any user. In essence, comparing typical GIS software and R for drawing maps is similar to comparing word processing software (e.g. Microsoft Office or LibreOffice) and a programmatic typesetting system such as LaTeX, in that typical GIS software implement a WYSIWIG approach (“What You See Is What You Get”), while R implements a WYSIWYM approach (“What You See Is What You Mean”).

The package ggplot2 implements the grammar of graphics in R, as a way to create code that make sense to the user: The grammar of graphics is a term used to breaks up graphs into semantic components, such as geometries and layers. Practically speaking, it allows (and forces!) the user to focus on graph elements at a higher level of abstraction, and how the data must be structured to achieve the expected outcome. While ggplot2 is becoming the de facto standard for R graphs, it does not handle spatial data specifically. The current state-of-the-art of spatial objects in R relies on Spatial classes defined in the package sp, but the new package sf has recently implemented the “simple feature” standard, and is steadily taking over sp. Recently, the package ggplot2 has allowed the use of simple features from the package sf as layers in a graph1. The combination of ggplot2 and sf therefore enables to programmatically create maps, using the grammar of graphics, just as informative or visually appealing as traditional GIS software.

Drawing beautiful maps programmatically with R, sf and ggplot2

This tutorial is the first part in a series of three:

In this part, we will cover the fundamentals of mapping using ggplot2 associated to sf, and presents the basics elements and parameters we can play with to prepare a map.

Getting started

Many R packages are available from CRAN, the Comprehensive R Archive Network, which is the primary repository of R packages. The full list of packages necessary for this series of tutorials can be installed with:

install.packages(c("cowplot", "googleway", "ggplot2", "ggrepel", "ggspatial", "libwgeom", "sf", "rworldmap", "rworldxtra"))

We start by loading the basic packages necessary for all maps, i.e. ggplot2 and sf. We also suggest to use the classic dark-on-light theme for ggplot2 (theme_bw), which is appropriate for maps:

library("ggplot2") theme_set(theme_bw()) library("sf")

The package rworldmap provides a map of countries of the entire world; a map with higher resolution is available in the package rworldxtra. We use the function getMap to extract the world map (the resolution can be set to "low", if preferred):

library("rworldmap") library("rworldxtra") world <- getMap(resolution = "high") class(world) ## [1] "SpatialPolygonsDataFrame" ## attr(,"package") ## [1] "sp"

The world map is available as a SpatialPolygonsDataFrame from the package sp; we thus convert it to a simple feature using st_as_sf from package sf:

world <- st_as_sf(world) class(world) ## [1] "sf" "data.frame" General concepts illustrated with the world map Data and basic plot (ggplot and geom_sf)

First, let us start with creating a base map of the world using ggplot2. This base map will then be extended with different map elements, as well as zoomed in to an area of interest. We can check that the world map was properly retrieved and converted into an sf object, and plot it with ggplot2:

ggplot(data = world) + geom_sf()

This call nicely introduces the structure of a ggplot call: The first part ggplot(data = world) initiates the ggplot graph, and indicates that the main data is stored in the world object. The line ends up with a + sign, which indicates that the call is not complete yet, and each subsequent line correspond to another layer or scale. In this case, we use the geom_sf function, which simply adds a geometry stored in a sf object. By default, all geometry functions use the main data defined in ggplot(), but we will see later how to provide additional data.

Note that layers are added one at a time in a ggplot call, so the order of each layer is very important. All data will have to be in an sf format to be used by ggplot2; data in other formats (e.g. classes from sp) will be manually converted to sf classes if necessary.

Title, subtitle, and axis labels (ggtitle, xlab, ylab)

A title and a subtitle can be added to the map using the function ggtitle, passing any valid character string (e.g. with quotation marks) as arguments. Axis names are absent by default on a map, but can be changed to something more suitable (e.g. “Longitude” and “Latitude”), depending on the map:

ggplot(data = world) + geom_sf() + xlab("Longitude") + ylab("Latitude") + ggtitle("World map", subtitle = paste0("(", length(unique(world\$NAME)), " countries)"))

Map color (geom_sf)

In many ways, sf geometries are no different than regular geometries, and can be displayed with the same level of control on their attributes. Here is an example with the polygons of the countries filled with a green color (argument fill), using black for the outline of the countries (argument color):

ggplot(data = world) + geom_sf(color = "black", fill = "lightgreen")

The package ggplot2 allows the use of more complex color schemes, such as a gradient on one variable of the data. Here is another example that shows the population of each country. In this example, we use the “viridis” colorblind-friendly palette for the color gradient (with option = "plasma" for the plasma variant), using the square root of the population (which is stored in the variable POP_EST of the world object):

ggplot(data = world) + geom_sf(aes(fill = POP_EST)) + scale_fill_viridis_c(option = "plasma", trans = "sqrt")

Projection and extent (coord_sf)

The function coord_sf allows to deal with the coordinate system, which includes both projection and extent of the map. By default, the map will use the coordinate system of the first layer that defines one (i.e. scanned in the order provided), or if none, fall back on WGS84 (latitude/longitude, the reference system used in GPS). Using the argument crs, it is possible to override this setting, and project on the fly to any projection. This can be achieved using any valid PROJ4 string (here, the European-centric ETRS89 Lambert Azimuthal Equal-Area projection):

ggplot(data = world) + geom_sf() + coord_sf(crs = "+proj=laea +lat_0=52 +lon_0=10 +x_0=4321000 +y_0=3210000 +ellps=GRS80 +units=m +no_defs ")

Spatial Reference System Identifier (SRID) or an European Petroleum Survey Group (EPSG) code are available for the projection of interest, they can be used directly instead of the full PROJ4 string. The two following calls are equivalent for the ETRS89 Lambert Azimuthal Equal-Area projection, which is EPSG code 3035:

ggplot(data = world) + geom_sf() + coord_sf(crs = "+init=epsg:3035") ggplot(data = world) + geom_sf() + coord_sf(crs = st_crs(3035))

The extent of the map can also be set in coord_sf, in practice allowing to “zoom” in the area of interest, provided by limits on the x-axis (xlim), and on the y-axis (ylim). Note that the limits are automatically expanded by a fraction to ensure that data and axes don’t overlap; it can also be turned off to exactly match the limits provided with expand = FALSE:

ggplot(data = world) + geom_sf() + coord_sf(xlim = c(-102.15, -74.12), ylim = c(7.65, 33.97), expand = FALSE)

Scale bar and North arrow (package ggspatial)

Several packages are available to create a scale bar on a map (e.g. prettymapr, vcd, ggsn, or legendMap). We introduce here the package ggspatial, which provides easy-to-use functions…

scale_bar that allows to add simultaneously the north symbol and a scale bar into the ggplot map. Five arguments need to be set manually: lon, lat, distance_lon, distance_lat, and distance_legend. The location of the scale bar has to be specified in longitude/latitude in the lon and lat arguments. The shaded distance inside the scale bar is controlled by the distance_lon argument. while its width is determined by distance_lat. Additionally, it is possible to change the font size for the legend of the scale bar (argument legend_size, which defaults to 3). The North arrow behind the “N” north symbol can also be adjusted for its length (arrow_length), its distance to the scale (arrow_distance), or the size the N north symbol itself (arrow_north_size, which defaults to 6). Note that all distances (distance_lon, distance_lat, distance_legend, arrow_length, arrow_distance) are set to "km" by default in distance_unit; they can also be set to nautical miles with “nm”, or miles with “mi”.

library("ggspatial") ggplot(data = world) + geom_sf() + annotation_scale(location = "bl", width_hint = 0.5) + annotation_north_arrow(location = "bl", which_north = "true", pad_x = unit(0.75, "in"), pad_y = unit(0.5, "in"), style = north_arrow_fancy_orienteering) + coord_sf(xlim = c(-102.15, -74.12), ylim = c(7.65, 33.97)) ## Scale on map varies by more than 10%, scale bar may be inaccurate

Note the warning of the inaccurate scale bar: since the map use unprojected data in longitude/latitude (WGS84) on an equidistant cylindrical projection (all meridians being parallel), length in (kilo)meters on the map directly depends mathematically on the degree of latitude. Plots of small regions or projected data will often allow for more accurate scale bars.

Country names and other names (geom_text and annotate)

The world data set already contains country names and the coordinates of the centroid of each country (among more information). We can use this information to plot country names, using world as a regular data.frame in ggplot2. We first check the country name information:

head(world[, c("NAME", "LON", "LAT")]) ## Simple feature collection with 6 features and 3 fields ## geometry type: MULTIPOLYGON ## dimension: XY ## bbox: xmin: -70.06164 ymin: -18.0314 xmax: 74.89231 ymax: 60.48075 ## epsg (SRID): 4326 ## proj4string: +proj=longlat +datum=WGS84 +no_defs ## NAME LON LAT geometry ## 1 Aruba -69.97345 12.51678 MULTIPOLYGON (((-69.87609 1... ## 2 Afghanistan 66.00845 33.83627 MULTIPOLYGON (((71.02458 38... ## 3 Angola 17.56405 -12.32934 MULTIPOLYGON (((13.98233 -5... ## 4 Anguilla -63.05667 18.22432 MULTIPOLYGON (((-63.0369 18... ## 5 Albania 20.05399 41.14258 MULTIPOLYGON (((20.06496 42... ## 6 Aland 19.94429 60.22851 MULTIPOLYGON (((19.91892 60...

The function geom_text can be used to add a layer of text to a map using geographic coordinates. The function requires the data needed to enter the country names, which is the same data as the world map. Again, we have a very flexible control to adjust the text at will on many aspects:

• The size (argument size);
• The alignment, which is centered by default on the coordinates provided. The text can be adjusted horizontally or vertically using the arguments hjust and vjust, which can either be a number between 0 (right/bottom) and 1 (top/left) or a character (“left”, “middle”, “right”, “bottom”, “center”, “top”). The text can also be offset horizontally or vertically with the argument nudge_x and nudge_y;
• The font of the text, for instance its color (argument color) or the type of font (fontface);
• The overlap of labels, using the argument check_overlap, which removes overlapping text. Alternatively, when there is a lot of overlapping labels, the package ggrepel provides a geom_text_repel function that moves label around so that they do not overlap.

Additionally, the annotate function can be used to add a single character string at a specific location, as demonstrated here to add the Gulf of Mexico:

ggplot(data = world) + geom_sf() + geom_text(aes(LON, LAT, label = NAME), size = 4, hjust = "left", color = "darkblue", fontface = "bold", check_overlap = TRUE) + annotate(geom = "text", x = -90, y = 26, label = "Gulf of Mexico", fontface = "italic", color = "grey22", size = 6) + coord_sf(xlim = c(-102.15, -74.12), ylim = c(7.65, 33.97), expand = FALSE)

Final map

Now to make the final touches, the theme of the map can be edited to make it more appealing. We suggested the use of theme_bw for a standard theme, but there are many other themes that can be selected from (see for instance ?ggtheme in ggplot2, or the package ggthemes which provide several useful themes). Moreover, specific theme elements can be tweaked to get to the final outcome:

• Position of the legend: Although not used in this example, the argument legend.position allows to automatically place the legend at a specific location (e.g. "topright", "bottomleft", etc.);
• Grid lines (graticules) on the map: by using panel.grid.major and panel.grid.minor, grid lines can be adjusted. Here we set them to a gray color and dashed line type to clearly distinguish them from country borders lines;
• Map background: the argument panel.background can be used to color the background, which is the ocean essentially, with a light blue;
• Many more elements of a theme can be adjusted, which would be too long to cover here. We refer the reader to the documentation for the function theme.
ggplot(data = world) + geom_sf(fill = "antiquewhite1") + geom_text(aes(LON, LAT, label = NAME), size = 4, hjust = "left", color = "darkblue", fontface = "bold", check_overlap = TRUE) + annotate(geom = "text", x = -90, y = 26, label = "Gulf of Mexico", fontface = "italic", color = "grey22", size = 6) + annotation_scale(location = "bl", width_hint = 0.5) + annotation_north_arrow(location = "bl", which_north = "true", pad_x = unit(0.75, "in"), pad_y = unit(0.5, "in"), style = north_arrow_fancy_orienteering) + coord_sf(xlim = c(-102.15, -74.12), ylim = c(7.65, 33.97), expand = FALSE) + xlab("Longitude") + ylab("Latitude") + ggtitle("Map of the Gulf of Mexico and the Caribbean Sea") + theme(panel.grid.major = element_line(color = gray(.5), linetype = "dashed", size = 0.5), panel.background = element_rect(fill = "aliceblue"))

Saving the map with ggsave

The final map now ready, it is very easy to save it using ggsave. This function allows a graphic (typically the last plot displayed) to be saved in a variety of formats, including the most common PNG (raster bitmap) and PDF (vector graphics), with control over the size and resolution of the outcome. For instance here, we save a PDF version of the map, which keeps the best quality, and a PNG version of it for web purposes:

ggsave("map.pdf") ggsave("map_web.png", width = 6, height = 6, dpi = "screen")
1. Note: Support of sf objects is available since version 3.0.0 of ggplot2, recently released on CRAN.

### RSAGA 1.0.0

Sat, 10/20/2018 - 02:00

RSAGA 1.0.0 has been released on CRAN. The RSAGA package provides an interface between R and the open-source geographic information system SAGA, which offers a variety of geoscientific methods to analyse spatial data. SAGA GIS is supported on Windows, Linux, Mac OS and FreeBSD and is available from sourceforge.

After a long break in the development, RSAGA now supports SAGA GIS 2.3 LTS to 6.2.0. The main issue with the maintenance of RSAGA lies in the changing names of the parameters used by the SAGA command line program. These parameters are essential to provide the user with wrapper functions like rsaga.slope.asp.curv and rsaga.wetness.index. We were able to update all parameter names, while keeping the support for older SAGA versions.

RSAGA is able to find SAGA installations on Windows, Linux and Mac OS automatically again with the build-in function rsaga.env. For new users we would like to refer to the updated vignette.

Further development will go into the integration of Travis CI to test new SAGA versions automatically. We would like to reduce the effort of maintaining the list of parameter names. A solution could be the collaboration with the recently started Rsagacmd project, which uses the html help files of SAGA GIS to get the parameter names.

The development of RSAGA is now integrated into the r-spatial.org community. If you find any bugs please report them at our new development repository r-spatial/RSAGA. We are happy about every project that involves RSAGA. Please help us with bug reports, new ideas and feedback. Stay tuned for further updates.

### Quantities for R – Ready for a CRAN release

Fri, 08/31/2018 - 02:00

### RSAGA 1.0.0

Mon, 08/20/2018 - 02:00

### mapedit and leaflet.js &gt; 1.0

Sun, 07/15/2018 - 02:00

### mapedit and leaflet.js &gt; 1.0

Sun, 07/15/2018 - 02:00

### Data wrangling operations with quantities

Wed, 06/27/2018 - 02:00

### Data wrangling operations with quantities

Wed, 06/27/2018 - 02:00

### RSAGA 1.0.0

Wed, 06/20/2018 - 02:00

### Geocomputation with R: workshop at eRum

Thu, 05/31/2018 - 02:00

### Geocomputation with R: workshop at eRum

Thu, 05/31/2018 - 02:00

### RSAGA 1.0.0

Sun, 05/20/2018 - 02:00