To access all pages and datasets consider registering first. Already registered? Login here. Forgot your password?

Change Font Size

Change Screen

Change Profile

Change Layouts

Change Menu Styles



Stay informed on our latest news!

Syndicate content

GEOSTAT tutorials

GEOSTAT software

Software iconList of FOSS software used in this course and installation instructions. Follow these instructions to prepare and customize the software before the beginning of the course.

Literature used

ASDAR bookList of books / lecture notes used in this course. See also: CRAN Task View: Analysis of Spatial Data.


« February 2017 »

R spatial

Syndicate content
R Spatial software blogs and ideas
Updated: 38 min 54 sec ago

mapedit - interactively edit spatial data in R

29 January 2017 - 7:00pm

[view raw Rmd]

The R ecosystem offers a powerful set of packages for geospatial analysis. For a comprehensive list see the CRAN Task View: Analysis of Spatial Data. Yet, many geospatial workflows require interactivity for smooth uninterrupted completion. With new tools, such as htmlwidgets, shiny, and crosstalk, we can now inject this useful interactivity without leaving the R environment. In the first phase of the mapedit project, we have focused on experimenting and creating proof of concepts for the following three objectives:

  1. drawing, editing, and deleting features,

  2. selecting and querying of features and map regions,

  3. editing attributes.

Install mapedit

To run the code in the following discussion, please install with devtools::install_github. Please be aware that the current functionality is strictly a proof of concept, and the API will change rapidly and dramatically.

devtools::install_github("bhaskarvk/leaflet.extras") devtools::install_github("r-spatial/mapedit") Drawing, Editing, Deleting Features

We would like to set up an easy process for CRUD (create, read, update, and delete) of map features. The function edit_map demonstrates a first step toward this goal.

Proof of Concept 1 | Draw on Blank Map

To see how we might add some features, let’s start with a blank map, and then feel free to draw, edit, and delete with the Leaflet.Draw toolbar on the map. Once finished drawing simply press “Done”.

library(leaflet) library(mapedit) what_we_created <- leaflet() %>% addTiles() %>% edit_map()

edit_map returns a list with drawn, edited, deleted, and finished features as GeoJSON. In this case, if we would like to see our finished creation we can focus on what_we_created$finished. Since this is GeoJSON, the easiest way to see what we just created will be to use the addGeoJSON function from leaflet. This works well with polylines, polygons, rectangles, and points, but circles will be treated as points without some additional code. In future versions of the API it is likely that mapedit will return simple features gemometries rather than geojson by default.

leaflet() %>% addTiles() %>% addGeoJSON(what_we_created$finished) Proof of Concept 2 | Edit and Delete Existing Features

As an extension of the first proof of concept, we might like to edit and/or delete existing features. Let’s play Donald Trump for this exercise and use the border between Mexico and the United States for California and Arizona. For the sake of the example, let’s use a simplified polyline as our border. As we have promised we want to build a wall, but if we could just move the border a little in some places, we might be able to ease construction.

library(sf) # simplified border for purpose of exercise border <- st_as_sfc( "LINESTRING(-109.050197582692 31.3535554844322, -109.050197582692 31.3535554844322, -111.071681957692 31.3723176640684, -111.071681957692 31.3723176640684, -114.807033520192 32.509681296831, -114.807033520192 32.509681296831, -114.741115551442 32.750242384668, -114.741115551442 32.750242384668, -117.158107738942 32.5652527715121, -117.158107738942 32.5652527715121)" ) %>% st_set_crs(4326) # plot quickly for visual inspection plot(border)

Since we are Trump, we can do what we want, so let’s edit the line to our liking. We will use mapview for our interactive map since it by default gives us an OpenTopoMap layer and the develop branch includes preliminary simple features support. With our new border and fence, we will avoid the difficult mountains and get a little extra beachfront.

# use develop branch of mapview with simple features support # devtools::install_github("environmentalinformatics-marburg/mapview@develop") library(mapview) new_borders <- mapview(border)@map %>% edit_map("border")

Now, we can quickly inspect our new borders and then send the coordinates to the wall construction company.

leaflet() %>% addTiles() %>% fitBounds(-120, 35, -104, 25) %>% addGeoJSON(new_borders$drawn)


If you played enough with the border example, you might notice a couple of glitches and missing functionality. This is a good time for a reminder that this is alpha and intended as a proof of concept. Please provide feedback, so that we can insure a quality final product. In this case, the older version of Leaflet.Draw in RStudio Viewer has some bugs, so clicking an existing point creates a new one rather than allowing editing of that point. Also, the returned list from edit_map has no knowledge of the provided features.

Selecting Regions

The newest version of leaflet provides crosstalk support, but support is currently limited to addCircleMarkers. This functionality is enhanced by the sf use of list columns and integration with dplyr verbs. Here is a quick example with the breweries91 data from mapview.

library(crosstalk) library(mapview) library(sf) library(shiny) library(dplyr) # convert breweries91 from mapview into simple features # and add a Century column that we will use for selection brew_sf <- st_as_sf(breweries91) %>% mutate(century = floor(founded/100)*100) %>% filter(! %>% mutate(id=1:n()) pts <- SharedData$new(brew_sf, key = ~id, group = "grp1") ui <- fluidPage( fluidRow( column(4, filter_slider(id="filterselect", label="Century Founded", sharedData=pts, column=~century, step=50)), column(6, leafletOutput("leaflet1")) ), h4("Selected points"), verbatimTextOutput("selectedpoints") ) server <- function(input, output, session) { # unfortunatly create SharedData again for scope pts <- SharedData$new(brew_sf, key = ~id, group = "grp1") lf <- leaflet(pts) %>% addTiles() %>% addMarkers() not_rendered <- TRUE # hack to only draw leaflet once output$leaflet1 <- renderLeaflet({ if(req(not_rendered,cancelOutput=TRUE)) { not_rendered <- FALSE lf } }) output$selectedpoints <- renderPrint({ df <- pts$data(withSelection = TRUE) cat(nrow(df), "observation(s) selected\n\n") str(dplyr::glimpse(df)) }) } shinyApp(ui, server)

With mapedit, we would like to enhance the geospatial crosstalk integration to extend beyond leaflet::addCircleMarkers. In addition, we would like to provide an interactive interface to the geometric operations of sf, such as st_intersects(), st_difference(), and st_contains().

Proof of Concept 3

As a select/query proof of concept, assume we want to interactively select some US states for additional analysis. We will build off Bhaskar Karambelkar’s leaflet projection example using Bob Rudis albersusa package.

# use @bhaskarvk USA Albers with leaflet code # #devtools::install_github("hrbrmstr/albersusa") library(albersusa) library(sf) library(leaflet) library(mapedit) spdf <- usa_composite() %>% st_as_sf() pal <- colorNumeric( palette = "Blues", domain = spdf$pop_2014 ) bounds <- c(-125, 24 ,-75, 45) (lf <- leaflet( options= leafletOptions( worldCopyJump = FALSE, crs=leafletCRS( crsClass="L.Proj.CRS", code='EPSG:2163', proj4def='+proj=laea +lat_0=45 +lon_0=-100 +x_0=0 +y_0=0 +a=6370997 +b=6370997 +units=m +no_defs', resolutions = c(65536, 32768, 16384, 8192, 4096, 2048,1024, 512, 256, 128) ))) %>% fitBounds(bounds[1], bounds[2], bounds[3], bounds[4]) %>% setMaxBounds(bounds[1], bounds[2], bounds[3], bounds[4]) %>% addPolygons( data=spdf, weight = 1, color = "#000000", # adding group necessary for identification group = ~iso_3166_2, fillColor=~pal(pop_2014), fillOpacity=0.7, label=~stringr::str_c(name,' ', format(pop_2014, big.mark=",")), labelOptions= labelOptions(direction = 'auto')#, #highlightOptions = highlightOptions( # color='#00ff00', bringToFront = TRUE, sendToBack = TRUE) ) ) # test out select_map with albers example select_map( lf, style_false = list(weight = 1), style_true = list(weight = 4) )

The select_map() function will return a data.frame with an id/group column and a selected column. select_map() will work with nearly all leaflet overlays and offers the ability to customize the styling of selected and unselected features.

Editing Attributes

A common task in geospatial analysis involves editing or adding feature attributes. While much of this can be accomplished in the R console, an interactive UI on a reference map can often help perform this task. Mapbox’s provides a good reference point for some of the features we would like to provide in mapedit.

Proof of Concept 4

As a proof of concept, we made a Shiny app that thinly wraps a slightly modified Currently, we will have to pretend that there is a mechanism to load R feature data onto the map, since this functionality does not yet exist.

library(shiny) edited_features <- runGitHub( "", "timelyportfolio", ref="shiny" )


mapedit hopes to add useful interactivity to your geospatial workflows by leveraging powerful new functionality in R with the interactivity of HTML, JavaScript, and CSS. mapedit will be better with your feedback, requests, bug reports, use cases, and participation. We will report on progress periodically with blog posts on this site, and we will develop openly on the mapedit Github repo.

sf - plot, graticule, transform, units, cast, is

11 January 2017 - 7:00pm

[view raw Rmd]

This year began with the R Consortium blog on simple features:

#rstats A new post by Edzer Pebesma reviews the status of the R Consortium’s Simple Features project:

— Joseph Rickert (@RStudioJoe) January 3, 2017

This blog post describes changes of sf 0.2-8 and upcoming 0.2-9, compared to 0.2-7, in more detail.

Direct linking to Proj.4

Since 0.2-8, sf links directly to the Proj.4 library:

library(sf) ## Linking to GEOS 3.5.0, GDAL 2.1.0, proj.4 4.9.2

before that, it would use the projection interface of GDAL, which uses Proj.4, but exposes only parts of it. The main reason for switching to Proj.4 is the ability for stronger error checking. For instance, where GDAL would interpret any unrecognized field for +datum as WGS84:

# sf 0.2-7: > st_crs("+proj=longlat +datum=NAD26") $epsg [1] NA $proj4string [1] "+proj=longlat +ellps=WGS84 +no_defs" attr(,"class") [1] "crs"

Now, with sf 0.2-8 we get a proper error in case of an unrecognized +datum field:

t = try(st_crs("+proj=longlat +datum=NAD26")) attr(t, "condition") ## <simpleError in make_crs(x): invalid crs: +proj=longlat +datum=NAD26, reason: unknown elliptical parameter name> plotting

The default plot method for sf objects (simple features with attributes, or data.frames with a simple feature geometry list-column) now plots the set of maps, one for each attribute, with automatic color scales:

nc = st_read(system.file("gpkg/nc.gpkg", package="sf"), quiet = TRUE) plot(nc)

well, that is all there is, basically. For plotting a single map, select the appropriate attribute


or only the geometry:



Package sf gained a function st_graticule to generate graticules, grids formed by lines with constant longitude or latitude. Suppose we want to project nc to the state plane, and plot it with a longitude latitude graticule in NAD27 (the original datum of nc):

nc_sp = st_transform(nc["SID79"], 32119) # NC state plane, m plot(nc_sp, graticule = st_crs(nc), axes = TRUE)

The underlying function, st_graticule, can be used directly to generate a simple object with graticules, but is rather meant to be used by plotting functions that benefit from a graticule in the background, such as plot or ggplot. The function provides the end points of graticules and the angle at which they end; an example for using Lambert equal area on the USA is found in the help page of st_graticule:

The default plotting method for simple features with longitude/latitude coordinates is the equirectangular projection, (also called geographic projection, or equidistant cylindrical (eqc) projection) which linearly maps longitude and latitude into \(x\) and \(y\), transforming \(y\) such that in the center of the map 1 km easting equals 1 km northing. This is also the default for sp::plot, sp::spplot and ggplot2::coord_quickmap. The official Proj.4 transformation for this is found here.

We can obtain e.g. a plate carrée projection (with one degree latitude equaling one degree longitude) with

caree = st_crs("+proj=eqc") plot(st_transform(nc[1], caree), graticule = st_crs(nc), axes=TRUE, lon = -84:-76)

and we see indeed that the lon/lat grid is formed of squares.

The usual R plot for nc obtained by

plot(nc[1], graticule = st_crs(nc), axes = TRUE)

corrects for latitude. The equivalent, officially projected map is obtained by using the eqc projection with the correct latitude:

mean(st_bbox(nc)[c(2,4)]) ## [1] 35.23582 eqc = st_crs("+proj=eqc +lat_ts=35.24") plot(st_transform(nc[1], eqc), graticule = st_crs(nc), axes=TRUE)

so that in the center of these (identical) maps, 1 km east equals 1 km north.

geosphere and units support

sf now uses functions in package geosphere to compute distances or areas on the sphere. This is only possible for points and not for arbitrary feature geometries:

centr = st_centroid(nc) ## Warning in st_centroid.sfc(st_geometry(x)): st_centroid does not give ## correct centroids for longitude/latitude data st_distance(centr[c(1,10)])[1,2] ## 34093.21 m

As a comparison, we can compute distances in two similar projections, each having a different measurement unit:

centr.sp = st_transform(centr, 32119) # NC state plane, m (m <- st_distance(centr.sp[c(1,10)])[1,2]) ## 34097.54 m centr.ft = st_transform(centr, 2264) # NC state plane, US feet (ft <- st_distance(centr.ft[c(1,10)])[1,2]) ## 111868.3 US_survey_foot

and we see that the units are reported, by using package units. To verify that the distances are equivalent, we can compute

ft/m ## 1 1

which does automatic unit conversion before computing the ratio. (Here, 1 1 should be read as one, unitless (with unit 1)).

For spherical distances, sf uses geosphere::distGeo. It passes on the parameters of the datum, as can be seen from

st_distance(centr[c(1,10)])[1,2] # NAD27 ## 34093.21 m st_distance(st_transform(centr, 4326)[c(1,10)])[1,2] # WGS84 ## 34094.28 m

Other measures come with units too, e.g. st_area

st_area(nc[1:5,]) ## Units: m^2 ## [1] 1137388604 611077263 1423489919 694546292 1520740530

units vectors can be coerced to numeric by

as.numeric(st_area(nc[1:5,])) ## [1] 1137388604 611077263 1423489919 694546292 1520740530 type casting

With help from Mike Sumner and Etienne Racine, we managed to get a working st_cast, which helps converting one geometry in another.

casting individual geometries (sfg)

Casting individual geometries will close polygons when needed:

st_point(c(0,1)) %>% st_cast("MULTIPOINT") ## MULTIPOINT(0 1) st_linestring(rbind(c(0,1), c(5,6))) %>% st_cast("MULTILINESTRING") ## MULTILINESTRING((0 1, 5 6)) st_linestring(rbind(c(0,0), c(1,0), c(1,1))) %>% st_cast("POLYGON") ## POLYGON((0 0, 1 0, 1 1, 0 0))

and will warn on loss of information:

st_linestring(rbind(c(0,1), c(5,6))) %>% st_cast("POINT") ## Warning in st_cast.LINESTRING(., "POINT"): point from first coordinate only ## POINT(0 1) st_multilinestring(list(matrix(1:4,2), matrix(1:6,,2))) %>% st_cast("LINESTRING") ## Warning in st_cast.MULTILINESTRING(., "LINESTRING"): keeping first ## linestring only ## LINESTRING(1 3, 2 4) casting sets of geometries (sfc)

Casting sfc objects can group or ungroup geometries:

# group: st_sfc(st_point(0:1), st_point(2:3), st_point(4:5)) %>% st_cast("MULTIPOINT", ids = c(1,1,2)) ## Geometry set for 2 features ## geometry type: MULTIPOINT ## dimension: XY ## bbox: xmin: 0 ymin: 1 xmax: 4 ymax: 5 ## epsg (SRID): NA ## proj4string: NA ## MULTIPOINT(0 1, 2 3) ## MULTIPOINT(4 5) # ungroup: st_sfc(st_multipoint(matrix(1:4,,2))) %>% st_cast("POINT") ## Geometry set for 2 features ## geometry type: POINT ## dimension: XY ## bbox: xmin: 1 ymin: 3 xmax: 2 ymax: 4 ## epsg (SRID): NA ## proj4string: NA ## POINT(1 3) ## POINT(2 4)

st_cast with no to argument will convert mixes of GEOM and MULTIGEOM to MULTIGEOM, where GEOM is POINT, LINESTRING or POLYGON, e.g.

st_sfc( st_multilinestring(list(matrix(5:8,,2))), st_linestring(matrix(1:4,2)) ) %>% st_cast() ## Geometry set for 2 features ## geometry type: MULTILINESTRING ## dimension: XY ## bbox: xmin: 1 ymin: 3 xmax: 6 ymax: 8 ## epsg (SRID): NA ## proj4string: NA ## MULTILINESTRING((5 7, 6 8)) ## MULTILINESTRING((1 3, 2 4))

or unpack geometry collections:

x <- st_sfc( st_multilinestring(list(matrix(5:8,,2))), st_point(c(2,3)) ) %>% st_cast("GEOMETRYCOLLECTION") x ## Geometry set for 2 features ## geometry type: GEOMETRYCOLLECTION ## dimension: XY ## bbox: xmin: 2 ymin: 3 xmax: 6 ymax: 8 ## epsg (SRID): NA ## proj4string: NA ## GEOMETRYCOLLECTION(MULTILINESTRING((5 7, 6 8))) ## GEOMETRYCOLLECTION(POINT(2 3)) x %>% st_cast() ## Geometry set for 2 features ## geometry type: GEOMETRY ## dimension: XY ## bbox: xmin: 2 ymin: 3 xmax: 6 ymax: 8 ## epsg (SRID): NA ## proj4string: NA ## MULTILINESTRING((5 7, 6 8)) ## POINT(2 3) casting on sf objects

The casting of sf objects works in principle identical, except that for ungrouping, attributes are repeated (and might give rise to warning messages),

# ungroup: st_sf(a = 1, geom = st_sfc(st_multipoint(matrix(1:4,,2)))) %>% st_cast("POINT") ## Warning in st_cast.sf(., "POINT"): repeating attributes for all sub- ## geometries for which they may not be constant ## Simple feature collection with 2 features and 1 field ## geometry type: POINT ## dimension: XY ## bbox: xmin: 1 ymin: 3 xmax: 2 ymax: 4 ## epsg (SRID): NA ## proj4string: NA ## c.1..1. geom ## 1 1 POINT(1 3) ## 2 1 POINT(2 4)

and for grouping, attributes are aggregated, which requires an aggregation function

# group: st_sf(a = 1:3, geom = st_sfc(st_point(0:1), st_point(2:3), st_point(4:5))) %>% st_cast("MULTIPOINT", ids = c(1,1,2), FUN = mean) ## Simple feature collection with 2 features and 2 fields ## geometry type: MULTIPOINT ## dimension: XY ## bbox: xmin: 0 ymin: 1 xmax: 4 ymax: 5 ## epsg (SRID): NA ## proj4string: NA ## a geom ## 1 1 1.5 MULTIPOINT(0 1, 2 3) ## 2 2 3 MULTIPOINT(4 5) type selection

In case we have a mix of geometry types, we can select those of a particular geometry type by the new helper function st_is. As an example we create a mix of polygons, lines and points:

g = st_makegrid(n=c(2,2), offset = c(0,0), cellsize = c(2,2)) s = st_sfc(st_polygon(list(rbind(c(1,1), c(2,1),c(2,2),c(1,2),c(1,1))))) i = st_intersection(st_sf(a=1:4, geom = g), st_sf(b = 2, geom = s)) ## Warning in st_intersection(st_sf(a = 1:4, geom = g), st_sf(b = 2, geom = ## s)): attribute variables are assumed to be spatially constant throughout ## all geometries i ## Simple feature collection with 4 features and 2 fields ## geometry type: GEOMETRY ## dimension: XY ## bbox: xmin: 1 ymin: 1 xmax: 2 ymax: 2 ## epsg (SRID): NA ## proj4string: NA ## a b geometry ## 1 1 2 POLYGON((2 2, 2 1, 1 1, 1 2... ## 2 2 2 LINESTRING(2 2, 2 1) ## 3 3 2 LINESTRING(1 2, 2 2) ## 4 4 2 POINT(2 2)

and can select using dplyr::filter, or directly using st_is:

filter(i, st_is(geometry, c("POINT"))) ## Simple feature collection with 1 feature and 2 fields ## geometry type: GEOMETRY ## dimension: XY ## bbox: xmin: 1 ymin: 1 xmax: 2 ymax: 2 ## epsg (SRID): NA ## proj4string: NA ## a b geometry ## 1 4 2 POINT(2 2) filter(i, st_is(geometry, c("POINT", "LINESTRING"))) ## Simple feature collection with 3 features and 2 fields ## geometry type: GEOMETRY ## dimension: XY ## bbox: xmin: 1 ymin: 1 xmax: 2 ymax: 2 ## epsg (SRID): NA ## proj4string: NA ## a b geometry ## 1 2 2 LINESTRING(2 2, 2 1) ## 2 3 2 LINESTRING(1 2, 2 2) ## 3 4 2 POINT(2 2) st_is(i, c("POINT", "LINESTRING")) ## [1] FALSE TRUE TRUE TRUE

OpenEO: a GDAL for Earth Observation Analytics

28 November 2016 - 7:00pm

Earth observation data, or satellite imagery, is one of the richest sources to find out how our Earth is changing. The amount of Earth observation data we collect Today has become too large to analyze on a single computer. Although most of the Earth observation data is available for free, practical difficulties we are currently facing when we try to analyze it seriously constrains the potential benefits for citizens, industry, scientists, or society. How did we get here?

GIS: the 80’s

To understand the current difficulty when analyzing big Earth observation data analysis, let us look how geographic information systems (GIS) developed over the past decades. In the early days, they would be isolated structures:

where one would get things done in isolation, without any chance of verifying or comparing it with another system: these were expensive systems, hard to set up and maintain, and (with the exception of GRASS) closed databases and closed source software.

File formats: the 90’s

In the 90’s, file formats came up: several systems started supporting various file formats, and dedicated programs that would do certain file format conversions became available. This made many new things possible, such as the integration of S-Plus with Arc-Info, but to fully realize this, each software would have to implement drivers for every file format. Developing new applications or introducing a new file format are both difficult in this model.

GDAL: the 00’s

Then came GDAL! This Geospatial Data Abstraction Layer is a software library that reads and writes raster and vector data. Instead of having to write drivers for each file format, application developers needed to only write a GDAL client driver. When proposing a new file format, instead of having to convince many application developers to support it, only a GDAL driver for the new format was required to realize quick adoption. Instead of many-to-many links, only many-to-one links were needed:

R and python suddenly became strong GIS and spatial modelling tools. ArcGIS users could suddenly deal with the weird data formats from hydrologists, meteorologists, and climate scientists. Developing innovative applications and introducing new file formats became attractive.

For analyzing big Earth observation data, GDAL has its limitations, including:

  • it has weak data semantics: the data model does not include observation time or band wavelength (color), but addresses bands by dataset name and band number,
  • raster datasets cannot tell whether pixels refer to points, cells with constant value, or cells with an aggregated value; most regridding methods silently assume the least likely option (points),
  • the library cannot do much processing, meaning that clients that do the processing and use GDAL for reading and writing need to be close to where the data is, which is far away from the user’s computer.
Big Earth Observation data: the 10’s

We currently see a plethora of cloud environments that all try to solve the problem of how to effectively analyze Earth Observation data that are too large to download and process locally. It very much reminds of the isolated GIS of the 80’s, in the sense that strongly differing systems have been built with large efforts, and when solving a certain problem in one system it is practically impossible to try to solve it in another system too. Several systems work with data processing back-ends that are not open source. The following figure only shows a few systems for illustration:

Open Earth Observation Science

For open science, open data is a necessary but not sufficient condition. In order to fight the reproducibility crisis, analysis procedures need to be fully transparent and reproducible. To get there, it needs to be simple to execute a given analysis on two different data centers to verify that they yield identical results. Today, this sounds rather utopian. On the other hand, we believe that one of the reasons that data science has taken off is that the software making it possible (e.g. python, R, spark) was able to hide irrelevant details and had reached at a stage where it was transparent, stable, and robust.

Google Earth Engine is an excellent example of an interface that is simple, in the sense that users can directly address sensors and observation times and work with composites rather than having to comb through the raw data consisting of large collections of files (“scenes”). The computational back-end however is not transparent, it lets the user execute functions as far as they are provided by the API, but not inspect the source code of these function, or modify them.

How then can we make progress out of a world of currently incompatible, isolated Earth Observation data center efforts? Learning from the past, the solution might be a central interface between users who are allowed to think in high-level objects (“Landsat 7 over Western Europe, 2005-2015”) and a set of computational Earth Observation data centers that agree on supporting this interface. A GDAL for Earth Observation Analytics, so to speak. We’ll call it OpenEO.

Open EO Interface: the 20’s

The following figure shows schematically how this could look like: Users write scripts in a neutral language that addresses the data and operations, but ignores the computer architecture of the back-end. The back-end carries out the computations and returns results. It needs to be able to identify users (e.g. verify its permission to use it), but also to tell which data they provide and which computational operations they afford.

More in detail, and following the GDAL model closely, the architecture contains of

  • a client and back-end neutral set of API’s for both sides, defining which functions users can call, and which functions back-ends have to obey to,
  • a driver for each back-end that translates the neutral requirements into the platform specific offerings, or information that a certain function is not available (e.g. creating new datasets in a write-only back-end)
  • a driver for each client that binds the interfaces to the particular front-end such as R or python

With an architecture like this, it would not only become easy and attractive for users to change from one back-end to the other, but also make it much easier to choose between back-ends, because the value propositions of the back-end providers are now comparable, on paper as well as in practice.

A rough idea of the architecture is shown in this figure:

One of the drivers will interface collections of scenes in a back-end, or on the local hard drive. GDAL will remain playing an important role in back-ends if the back-end uses files in a format supported by GDAL, and in the front-end if the (small) results of big computations are fetched.

The way forward

We, as the authors of this piece, have started to work on these ideas in current activities, projects, and proposals. We are planning to submit a proposal to the EC H2020 call EO-2-2017: EO Big Data Shift. We hope that the call writers (and the reviewers) have the same problem in mind as what we explain above.

We are publishing this article because we believe this work has to be done, and only a concerted effort can realize it. When you agree and would like to participate, please get in touch with us. When successful, it will in the longer run benefit science, industry, and all of us.

Earlier blogs related to this topic

Earlier posts related to this:

Simple features now on CRAN

1 November 2016 - 7:00pm

Submitting a package to CRAN is always an adventure, submitting a package with lots of external dependencies even more so. A week ago I submitted the simple features for R package to CRAN, and indeed, hell broke loose! Luckily, the people behind CRAN are extremely patient, friendly and helpful, but they let your code test on a big server farm with computers in 13 different flavors.

Of course we test code on linux and windows after every code push to github, but that feels like talking to a machine, like remotely compiling and testing. CRAN feels different: you first need to manually confirm that you did your utmost best to solve problems, and then there is a person telling you everything still remaining! Of course, this is of incredible help, and a big factor in the R community’s sustainability.

Package sf is somewhat special in that it links to GEOS and GDAL, and in particular GDAL links, depending on how it is installed, to many (77 in my case) other libraries, each with their own versions. After first submission of sf 0.2-0, I ran into the following issues with my code.

sf 0.2-0
  • I had to change all links starting with into A direct link to a units vignette on CRAN had to be removed.
  • some of the tests gave very different output, because my default testing platforms (laptop, travis) have PostGIS, and CRAN machines don’t; I changed this so that testing without PostGIS (CRAN) is now most silent
  • the tests still output differences in GDAL and GEOS versions, but that was considered OK.

That was it! The good message

Thanks, on CRAN now. Best -k

arrived! Party time! Too early. In the evening (CRAN never sleeps) an email arrived, mentioning:

This worked for my incoming checks, but just failed for the regular checks, with Error in loadNamespace(name) : there is no package called 'roxygen2' Calls: :: ... tryCatch -> tryCatchList -> tryCatchOne -> <Anonymous> Execution halted For some reason roxygen2 is not working. Is it installed? ERROR: configuration failed for package ‘sf’

with lots of helpful hints. Indeed, my package generated Rcpp files and manual pages dynamically during install; this requires Rcpp and roxygen2 to be available unconditionally and they aren’t.

So I sat down and worked on 0.2-1, to address this. Before I could do that, an email from Oxford (Brian Ripley) arrived, telling me that sf had caused some excitement in the multi-flavor server farm:

Here, it should be noted (again) that the only two NOTEs were due to the excellent work of Jeroen Ooms who compiled GDAL and many other libraries for rwinlib, and prepared sf for downloading and using them. The rest was my contribution.

In addition, an issue was raised by Dirk Eddelbuettel, telling me that his Rcpp reverse check farm had noted him that sf required GDAL 2.0 or later, but not by properly checking its version but by generating plane compile errors. The horror, the horror.

sf 0.2-1: roxygen, Rcpp, SQLITE on Solaris

sf 0.2-1 tried to address the Rcpp and roxygen2 problems: I took their generation out of the configure and scripts. I added all automatically derived files to the github repo, to get everything in sync. Worked:

Thanks, on CRAN now. [Tonight I'll know for sure ...] Best -k

… and no emails in the evening.

Also, the errors on Solaris platforms were caused by the SQLITE library not being present, hence GeoPackage not being available as a GDAL driver. As a consequence, I had to set back the examples reading a GeoPackage polygons file to one where a shapefile is read. Bummer.

Another issue was related to relying on GDAL 2.1 features without testing for it; this was rather easily solved by conditional compiling.

This gave:

meaning SOME improvement, but where do these UBSAN reports suddenly come from!?

sf 0.2-2: byte swapping, memory alignment

Over-optimistically as I am, I had commented that life is too short to do byte swapping when reading WKB. Welcome to the CRAN server farm. Solaris-Spark is big-endian. Although the R code reading WKB does read non-native endian, this would have required to rewrite all tests, so I added byte swapping to the C++ code, using a helpful SO post.

The UBSAN issues all listed something like:

UBSAN: don't assume that pointers can point anywhere to, in valid memory; wkb.cpp:185:39: runtime error: load of misaligned address 0x6150001d0e29 for type 'uint32_t', which requires 4 byte alignment 0x6150001d0e29: note: pointer points here 00 00 00 01 06 00 00 00 01 00 00 00 01 03 00 00 00 01 00 00 00 1b 00 00 00 00 00 00 a0 41 5e 54

Sounds scary? What I had done was for every coordinate use a double pointer, pointing it to the right place in the WKB byte stream, then copy its value, and move it 8 bytes. I love this *d++ expression. But you can’t do this anymore! Although the code worked on my machines, you can’t put a double pointer to any location and assume it’ll work everywhere. The solution was to memcpy the relevant bytes to a double value on the stack, and copy that into the Rcpp::NumericVector.

To be done:

All these changes have brought me here:

where you see that linux and Windows compile (all NOTEs indicate that the library is too large, which is unavoidable with GDAL) and that errors are Mac related:

r-devel-macos-x86_64-clang ** testing if installed package can be loaded Error in dyn.load(file, DLLpath = DLLpath, ...) : unable to load shared object '/Users/ripley/R/packages/tests-devel/sf.Rcheck/sf/libs/': dlopen(/Users/ripley/R/packages/tests-devel/sf.Rcheck/sf/libs/, 6): Symbol not found: _H5T_NATIVE_DOUBLE_g Referenced from: /Users/ripley/R/packages/tests-devel/sf.Rcheck/sf/libs/ Expected in: flat namespace in /Users/ripley/R/packages/tests-devel/sf.Rcheck/sf/libs/

which indicates a problem with GDAL not linking to the HDF5 library (unsolved).

The second:

r-release-osx-x86_64-mavericks checking gdal-config usability... yes configure: GDAL: 1.11.4 checking GDAL version >= 2.0.0... yes

indicates that

  • this platform still runs GDAL 1.x, so needs to be upgraded and that
  • my check for GDAL version present on the system still does not work!
CRAN flavors

CRAN flavors is a great asset that teaches you problems of all kinds at an early stage. Without it, users would have run at some stage into problems that are now caught up front. Thanks to the tremendous effort of the CRAN team!!

UPDATE, Dec 21, 2016
  • Thanks to Roger Bivand, Simon Urbanek and Brian Ripley for constructive help, the MacOSX-mavericks binary build is now on CRAN, the issue is described here