Subscribe to R-spatial feed
R Spatial software blogs and ideas
Updated: 21 hours 44 min ago

Geocomputation with R: workshop at eRum

Thu, 05/31/2018 - 02:00

RSAGA 1.0.0

Sun, 05/20/2018 - 02:00

Plotting and subsetting stars objects

Thu, 03/22/2018 - 01:00

RSAGA 1.0.0

Tue, 03/20/2018 - 01:00

sf 0.6-0 news

Mon, 01/08/2018 - 01:00

Spatiotemporal arrays for R - blog one

Thu, 11/23/2017 - 01:00

Tidy storm trajectories

Mon, 08/28/2017 - 02:00

Setting up large scale OSM environments for R using Osmosis and PostgreSQL with PostGIS

Fri, 07/14/2017 - 02:00

Importing OpenStreetMap (OSM) data into R can sometimes be rather difficult, especially when it comes to processing large datasets. There are some packages that aim at easy integration of OSM data, including the very versatile osmar package, that allows you to scrape data directly in R via the OSM API. Packages like osmar,rgdal or sf also offer build-in functions to read the spatial data formats that OSM data comes along with.

However, these packages reach their limits when it comes to larger datasets and running the programmes on weak machines. I want to introduce an easy way to set up an environment to process large OSM datasets in R, using the Java application Osmosis and the open-source database PostgreSQL with the PostGIS extension for spatial data.

This tutorial was created using a Asus Zenbook UX32LA with a i5-4200U CPU, 8 GB RAM and a 250 GB SSD, running on Windows

  1. The data used has a size of 1.9 GB (unzipped). Under this setting, OSM data import using osmar, rgdal and sf takes up several hours, if not days, especially if you want to continue using your system. The following steps thus show a way to set up larger spatial data environments using the PostgreSQL database scheme and how to easily import and set up this data in R.
Getting OSM data

The place to extract large OSM datasets is the file dump Planet.osm, which can be found here: https://wiki.openstreetmap.org/wiki/Planet.osm

Here, we can download all available OSM data or search for extracts from our area of interest. I am interested in downloading the most recent OSM data for Greater London, which for instance is provided by Geofabrik. This archive offers OSM data for predefined layers like countries, states or urban areas. The London data that I will be using in this tutorial can be found here: http://download.geofabrik.de/europe/great-britain/england/greater-london.html

I download the file greater-london-latest.osm.pbf, conatining the complete dataset for the Greater London area. Note that this file is updated regularly.

Setting up PostgreSQL and PostGIS

We now need to download and install PostgreSQL with the PostGIS extension. A detailed explanation on how to install PostgreSQL can be found here: https://wiki.postgresql.org/wiki/Detailed_installation_guides

Make sure you note username, password and the port you use for the installation. After PostgreSQL is installed on the system, PostGIS can be added as described here: https://wiki.openstreetmap.org/wiki/PostGIS/Installation

Now, open pgAdmin to set up your database. We can create new databases by clicking on Object – Create – Database, inserting a name of your choice, e.g. London OSM.

My new database London OSM is now in place and can be prepared for data import. We have to create two extensions to our database, using a SQL script. We navigate into the new database and open the script command line by clicking on Object – CREATE Script and execute two commands:

  • CREATE EXTENSION postgis;
  • CREATE EXTENSION hstore;

These extensions should now show up when openng the Extensions path in our London OSM database.

Setting up Osmosis and importing the data

The tool connecting our dataset with PostgreSQL is called Osmosis. It is a command line Java application and can be used to read, write and manipulate OSM data. The latest stable version including detailed installation information for different OS can be found here: https://wiki.openstreetmap.org/wiki/Osmosis (Note that Osmosis requires the Java Runtime Environment, which can be downloaded at https://java.com/download/)

If you are using Windows, you can navigate into the Osmosis installation folder, e.g. C:\Program Files (x86)\osmosis\bin\, and open osmosis.bat. Double clicking this file opens the Osmosis command line. To keep the Osmosis window open, create a shortcut to the osmosis.bat file, open its properties and add cmd /k at the beginning of the target in the shortcut tab. The Osmosis output should look like this:

We now have to prepare our PostgreSQL database for the OSM data import (courtesy of Stackexchange user zehpunktbarron). Navigate back into pgAdmin and the OSM London database and create a new script via Object – CREATE Script. Now, execute the SQL code that you find in two of the files that you created when installing Osmosis. First execute the code from [PATH_TO_OSMOSIS]\script\pgsnapshot_schema_0.6.sql and afterwards the code from [PATH_TO_OSMOSIS]\script\pgsnapshot_schema_0.6_linestring.sql.

Now, add indices to the database to better process the data. Execute the following SQL commands in the script:

  • CREATE INDEX idx_nodes_tags ON nodes USING GIN(tags);
  • CREATE INDEX idx_ways_tags ON ways USING GIN(tags);
  • CREATE INDEX idx_relations_tags ON relations USING GIN(tags);

We have now successfully prepared our database for the OSM import. Open Osmosis and run the following command to import the previously downloaded .pbf file:

"[PATH_TO_OSMOSIS]\bin\osmosis" --read-pbf file="[PATH_TO_OSM_FILE]\greater-london-latest.osm.pbf" --write-pgsql host="localhost" database="London OSM" user="YOUR_USERNAME" password="YOUR_PASSWORD"

Note that if the .pbf file is larger, this process might take a while – also depending on the specs of your system. If the data import was successful, this should give you an output that looks like this:

Accessing PostgreSQL databases in R

Our freshly imported database is now ready to be accessed via R. Connecting to the PostgreSQL database requires the R package RPostgreSQL. First, we load the PostgreSQL driver and connect to the database using our credentials:

require(RPostgreSQL) # LOAD POSTGRESQL DRIVER driver <- dbDriver("PostgreSQL") # CREATE CONNECTION TO THE POSTGRESQL DATABASE # THE CONNECTION VARIABLE WILL BE USED FOR ALL FURTHER OPERATIONS connection <- dbConnect(driver, dbname = "London OSM", host = "localhost", port = 5432, user = "YOUR_USERNAME", password = "YOUR_PASSWORD")

We can now check, whether we have successfully established a connection to our database using a simple command:

dbExistsTable(connection, "lines") ## [1] TRUE

We have now set up the environment to load OSM data into R flawlessly. Note that queries using RPostgreSQL are written in the SQL syntax. Further information on the use of the RPostgreSQL package can be found here: https://www.r-bloggers.com/using-postgresql-in-r-a-quick-how-to/

Creating spatial data frames in R

In the last step of this tutorial we will explore how to put the accessed data to work and how to properly establish the geographical reference. We first load data into the R environment, using a RPostgreSQL query. The following query creates a data.frame with all available OSM point data. We use the PostGIS command ST_AsText on the wkb_geometry column to return the Well Known Text (WKT) geometries and save it in the newly created column geom. After that, we delete the now redundant wkb_geometry column.

#LOAD POINT DATA FROM OSM DATABASE points <- dbGetQuery(connection, "SELECT * , ST_AsText(wkb_geometry) AS geom from points") points$wkb_geometry <- NULL

The points data frame contains all available OSM point data, including the several different tagging schemes, which can be further explored looking at OSMs’ map features: https://wiki.openstreetmap.org/wiki/Map_Features

head(points) ## ogc_fid osm_id name barrier highway ## 1 1 1 Prime Meridian of the World <NA> <NA> ## 2 2 99941 <NA> lift_gate <NA> ## 3 3 101831 <NA> <NA> crossing ## 4 4 101833 <NA> <NA> crossing ## 5 5 101839 <NA> <NA> traffic_signals ## 6 6 101843 <NA> <NA> traffic_signals ## ref address is_in place man_made ## 1 <NA> <NA> <NA> <NA> <NA> ## 2 <NA> <NA> <NA> <NA> <NA> ## 3 <NA> <NA> <NA> <NA> <NA> ## 4 <NA> <NA> <NA> <NA> <NA> ## 5 <NA> <NA> <NA> <NA> <NA> ## 6 <NA> <NA> <NA> <NA> <NA> ## other_tags ## 1 "historic"=>"memorial","memorial"=>"stone" ## 2 <NA> ## 3 "crossing"=>"traffic_signals","crossing_ref"=>"pelican" ## 4 "crossing"=>"island" ## 5 <NA> ## 6 <NA> ## geom ## 1 POINT(-0.0014863 51.4779481) ## 2 POINT(-0.1553793 51.5231639) ## 3 POINT(-0.1470438 51.5356116) ## 4 POINT(-0.1588224 51.5350894) ## 5 POINT(-0.1526586 51.5375096) ## 6 POINT(-0.163653 51.534922)

Now, to get the geometry working, we can transform the data frame into a spatial data frame using the sf package. Note that I have to set a coordinate reference system (CRS), in this case the WGS84 projection:

#SET COORDINATE SYSTEM require(sf) points <- st_as_sf(points, wkt="geom") %>% `st_crs<-`(4326)

We can now scrape our dataset for the data we are looking for, e.g. all bicycle parking spots (see https://wiki.openstreetmap.org/wiki/Tag:amenity%3Dbicycle_parking. Since sf data is stored in spatial data frames, we can easily create a subset containing our desired points - e.g. using the filter function from the dplyr package and str_detect from stringr:

require(dplyr) require(stringr) #EXTRACTING ALL POINTS TAGGED 'BUS_STOP' bikepark <- points %>% filter(str_detect(other_tags, "bicycle_parking"))

Additionally to all bike parking spots, we also want to include all explicitly marked bicycle routes, as found in the lines data. These can be extracted from OSM relation data via the cycleway tag: https://wiki.openstreetmap.org/wiki/Key:cycleway

We can contrast cycleways from the regular road network by also selecting the most common road types (see https://wiki.openstreetmap.org/wiki/Key:highway). Note that after our final PostgreSQL query, we close the connection using the dbDisconnect command in order to not overload the driver.

#LOAD RELATION DATA lines <- dbGetQuery(connection, "SELECT * , ST_AsText(wkb_geometry) AS geom from lines") lines$wkb_geometry <- NULL dbDisconnect(connection) ## [1] TRUE #SET CRS lines <- st_as_sf(lines, wkt="geom") %>% `st_crs<-`(4326) #SUBSET CYCLEWAYS cycleways <- lines %>% filter(highway=="cycleway") #SUBSET OTHER STREETS streets <- lines %>% filter(highway=="motorway" | highway=="trunk" | highway=="primary" | highway=="secondary" | highway=="tertiary")

Having created subsets for bicycle parking spots, cycleways and regular roads. We finally plot our data using ggplot2 and the geom_sf function:

require(ggplot2) #PLOTTING ALL BUS STOPS ggplot(bikepark) + geom_sf(data=streets,aes(colour="lightgrey")) + #Requires development version of ggplot2: devtools::install_github("tidyverse/ggplot2") geom_sf(data=cycleways,aes(colour="turquoise")) + geom_sf(data=bikepark,aes(colour="turquoise4"),shape=".") + coord_sf(crs = st_crs(bikepark)) + ggtitle("Biking infrastructure (parking + cycleways) in London") + scale_colour_manual("",values = c("lightgrey","turquoise","turquoise4"),labels=c("Other Roads","Cycleways","Bike Parking")) + theme_void() + theme(legend.position="bottom")

UseR! 2017 Spatial Tutorials

Thu, 06/29/2017 - 02:00

This year’s UseR! (next week Tuesday!!) will have two spatial tutorials:

1. Geospatial visualization using R (morning)

Bashkar Karambelkar posted earlier that attendees should download his docker image by Jul 1, by

docker pull bhaskarvk/rgeodataviz 2. Spatial data in R: new directions (afternoon)

My tutorial deals mostly with sf, the material (html, R markdown) is found at here: https://edzer.github.io/UseR2017/

I’m looking forward to it!

Related posts:

UseR! 2016 tutorial

Spatial indexes coming to sf

Thu, 06/22/2017 - 02:00

Spatial indexes give you fast results on spatial queries, such as finding whether pairs of geometries intersect or touch, or finding their intersection. They reduce the time to get results from quadratic in the number of geometries to linear in the number of geometries. A recent commit brings spatial indexes to sf for the binary logical predicates (intersects, touches, crosses, within, contains, contains_properly, overlaps, covers, covered_by), as well as the binary predicates that yield geometries (intersection, union, difference, sym_difference).

The spatial join function st_join using a logical predicate to join features, and aggregate or summarise using union to union aggregated feature geometries, are also affected by this speedup.

Antecedents

There have been attempts to use spatial planar indices, including enhancement issue sfr:76. In rgeos, GEOS STRtrees were used in rgeos/src/rgeos_poly2nb.c, which is mirrored in a modern Rcpp setting sf/src/geos.cpp, around lines 276 and 551. The STRtree is constructed by building envelopes (bounding boxes) of input entities, which are then queried for intersection with envelopes of another set of entities (in rgeos, R functions gUnarySTRtreeQuery and gBinarySTRtreeQuery). The use case was to find neighbours of all the about 90,000 US Census entities in Los Angeles, via spdep::poly2nb(), which received an argument to enter the candidate neighbours found by Unary querying the STRtree of entities by the same entities.

Benchmark

A simple benchmark shows the obvious: st_intersects without spatial index behaves quadratic in the number of geometries (black line), and is much faster for the case where a spatial index is created, stronger so for larger number of polygons:

The polygon datasets used are simple checker boards with square polygons (showing a nice Moiré pattern):

The black small square polygons are essentially matched to the red ones; the number of polygons along the x axis is the number of a single geometry set (black).

To show that the behaviour of intersects and intersection is indeed linear in the number of polygons, we show runtimes for both, as a function of the number of polygons (where intersection was divided by 10 for scaling purposes):

Implementation

Spatial indexes are available in the GEOS library used by sf, through the functions starting with STRtree. The algorithm implements a Sort-Tile-Recursive R-tree, according to the JTS documentation described in P. Rigaux, Michel Scholl and Agnes Voisard. Spatial Databases With Application To GIS. Morgan Kaufmann, San Francisco, 2002.

The sf implementation (some commits to follow this one) excludes some binary operations. st_distance, st_relate, and st_relate_pattern, as these all need to go through all combinations, rather than a subset found by checking for overlapping bounding boxes. st_equals_exact and st_equals are excluded because they do not have an implementation for prepared geometries. st_disjoint could benefit from the search tree, but needs a dedicated own implementation.

On which argument is an index built?

The R-tree is built on the first argument (x), and used to match all geometries over the second argument (y) of binary functions. This could give runtime differences, but for instance for the dataset that triggered this development in sfr:394, we see hardly any difference:

library(sf) # Linking to GEOS 3.5.1, GDAL 2.1.3, proj.4 4.9.2, lwgeom 2.3.2 r15302 load("test_intersection.Rdata") nrow(test) # [1] 16398 nrow(hsg2) # [1] 6869 system.time(int1 <- st_intersection(test, hsg2)) # user system elapsed # 105.712 0.040 105.758 system.time(int2 <- st_intersection(hsg2, test)) # user system elapsed # 107.756 0.060 107.822 # Warning messages: # 1: attribute variables are assumed to be spatially constant throughout all geometries # 2: attribute variables are assumed to be spatially constant throughout all geometries

The resulting feature sets int1 and int2 are identical, only the order of the features (records) and of the attribute columns (variables) differs. (Runtime without index is 35 minutes, 20 times as long.)

Is the spatial index always built?

In the current implemenation it is always built of logical predicates when argument prepare = TRUE, which means by default. This made it easier to run benchmarks, and I strongly doubt anyone ever sets prepare = FALSE. This may change, to have them always built.

It would be nice to also have them on st_relate and st_relate_pattern, e.g. for rook or queen neighborhood selections (sfr:234), but this still requires some work, since two non-intersecting geometries have a predictable but not a constant relationship.

What about prepared geometries?

Prepared geometries in GEOS are essentially indexes over single geometries and not over sets of geometries; they speed things up in particular when single geometries are very complex, and only for a single geometry to single geometry comparison. The spatial indexes are indexes over collections of geometries; they make a cheap preselection based on bounding boxes before the expensive pairwise comparison takes place.

Script used

The followinig script was used to create the benchmark plots.

library(sf) sizes = c(10, 20, 50, 100, 160, 200) res = matrix(NA, length(sizes), 4) for (i in seq_along(sizes)) { g1 = st_make_grid(st_polygon(list(rbind(c(0,0),c(0,1),c(1,1),c(0,1),c(0,0)))), n = sizes[i]) * sizes[i] g2 = g1 + c(.5,.5) res[i, 1] = system.time(i1 <- st_intersects(g1, g2))[1] res[i, 2] = system.time(i2 <- st_intersects(g1, g2, prepare = FALSE))[1] res[i, 3] = system.time(i1 <- st_intersection(g1, g2))[1] res[i, 4] = identical(i1, i2) } plot(sizes^2, res[,2], type = 'b', ylab = 'time [s]', xlab = '# of polygons') lines(sizes^2, res[,1], type = 'b', col = 'red') legend("topleft", lty = c(1,1), col = c(1,2), legend = c("st_intersects without index", "st_intersects with spatial index")) plot(sizes^2, res[,3]/10, type = 'b', ylab = 'time [s]', xlab = '# of polygons') lines(sizes^2, res[,1], type = 'b', col = 'red') legend("topleft", lty = c(1,1), col = c(1,2), legend = c("st_intersection * 0.1", "st_intersects"))

mapedit - updates in 0.2.0

Fri, 06/09/2017 - 02:00

[view raw Rmd]

mapedit has progressed substantially since the introduction to mapedit post. mapedit 0.2.0 offers improvements and incorporates changes based on much appreciated feedback from the R geospatial community. mapedit is still in rapid development, but the API is stabilizing. We are targeting a CRAN release prior to useR 2017, and Tim Appelhans will demonstrate mapedit in his useR talk.

In this post, we will highlight some of the recent improvements and changes to mapedit. These updates can be categorized as

  1. better integration with [simple features(https://github.com/edzer/sfr)] and

  2. addition of Shiny modules.

Install

We are moving quickly, so please install the development versions of mapview, leaflet.extras, and mapview as shown below.

devtools::install_github("r-spatial/mapview@develop") devtools::install_github("bhaskarvk/leaflet.extras") devtools::install_github("r-spatial/mapedit") Simple Features

The R geo community is radidly embracing the RConsortium-sponsored sf package, and mapedit plans to fully adopt and incorporate simple features like leaflet, mapview, geojsonio, plotly, and ggplot2. sf can greatly improve geospatial workflows in R. mapedit now returns simple features by default with editMap() and includes a new function selectFeatures() for interactive selection of simple features. Let’s take a quick look at this new functionality.

editMap returns sf

editMap() looks the same, but the output is very different.

library(mapedit) library(mapview) library(sf) crud <- editMap(mapview())

Now, since the return value is simple features and mapview added addFeatures(), we can see the drawn features with a one-liner. This collaboration greatly increases the efficiency of the editing workflow.

mapview(crud$finished)

selectFeatures makes selecting features easy

Let’s use the sf example with North Carolina county data to give us some simple features to select with the new selectFeatures().

library(mapview) library(mapedit) library(sf) nc <- st_read(system.file("shape/nc.shp", package="sf")) selected <- selectFeatures(nc)

As before, we can now take advantage of mapview to plot our selection.

mapview(selected)

We also changed the underlying selectMap() function to use the RStudio Viewer by default. This allows us to include selectMap() in a workflow or pipeline.

library(mapview) library(mapedit) library(sf) nc <- st_read(system.file("shape/nc.shp", package="sf")) selectFeatures(nc) %>% st_union() %>% mapview()

Stay tuned for an editFeatures() equivalent.

Shiny Modules

The original editMap() and selectMap() provided useful functionality. However, they are limited to standalone application. For even greater integration in an interactive geospatial workflow, Shiny modules allow a user to incorporate edit and select in a broader application context. Let’s see a couple examples of this concept.

Select as Shiny Module

In this example, we will demonstrate analysis of the quakes data in R along with some helpful sf. The app will build a grid for selection of quakes and then plot the selection with a comparative density plot.

First we will convert the quakes to simple features.

library(sf) # make the coordinates a numeric matrix qk_mx <- data.matrix(quakes[,2:1]) # convert the coordinates to a multipoint feature qk_mp <- st_multipoint(qk_mx) # convert the multipoint feature to sf qk_sf <- st_sf(st_cast(st_sfc(qk_mp), "POINT"), quakes, crs=4326)

Now let’s use the very helpful sf::st_make_grid() function, and then filter the grid to only those that contain quakes points.

# make a grid grd <- st_set_crs(st_make_grid(qk_sf), 4326) # only keep grid polygons that contain at least one quake point grd <- grd[which(sapply(st_contains(st_sf(grd), qk_sf),length)>0)]

With our grid, we can build a Shiny app for some interactive analysis of quake magnitude.

library(mapview) library(mapedit) library(shiny) ui <- fluidPage( fluidRow( column( 6, h3("Select Grid"), # our new select module ui selectModUI("selectmap") ), column( 6, h3("Selected Quakes"), plotOutput("selectplot") ) ), fluidRow( h3("Magnitude Distribution of Selected Quakes"), plotOutput("quakestat", height=200) ) ) server <- function(input, output, session) { # our new select module g_sel <- callModule( selectMod, "selectmap", leaflet() %>% addTiles() %>% addFeatures(st_sf(grd), layerId = ~seq_len(length(grd))) ) rv <- reactiveValues(intersect=NULL, selectgrid=NULL) observe({ # the select module returns a reactive # so let's use it to find the intersection # of selected grid with quakes points gs <- g_sel() rv$selectgrid <- st_sf( grd[as.numeric(gs[which(gs$selected==TRUE),"id"])] ) if(length(rv$selectgrid) > 0) { rv$intersect <- st_intersection(rv$selectgrid, qk_sf) } else { rv$intersect <- NULL } }) output$selectplot <- renderPlot({ plot(qk_mp, col="gray") if(!is.null(rv$intersect)) { plot(rv$intersect, pch=19, col="black", add=TRUE) } plot(st_union(rv$selectgrid), add=TRUE) }) output$quakestat <- renderPlot({ plot( stats::density(qk_sf$mag), col="gray30", ylim=c(0,1.2), main = NA ) if(!is.null(rv$intersect) && nrow(rv$intersect) > 0) { lines(stats::density(rv$intersect$mag), col="red", lwd=2) } }) } shinyApp(ui, server)

Edit as Shiny Module

Since we have the quake data, we will use it to show the edit module in a simple application. Instead of the grid, let’s draw polygons to select quakes.

# run select demo for the quake data # we will need the qk_sf # to test # plot(qk_sf) library(mapedit) library(mapview) library(shiny) ui <- fluidPage( fluidRow( # edit module ui column(6, editModUI("editor")), column( 6, h3("Boxplot of Depth"), plotOutput("selectstat") ) ) ) server <- function(input, output, session) { # edit module returns sf edits <- callModule(editMod, "editor", mapview(qk_sf)@map) output$selectstat <- renderPlot({ req(edits()$finished) qk_intersect <- st_intersection(edits()$finished, qk_sf) req(nrow(qk_intersect) > 0) boxplot( list( all = as.numeric(qk_sf$depth), selected = as.numeric(qk_intersect$depth) ), xlab = "depth" ) }) } shinyApp(ui, server)

Next Steps

The progress made thus far depended entirely on feedback received. Please help us by providing feedback, ideas, and use cases. As mentioned earlier, we aim for an initial CRAN release before useR 2017 on July 4, 2017. We do not anticipate any breaking API changes before release. Rather, we plan to spend time on documentation, examples, and tests.

RConsortium

mapedit and many of its dependency packages are funded by the RConsortium. Thanks so much to all those who have contributed to this fantastic organization. Also, thanks to all those open source contributors in the R community.

First openEO hackaton report

Wed, 04/19/2017 - 02:00
Introduction

As announced earlier,

First #OpenEO hackaton will be held @ ifgi on Mar 23-24. Let me know if you’re interested to participate! https://t.co/v6cr7cnasQ

— Edzer Pebesma ((???)) January 20, 2017

the first openEO hackaton was held a few weeks ago. A short report follows.

OpenEO

OpenEO is an initiative to define an open API for accessing cloud-based Earth observation data processing engines, explained in this blog post.

Five participants took part in the hackaton: Christoph Paulik from the TU Wien, and the following people from from ifgi: Meng Lu, Daniel Nuest, Marius Appel and me.

The EODC had been so kind to set us up with access to a virtual machine in their data center, and this illustrates the problem with remote sensing data is that there are many and they are big. Part of the archive we looked at was found here:

xxxxxxxx@openeo1:~$ df -h /eodc/products/ Filesystem Size Used Avail Use% Mounted on XX.XXX.XX.XXX:/products 1.4P 1.3P 159T 89% /eodc/products

How big is 1.3 petabyte? I often try to relate to the latest fixed-size digital medium we used, the CD-R:

How many? Mmm.

1.3 * 1024^5 / (700 * 1024^2) ## [1] 1994092

2 Million: a stack of 2.4 km, or 10 km when we put them in a case.

APIs

We first talked quite a while about what RESTful APIs are, how they work, their verbs (GET, PUT, POST, etc), resources, what microservices are, and how they can be standardised e.g. following the Open API initiative. Then, we tried to define a simple challenge that uses REST, and uses Earth observation data.

Challenge

In the challenge, We didn’t want to work with the large amounts of data straight away. Instead, we tried to a challenge as simple as possible, namely adding two bands from a particular Sentinel-2 scene, and returning the result as a GeoTIFF.

We worked in three teams:

  • team 1 worked on a JavaScript solution, with the imagery in a SciDB backend
  • team 2 worked on a Python solution,
  • team 3 worked on a pure R solution.

After around 3-4 hours, all teams had managed to get a RESTfull service doing this:

  • team 1 had most man power, but also most work to do; the solution was in the end the fastest, because the SciDB backend is highly scalable, using 24 cores or so

  • team 2, Christoph’s Python solution uses flask for web requests, gdal from osgeo to read data, and numpy to add to images–the resulting image was not georeferenced.

  • team 3 (Edzer) used the plumber R package to set up web services, and rgdal and sp to read and write raster maps. Solution found here.

Road ahead

We discussed quite at length what it will take to realize the OpenEO ambition. Adding two bands is easy, handling large collections of scenes is not. My respect for what the people from Google Earth Engine have realized has grown, their award from ASPRS is more than deserved. Nevertheless, it is isolated, closed source, practically impossible to verify.

We also drafted service calls, discussed coordinate reference systems of tile/scene collections, and looked at covjson. And one of the most attractive ideas (for some of us): to submit your Python or R function to the imagery server, and have it work, in parallel, over the scenes, returning the requested summary.

Certain terms (scene, tile, pixel) reoccurred many times, seemingly stretching across the different used back-ends but there were slight differences - a huge challenge for an API intending to be useful to many. Discussing the API design, we struggled with the user perspective: is the user the analyst, or the developer? Who do we accommodate with the used terms?

Reproducible assignments with R-markdown

Thu, 04/13/2017 - 02:00
Introduction

I tweeted earlier Today about

Evaluating 22 assignments of my MSc course that students handed in as html, and Rmd + raw data. I can reproduce each of them!

— Edzer Pebesma (@edzerpebesma) April 13, 2017

for my course Analysis of Spatio-Temporal Data in our MSc program Geoinformatics. Barry kindly answered:

@edzerpebesma It would be nice to see a write-up of how you get students to this level of competence.

— Barry Rowlingson (@geospacedman) April 13, 2017

Well, here you go!

The course

The course has a a goal to get familiar with what spatio-temporal data is, and how you can analyse it, with emphasis on using R. Analysis methods focus on spatial statistical methods (point patterns, geostatistics, lattice data) and analysis of movement data. The course consists of lectures and exercise meetings; for exercise meetings students were supposed to hand in short assignments. The larger final assignment is individual, and is like a small, reproducible research paper and forms the basis for the final grade.

Instead of (or: in addition to) working with pre-cooked examples that use data from a particular package, loaded by data(dataset), I encouraged students from the beginning to look for datasets themselves, and import them in R. This way they learn, and learn from one another

  • how datasets look like in the wild
  • how they are imported in R
  • the hoops they have to go through to get time or date variables right, e.g. by using as.POSIXct or strptime and
  • troubles that coordinates may give, e.g. their projections or degrees-minutes-seconds notation.

as well as challenges related to applying methods:

  • time series analysis does not work if you have very short time series
  • periodicities are often absent in yearly data, but present in sub-yearly (daily/weekly/monthly)
  • spatial correlation is hard to assess with a handful of observations
  • not all point-referenced datasets are meaningful input for point pattern analysis, or for geostatistical analysis
  • most point pattern software assumes Carthesian coordinates, but does not check for this
Reproducing

I’ve asked students to hand in assignments as R markdown files so that I can not only see what they did, but also redo what they did. I’ve asked them specifically to

  • work in a separate directory for each assignment, using lastname.assignmentNumber as directory name
  • include the R markdown file, data sets, and the output (html or pdf)
  • not set paths in the R markdown files
  • not use absolute paths to data files
  • not use install.packages in their R markdown file
  • zip this directory and submit it to the moodle system

I’ve also demonstrated for the whole group how things work when I reproduce this, and showed them where things go wrong. For reproducing, I have to

  • download the zip, and unzip it
  • go into the directory
  • double-click the R-markdown file, so rstudio starts in the right directory
  • click Knit to run the R markdown file and create the output html

For the final assignment, several students ended up with data sets larger than the maximum allowed upload size; they then gave me a link to a download link. Several students saved the result of a long computation to a .RData file, uncommented the long running section, and loaded the .RData instead, to limit run-time while writing the assignment. They warned me in case run times was long.

If anyone can think of a simpler workflow (for me, or for the students) I’d very much like to hear it!

Spring School on “Statistical analysis of hyperspectral and high-dimensional remote-sensing data using R”: a report

Sat, 03/25/2017 - 01:00

The Spring School on “Statistical analysis of hyperspectral and high-dimensional remote-sensing data using R”, held at the University of Jena, March 13-17, 2017, was organized by the GIScience group, led by Prof. Alexander Brenning and two researchers from the GIScience research group, Patrick Schratz and Dr. Jannes Münchow.

The school brought together a diverse group of 28 researchers (e.g. geoscientists, forestry, environmental studies) at different scientific levels (graduate students, PhD, postdoc, professor) from all over the world as far as Chile, Peru, Turkey, and Bosnia & Herzegowina. Overall, eight german and 16 non-german participants (20 male, 8 female) took part in this event. During five days the participants were introduced to the theoretical background of hyperspectral remote sensing data and learned in numerous hands-on sessions how to analyse and illustrate spatial data in R. The Spring School was organized within the LIFE Healthy Forest project and supported by the Michael Stifel Center Jena.

In this short blog-post I will give a quick overview of the many, many things we learned during this intense “spatial stats-and-R-week”.

Participants and organizers of the Spring School on “Statistical analysis of hyperspectral and high-dimensional remote-sensing data using R” in Jena, © H. Petschko

Day 1

On the first day of the summer school the participants obtained a theoretical introduction to hyperspectral remote-sensing data with examples focusing on the application of hyperspectral data in forest research.
Marco Peña from the Alberto Hurtado University in Chile gave a lecture on “Introduction to hyperspectral remote sensing” which brought everyone to the same level.
This very comprehensive introduction was followed by a talk on hyperspectral applications exemplified on a study on forests in the Bialowieza Forest in eastern Poland by Aneta Modzelewska from the Forest Research Institute in Raszyn.
The last talk on the first day was by Dr. Henning Buddenbaum (University of Trier) on “Hyperspectral remote sensing for measuring biochemical leaf parameters in forests”.
Dr. Buddenbaum is involved in the Science Advisory Group – Forests and Natural Ecosystems in the EnMAP mission, a German hyperspectral satellite mission aiming at monitoring and characterising the Earth’s environment globally.

Lecture by Prof. A. Brenning on “Statistical and machine learning in remote sensing”, © H. Petschko

Day 2

The second day was filled with hands-on R sessions. In a first session by Patrick Schratz we learned about his “must know” features of R, namely Rmarkdown, the apply-family and pipes.
This was followed by two session focusing on the usage of R as a GIS. Dr. Jannes Münchow, who developed the package RQGIS, an interface between R and QGIS which allows the user to access QGIS algorithms from within R.
Afterwards we were introduced to the R package mapview, by its author, Dr. Tim Appelhans. Mapview is a GIS-like interactive graphing tool that is directly accessible within RStudio (or the web browser, if you are not using RStudio). It is especially helpful if you want to quickly do a visual check whether a certain analysis has produced reasonable results.

Solving R-problems with Dr. Jannes Münchow, © H. Petschko

Day 3

The third day started with a lecture and hands-on session on “Statistical and machine learning in remote sensing” by Prof. Alexander Brenning with a focus on linear discriminant analysis, support vector machine and random forest. A short overview of these statistical modeling methods and the application in R including a comprehensive tutorial can be found here.
In the afternoon, Dr. Thomas Bocklitz presented a very different perspective in the application of spectral data analysis in histopathology. Afterwards, the participants had a chance to discuss their own research involving spatial modeling techniques or R-problem with the group and the experts from the GIScience group in Jena.

Open session during the Day3 of the Spring School to discuss research projects of the participants, © H. Petschko

Day 4

On the fourth day, Partick Schratz briefly introduced the hsdar package developed by Dr. Lukas Lehnert from University of Marburg. It can be used for processing and analysis on hyperspectral data in R.
Prof. Brenning focused in his second session further on the assessment of model accuracy (non-spatial and spatial validation methods, variable importance) using the sperrorest package and dealing with high dimensionality in linear regression.

Discussing sampling designs with Prof. A. Brenning, © H. Petschko

Introduction to parallel processing in R with Patrick Schratz, © H. Petschko

Day 5 (Thuringian Forest excursion)

On the last day, we visited a monitoring site and a site with tornado damage (see images below) from 2016 in the Thuringian Forest together with three experts from the official authority “ThüringenForst”.
In conclusion, the Spring School was a great event with many fruitful hands-on R-sessions during which the participants could learn helpful tricks in R, how to use R as a GIS and about statistical and machine learning in R. Hopefully there will be more academic “schools” like this one to follow in the future (maybe even with a thematic focus on geomorphology or natural hazards).

Tornado damage in the Thuringian Forest from September 2016 © P. Schratz

Field trip to the Thuringian Forest, © X. Tagle

Pages