Subscribe to R-spatial feed
R Spatial software blogs and ideas 2018-10-10T16:36:39+00:00 Jekyll v3.8.4 https://avatars1.githubusercontent.com/u/25086656
Updated: 34 min 24 sec ago

Automatic units in axis labels

Thu, 09/29/2016 - 02:00

This blog post concerns the development version of units, installed by

devtools::install_github("edzer/units")

[view raw Rmd]

Have you ever tried to properly add measurement units to R plots? It might go like this:

xlab = parse(text = "temperature ~~ group('[', degree * C, ']')") ylab = parse(text = "speed ~~ group('[', m * ~~ s^-1, ']')") par(mar = par("mar") + c(0, .3, 0, 0)) # avoids cutting of superscript plot(3 + 1:10 + 2 * rnorm(10), xlab = xlab, ylab = ylab)

The main observation is, of course that it can be done. However,

  • it looks geeky, and not quite intuitive
  • you would typically postpone this work to just before submitting the paper, or during review
  • you need this so infrequently that you tend to forget how it works.

Although well-written help is found in ?plotmath, all three observations cause frustration.

The original paper desribing plotmath is by Paul Murrell and Ross Ihaka. R core member Paul Murrell also wrote package grid, part of base R. Few people use it directly, but without it ggplot2 or lattice could not exist.

Automatic unit handling

The new units CRAN package now makes working with units

  • easier
  • automatic, and
  • less error-prone

Here is an example using mtcars. First, we specify the imperial units to those known in the udunits2 database:

library(units) gallon = make_unit("gallon") consumption = mtcars$mpg * with(ud_units, mi/gallon) displacement = mtcars$disp * ud_units[["in"]]^3

For displacement, we cannot use the normal lookup in the database

displacement = mtcars$disp * with(ud_units, in)

because in (inch) is also a reserved word in R.

We convert these values to SI units by

units(displacement) = with(ud_units, cm^3) units(consumption) = with(ud_units, km/l) consumption[1:5] ## Units: km/l ## [1] 8.928017 8.928017 9.693276 9.098075 7.950187 Automatic measurement units in axis labels

We can plot these numeric variabes of type units by

par(mar = par("mar") + c(0, .1, 0, 0)) # avoids cutting of brackets at lhs plot(displacement, consumption)

The units automatically appear in axis labels! If we want to have negative power instead of division bars, we can set a global option

units_options(negative_power = TRUE) # division becomes ^-1

Expressions such as

1/displacement [1:10] ## Units: cm^-3 ## [1] 0.0003813984 0.0003813984 0.0005650347 0.0002365261 0.0001695104 ## [6] 0.0002712166 0.0001695104 0.0004159764 0.0004334073 0.0003641035

automatically convert units, which also happens in plots (note the converted units symbols):

par(mar = par("mar") + c(0, .3, 0, 0)) plot(1/displacement, 1/consumption)

How to do this with ggplot?

We can of course plot these data by dropping units:

library(ggplot2) ggplot() + geom_point(aes(x = as.numeric(displacement), y = as.numeric(consumption)))

but that doesn’t show us units. Giving the units as variables gives an error:

ggplot() + geom_point(aes(x = displacement, y = consumption)) ## Don't know how to automatically pick scale for object of type units. Defaulting to continuous. ## Don't know how to automatically pick scale for object of type units. Defaulting to continuous. ## Error in Ops.units(x, range[1]): both operands of the expression should be "units" objects

(I could make that error go away by letting units drop the requirement that in a comparison both sides should have compatible units, which of course would be wrong.)

We can then go all the way with

ggplot() + geom_point(aes(x = as.numeric(displacement), y = as.numeric(consumption))) + xlab(make_unit_label("displacement", displacement)) + ylab(make_unit_label("consumption", consumption))

which at least doesn’t cut off the left label, but feels too convoluted and error-prone.

Oh ggplot gurus, who can help us out, here? How can we obtain that last plot by

ggplot() + geom_point(aes(x = displacement, y = consumption))

?

Update of Dec 2, 2016

Thanks to ggguru Thomas Lin Pedersen, automatic units in axis labels of ggplots are now provided by CRAN package ggforce:

library(ggforce) ggplot() + geom_point(aes(x = displacement, y = consumption))

and see this vignette for more examples. In addition to printing units in default axes labels, it allows for on-the-fly unit conversion in ggplot expressions:

dm = with(ud_units, dm) gallon = with(ud_units, gallon) mi = with(ud_units, mi) ggplot() + geom_point(aes(x = displacement, y = consumption)) + scale_x_unit(unit = dm^3) + scale_y_unit(unit = mi/gallon)

Related posts/articles

The future of R spatial

Mon, 09/26/2016 - 10:00

Last week’s geostat summer school in Albacete was a lot of fun, with about 60 participants and 10 lecturers. Various courses were given on handling, analyzing and modelling spatial and spatiotemporal data, using open source software. Participants came from all kind of directions, not only geosciences but also antropology, epidemiology and surprisingly many from biology and ecology. Tom Hengl invited us to discuss the future of spatial and spatiotemporal analysis on day 2:

@edzerpebesma talking about the future of spatial and spatiotemporal analysis at #geostat2016 @uclm_inter pic.twitter.com/yfQL2vb5ii

— Rubén G. Mateo (@RubenGMateo) September 20, 2016

In the background of the screen, you see the first appveyor (= windows) build of sf, the simple features for R package. It means that thanks to Jeroen Ooms and rwinlib, windows users can now build binary packages that link to GDAL 2.1, GEOS and Proj.4:

Windows users with Rtools installed can now build and install sfr. Opens the way for others to directly Rcpp into gdal2. Ta2 @opencpu !

— Edzer Pebesma (@edzerpebesma) September 21, 2016

Thanks to the efficient well-known-binary interface of sf, and thanks to using C++ and Rcpp, compared to sp the sf package now reads large feature sets much (18 x) faster into much (4 x) smaller objects (benchmark shapefile provided by Robin Lovelace):

> system.time(r <- rgdal::readOGR(".", "gis.osm_buildings_v06")) OGR data source with driver: ESRI Shapefile Source: ".", layer: "gis.osm_buildings_v06" with 487576 features It has 6 fields user system elapsed 90.312 0.744 91.053 > object.size(r) 1556312104 bytes > system.time(s <- sf::st_read(".", "gis.osm_buildings_v06")) Reading layer gis.osm_buildings_v06 from data source . using driver "ESRI Shapefile" features: 487576 fields: 6 converted into: MULTIPOLYGON proj4string: +proj=longlat +datum=WGS84 +no_defs user system elapsed 5.100 0.092 5.191 > object.size(s) 410306448 bytes Raster data

Currently, R package raster is gradually being ported to C++ for efficiency reasons. For reading and writing data through GDAL, it uses rgdal, so when going through a big (cached) raster in C++ it has to go through C++ \(\rightarrow\) R \(\rightarrow\) rgdal \(\rightarrow\) R \(\rightarrow\) C++ for every chunk of data. The current set of raster classes

library(raster) Loading required package: sp > showClass("Raster") Virtual Class "Raster" [package "raster"] Slots: Name: title extent rotated rotation ncols nrows crs Class: character Extent logical .Rotation integer integer CRS Name: history z Class: list list Extends: "BasicRaster" Known Subclasses: Class "RasterLayer", directly Class "RasterBrick", directly Class "RasterStack", directly Class ".RasterQuad", directly Class "RasterLayerSparse", by class "RasterLayer", distance 2 Class ".RasterBrickSparse", by class "RasterBrick", distance 2

has grown somewhat ad hoc, and should be replaced by a single class that supports

  • one or more layers (bands, attributes)
  • time as a dimension
  • altitude or depth as a dimension (possibly expressed as pressure level)
The future

So, how does the future of R spatial look like?

  1. vector data use simple features, now in package sf
  2. raster data get a single, flexible class that generalizes all Raster* classes now in raster and integrates with simple features
  3. vector and raster data share a clear and consistent interface, no more conflicting function names
  4. raster computing directly links to GDAL, but supports distributed computing back ends provided e.g. by SciDB, Google Earth Engine or rasdaman
  5. spatiotemporal classes in spacetime and trajectories build on simple features or raster
  6. support for measurement units
  7. support for strong typing that encourages meaningful computation.

Exciting times are ahead of us. We need your help!

Reading well-known-binary into R

Thu, 09/01/2016 - 11:00

This blog post describes ways to read binary simple feature data into R, and compares them.

WKB (well-known-binary) is the (ISO) standard binary serialization for simple features. You see it often printed in hexadecimal notation , e.g. in spatially extended databases such as PostGIS:

postgis=# SELECT 'POINT(1 2)'::geometry; geometry -------------------------------------------- 0101000000000000000000F03F0000000000000040 (1 row)

where the alternative form is the human-readable text (Well-known text) form:

postgis=# SELECT ST_AsText('POINT(1 2)'::geometry); st_astext ------------ POINT(1 2) (1 row)

In fact, the WKB is the way databases store features in BLOBs (binary large objects). This means that, unlike well-known text, reading well-known binary involves

  • no loss of precision caused by text <–> binary conversion,
  • no conversion of data needed at all (provided the endianness is native)

As a consequence, it should be possible to do this blazingly fast. Also with R? And large data sets?

Three software scenarios

I compared three software implementations:

  1. sf::st_as_sfc (of package sf) using C++ to read WKB
  2. sf::st_as_sfc (of package sf) using pure R to read WKB (but C++ to compute bounding box)
  3. wkb::readWKB (of package wkb) using pure R to read features into sp-compatible objects

Note that the results below were obtained after profiling, and implementing expensive parts in C++.

Three geometries

I created three different (sets of) simple features to compare read performance: one large and simple line, one data set with many small lines, and one multi-part line containing many sub-lines:

  1. single LINESTRING with many points: a single LINESTRING with one million nodes (pionts) is read into a single simple feature
  2. many LINESTRINGs with few points: half a million simple features of type LINESTRING are read, each having two nodes (points)
  3. single MULTILINESTRING with many short lines: a single simple feature of type MULTILINESTRING is read, consisting of half a million line segments, each line segment consisting of two points.

A reproducible demo-script is found in the sf package here, and can be run by

devtools::install_github("edzer/sfr") demo(bm_wkb)

Reported run times are in seconds, and were obtained by system.time().

single LINESTRING with many points expression user system elapsed sf::st_as_sfc(.) 0.032 0.000 0.031 sf::st_as_sfc(., pureR = TRUE) 0.096 0.012 0.110 wkb::readWKB(.) 8.276 0.000 8.275

We see that for this case both sf implementations are comparable; this is due to the fact that the whole line of 16 Mb is read into R with a single readBin call: C++ can’t do this much faster.

I suspect wkb::readWKB is slower here because instead of reading a complete matrix in one step it makes a million calls to readPoint, and then merges the points read in R. This adds a few million function calls. Since only a single Line is created, not much overhead from sp can take place here.

Function calls, as John Chambers explains in Extending R, have a constant overhead of about 1000 instructions. Having lots of them may become expensive, if each of them does relatively little.

many LINESTRINGs with few points expression user system elapsed sf::st_as_sfc(.) 1.244 0.000 1.243 sf::st_as_sfc(., pureR = TRUE) 55.004 0.056 55.063 wkb::readWKB(.) 257.092 0.192 257.291

Here we see a strong performance gain of the C++ implementation: all the object creation is done in C++, without R function calls. wkb::readWKB slowness may be largely due to overhead caused by sp: creating Line and Lines objects, object validation, computing bounding box.

I made the C++ and “pureR” implementations considerably faster by moving the bounding box calculation to C++. The C++ implementation was further optimized by moving the type check to C++: if a mix of types is read from a set of WKB objects, sfc will coerce them to a single type (e.g., a set of LINESTRING and MULTILINESTRING will be coerced to all MULTILINESTRING.)

single MULTILINESTRING with many short lines expression user system elapsed sf::st_as_sfc(.) 0.348 0.000 0.348 sf::st_as_sfc(., pureR = TRUE) 24.088 0.008 24.100 wkb::readWKB(.) 87.072 0.004 87.074

Here we see again the cost of function calls: both “pureR” in sf and wkb::readWKB are much slower due to the many function calls; the latter also due to object management and validation in sp.

Discussion

Reading well-known binary spatial data into R can be done pretty elegantly by R, but in many scenarios can be much faster using C++. We observe speed gains up to a factor 250.

Book review: Extending R

Wed, 08/17/2016 - 02:00

“Extending R”, by John M. Chambers; Paperback $69.95, May 24, 2016 by Chapman and Hall/CRC; 364 Pages - 7 B/W Illustrations;

R is a free software environment for statistical computing and graphics. It started as a free implementation of the S language, which was back then commercially available as S-Plus, and has since around ten years become the lingua franca of statistics, the main language people use to communicate statistical computation. R’s popularity stems partly from the fact that it is free and open source, partly from the fact that it is easily extendible: through add-on packages that follow a clearly defined structure, new statistical ideas can be implemented, shared, and used by others. Using R, the computational aspects of research can be communicated in a reproducible way, understood by a large audience.

Written between 1984 and 1998, John Chambers is (co-)author of the four leading – “brown”, “blue”, “white”, “green” – books that describe the S language as it evolved and as it is now. He has designed it, implemented it, and improved it in all its phases. Being part of the R core team, he is author of the methods package, part of every R installation, providing the S4 approach to object orientation.

This book, Extending R, appeared as a volume in “The R Series”. The book is organized in four parts:

  1. Understanding R,
  2. Programming with R,
  3. Object-oriented programming, and
  4. Interfaces.

The first part starts with explaining three principles underlying R:

  • Everything that exists in R is an object
  • Everything that happens in R is a function call
  • Interfaces to other software are part of R.

These principle form the basis for parts II, III and IV. The first chapter introduces them. Chapter two, “Evolution”, describes the history of the S language, from its earliest days to Today: the coming and going of S-Plus, the arrival of R and its dominance Today. It also describes the evolution of functional S, and the evolution of object-oriented programming in S. Chapter 3, “R in action”, explains a number of basics of R, such as how function calls work, how objects are implemented, and how the R evaluator works.

Part II, “Programming with R”, discusses functions in depth, explains what objects are and how they are managed, and explains what extension packages do to the R environment. It discusses small, medium and large programming exercises, and what they demand.

Part III, “Object-oriented programming”, largely focuses on the difference between functional object oriented programming (as implemented in S4) and encapsulated object oriented programming as implemented in reference classes (similar to C++ and java), and shows examples for which purpose each paradigm is most useful.

Part IV, “Interfaces”, explains the potential and challenges of interfacing R with other programming languages. It discusses several of such interfaces, and describes a general framework for creating such interfaces. As instances of this framework it provides interfaces to the Python and Julia languages, and discusses the existing Rcpp framework.

For who was this book written? It is clearly not an introductory text, nor a how-to or hands-on book for learning how to program R or write R packages, and it refers to the two volumes Advanced R and R packages, both written by Hadley Wickham. For those with a bit of experience with R programming and a general interest in the language, this book may give a number of new insights and a deeper, often evolutionary motivated understanding.

Not surprisingly, the book also gives clear advice on how software development should take place: object-oriented with formally defined classes (S4 or reference classes), and it argues why this is a good idea. One of these arguments is the ability to do method dispatch based on more than one argument. This needs all arguments to be evaluated, and does not work well with non-standard evaluation. Many R packages currently promoted by Hadley Wickham and many others (“tidyverse”) often favor non-standard evaluation, and constrain to S3. I think that both arguments have some merit, and would look forward to a good user study that compares the usability of the two approaches.

Measurement units in R now simplify

Tue, 08/16/2016 - 02:00

[view raw Rmd]

I wrote earlier about the units R package in this blog post. Last weekend I was happily surprised by two large pull requests (1, 2), from Thomas Mailund. He discusses his contribution in this blog.

Essentially, the pull requests enable

  • the handling and definition of user-defined units in R, and
  • automatic simplification of units
How it works

Units now have to be created explicitly, e.g. by

library(units) m = make_unit("m") s = make_unit("s") (a = 1:10 * m/s) ## Units: m/s ## [1] 1 2 3 4 5 6 7 8 9 10

The units of the udunits2 package are no longer loaded automatically; they are in a database (list) called ud_untis, which is lazyloaded, so after

rm("m", "s")

two clean solutions to use them are either

(a = 1:10 * ud_units$m / ud_units$s) ## Units: m/s ## [1] 1 2 3 4 5 6 7 8 9 10

or

(with(ud_units, a <- 1:10 * m / s)) ## Units: m/s ## [1] 1 2 3 4 5 6 7 8 9 10

and one much less clean solution is to first attach the whole database:

attach(ud_units) ## The following object is masked _by_ .GlobalEnv: ## ## a ## The following object is masked from package:datasets: ## ## npk ## The following objects are masked from package:base: ## ## F, T (a = 1:10 * m / s) ## Units: m/s ## [1] 1 2 3 4 5 6 7 8 9 10 Simplification

Simplification not only works when identical units appear in both numerator and denominator:

a = 1:10 * m / s a * (10 * s) ## Units: m ## [1] 10 20 30 40 50 60 70 80 90 100

but also when a unit in the numerator and denominator are convertible:

a = 1:10 * m / s a * (10 * min) ## Units: m ## [1] 600 1200 1800 2400 3000 3600 4200 4800 5400 6000 a / (0.1 * km) ## Units: 1/s ## [1] 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.10 New units

New units can be created on the fly, and are simplified:

apple = make_unit("apple") euro = make_unit("euro") (nr = c(5, 10, 15) * apple) ## Units: apple ## [1] 5 10 15 (cost_per_piece = 0.57 * euro / apple) ## 0.57 euro/apple (cost = nr * cost_per_piece) ## Units: euro ## [1] 2.85 5.70 8.55 Limitations

Two limitations of the current implementation are

  1. automatic conversion of user-implemented units into other user-defined units or to and from units in the ud_units database is not supported,
  2. non-integer powers are no (longer) supported.

Simple features for R, part 2

Mon, 07/18/2016 - 02:00

[view raw Rmd]

What happened so far?
  • in an earlier blog post I introduced the idea of having simple features mapped directly into simple R objects
  • an R Consortium ISC proposal to implement this got granted
  • during UseR! 2016 I presented this proposal (slides), which we followed up with an open discussion on future directions
  • first steps to implement this in the sf package have finished, and are described below

This blog post describes current progress.

Install & test

You can install package sf directly from github:

library(devtools) # maybe install first? install_github("edzer/sfr", ref = "16e205f54976bee75c72ac1b54f117868b6fafbc")

if you want to try out read.sf, which reads through GDAL 2.0, you also need my fork of the R package rgdal2, installed by

install_github("edzer/rgdal2")

this, obviously, requires that GDAL 2.0 or later is installed, along with development files.

After installing, a vignette contains some basic operations, and is shown by

library(sf) vignette("basic") How does it work?

Basic design ideas and constraints have been written in this document.

Simple features are one of the following 17 types: Point, LineString, Polygon, MultiPoint, MultiLineString, MultiPolygon, GeometryCollection, CircularString, CompoundCurve, CurvePolygon, MultiCurve, MultiSurface, Curve, Surface, PolyhedralSurface, TIN, and Triangle. Each type can have 2D points (XY), 3D points (XYZ), 2D points with measure (XYM) and 3D points with measure (XYZM). This leads to 17 x 4 = 68 combinations.

The first seven of these are most common, and have been implemented, allowing for XY, XYZ, XYM and XYZM geometries.

Simple feature instances: sfi

A single simple feature is created by calling the constructor function, along with a modifier in case a three-dimensional geometry has measure “M” as its third dimension:

library(sf) POINT(c(2,3)) ## [1] "POINT(2 3)" POINT(c(2,3,4)) ## [1] "POINT Z(2 3 4)" POINT(c(2,3,4), "M") ## [1] "POINT M(2 3 4)" POINT(c(2,3,4,5)) ## [1] "POINT ZM(2 3 4 5)"

what is printed is a well kown text representation of the object; the data itself is however stored as a regular R vector or matrix:

str(POINT(c(2,3,4), "M")) ## Classes 'POINT M', 'sfi' num [1:3] 2 3 4 str(LINESTRING(rbind(c(2,2), c(3,3), c(3,2)))) ## LINESTRING [1:3, 1:2] 2 3 3 2 3 2 ## - attr(*, "class")= chr [1:2] "LINESTRING" "sfi"

By using the two simple rules that

  1. sets of points are kept in a matrix
  2. other sets are kept in a list

we end up with the following structures, with increasing complexity:

Sets of points (matrix): str(LINESTRING(rbind(c(2,2), c(3,3), c(3,2)))) ## LINESTRING [1:3, 1:2] 2 3 3 2 3 2 ## - attr(*, "class")= chr [1:2] "LINESTRING" "sfi" str(MULTIPOINT(rbind(c(2,2), c(3,3), c(3,2)))) ## MULTIPOINT [1:3, 1:2] 2 3 3 2 3 2 ## - attr(*, "class")= chr [1:2] "MULTIPOINT" "sfi" Sets of sets of points: str(MULTILINESTRING(list(rbind(c(2,2), c(3,3), c(3,2)), rbind(c(2,1),c(0,0))))) ## List of 2 ## $ : num [1:3, 1:2] 2 3 3 2 3 2 ## $ : num [1:2, 1:2] 2 0 1 0 ## - attr(*, "class")= chr [1:2] "MULTILINESTRING" "sfi" outer = matrix(c(0,0,10,0,10,10,0,10,0,0),ncol=2, byrow=TRUE) hole1 = matrix(c(1,1,1,2,2,2,2,1,1,1),ncol=2, byrow=TRUE) hole2 = matrix(c(5,5,5,6,6,6,6,5,5,5),ncol=2, byrow=TRUE) str(POLYGON(list(outer, hole1, hole2))) ## List of 3 ## $ : num [1:5, 1:2] 0 10 10 0 0 0 0 10 10 0 ## $ : num [1:5, 1:2] 1 1 2 2 1 1 2 2 1 1 ## $ : num [1:5, 1:2] 5 5 6 6 5 5 6 6 5 5 ## - attr(*, "class")= chr [1:2] "POLYGON" "sfi" Sets of sets of sets of points: pol1 = list(outer, hole1, hole2) pol2 = list(outer + 12, hole1 + 12) pol3 = list(outer + 24) mp = MULTIPOLYGON(list(pol1,pol2,pol3)) str(mp) ## List of 3 ## $ :List of 3 ## ..$ : num [1:5, 1:2] 0 10 10 0 0 0 0 10 10 0 ## ..$ : num [1:5, 1:2] 1 1 2 2 1 1 2 2 1 1 ## ..$ : num [1:5, 1:2] 5 5 6 6 5 5 6 6 5 5 ## $ :List of 2 ## ..$ : num [1:5, 1:2] 12 22 22 12 12 12 12 22 22 12 ## ..$ : num [1:5, 1:2] 13 13 14 14 13 13 14 14 13 13 ## $ :List of 1 ## ..$ : num [1:5, 1:2] 24 34 34 24 24 24 24 34 34 24 ## - attr(*, "class")= chr [1:2] "MULTIPOLYGON" "sfi" Sets of sets of sets of sets of points: str(GEOMETRYCOLLECTION(list(MULTIPOLYGON(list(pol1,pol2,pol3)), POINT(c(2,3))))) ## List of 2 ## $ :List of 3 ## ..$ :List of 3 ## .. ..$ : num [1:5, 1:2] 0 10 10 0 0 0 0 10 10 0 ## .. ..$ : num [1:5, 1:2] 1 1 2 2 1 1 2 2 1 1 ## .. ..$ : num [1:5, 1:2] 5 5 6 6 5 5 6 6 5 5 ## ..$ :List of 2 ## .. ..$ : num [1:5, 1:2] 12 22 22 12 12 12 12 22 22 12 ## .. ..$ : num [1:5, 1:2] 13 13 14 14 13 13 14 14 13 13 ## ..$ :List of 1 ## .. ..$ : num [1:5, 1:2] 24 34 34 24 24 24 24 34 34 24 ## ..- attr(*, "class")= chr [1:2] "MULTIPOLYGON" "sfi" ## $ :Classes 'POINT', 'sfi' num [1:2] 2 3 ## - attr(*, "class")= chr [1:2] "GEOMETRYCOLLECTION" "sfi"

where this is of course a worst case: GEOMETRYCOLLECTION objects with simpler elements have less nesting.

Methods for sfi

The following methods have been implemented for sfi objects:

methods(class = "sfi") ## [1] as.WKT format print ## see '?methods' for accessing help and source code Alternatives to this implementation
  1. Package rgdal2 reads point sets not in a matrix, but into a list with numeric vectors named x and y. This is closer to the GDAL (OGR) data model, and would allow for easier disambiguation of the third dimension (m or z) in case of three-dimensional points. It is more difficult to select a single point, and requires validation of vector lenghts being identical. I’m inclined to keep using matrix for point sets.
  2. Currently, POINT Z is of class c("POINT Z", "sfi"). An alternative would be to have it derive from POINT, i.e. give it class c("POINT Z", "POINT", "sfi"). This would make it easier to write methods for XYZ, XYM and XYZM geometries. This may be worth trying out.
Simple feature list columns: sfc

Collections of simple features can be added together into a list. If all elements of this list

  • are of identical type (have identical class), or are a mix of X and MULTIX (with X being one of POINT, LINESTRING or POLYGON)
  • have an identical coordinate reference system

then they can be combined in a sfc object. This object

  • converts, if needed, X into MULTIX (this is also what PostGIS does),
  • registers the coordinate reference system in attributes epsg and proj4string,
  • has the bounding box in attribute bbox, and updates it after subsetting
ls1 = LINESTRING(rbind(c(2,2), c(3,3), c(3,2))) ls2 = LINESTRING(rbind(c(5,5), c(4,1), c(1,2))) sfc = sfc(list(ls1, ls2), epsg = 4326) attributes(sfc) ## $class ## [1] "sfc" ## ## $type ## [1] "LINESTRING" ## ## $epsg ## [1] 4326 ## ## $bbox ## xmin xmax ymin ymax ## 1 5 1 5 ## ## $proj4string ## [1] "+init=epsg:4326 +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0" attributes(sfc[1]) ## $class ## [1] "sfc" ## ## $type ## [1] "LINESTRING" ## ## $epsg ## [1] 4326 ## ## $bbox ## xmin xmax ymin ymax ## 2 3 2 3 ## ## $proj4string ## [1] "+init=epsg:4326 +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0"

The following methods have been implemented for sfc simple feature list columns:

methods(class = "sfc") ## [1] bbox format [ summary ## see '?methods' for accessing help and source code data.frames with simple features: sf

Typical spatial data contain attribute values and attribute geometries. When combined in a table, they can be converted into sf objects, e.g. by

roads = data.frame(widths = c(5, 4.5)) roads$geom = sfc roads.sf = sf(roads) roads.sf ## widths geom ## 1 5.0 LINESTRING(2 2, 3 3, 3 2) ## 2 4.5 LINESTRING(5 5, 4 1, 1 2) summary(roads.sf) ## widths geom ## Min. :4.500 LINESTRING :2 ## 1st Qu.:4.625 epsg:4326 :0 ## Median :4.750 +init=epsg...:0 ## Mean :4.750 ## 3rd Qu.:4.875 ## Max. :5.000 attributes(roads.sf) ## $names ## [1] "widths" "geom" ## ## $row.names ## [1] 1 2 ## ## $class ## [1] "sf" "data.frame" ## ## $sf_column ## geom ## 2 ## ## $relation_to_geometry ## widths ## <NA> ## Levels: field lattice entity

here, attribute relation_to_geometry allows documenting how attributes relate to the geometry: are they constant (field), aggregated over the geometry (lattice), or do they identify individual entities (buildings, parcels etc.)?

The following methods have been implemented for sfc simple feature list columns:

methods(class = "sf") ## [1] geometry ## see '?methods' for accessing help and source code Coercion to and from sp

Points, MultiPoints, Lines, MultiLines, Polygons and MultiPolygons can be converted between sf and sp, both ways. A round trip is demonstrated by:

df = data.frame(a=1) df$geom = sfc(list(mp)) sf = sf(df) library(methods) a = as(sf, "Spatial") class(a) ## [1] "SpatialPolygonsDataFrame" ## attr(,"package") ## [1] "sp" b = as.sf(a) all.equal(sf, b) # round-trip sf-sp-sf ## [1] TRUE a2 = as(a, "SpatialPolygonsDataFrame") all.equal(a, a2) # round-trip sp-sf-sp ## [1] TRUE Reading through GDAL

Function read.sf works, if rgdal2 is installed (see above), and reads simple features through GDAL:

(s = read.sf(system.file("shapes/", package="maptools"), "sids"))[1:5,] ## AREA PERIMETER CNTY_ CNTY_ID NAME FIPS FIPSNO CRESS_ID BIR74 ## 0 0.114 1.442 1825 1825 Ashe 37009 37009 5 1091 ## 1 0.061 1.231 1827 1827 Alleghany 37005 37005 3 487 ## 2 0.143 1.63 1828 1828 Surry 37171 37171 86 3188 ## 3 0.07 2.968 1831 1831 Currituck 37053 37053 27 508 ## 4 0.153 2.206 1832 1832 Northampton 37131 37131 66 1421 ## SID74 NWBIR74 BIR79 SID79 NWBIR79 geom ## 0 1 10 1364 0 19 MULTIPOLYGON(((-81.47275543212 ... ## 1 0 10 542 3 12 MULTIPOLYGON(((-81.23989105224 ... ## 2 5 208 3616 6 260 MULTIPOLYGON(((-80.45634460449 ... ## 3 1 123 830 2 145 MULTIPOLYGON(((-76.00897216796 ... ## 4 9 1066 1606 3 1197 MULTIPOLYGON(((-77.21766662597 ... summary(s) ## AREA PERIMETER CNTY_ CNTY_ID NAME ## 0.118 : 4 1.307 : 2 1825 : 1 1825 : 1 Alamance : 1 ## 0.091 : 3 1.601 : 2 1827 : 1 1827 : 1 Alexander: 1 ## 0.143 : 3 1.68 : 2 1828 : 1 1828 : 1 Alleghany: 1 ## 0.07 : 2 1.791 : 2 1831 : 1 1831 : 1 Anson : 1 ## 0.078 : 2 0.999 : 1 1832 : 1 1832 : 1 Ashe : 1 ## 0.08 : 2 1 : 1 1833 : 1 1833 : 1 Avery : 1 ## (Other):84 (Other):90 (Other):94 (Other):94 (Other) :94 ## FIPS FIPSNO CRESS_ID BIR74 SID74 ## 37001 : 1 37001 : 1 1 : 1 1027 : 1 0 :13 ## 37003 : 1 37003 : 1 10 : 1 1035 : 1 4 :13 ## 37005 : 1 37005 : 1 100 : 1 1091 : 1 1 :11 ## 37007 : 1 37007 : 1 11 : 1 11158 : 1 5 :11 ## 37009 : 1 37009 : 1 12 : 1 1143 : 1 2 : 8 ## 37011 : 1 37011 : 1 13 : 1 1173 : 1 3 : 6 ## (Other):94 (Other):94 (Other):94 (Other):94 (Other):38 ## NWBIR74 BIR79 SID79 NWBIR79 geom ## 736 : 3 10432 : 1 2 :10 1161 : 2 MULTIPOLYGON:100 ## 1 : 2 1059 : 1 0 : 9 5 : 2 epsg:NA : 0 ## 10 : 2 1141 : 1 1 : 9 10 : 1 ## 1243 : 2 11455 : 1 4 : 9 1023 : 1 ## 134 : 2 1157 : 1 5 : 9 1033 : 1 ## 930 : 2 1173 : 1 3 : 6 104 : 1 ## (Other):87 (Other):94 (Other):48 (Other):92

This also shows the abbreviation of long geometries when printed or summarized, provided by the format methods.

The following works for me, with PostGIS installed and data loaded:

(s = read.sf("PG:dbname=postgis", "meuse2"))[1:5,] ## zinc geom ## 1 1022 POINT(181072 333611) ## 2 1141 POINT(181025 333558) ## 3 640 POINT(181165 333537) ## 4 257 POINT(181298 333484) ## 5 269 POINT(181307 333330) summary(s) ## zinc geom ## Min. : 113.0 POINT :155 ## 1st Qu.: 198.0 epsg:NA : 0 ## Median : 326.0 +proj=ster...: 0 ## Mean : 469.7 ## 3rd Qu.: 674.5 ## Max. :1839.0 Still to do/to be decided

The following issues need to be decided upon:

  • reproject sf objects through rgdal2? support well-known-text for CRS? or use PROJ.4 directly?
  • when subsetting attributes from an sf objects, make geometry sticky (like sp does), or drop geometry and return data.frame (data.frame behaviour)?

The following things still need to be done:

  • write simple features through GDAL (using rgdal2)
  • using gdal geometry functions in rgdal2
  • extend rgdal2 to also read XYZ, XYM, and XYZM geometries - my feeling is that this will be easier than modifying rgdal
  • reprojection of sf objects
  • link to GEOS, using GEOS functions: GDAL with GEOS enabled (and rgdal2) has some of this, but not for instance rgeos::gRelate
  • develop better and more complete test cases; also check the OGC test suite
  • improve documentation, add tutorial (vignettes, paper)
  • add plot functions (base, grid)
  • explore direct WKB - sf conversion, without GDAL
  • explore how meaningfulness of operations can be verified when for attributes their relation_to_geometry has been specified

Please let me know if you have any comments, suggestions or questions!

Pages