Subscribe to R-sig-geo feed
This is an archive for R-sig-geo, not a forum. It is not possible to post through Nabble - you may not start a new thread nor follow up an existing thread. If you wish to post, but are not subscribed to the list through the list homepage, subscribe first (through the list homepage, not via Nabble) and post from your subscribed email address. Until 2015-06-20, subscribers could post through Nabble, but policy has been changed as too many non-subscribers misunderstood the interface.
Updated: 52 min 5 sec ago

Re: Is a simple feature -friendly version of spdep being developed?

Tue, 05/16/2017 - 07:27
Hello, I'm following this conversation with great interest. There's a lot
here for me
to work with, it's an active pursuit and I don't have much to offer yet but
I have a couple of inline responses below.  I've learnt a lot about spdep
from this discussion, something I've meant to explore for a long time
but my work has never had the same modelling focus).

On Tue, 16 May 2017 at 18:25 Roger Bivand <[hidden email]> wrote:

> On Tue, 16 May 2017, Tiernan Martin wrote:
>
> >> Should the weights be
> >> squirreled away inside the object flowing down the pipe?
> >
> > I'd propose that the that functions be written in such a way that the
> > neighbors, weights, and test results can be stored in list cols within
> the
> > data.frame.
> >
> > This way multiple tests (and their inputs) can be stored in a single,
> > easily comprehensible, rectangular data object.
> >
> > Here's some more pretend code to illustrate that concept with regard to
> sf:
> >
> >  nc %>%
> >    group_by(NAME) %>%
> >    nest %>%
> >    mutate(
> >           geometry = map(.x = data, ~.x[['geometry']]),
> >           neighbors = map(.x = data, .f = my_nb), # creates a list col of
> > the neighbors
> >           weights = map(.x = data, .f = add_weights), # creates a weights
> > object (class=wt, is this a sparse matrix in spdep?)
> >           test_moran = map2(.x = data, .y =  weights, .f = my_moran), #
> > creates list of Moran's I and sample kurtosis of x
> >           test_geary = map2(.x = data, .y = weights, .f = my_geary) #
> > creates list of Geary's C and sample kurtosis of x
> >               )
> >
> > #> # A tibble: 100 x 7
> > #>           NAME              data         geometry  neighbors  weights
> > test_moran test_geary
> > #>        <fctr>            <list>           <list>     <list>   <list>
> > <list>     <list>
> > #>  1        Ashe <tibble [1 x 14]> <simple_feature> <list [2]> <S3: wt>
> > <list [2]> <list [2]>
> > #>  2   Alleghany <tibble [1 x 14]> <simple_feature> <list [5]> <S3: wt>
> > <list [2]> <list [2]>
> > #>  3       Surry <tibble [1 x 14]> <simple_feature> <list [4]> <S3: wt>
> > <list [2]> <list [2]>
> > #>  4   Currituck <tibble [1 x 14]> <simple_feature> <list [2]> <S3: wt>
> > <list [2]> <list [2]>
> > #>  5 Northampton <tibble [1 x 14]> <simple_feature> <list [3]> <S3: wt>
> > <list [2]> <list [2]>
> > #>  6    Hertford <tibble [1 x 14]> <simple_feature> <list [6]> <S3: wt>
> > <list [2]> <list [2]>
> > #>  7      Camden <tibble [1 x 14]> <simple_feature> <list [5]> <S3: wt>
> > <list [2]> <list [2]>
> > #>  8       Gates <tibble [1 x 14]> <simple_feature> <list [2]> <S3: wt>
> > <list [2]> <list [2]>
> > #>  9      Warren <tibble [1 x 14]> <simple_feature> <list [5]> <S3: wt>
> > <list [2]> <list [2]>
> > #> 10      Stokes <tibble [1 x 14]> <simple_feature> <list [2]> <S3: wt>
> > <list [2]> <list [2]>
> > #> # ... with 90 more rows
> >
> > Examples of this nested data.frame / list col approach can be found here:
> > http://r4ds.had.co.nz/many-models.html#nested-data
>
> Thanks, interesting but really scary. You get numeric output for local
> Moran's I and local Geary, but what will happen next?
>
> All of these tests assume a correctly specified mean model. Without this
> assumption being met (absence of trends, absence of global spatial
> autocorrelation in the local test case, absence of missing right-hand-side
> variables and absence of incorrect functional forms for these, ...), a map
> pretending to invite "inference" of "hotspots" etc. - collections of
> observations with similar values of the local statistic, will most often
> be highly misleading. This isn't only about whether it is possible to do
> technically, but also about not making it simpler than reason would
> suggest. With a correct mean model, there may not be any (local) spatial
> autocorrelation left. See ?localmoran.sad and ?localmoran.exact, and their
> references, and McMillen (2003) (the McSpatial author). I won't even
> mention the need to adjust p.values for multiple comparisons ...
>
> >
> >> Extending this to adding weights to sf objects may be going too far. It
> >> seems slightly forced to - say - attach a sparse matrix to an sf
> geometry
> >> column as an attribute, or to add a neighbour nested list column when
> >> writing subsetting protection is very far from obvious. Like time series
> >> data (not time stamped observations, but real ordered time series),
> >> spatial data have implicit order that is tidy in a different way than
> tidy
> >> (graph dependencies between observation rows). Maybe mapped features
> with
> >> attached attribute data are "tidier" than a table, because they would
> >> preserve their relative positions?
> >
> > This seems like the crux of the problem. Are you saying that the
> orthogonal
> > "tidyness" of the tidyverse is fundamentally different from the
> underlying
> > relationships of spatial data, and therefore there's a limit to the
> > usefulness of tidyverse approach to spatial analysis?
>
> Right, this is central and open. Spatial objects can be tidy in the
> tidyverse sense - sf shows this can be done. The open question is whether
> this extends to the analysis of the data.


I think the structures are still wide open, unless you think the goals of sf
encompass everything. The simple features standard is one corner of the
available ways to structure spatial data. It pointedly ignores relations
between objects, and heavily relies on 2D-specific optimizations. All
the topology is done "on the fly" and is treated as expendable. I personally
want structures that treat simple features and similar formats as
expendable,
but that maintain the topology as a first-class citizen. I don't say this
because I think simple features is not important, it's just not my
main focus. I want time, depth, temperature, salinity, anything and
everything
storable on features, parts, edges, vertices and nodes. Lots of tools
do this or aspects of it, and none of them can be shoe-horned into simple
features, and
they don't have a unifying framework. (It's called "mesh structures", or
the simplicial
complex, or "a database" in other contexts, but none of those are really
enough).

 (I'm dismayed by GIS ideas when they are spoken about as if it's the end
of the story. Sf is really great, but it's now more than the standard and
it's kind
of unique in the available tools for the standard itself - but most of them
are pretty specialized and different to each other, probably unavoidably).

ggplot2 and tidygraph still have a long way to go, but in there I believe
is a language not just of graphics and of converting to and from igraph,
but of flexible ways of specifying how spatial data are constructed,
analysed
and interacted with.
Ggplot builds geometries, but it has no output other than a plot, and the
object that can
modify that plot. Select, group-by, arrange, filter, nest, and normalization
into multiple tables are all part of the powerful ways in which raw data
can be given structure. "Spatial" is just one small part of that structure,
and
unfortunately the 'geographic spatial', the 2D optimizations baked in
by decades old GIS practice seems to have the most sway in discussions.

This community could find a way to bake-in the geometries from ggplot
constructions as sf objects, i.e. convert gg-plot into sf, not into a plot.
I think that's a
digestable project that would really provide some valuable insights.


> From looking around, it seems
> that time series and graphs suffer from the same kinds of questions, and
> it will be interesting to see what stars turns up for spatio-temporal
> data:
>
> https://github.com/edzer/stars
>
> I see that stars wisely does not include trip/trajectory data structures,
> but analysis might involve reading the array-based representation of
> spatio-temporal environmental drivers for an ensemble of buffers around
> trajectories (say animal movement), then modelling.
>
> Interesting that you mention this, I agree it's a crux area in modern
spatial,
tracking has been neglected for years but it's starting to be looked at.

 I don't see this as terribly difficult, in fact we do this routinely
for all kinds of tracking data both as-measured and modelled. The
difficulty is having
access to the data as well as the motivation in the same place. We have
in-house tools
that sit on large collections of remotely sensed time series data:

https://github.com/AustralianAntarcticDataCentre/raadsync

When you have both the motivation (lots of animal tracking and voyage data)
and
the data on hand it's relatively easy (we have a ready audience at in
Southern Ocean
ecosystems research here).  First step is to write a read-data tool as a
function
of time. Then the problem of matching tracking data to the right space-time
window is easily broken down into the intervals between the time
series of environmental data. At that level, you can employ interpolation
between the layers,
aggregation to compromise resolution with coverage, and even
bring user-defined calculations to the smallish windows to calculate
derived products (rugosity, slope, distance to thresholds etc.)

In our modelled track outputs with tripEstimation/SGAT and bsam[1] we can
use the more probabilistic
estimates of location as more nuanced "query buckets" against the
same environmental data, it's the same issue with matching time with
all the different options you might want. What is hard for us is to share
the tools,
because first you need to arrange access to quite a lot of data, before
you get to see the magic in action.

I do think that stars provides a great opportunity though, and my thoughts
along
those lines are here:

https://github.com/mdsumner/stars.use.cases/blob/master/Use-cases.rmd#animal-tracking-mcmc-estimation


[1] https://CRAN.R-project.org/package=bsam



> Maybe: "everything should be as tidy as it can be but not tidier"?
>
>
I don't think we've even started with the housekeeping here. Some early
thoughts are here:

http://rpubs.com/cyclemumner/sc-rationale

I'm interested to help find ways to nest the spdep idioms in "tidy" ways,
but I'm also not sure
how valuable that is yet. When you convert the structures to primitives the
edge and vertex relations
are inherent already, and it's a mater of standard database
join/select/filter idioms
to link things up:

http://rpubs.com/cyclemumner/neighbours01

http://rpubs.com/cyclemumner/neighbours02

(those examples use https://github.com/mdsumner/scsf for the PRIMITIVE
model
but it's really too raw to use yet). This is all fine and dandy, but then
you still have a list-bag
- i.e. database  - of tables with no strong idioms for treating them as a
single object -
that's the real loftier goal  of tidygraph, it's certainly a clear goal of
the tidyverse to be a
general master of data).

Is this a serious option worth pursuing for spdep? I don't know, but I'm
pursuing it for my own
reasons as detailed above. I don't think modernizing spdep would be that
difficult.  At the least the limitations and
obstacles faced when using it  should be described.

I'm happy to help and I might have a go at it at some point, and I'd
encourage
anyone to get in and try it out.With rlang and the imminent release of the
greatly
improved dplyr it's a good time to start.

Cheers, Mike.



> Roger
>
> >
> > On Mon, May 15, 2017 at 5:17 AM Roger Bivand <[hidden email]>
> wrote:
> >
> >> On Sun, 14 May 2017, Tiernan Martin wrote:
> >>
> >>> You’re right about one thing: those of us introduced to the pipe in our
> >>> formative years find it hard to escape (everything is better here
> inside
> >>> the pipe - right?) ;)
> >>
> >> Pipes can be great if they don't involve extra inputs at a later stage
> in
> >> the pipe, but as:
> >>
> >> https://github.com/thomasp85/tidygraph
> >>
> >> suggests, it isn't very easy, and may be forced.
> >>
> >>>
> >>> Thank you for your very informative response. Before reading it, I had
> a
> >>> vague inkling that matrices were part of the challenge of adapting
> these
> >>> tools to work with simple features, but I also thought perhaps all this
> >>> issue needed was a little nudge from a user like me. Your response made
> >> it
> >>> clear that there is a series of technical decisions that need to be
> made,
> >>> and in order to do that I imagine some exploratory testing is in order.
> >> Of
> >>> course, it is possible that the data.frame + list-col foundation is
> just
> >>> not well-suited to the needs of spatial weighting/testing tools, but I
> >>> suspect that there is some valuable insight to be gained from exploring
> >>> this possibility further. And as I stated in my first post, going
> >> forward I
> >>> expect more users will be interested in seeing this sort of integration
> >>> happen.
> >>
> >> I think that there are two tasks - one to create neighbour objects from
> sf
> >> objects, then another to see whether the costs of putting the spatial
> >> weights into a data.frame (or even more challenging, a tibble) to test
> for
> >> spatial autocorrelation or model spatial dependence. Should the weights
> be
> >> squirreled away inside the object flowing down the pipe?
> >>
> >> sf_object %>% add_weights(...) -> sf_object_with_weights
> >> moran.test(sf_object_with_weights, ...)
> >> geary.test(sf_object_with_weights, ...)
> >>
> >> or
> >>
> >> sf_object %>% add_weights(...) %>% moran.test(sf_object_with_weights,
> ...)
> >> -> output
> >>
> >> all in one (output is a htest object). It gets more complex in:
> >>
> >> lm.morantest(lm(formula, sf_object), sf_object_with_weights)
> >>
> >> as we need the weights and the lm output object.
> >>
> >>>
> >>>> How should we go about this technically? spdep is on R-Forge and is
> >> happy
> >>>> there. Its present functionality has to be maintained as it stands, it
> >>> has
> >>>> too many reverse dependencies to break. Should it be re-written from
> >>>> scratch (with more sparse matrices internally for computation)?
> >>>> ...
> >>>> We'd need proof of concept with realistically sized data sets (not yet
> >> NY
> >>>> taxis, but maybe later ...). spdep started as spweights, sptests and
> >>>> spdep, and the first two got folded into the third when stable. If
> >>> weights
> >>>> are the first thing to go for, sfweights is where to go first (and
> port
> >>>> the STRtree envelope intersections for contiguities). It could define
> >> the
> >>>> new classes, and tests and modelling would use them.
> >>>
> >>> Sounds to me like this package would need to be built from the ground
> up,
> >>> possibly following a similar path to the `sp` development process as
> you
> >>> mentioned. Maybe that presents a challenge with regards to resources
> >> (i.e.,
> >>> funding, time, etc.). Perhaps project this is a candidate for future
> >>> proposals to the R Consortium or other funding sources. I hope that
> this
> >>> post is useful in documenting user interest in seeing the `sf` protocol
> >>> integrated into other spatial tools and workflows, and I look forward
> to
> >>> hearing others' thoughts on the matter.
> >>
> >> Making spatial weights from sf objects is just expanding spdep, and is
> >> feasible. There is no point looking for funding, all it needs is
> knowledge
> >> of how things have been done in spdep. Contiguity, knn, distance,
> >> graph-based neighbours are all feasible.
> >>
> >> Extending this to adding weights to sf objects may be going too far. It
> >> seems slightly forced to - say - attach a sparse matrix to an sf
> geometry
> >> column as an attribute, or to add a neighbour nested list column when
> >> writing subsetting protection is very far from obvious. Like time series
> >> data (not time stamped observations, but real ordered time series),
> >> spatial data have implicit order that is tidy in a different way than
> tidy
> >> (graph dependencies between observation rows). Maybe mapped features
> with
> >> attached attribute data are "tidier" than a table, because they would
> >> preserve their relative positions?
> >>
> >> Roger
> >>
> >>>
> >>> Thanks again,
> >>>
> >>> Tiernan
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> On Sat, May 13, 2017 at 1:39 PM Roger Bivand <[hidden email]>
> >> wrote:
> >>>
> >>>> On Fri, 12 May 2017, Tiernan Martin wrote:
> >>>>
> >>>>> Is anyone thinking about creating an adaptation of the `spdep`
> package
> >>>> that
> >>>>> expects sf-class inputs and works well in a pipeline?
> >>>>
> >>>> I assume that "this is not a pipeline" is a joke for insiders (in the
> >>>> pipe)?
> >>>>
> >>>> No, there is your issue on the sfr github repository that is relevant
> >> for
> >>>> contiguous neighbours, but not beyond that:
> >>>>
> >>>> https://github.com/edzer/sfr/issues/234
> >>>>
> >>>> An sf is a data.frame, and as such should "just work", like "Spatial"
> >>>> objects have, in formula/data settings. The problem is (of course) the
> >>>> weights matrix.
> >>>>
> >>>> Should it be a list column (each row has a list nesting two lists,
> first
> >>>> indices, second non-zero weights), or remain separate as it has been
> for
> >>>> 20 years, or become a column-oriented representation (Matrix package)
> -
> >> a
> >>>> nested list like a list column or a listw obeject is row-oriented. I
> had
> >>>> started thinking about using a sparse column-oriented representation,
> >> but
> >>>> have not implemented functions accepting them instead of listw
> objects.
> >>>>
> >>>> I am very cautious about creating classes for data that were
> data.frame,
> >>>> then sf, and then have the weights built-in. In the simple case it
> would
> >>>> work, but all you have to do is re-order the rows and the link between
> >> the
> >>>> neighbour ids and row order breaks down; the same applies to
> subsetting.
> >>>>
> >>>> The problems to solve first are related the workflows, and easiest to
> >> look
> >>>> at in the univariate case (Moran, Geary, join-count, Mantel, G, ...)
> for
> >>>> global and local tests. I think that all or almost all of the NSE
> verbs
> >>>> will cause chaos (mutate, select, group) once weights have been
> created.
> >>>> If there is a way to bind the neighour IDs to the input sf object
> rows,
> >> a
> >>>> list column might be possible, but we have to permit multiple such
> >> columns
> >>>> (Queen, Rook, ...), and ensure that subsetting and row-reordering keep
> >> the
> >>>> graph relations intact (and their modifications, like
> >> row-standardisation)
> >>>>
> >>>> If you've seen any examples of how tidy relates to graphs (adjacency
> >> lists
> >>>> are like nb objects), we could learn from that.
> >>>>
> >>>> How should we go about this technically? spdep is on R-Forge and is
> >> happy
> >>>> there. Its present functionality has to be maintained as it stands, it
> >> has
> >>>> too many reverse dependencies to break. Should it be re-written from
> >>>> scratch (with more sparse matrices internally for computation)?
> >>>>
> >>>> Your example creates weights on the fly for a local G* map, but has
> >> n=100,
> >>>> not say n=90000 (LA census blocks). Using the sf_ geos based methods
> >> does
> >>>> not use STRtrees, which cut the time needed to find contigious
> >> neighbours
> >>>> from days to seconds. We ought to pre-compute weights, but this messes
> >> up
> >>>> the flow of data, because a second lot of data (the weights) have to
> >> enter
> >>>> the pipe, and be matched with the row-ids.
> >>>>
> >>>> We'd need proof of concept with realistically sized data sets (not yet
> >> NY
> >>>> taxis, but maybe later ...). spdep started as spweights, sptests and
> >>>> spdep, and the first two got folded into the third when stable. If
> >> weights
> >>>> are the first thing to go for, sfweights is where to go first (and
> port
> >>>> the STRtree envelope intersections for contiguities). It could define
> >> the
> >>>> new classes, and tests and modelling would use them.
> >>>>
> >>>> Thanks for starting this discussion.
> >>>>
> >>>> Roger
> >>>>
> >>>>>
> >>>>> I understand that there is skepticism about the wisdom of adopting
> the
> >>>>> “tidyverse” principles throughout the R package ecosystem, and I
> share
> >>>> the
> >>>>> concern that an over-reliance on any single paradigm could reduce the
> >>>>> resilience and diversity of the system as a whole.
> >>>>>
> >>>>> That said, I believe that the enthusiastic adoption of the `sf`
> package
> >>>> and
> >>>>> the package's connections with widely-used tidyverse packages like
> >>>> `dplyr`
> >>>>> and `ggplot2` may result in increased demand for sf-friendly spatial
> >>>>> analysis tools. As an amateur who recently started using R as my
> >> primary
> >>>>> GIS tool, it seems like the tidyverse's preference for dataframes, S3
> >>>>> objects, list columns, and pipeline workflows would be well-suited to
> >> the
> >>>>> field of spatial analysis. Are there some fundamental reasons why the
> >>>>> `spdep` tools cannot (or should not) be adapted to the tidyverse
> >>>> "dialect"?
> >>>>>
> >>>>> Let me put the question in the context of an actual analysis: in
> >> February
> >>>>> 2017, the pop culture infovis website The Pudding (
> >> https://pudding.cool/
> >>>> )
> >>>>> published an analysis of regional preferences for Oscar-nominated
> films
> >>>> in
> >>>>> the US (https://pudding.cool/2017/02/oscars_so_mapped/). A few days
> >> ago,
> >>>>> the author posted a tutorial explaining the method of “regional
> >>>> smoothing”
> >>>>> used to create the article’s choropleths (
> >>>>> https://pudding.cool/process/regional_smoothing/).
> >>>>>
> >>>>> The method relies on several `spdep` functions (
> >>>>>
> >>>>
> >>
> https://github.com/polygraph-cool/smoothing_tutorial/blob/master/smoothing_tutorial.R
> >>>> ).
> >>>>> In the code below, I provide reprex with a smaller dataset included
> in
> >>>> the
> >>>>> `sf` package:
> >>>>>
> >>>>> library(sf)
> >>>>> library(spdep)
> >>>>>
> >>>>> nc <- st_read(system.file("shape/nc.shp", package = "sf"))  # North
> >>>>> Carolina counties
> >>>>> nc_shp <- as(nc,'Spatial')
> >>>>>
> >>>>> coords <- coordinates(nc_shp)
> >>>>> IDs<-row.names(as(nc_shp, "data.frame"))
> >>>>>
> >>>>> knn5 <- knn2nb(knearneigh(coords, k = 5), row.names = IDs)  # find
> the
> >>>>> nearest neighbors for each county
> >>>>> knn5 <- include.self(knn5)
> >>>>>
> >>>>> localGvalues <- localG(x = as.numeric(nc_shp@data$NWBIR74), listw =
> >>>>> nb2listw(knn5, style = "B"), zero.policy = TRUE) # calculate the G
> >> scores
> >>>>> localGvalues <- round(localGvalues,3)
> >>>>>
> >>>>> nc_shp@data$LOCAL_G <- as.numeric(localGvalues)
> >>>>>
> >>>>> p1 <- spplot(nc_shp, c('NWBIR74'))
> >>>>> p2 <- spplot(nc_shp, c('LOCAL_G'))
> >>>>> plot(p1, split=c(1,1,2,2), more=TRUE)
> >>>>> plot(p2, split=c(1,2,2,2), more=TRUE)
> >>>>>
> >>>>> Here’s what I imagine that would look like in a tidyverse pipeline
> >>>> (please
> >>>>> note that this code is for illustrative purposes and will not run):
> >>>>>
> >>>>> library(tidyverse)
> >>>>> library(purrr)
> >>>>> library(sf)
> >>>>> library(sfdep) # this package doesn't exist (yet)
> >>>>>
> >>>>> nc <- st_read(system.file("shape/nc.shp", package = "sf"))
> >>>>>
> >>>>> nc_g <-
> >>>>>  nc %>%
> >>>>>  mutate(KNN = map(.x = geometry, ~ sfdep::st_knn(.x, k = 5,
> >> include.self
> >>>> =
> >>>>> TRUE)),  # find the nearest neighbors for each county
> >>>>>         NB_LIST = map(.x = KNN, ~ sfdep::st_nb_list(.x, style =
> >> 'B')),  #
> >>>>> make a list of the neighbors using the binary method
> >>>>>         LOCAL_G = sfdep::st_localG(x = NWBIR74, listw = NB_LIST,
> >>>>> zero.policy = TRUE),  # calculate the G scores
> >>>>>         LOCAL_G = round(LOCAL_G,3))
> >>>>>
> >>>>> We can see that the (hypothetical) tidyverse version reduces the
> amount
> >>>> of
> >>>>> intermediate objects and wraps the creation of the G scores into a
> >> single
> >>>>> code chunk with clear steps.
> >>>>>
> >>>>> I'd be grateful to hear from the users and developers of the `spdep`
> >> and
> >>>>> `sf` packages about this topic!
> >>>>>
> >>>>> Tiernan Martin
> >>>>>
> >>>>>       [[alternative HTML version deleted]]
> >>>>>
> >>>>> _______________________________________________
> >>>>> R-sig-Geo mailing list
> >>>>> [hidden email]
> >>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
> >>>>
> >>>> --
> >>>> Roger Bivand
> >>>> Department of Economics, Norwegian School of Economics,
> >>>> Helleveien 30, N-5045 Bergen, Norway.
> >>>> voice: +47 55 95 93 55 <+47%2055%2095%2093%2055>
> <+47%2055%2095%2093%2055>
> >> <+47%2055%2095%2093%2055>; e-mail:
> >>>> [hidden email]
> >>>> Editor-in-Chief of The R Journal,
> >> https://journal.r-project.org/index.html
> >>>> http://orcid.org/0000-0003-2392-6140
> >>>> https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en
> >>>
> >>
> >> --
> >> Roger Bivand
> >> Department of Economics, Norwegian School of Economics,
> >> Helleveien 30, N-5045 Bergen, Norway.
> >> voice: +47 55 95 93 55 <+47%2055%2095%2093%2055>
> <+47%2055%2095%2093%2055>; e-mail:
> >> [hidden email]
> >> Editor-in-Chief of The R Journal,
> https://journal.r-project.org/index.html
> >> http://orcid.org/0000-0003-2392-6140
> >> https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en
> >
>
> --
> Roger Bivand
> Department of Economics, Norwegian School of Economics,
> Helleveien 30, N-5045 Bergen, Norway.
> voice: +47 55 95 93 55 <+47%2055%2095%2093%2055>; e-mail:
> [hidden email]
> Editor-in-Chief of The R Journal, https://journal.r-project.org/index.html
> http://orcid.org/0000-0003-2392-6140
> https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en
> _______________________________________________
> R-sig-Geo mailing list
> [hidden email]
> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
--
Dr. Michael Sumner
Software and Database Engineer
Australian Antarctic Division
203 Channel Highway
Kingston Tasmania 7050 Australia

        [[alternative HTML version deleted]]

_______________________________________________
R-sig-Geo mailing list
[hidden email]
https://stat.ethz.ch/mailman/listinfo/r-sig-geo

Re: Is a simple feature -friendly version of spdep being developed?

Tue, 05/16/2017 - 03:25
On Tue, 16 May 2017, Tiernan Martin wrote:

>> Should the weights be
>> squirreled away inside the object flowing down the pipe?
>
> I'd propose that the that functions be written in such a way that the
> neighbors, weights, and test results can be stored in list cols within the
> data.frame.
>
> This way multiple tests (and their inputs) can be stored in a single,
> easily comprehensible, rectangular data object.
>
> Here's some more pretend code to illustrate that concept with regard to sf:
>
>  nc %>%
>    group_by(NAME) %>%
>    nest %>%
>    mutate(
>           geometry = map(.x = data, ~.x[['geometry']]),
>           neighbors = map(.x = data, .f = my_nb), # creates a list col of
> the neighbors
>           weights = map(.x = data, .f = add_weights), # creates a weights
> object (class=wt, is this a sparse matrix in spdep?)
>           test_moran = map2(.x = data, .y =  weights, .f = my_moran), #
> creates list of Moran's I and sample kurtosis of x
>           test_geary = map2(.x = data, .y = weights, .f = my_geary) #
> creates list of Geary's C and sample kurtosis of x
>               )
>
> #> # A tibble: 100 x 7
> #>           NAME              data         geometry  neighbors  weights
> test_moran test_geary
> #>        <fctr>            <list>           <list>     <list>   <list>
> <list>     <list>
> #>  1        Ashe <tibble [1 x 14]> <simple_feature> <list [2]> <S3: wt>
> <list [2]> <list [2]>
> #>  2   Alleghany <tibble [1 x 14]> <simple_feature> <list [5]> <S3: wt>
> <list [2]> <list [2]>
> #>  3       Surry <tibble [1 x 14]> <simple_feature> <list [4]> <S3: wt>
> <list [2]> <list [2]>
> #>  4   Currituck <tibble [1 x 14]> <simple_feature> <list [2]> <S3: wt>
> <list [2]> <list [2]>
> #>  5 Northampton <tibble [1 x 14]> <simple_feature> <list [3]> <S3: wt>
> <list [2]> <list [2]>
> #>  6    Hertford <tibble [1 x 14]> <simple_feature> <list [6]> <S3: wt>
> <list [2]> <list [2]>
> #>  7      Camden <tibble [1 x 14]> <simple_feature> <list [5]> <S3: wt>
> <list [2]> <list [2]>
> #>  8       Gates <tibble [1 x 14]> <simple_feature> <list [2]> <S3: wt>
> <list [2]> <list [2]>
> #>  9      Warren <tibble [1 x 14]> <simple_feature> <list [5]> <S3: wt>
> <list [2]> <list [2]>
> #> 10      Stokes <tibble [1 x 14]> <simple_feature> <list [2]> <S3: wt>
> <list [2]> <list [2]>
> #> # ... with 90 more rows
>
> Examples of this nested data.frame / list col approach can be found here:
> http://r4ds.had.co.nz/many-models.html#nested-dataThanks, interesting but really scary. You get numeric output for local
Moran's I and local Geary, but what will happen next?

All of these tests assume a correctly specified mean model. Without this
assumption being met (absence of trends, absence of global spatial
autocorrelation in the local test case, absence of missing right-hand-side
variables and absence of incorrect functional forms for these, ...), a map
pretending to invite "inference" of "hotspots" etc. - collections of
observations with similar values of the local statistic, will most often
be highly misleading. This isn't only about whether it is possible to do
technically, but also about not making it simpler than reason would
suggest. With a correct mean model, there may not be any (local) spatial
autocorrelation left. See ?localmoran.sad and ?localmoran.exact, and their
references, and McMillen (2003) (the McSpatial author). I won't even
mention the need to adjust p.values for multiple comparisons ...

>
>> Extending this to adding weights to sf objects may be going too far. It
>> seems slightly forced to - say - attach a sparse matrix to an sf geometry
>> column as an attribute, or to add a neighbour nested list column when
>> writing subsetting protection is very far from obvious. Like time series
>> data (not time stamped observations, but real ordered time series),
>> spatial data have implicit order that is tidy in a different way than tidy
>> (graph dependencies between observation rows). Maybe mapped features with
>> attached attribute data are "tidier" than a table, because they would
>> preserve their relative positions?
>
> This seems like the crux of the problem. Are you saying that the orthogonal
> "tidyness" of the tidyverse is fundamentally different from the underlying
> relationships of spatial data, and therefore there's a limit to the
> usefulness of tidyverse approach to spatial analysis? Right, this is central and open. Spatial objects can be tidy in the
tidyverse sense - sf shows this can be done. The open question is whether
this extends to the analysis of the data. From looking around, it seems
that time series and graphs suffer from the same kinds of questions, and
it will be interesting to see what stars turns up for spatio-temporal
data:

https://github.com/edzer/stars

I see that stars wisely does not include trip/trajectory data structures,
but analysis might involve reading the array-based representation of
spatio-temporal environmental drivers for an ensemble of buffers around
trajectories (say animal movement), then modelling.

Maybe: "everything should be as tidy as it can be but not tidier"?

Roger

>
> On Mon, May 15, 2017 at 5:17 AM Roger Bivand <[hidden email]> wrote:
>
>> On Sun, 14 May 2017, Tiernan Martin wrote:
>>
>>> You’re right about one thing: those of us introduced to the pipe in our
>>> formative years find it hard to escape (everything is better here inside
>>> the pipe - right?) ;)
>>
>> Pipes can be great if they don't involve extra inputs at a later stage in
>> the pipe, but as:
>>
>> https://github.com/thomasp85/tidygraph
>>
>> suggests, it isn't very easy, and may be forced.
>>
>>>
>>> Thank you for your very informative response. Before reading it, I had a
>>> vague inkling that matrices were part of the challenge of adapting these
>>> tools to work with simple features, but I also thought perhaps all this
>>> issue needed was a little nudge from a user like me. Your response made
>> it
>>> clear that there is a series of technical decisions that need to be made,
>>> and in order to do that I imagine some exploratory testing is in order.
>> Of
>>> course, it is possible that the data.frame + list-col foundation is just
>>> not well-suited to the needs of spatial weighting/testing tools, but I
>>> suspect that there is some valuable insight to be gained from exploring
>>> this possibility further. And as I stated in my first post, going
>> forward I
>>> expect more users will be interested in seeing this sort of integration
>>> happen.
>>
>> I think that there are two tasks - one to create neighbour objects from sf
>> objects, then another to see whether the costs of putting the spatial
>> weights into a data.frame (or even more challenging, a tibble) to test for
>> spatial autocorrelation or model spatial dependence. Should the weights be
>> squirreled away inside the object flowing down the pipe?
>>
>> sf_object %>% add_weights(...) -> sf_object_with_weights
>> moran.test(sf_object_with_weights, ...)
>> geary.test(sf_object_with_weights, ...)
>>
>> or
>>
>> sf_object %>% add_weights(...) %>% moran.test(sf_object_with_weights, ...)
>> -> output
>>
>> all in one (output is a htest object). It gets more complex in:
>>
>> lm.morantest(lm(formula, sf_object), sf_object_with_weights)
>>
>> as we need the weights and the lm output object.
>>
>>>
>>>> How should we go about this technically? spdep is on R-Forge and is
>> happy
>>>> there. Its present functionality has to be maintained as it stands, it
>>> has
>>>> too many reverse dependencies to break. Should it be re-written from
>>>> scratch (with more sparse matrices internally for computation)?
>>>> ...
>>>> We'd need proof of concept with realistically sized data sets (not yet
>> NY
>>>> taxis, but maybe later ...). spdep started as spweights, sptests and
>>>> spdep, and the first two got folded into the third when stable. If
>>> weights
>>>> are the first thing to go for, sfweights is where to go first (and port
>>>> the STRtree envelope intersections for contiguities). It could define
>> the
>>>> new classes, and tests and modelling would use them.
>>>
>>> Sounds to me like this package would need to be built from the ground up,
>>> possibly following a similar path to the `sp` development process as you
>>> mentioned. Maybe that presents a challenge with regards to resources
>> (i.e.,
>>> funding, time, etc.). Perhaps project this is a candidate for future
>>> proposals to the R Consortium or other funding sources. I hope that this
>>> post is useful in documenting user interest in seeing the `sf` protocol
>>> integrated into other spatial tools and workflows, and I look forward to
>>> hearing others' thoughts on the matter.
>>
>> Making spatial weights from sf objects is just expanding spdep, and is
>> feasible. There is no point looking for funding, all it needs is knowledge
>> of how things have been done in spdep. Contiguity, knn, distance,
>> graph-based neighbours are all feasible.
>>
>> Extending this to adding weights to sf objects may be going too far. It
>> seems slightly forced to - say - attach a sparse matrix to an sf geometry
>> column as an attribute, or to add a neighbour nested list column when
>> writing subsetting protection is very far from obvious. Like time series
>> data (not time stamped observations, but real ordered time series),
>> spatial data have implicit order that is tidy in a different way than tidy
>> (graph dependencies between observation rows). Maybe mapped features with
>> attached attribute data are "tidier" than a table, because they would
>> preserve their relative positions?
>>
>> Roger
>>
>>>
>>> Thanks again,
>>>
>>> Tiernan
>>>
>>>
>>>
>>>
>>>
>>> On Sat, May 13, 2017 at 1:39 PM Roger Bivand <[hidden email]>
>> wrote:
>>>
>>>> On Fri, 12 May 2017, Tiernan Martin wrote:
>>>>
>>>>> Is anyone thinking about creating an adaptation of the `spdep` package
>>>> that
>>>>> expects sf-class inputs and works well in a pipeline?
>>>>
>>>> I assume that "this is not a pipeline" is a joke for insiders (in the
>>>> pipe)?
>>>>
>>>> No, there is your issue on the sfr github repository that is relevant
>> for
>>>> contiguous neighbours, but not beyond that:
>>>>
>>>> https://github.com/edzer/sfr/issues/234
>>>>
>>>> An sf is a data.frame, and as such should "just work", like "Spatial"
>>>> objects have, in formula/data settings. The problem is (of course) the
>>>> weights matrix.
>>>>
>>>> Should it be a list column (each row has a list nesting two lists, first
>>>> indices, second non-zero weights), or remain separate as it has been for
>>>> 20 years, or become a column-oriented representation (Matrix package) -
>> a
>>>> nested list like a list column or a listw obeject is row-oriented. I had
>>>> started thinking about using a sparse column-oriented representation,
>> but
>>>> have not implemented functions accepting them instead of listw objects.
>>>>
>>>> I am very cautious about creating classes for data that were data.frame,
>>>> then sf, and then have the weights built-in. In the simple case it would
>>>> work, but all you have to do is re-order the rows and the link between
>> the
>>>> neighbour ids and row order breaks down; the same applies to subsetting.
>>>>
>>>> The problems to solve first are related the workflows, and easiest to
>> look
>>>> at in the univariate case (Moran, Geary, join-count, Mantel, G, ...) for
>>>> global and local tests. I think that all or almost all of the NSE verbs
>>>> will cause chaos (mutate, select, group) once weights have been created.
>>>> If there is a way to bind the neighour IDs to the input sf object rows,
>> a
>>>> list column might be possible, but we have to permit multiple such
>> columns
>>>> (Queen, Rook, ...), and ensure that subsetting and row-reordering keep
>> the
>>>> graph relations intact (and their modifications, like
>> row-standardisation)
>>>>
>>>> If you've seen any examples of how tidy relates to graphs (adjacency
>> lists
>>>> are like nb objects), we could learn from that.
>>>>
>>>> How should we go about this technically? spdep is on R-Forge and is
>> happy
>>>> there. Its present functionality has to be maintained as it stands, it
>> has
>>>> too many reverse dependencies to break. Should it be re-written from
>>>> scratch (with more sparse matrices internally for computation)?
>>>>
>>>> Your example creates weights on the fly for a local G* map, but has
>> n=100,
>>>> not say n=90000 (LA census blocks). Using the sf_ geos based methods
>> does
>>>> not use STRtrees, which cut the time needed to find contigious
>> neighbours
>>>> from days to seconds. We ought to pre-compute weights, but this messes
>> up
>>>> the flow of data, because a second lot of data (the weights) have to
>> enter
>>>> the pipe, and be matched with the row-ids.
>>>>
>>>> We'd need proof of concept with realistically sized data sets (not yet
>> NY
>>>> taxis, but maybe later ...). spdep started as spweights, sptests and
>>>> spdep, and the first two got folded into the third when stable. If
>> weights
>>>> are the first thing to go for, sfweights is where to go first (and port
>>>> the STRtree envelope intersections for contiguities). It could define
>> the
>>>> new classes, and tests and modelling would use them.
>>>>
>>>> Thanks for starting this discussion.
>>>>
>>>> Roger
>>>>
>>>>>
>>>>> I understand that there is skepticism about the wisdom of adopting the
>>>>> “tidyverse” principles throughout the R package ecosystem, and I share
>>>> the
>>>>> concern that an over-reliance on any single paradigm could reduce the
>>>>> resilience and diversity of the system as a whole.
>>>>>
>>>>> That said, I believe that the enthusiastic adoption of the `sf` package
>>>> and
>>>>> the package's connections with widely-used tidyverse packages like
>>>> `dplyr`
>>>>> and `ggplot2` may result in increased demand for sf-friendly spatial
>>>>> analysis tools. As an amateur who recently started using R as my
>> primary
>>>>> GIS tool, it seems like the tidyverse's preference for dataframes, S3
>>>>> objects, list columns, and pipeline workflows would be well-suited to
>> the
>>>>> field of spatial analysis. Are there some fundamental reasons why the
>>>>> `spdep` tools cannot (or should not) be adapted to the tidyverse
>>>> "dialect"?
>>>>>
>>>>> Let me put the question in the context of an actual analysis: in
>> February
>>>>> 2017, the pop culture infovis website The Pudding (
>> https://pudding.cool/
>>>> )
>>>>> published an analysis of regional preferences for Oscar-nominated films
>>>> in
>>>>> the US (https://pudding.cool/2017/02/oscars_so_mapped/). A few days
>> ago,
>>>>> the author posted a tutorial explaining the method of “regional
>>>> smoothing”
>>>>> used to create the article’s choropleths (
>>>>> https://pudding.cool/process/regional_smoothing/).
>>>>>
>>>>> The method relies on several `spdep` functions (
>>>>>
>>>>
>> https://github.com/polygraph-cool/smoothing_tutorial/blob/master/smoothing_tutorial.R
>>>> ).
>>>>> In the code below, I provide reprex with a smaller dataset included in
>>>> the
>>>>> `sf` package:
>>>>>
>>>>> library(sf)
>>>>> library(spdep)
>>>>>
>>>>> nc <- st_read(system.file("shape/nc.shp", package = "sf"))  # North
>>>>> Carolina counties
>>>>> nc_shp <- as(nc,'Spatial')
>>>>>
>>>>> coords <- coordinates(nc_shp)
>>>>> IDs<-row.names(as(nc_shp, "data.frame"))
>>>>>
>>>>> knn5 <- knn2nb(knearneigh(coords, k = 5), row.names = IDs)  # find the
>>>>> nearest neighbors for each county
>>>>> knn5 <- include.self(knn5)
>>>>>
>>>>> localGvalues <- localG(x = as.numeric(nc_shp@data$NWBIR74), listw =
>>>>> nb2listw(knn5, style = "B"), zero.policy = TRUE) # calculate the G
>> scores
>>>>> localGvalues <- round(localGvalues,3)
>>>>>
>>>>> nc_shp@data$LOCAL_G <- as.numeric(localGvalues)
>>>>>
>>>>> p1 <- spplot(nc_shp, c('NWBIR74'))
>>>>> p2 <- spplot(nc_shp, c('LOCAL_G'))
>>>>> plot(p1, split=c(1,1,2,2), more=TRUE)
>>>>> plot(p2, split=c(1,2,2,2), more=TRUE)
>>>>>
>>>>> Here’s what I imagine that would look like in a tidyverse pipeline
>>>> (please
>>>>> note that this code is for illustrative purposes and will not run):
>>>>>
>>>>> library(tidyverse)
>>>>> library(purrr)
>>>>> library(sf)
>>>>> library(sfdep) # this package doesn't exist (yet)
>>>>>
>>>>> nc <- st_read(system.file("shape/nc.shp", package = "sf"))
>>>>>
>>>>> nc_g <-
>>>>>  nc %>%
>>>>>  mutate(KNN = map(.x = geometry, ~ sfdep::st_knn(.x, k = 5,
>> include.self
>>>> =
>>>>> TRUE)),  # find the nearest neighbors for each county
>>>>>         NB_LIST = map(.x = KNN, ~ sfdep::st_nb_list(.x, style =
>> 'B')),  #
>>>>> make a list of the neighbors using the binary method
>>>>>         LOCAL_G = sfdep::st_localG(x = NWBIR74, listw = NB_LIST,
>>>>> zero.policy = TRUE),  # calculate the G scores
>>>>>         LOCAL_G = round(LOCAL_G,3))
>>>>>
>>>>> We can see that the (hypothetical) tidyverse version reduces the amount
>>>> of
>>>>> intermediate objects and wraps the creation of the G scores into a
>> single
>>>>> code chunk with clear steps.
>>>>>
>>>>> I'd be grateful to hear from the users and developers of the `spdep`
>> and
>>>>> `sf` packages about this topic!
>>>>>
>>>>> Tiernan Martin
>>>>>
>>>>>       [[alternative HTML version deleted]]
>>>>>
>>>>> _______________________________________________
>>>>> R-sig-Geo mailing list
>>>>> [hidden email]
>>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>>>>
>>>> --
>>>> Roger Bivand
>>>> Department of Economics, Norwegian School of Economics,
>>>> Helleveien 30, N-5045 Bergen, Norway.
>>>> voice: +47 55 95 93 55 <+47%2055%2095%2093%2055>
>> <+47%2055%2095%2093%2055>; e-mail:
>>>> [hidden email]
>>>> Editor-in-Chief of The R Journal,
>> https://journal.r-project.org/index.html
>>>> http://orcid.org/0000-0003-2392-6140
>>>> https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en
>>>
>>
>> --
>> Roger Bivand
>> Department of Economics, Norwegian School of Economics,
>> Helleveien 30, N-5045 Bergen, Norway.
>> voice: +47 55 95 93 55 <+47%2055%2095%2093%2055>; e-mail:
>> [hidden email]
>> Editor-in-Chief of The R Journal, https://journal.r-project.org/index.html
>> http://orcid.org/0000-0003-2392-6140
>> https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en
> --
Roger Bivand
Department of Economics, Norwegian School of Economics,
Helleveien 30, N-5045 Bergen, Norway.
voice: +47 55 95 93 55; e-mail: [hidden email]
Editor-in-Chief of The R Journal, https://journal.r-project.org/index.html
http://orcid.org/0000-0003-2392-6140
https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en
_______________________________________________
R-sig-Geo mailing list
[hidden email]
https://stat.ethz.ch/mailman/listinfo/r-sig-geo Roger Bivand
Department of Economics
Norwegian School of Economics
Helleveien 30
N-5045 Bergen, Norway

Re: Is a simple feature -friendly version of spdep being developed?

Mon, 05/15/2017 - 18:27
> Should the weights be
> squirreled away inside the object flowing down the pipe?

I'd propose that the that functions be written in such a way that the
neighbors, weights, and test results can be stored in list cols within the
data.frame.

This way multiple tests (and their inputs) can be stored in a single,
easily comprehensible, rectangular data object.

Here's some more pretend code to illustrate that concept with regard to sf:

  nc %>%
    group_by(NAME) %>%
    nest %>%
    mutate(
           geometry = map(.x = data, ~.x[['geometry']]),
           neighbors = map(.x = data, .f = my_nb), # creates a list col of
the neighbors
           weights = map(.x = data, .f = add_weights), # creates a weights
object (class=wt, is this a sparse matrix in spdep?)
           test_moran = map2(.x = data, .y =  weights, .f = my_moran), #
creates list of Moran's I and sample kurtosis of x
           test_geary = map2(.x = data, .y = weights, .f = my_geary) #
creates list of Geary's C and sample kurtosis of x
               )

#> # A tibble: 100 x 7
#>           NAME              data         geometry  neighbors  weights
test_moran test_geary
#>        <fctr>            <list>           <list>     <list>   <list>
<list>     <list>
#>  1        Ashe <tibble [1 x 14]> <simple_feature> <list [2]> <S3: wt>
<list [2]> <list [2]>
#>  2   Alleghany <tibble [1 x 14]> <simple_feature> <list [5]> <S3: wt>
<list [2]> <list [2]>
#>  3       Surry <tibble [1 x 14]> <simple_feature> <list [4]> <S3: wt>
<list [2]> <list [2]>
#>  4   Currituck <tibble [1 x 14]> <simple_feature> <list [2]> <S3: wt>
<list [2]> <list [2]>
#>  5 Northampton <tibble [1 x 14]> <simple_feature> <list [3]> <S3: wt>
<list [2]> <list [2]>
#>  6    Hertford <tibble [1 x 14]> <simple_feature> <list [6]> <S3: wt>
<list [2]> <list [2]>
#>  7      Camden <tibble [1 x 14]> <simple_feature> <list [5]> <S3: wt>
<list [2]> <list [2]>
#>  8       Gates <tibble [1 x 14]> <simple_feature> <list [2]> <S3: wt>
<list [2]> <list [2]>
#>  9      Warren <tibble [1 x 14]> <simple_feature> <list [5]> <S3: wt>
<list [2]> <list [2]>
#> 10      Stokes <tibble [1 x 14]> <simple_feature> <list [2]> <S3: wt>
<list [2]> <list [2]>
#> # ... with 90 more rows

 Examples of this nested data.frame / list col approach can be found here:
http://r4ds.had.co.nz/many-models.html#nested-data

> Extending this to adding weights to sf objects may be going too far. It
> seems slightly forced to - say - attach a sparse matrix to an sf geometry
> column as an attribute, or to add a neighbour nested list column when
> writing subsetting protection is very far from obvious. Like time series
> data (not time stamped observations, but real ordered time series),
> spatial data have implicit order that is tidy in a different way than tidy
> (graph dependencies between observation rows). Maybe mapped features with
> attached attribute data are "tidier" than a table, because they would
> preserve their relative positions?

This seems like the crux of the problem. Are you saying that the orthogonal
"tidyness" of the tidyverse is fundamentally different from the underlying
relationships of spatial data, and therefore there's a limit to the
usefulness of tidyverse approach to spatial analysis?

On Mon, May 15, 2017 at 5:17 AM Roger Bivand <[hidden email]> wrote:

> On Sun, 14 May 2017, Tiernan Martin wrote:
>
> > You’re right about one thing: those of us introduced to the pipe in our
> > formative years find it hard to escape (everything is better here inside
> > the pipe - right?) ;)
>
> Pipes can be great if they don't involve extra inputs at a later stage in
> the pipe, but as:
>
> https://github.com/thomasp85/tidygraph
>
> suggests, it isn't very easy, and may be forced.
>
> >
> > Thank you for your very informative response. Before reading it, I had a
> > vague inkling that matrices were part of the challenge of adapting these
> > tools to work with simple features, but I also thought perhaps all this
> > issue needed was a little nudge from a user like me. Your response made
> it
> > clear that there is a series of technical decisions that need to be made,
> > and in order to do that I imagine some exploratory testing is in order.
> Of
> > course, it is possible that the data.frame + list-col foundation is just
> > not well-suited to the needs of spatial weighting/testing tools, but I
> > suspect that there is some valuable insight to be gained from exploring
> > this possibility further. And as I stated in my first post, going
> forward I
> > expect more users will be interested in seeing this sort of integration
> > happen.
>
> I think that there are two tasks - one to create neighbour objects from sf
> objects, then another to see whether the costs of putting the spatial
> weights into a data.frame (or even more challenging, a tibble) to test for
> spatial autocorrelation or model spatial dependence. Should the weights be
> squirreled away inside the object flowing down the pipe?
>
> sf_object %>% add_weights(...) -> sf_object_with_weights
> moran.test(sf_object_with_weights, ...)
> geary.test(sf_object_with_weights, ...)
>
> or
>
> sf_object %>% add_weights(...) %>% moran.test(sf_object_with_weights, ...)
> -> output
>
> all in one (output is a htest object). It gets more complex in:
>
> lm.morantest(lm(formula, sf_object), sf_object_with_weights)
>
> as we need the weights and the lm output object.
>
> >
> >> How should we go about this technically? spdep is on R-Forge and is
> happy
> >> there. Its present functionality has to be maintained as it stands, it
> > has
> >> too many reverse dependencies to break. Should it be re-written from
> >> scratch (with more sparse matrices internally for computation)?
> >> ...
> >> We'd need proof of concept with realistically sized data sets (not yet
> NY
> >> taxis, but maybe later ...). spdep started as spweights, sptests and
> >> spdep, and the first two got folded into the third when stable. If
> > weights
> >> are the first thing to go for, sfweights is where to go first (and port
> >> the STRtree envelope intersections for contiguities). It could define
> the
> >> new classes, and tests and modelling would use them.
> >
> > Sounds to me like this package would need to be built from the ground up,
> > possibly following a similar path to the `sp` development process as you
> > mentioned. Maybe that presents a challenge with regards to resources
> (i.e.,
> > funding, time, etc.). Perhaps project this is a candidate for future
> > proposals to the R Consortium or other funding sources. I hope that this
> > post is useful in documenting user interest in seeing the `sf` protocol
> > integrated into other spatial tools and workflows, and I look forward to
> > hearing others' thoughts on the matter.
>
> Making spatial weights from sf objects is just expanding spdep, and is
> feasible. There is no point looking for funding, all it needs is knowledge
> of how things have been done in spdep. Contiguity, knn, distance,
> graph-based neighbours are all feasible.
>
> Extending this to adding weights to sf objects may be going too far. It
> seems slightly forced to - say - attach a sparse matrix to an sf geometry
> column as an attribute, or to add a neighbour nested list column when
> writing subsetting protection is very far from obvious. Like time series
> data (not time stamped observations, but real ordered time series),
> spatial data have implicit order that is tidy in a different way than tidy
> (graph dependencies between observation rows). Maybe mapped features with
> attached attribute data are "tidier" than a table, because they would
> preserve their relative positions?
>
> Roger
>
> >
> > Thanks again,
> >
> > Tiernan
> >
> >
> >
> >
> >
> > On Sat, May 13, 2017 at 1:39 PM Roger Bivand <[hidden email]>
> wrote:
> >
> >> On Fri, 12 May 2017, Tiernan Martin wrote:
> >>
> >>> Is anyone thinking about creating an adaptation of the `spdep` package
> >> that
> >>> expects sf-class inputs and works well in a pipeline?
> >>
> >> I assume that "this is not a pipeline" is a joke for insiders (in the
> >> pipe)?
> >>
> >> No, there is your issue on the sfr github repository that is relevant
> for
> >> contiguous neighbours, but not beyond that:
> >>
> >> https://github.com/edzer/sfr/issues/234
> >>
> >> An sf is a data.frame, and as such should "just work", like "Spatial"
> >> objects have, in formula/data settings. The problem is (of course) the
> >> weights matrix.
> >>
> >> Should it be a list column (each row has a list nesting two lists, first
> >> indices, second non-zero weights), or remain separate as it has been for
> >> 20 years, or become a column-oriented representation (Matrix package) -
> a
> >> nested list like a list column or a listw obeject is row-oriented. I had
> >> started thinking about using a sparse column-oriented representation,
> but
> >> have not implemented functions accepting them instead of listw objects.
> >>
> >> I am very cautious about creating classes for data that were data.frame,
> >> then sf, and then have the weights built-in. In the simple case it would
> >> work, but all you have to do is re-order the rows and the link between
> the
> >> neighbour ids and row order breaks down; the same applies to subsetting.
> >>
> >> The problems to solve first are related the workflows, and easiest to
> look
> >> at in the univariate case (Moran, Geary, join-count, Mantel, G, ...) for
> >> global and local tests. I think that all or almost all of the NSE verbs
> >> will cause chaos (mutate, select, group) once weights have been created.
> >> If there is a way to bind the neighour IDs to the input sf object rows,
> a
> >> list column might be possible, but we have to permit multiple such
> columns
> >> (Queen, Rook, ...), and ensure that subsetting and row-reordering keep
> the
> >> graph relations intact (and their modifications, like
> row-standardisation)
> >>
> >> If you've seen any examples of how tidy relates to graphs (adjacency
> lists
> >> are like nb objects), we could learn from that.
> >>
> >> How should we go about this technically? spdep is on R-Forge and is
> happy
> >> there. Its present functionality has to be maintained as it stands, it
> has
> >> too many reverse dependencies to break. Should it be re-written from
> >> scratch (with more sparse matrices internally for computation)?
> >>
> >> Your example creates weights on the fly for a local G* map, but has
> n=100,
> >> not say n=90000 (LA census blocks). Using the sf_ geos based methods
> does
> >> not use STRtrees, which cut the time needed to find contigious
> neighbours
> >> from days to seconds. We ought to pre-compute weights, but this messes
> up
> >> the flow of data, because a second lot of data (the weights) have to
> enter
> >> the pipe, and be matched with the row-ids.
> >>
> >> We'd need proof of concept with realistically sized data sets (not yet
> NY
> >> taxis, but maybe later ...). spdep started as spweights, sptests and
> >> spdep, and the first two got folded into the third when stable. If
> weights
> >> are the first thing to go for, sfweights is where to go first (and port
> >> the STRtree envelope intersections for contiguities). It could define
> the
> >> new classes, and tests and modelling would use them.
> >>
> >> Thanks for starting this discussion.
> >>
> >> Roger
> >>
> >>>
> >>> I understand that there is skepticism about the wisdom of adopting the
> >>> “tidyverse” principles throughout the R package ecosystem, and I share
> >> the
> >>> concern that an over-reliance on any single paradigm could reduce the
> >>> resilience and diversity of the system as a whole.
> >>>
> >>> That said, I believe that the enthusiastic adoption of the `sf` package
> >> and
> >>> the package's connections with widely-used tidyverse packages like
> >> `dplyr`
> >>> and `ggplot2` may result in increased demand for sf-friendly spatial
> >>> analysis tools. As an amateur who recently started using R as my
> primary
> >>> GIS tool, it seems like the tidyverse's preference for dataframes, S3
> >>> objects, list columns, and pipeline workflows would be well-suited to
> the
> >>> field of spatial analysis. Are there some fundamental reasons why the
> >>> `spdep` tools cannot (or should not) be adapted to the tidyverse
> >> "dialect"?
> >>>
> >>> Let me put the question in the context of an actual analysis: in
> February
> >>> 2017, the pop culture infovis website The Pudding (
> https://pudding.cool/
> >> )
> >>> published an analysis of regional preferences for Oscar-nominated films
> >> in
> >>> the US (https://pudding.cool/2017/02/oscars_so_mapped/). A few days
> ago,
> >>> the author posted a tutorial explaining the method of “regional
> >> smoothing”
> >>> used to create the article’s choropleths (
> >>> https://pudding.cool/process/regional_smoothing/).
> >>>
> >>> The method relies on several `spdep` functions (
> >>>
> >>
> https://github.com/polygraph-cool/smoothing_tutorial/blob/master/smoothing_tutorial.R
> >> ).
> >>> In the code below, I provide reprex with a smaller dataset included in
> >> the
> >>> `sf` package:
> >>>
> >>> library(sf)
> >>> library(spdep)
> >>>
> >>> nc <- st_read(system.file("shape/nc.shp", package = "sf"))  # North
> >>> Carolina counties
> >>> nc_shp <- as(nc,'Spatial')
> >>>
> >>> coords <- coordinates(nc_shp)
> >>> IDs<-row.names(as(nc_shp, "data.frame"))
> >>>
> >>> knn5 <- knn2nb(knearneigh(coords, k = 5), row.names = IDs)  # find the
> >>> nearest neighbors for each county
> >>> knn5 <- include.self(knn5)
> >>>
> >>> localGvalues <- localG(x = as.numeric(nc_shp@data$NWBIR74), listw =
> >>> nb2listw(knn5, style = "B"), zero.policy = TRUE) # calculate the G
> scores
> >>> localGvalues <- round(localGvalues,3)
> >>>
> >>> nc_shp@data$LOCAL_G <- as.numeric(localGvalues)
> >>>
> >>> p1 <- spplot(nc_shp, c('NWBIR74'))
> >>> p2 <- spplot(nc_shp, c('LOCAL_G'))
> >>> plot(p1, split=c(1,1,2,2), more=TRUE)
> >>> plot(p2, split=c(1,2,2,2), more=TRUE)
> >>>
> >>> Here’s what I imagine that would look like in a tidyverse pipeline
> >> (please
> >>> note that this code is for illustrative purposes and will not run):
> >>>
> >>> library(tidyverse)
> >>> library(purrr)
> >>> library(sf)
> >>> library(sfdep) # this package doesn't exist (yet)
> >>>
> >>> nc <- st_read(system.file("shape/nc.shp", package = "sf"))
> >>>
> >>> nc_g <-
> >>>  nc %>%
> >>>  mutate(KNN = map(.x = geometry, ~ sfdep::st_knn(.x, k = 5,
> include.self
> >> =
> >>> TRUE)),  # find the nearest neighbors for each county
> >>>         NB_LIST = map(.x = KNN, ~ sfdep::st_nb_list(.x, style =
> 'B')),  #
> >>> make a list of the neighbors using the binary method
> >>>         LOCAL_G = sfdep::st_localG(x = NWBIR74, listw = NB_LIST,
> >>> zero.policy = TRUE),  # calculate the G scores
> >>>         LOCAL_G = round(LOCAL_G,3))
> >>>
> >>> We can see that the (hypothetical) tidyverse version reduces the amount
> >> of
> >>> intermediate objects and wraps the creation of the G scores into a
> single
> >>> code chunk with clear steps.
> >>>
> >>> I'd be grateful to hear from the users and developers of the `spdep`
> and
> >>> `sf` packages about this topic!
> >>>
> >>> Tiernan Martin
> >>>
> >>>       [[alternative HTML version deleted]]
> >>>
> >>> _______________________________________________
> >>> R-sig-Geo mailing list
> >>> [hidden email]
> >>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
> >>
> >> --
> >> Roger Bivand
> >> Department of Economics, Norwegian School of Economics,
> >> Helleveien 30, N-5045 Bergen, Norway.
> >> voice: +47 55 95 93 55 <+47%2055%2095%2093%2055>
> <+47%2055%2095%2093%2055>; e-mail:
> >> [hidden email]
> >> Editor-in-Chief of The R Journal,
> https://journal.r-project.org/index.html
> >> http://orcid.org/0000-0003-2392-6140
> >> https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en
> >
>
> --
> Roger Bivand
> Department of Economics, Norwegian School of Economics,
> Helleveien 30, N-5045 Bergen, Norway.
> voice: +47 55 95 93 55 <+47%2055%2095%2093%2055>; e-mail:
> [hidden email]
> Editor-in-Chief of The R Journal, https://journal.r-project.org/index.html
> http://orcid.org/0000-0003-2392-6140
> https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en
        [[alternative HTML version deleted]]

_______________________________________________
R-sig-Geo mailing list
[hidden email]
https://stat.ethz.ch/mailman/listinfo/r-sig-geo

Re: Is a simple feature -friendly version of spdep being developed?

Mon, 05/15/2017 - 07:16
On Sun, 14 May 2017, Tiernan Martin wrote:

> You’re right about one thing: those of us introduced to the pipe in our
> formative years find it hard to escape (everything is better here inside
> the pipe - right?) ;)

Pipes can be great if they don't involve extra inputs at a later stage in
the pipe, but as:

https://github.com/thomasp85/tidygraph

suggests, it isn't very easy, and may be forced.

>
> Thank you for your very informative response. Before reading it, I had a
> vague inkling that matrices were part of the challenge of adapting these
> tools to work with simple features, but I also thought perhaps all this
> issue needed was a little nudge from a user like me. Your response made it
> clear that there is a series of technical decisions that need to be made,
> and in order to do that I imagine some exploratory testing is in order. Of
> course, it is possible that the data.frame + list-col foundation is just
> not well-suited to the needs of spatial weighting/testing tools, but I
> suspect that there is some valuable insight to be gained from exploring
> this possibility further. And as I stated in my first post, going forward I
> expect more users will be interested in seeing this sort of integration
> happen. I think that there are two tasks - one to create neighbour objects from sf
objects, then another to see whether the costs of putting the spatial
weights into a data.frame (or even more challenging, a tibble) to test for
spatial autocorrelation or model spatial dependence. Should the weights be
squirreled away inside the object flowing down the pipe?

sf_object %>% add_weights(...) -> sf_object_with_weights
moran.test(sf_object_with_weights, ...)
geary.test(sf_object_with_weights, ...)

or

sf_object %>% add_weights(...) %>% moran.test(sf_object_with_weights, ...)
-> output

all in one (output is a htest object). It gets more complex in:

lm.morantest(lm(formula, sf_object), sf_object_with_weights)

as we need the weights and the lm output object.

>
>> How should we go about this technically? spdep is on R-Forge and is happy
>> there. Its present functionality has to be maintained as it stands, it
> has
>> too many reverse dependencies to break. Should it be re-written from
>> scratch (with more sparse matrices internally for computation)?
>> ...
>> We'd need proof of concept with realistically sized data sets (not yet NY
>> taxis, but maybe later ...). spdep started as spweights, sptests and
>> spdep, and the first two got folded into the third when stable. If
> weights
>> are the first thing to go for, sfweights is where to go first (and port
>> the STRtree envelope intersections for contiguities). It could define the
>> new classes, and tests and modelling would use them.
>
> Sounds to me like this package would need to be built from the ground up,
> possibly following a similar path to the `sp` development process as you
> mentioned. Maybe that presents a challenge with regards to resources (i.e.,
> funding, time, etc.). Perhaps project this is a candidate for future
> proposals to the R Consortium or other funding sources. I hope that this
> post is useful in documenting user interest in seeing the `sf` protocol
> integrated into other spatial tools and workflows, and I look forward to
> hearing others' thoughts on the matter. Making spatial weights from sf objects is just expanding spdep, and is
feasible. There is no point looking for funding, all it needs is knowledge
of how things have been done in spdep. Contiguity, knn, distance,
graph-based neighbours are all feasible.

Extending this to adding weights to sf objects may be going too far. It
seems slightly forced to - say - attach a sparse matrix to an sf geometry
column as an attribute, or to add a neighbour nested list column when
writing subsetting protection is very far from obvious. Like time series
data (not time stamped observations, but real ordered time series),
spatial data have implicit order that is tidy in a different way than tidy
(graph dependencies between observation rows). Maybe mapped features with
attached attribute data are "tidier" than a table, because they would
preserve their relative positions?

Roger

>
> Thanks again,
>
> Tiernan
>
>
>
>
>
> On Sat, May 13, 2017 at 1:39 PM Roger Bivand <[hidden email]> wrote:
>
>> On Fri, 12 May 2017, Tiernan Martin wrote:
>>
>>> Is anyone thinking about creating an adaptation of the `spdep` package
>> that
>>> expects sf-class inputs and works well in a pipeline?
>>
>> I assume that "this is not a pipeline" is a joke for insiders (in the
>> pipe)?
>>
>> No, there is your issue on the sfr github repository that is relevant for
>> contiguous neighbours, but not beyond that:
>>
>> https://github.com/edzer/sfr/issues/234
>>
>> An sf is a data.frame, and as such should "just work", like "Spatial"
>> objects have, in formula/data settings. The problem is (of course) the
>> weights matrix.
>>
>> Should it be a list column (each row has a list nesting two lists, first
>> indices, second non-zero weights), or remain separate as it has been for
>> 20 years, or become a column-oriented representation (Matrix package) - a
>> nested list like a list column or a listw obeject is row-oriented. I had
>> started thinking about using a sparse column-oriented representation, but
>> have not implemented functions accepting them instead of listw objects.
>>
>> I am very cautious about creating classes for data that were data.frame,
>> then sf, and then have the weights built-in. In the simple case it would
>> work, but all you have to do is re-order the rows and the link between the
>> neighbour ids and row order breaks down; the same applies to subsetting.
>>
>> The problems to solve first are related the workflows, and easiest to look
>> at in the univariate case (Moran, Geary, join-count, Mantel, G, ...) for
>> global and local tests. I think that all or almost all of the NSE verbs
>> will cause chaos (mutate, select, group) once weights have been created.
>> If there is a way to bind the neighour IDs to the input sf object rows, a
>> list column might be possible, but we have to permit multiple such columns
>> (Queen, Rook, ...), and ensure that subsetting and row-reordering keep the
>> graph relations intact (and their modifications, like row-standardisation)
>>
>> If you've seen any examples of how tidy relates to graphs (adjacency lists
>> are like nb objects), we could learn from that.
>>
>> How should we go about this technically? spdep is on R-Forge and is happy
>> there. Its present functionality has to be maintained as it stands, it has
>> too many reverse dependencies to break. Should it be re-written from
>> scratch (with more sparse matrices internally for computation)?
>>
>> Your example creates weights on the fly for a local G* map, but has n=100,
>> not say n=90000 (LA census blocks). Using the sf_ geos based methods does
>> not use STRtrees, which cut the time needed to find contigious neighbours
>> from days to seconds. We ought to pre-compute weights, but this messes up
>> the flow of data, because a second lot of data (the weights) have to enter
>> the pipe, and be matched with the row-ids.
>>
>> We'd need proof of concept with realistically sized data sets (not yet NY
>> taxis, but maybe later ...). spdep started as spweights, sptests and
>> spdep, and the first two got folded into the third when stable. If weights
>> are the first thing to go for, sfweights is where to go first (and port
>> the STRtree envelope intersections for contiguities). It could define the
>> new classes, and tests and modelling would use them.
>>
>> Thanks for starting this discussion.
>>
>> Roger
>>
>>>
>>> I understand that there is skepticism about the wisdom of adopting the
>>> “tidyverse” principles throughout the R package ecosystem, and I share
>> the
>>> concern that an over-reliance on any single paradigm could reduce the
>>> resilience and diversity of the system as a whole.
>>>
>>> That said, I believe that the enthusiastic adoption of the `sf` package
>> and
>>> the package's connections with widely-used tidyverse packages like
>> `dplyr`
>>> and `ggplot2` may result in increased demand for sf-friendly spatial
>>> analysis tools. As an amateur who recently started using R as my primary
>>> GIS tool, it seems like the tidyverse's preference for dataframes, S3
>>> objects, list columns, and pipeline workflows would be well-suited to the
>>> field of spatial analysis. Are there some fundamental reasons why the
>>> `spdep` tools cannot (or should not) be adapted to the tidyverse
>> "dialect"?
>>>
>>> Let me put the question in the context of an actual analysis: in February
>>> 2017, the pop culture infovis website The Pudding (https://pudding.cool/
>> )
>>> published an analysis of regional preferences for Oscar-nominated films
>> in
>>> the US (https://pudding.cool/2017/02/oscars_so_mapped/). A few days ago,
>>> the author posted a tutorial explaining the method of “regional
>> smoothing”
>>> used to create the article’s choropleths (
>>> https://pudding.cool/process/regional_smoothing/).
>>>
>>> The method relies on several `spdep` functions (
>>>
>> https://github.com/polygraph-cool/smoothing_tutorial/blob/master/smoothing_tutorial.R
>> ).
>>> In the code below, I provide reprex with a smaller dataset included in
>> the
>>> `sf` package:
>>>
>>> library(sf)
>>> library(spdep)
>>>
>>> nc <- st_read(system.file("shape/nc.shp", package = "sf"))  # North
>>> Carolina counties
>>> nc_shp <- as(nc,'Spatial')
>>>
>>> coords <- coordinates(nc_shp)
>>> IDs<-row.names(as(nc_shp, "data.frame"))
>>>
>>> knn5 <- knn2nb(knearneigh(coords, k = 5), row.names = IDs)  # find the
>>> nearest neighbors for each county
>>> knn5 <- include.self(knn5)
>>>
>>> localGvalues <- localG(x = as.numeric(nc_shp@data$NWBIR74), listw =
>>> nb2listw(knn5, style = "B"), zero.policy = TRUE) # calculate the G scores
>>> localGvalues <- round(localGvalues,3)
>>>
>>> nc_shp@data$LOCAL_G <- as.numeric(localGvalues)
>>>
>>> p1 <- spplot(nc_shp, c('NWBIR74'))
>>> p2 <- spplot(nc_shp, c('LOCAL_G'))
>>> plot(p1, split=c(1,1,2,2), more=TRUE)
>>> plot(p2, split=c(1,2,2,2), more=TRUE)
>>>
>>> Here’s what I imagine that would look like in a tidyverse pipeline
>> (please
>>> note that this code is for illustrative purposes and will not run):
>>>
>>> library(tidyverse)
>>> library(purrr)
>>> library(sf)
>>> library(sfdep) # this package doesn't exist (yet)
>>>
>>> nc <- st_read(system.file("shape/nc.shp", package = "sf"))
>>>
>>> nc_g <-
>>>  nc %>%
>>>  mutate(KNN = map(.x = geometry, ~ sfdep::st_knn(.x, k = 5, include.self
>> =
>>> TRUE)),  # find the nearest neighbors for each county
>>>         NB_LIST = map(.x = KNN, ~ sfdep::st_nb_list(.x, style = 'B')),  #
>>> make a list of the neighbors using the binary method
>>>         LOCAL_G = sfdep::st_localG(x = NWBIR74, listw = NB_LIST,
>>> zero.policy = TRUE),  # calculate the G scores
>>>         LOCAL_G = round(LOCAL_G,3))
>>>
>>> We can see that the (hypothetical) tidyverse version reduces the amount
>> of
>>> intermediate objects and wraps the creation of the G scores into a single
>>> code chunk with clear steps.
>>>
>>> I'd be grateful to hear from the users and developers of the `spdep` and
>>> `sf` packages about this topic!
>>>
>>> Tiernan Martin
>>>
>>>       [[alternative HTML version deleted]]
>>>
>>> _______________________________________________
>>> R-sig-Geo mailing list
>>> [hidden email]
>>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>>
>> --
>> Roger Bivand
>> Department of Economics, Norwegian School of Economics,
>> Helleveien 30, N-5045 Bergen, Norway.
>> voice: +47 55 95 93 55 <+47%2055%2095%2093%2055>; e-mail:
>> [hidden email]
>> Editor-in-Chief of The R Journal, https://journal.r-project.org/index.html
>> http://orcid.org/0000-0003-2392-6140
>> https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en
> --
Roger Bivand
Department of Economics, Norwegian School of Economics,
Helleveien 30, N-5045 Bergen, Norway.
voice: +47 55 95 93 55; e-mail: [hidden email]
Editor-in-Chief of The R Journal, https://journal.r-project.org/index.html
http://orcid.org/0000-0003-2392-6140
https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en
_______________________________________________
R-sig-Geo mailing list
[hidden email]
https://stat.ethz.ch/mailman/listinfo/r-sig-geo Roger Bivand
Department of Economics
Norwegian School of Economics
Helleveien 30
N-5045 Bergen, Norway

Re: Is a simple feature -friendly version of spdep being developed?

Sun, 05/14/2017 - 14:39
You’re right about one thing: those of us introduced to the pipe in our
formative years find it hard to escape (everything is better here inside
the pipe - right?) ;)

Thank you for your very informative response. Before reading it, I had a
vague inkling that matrices were part of the challenge of adapting these
tools to work with simple features, but I also thought perhaps all this
issue needed was a little nudge from a user like me. Your response made it
clear that there is a series of technical decisions that need to be made,
and in order to do that I imagine some exploratory testing is in order. Of
course, it is possible that the data.frame + list-col foundation is just
not well-suited to the needs of spatial weighting/testing tools, but I
suspect that there is some valuable insight to be gained from exploring
this possibility further. And as I stated in my first post, going forward I
expect more users will be interested in seeing this sort of integration
happen.

> How should we go about this technically? spdep is on R-Forge and is happy
> there. Its present functionality has to be maintained as it stands, it
has
> too many reverse dependencies to break. Should it be re-written from
> scratch (with more sparse matrices internally for computation)?
> ...
> We'd need proof of concept with realistically sized data sets (not yet NY
> taxis, but maybe later ...). spdep started as spweights, sptests and
> spdep, and the first two got folded into the third when stable. If
weights
> are the first thing to go for, sfweights is where to go first (and port
> the STRtree envelope intersections for contiguities). It could define the
> new classes, and tests and modelling would use them.

Sounds to me like this package would need to be built from the ground up,
possibly following a similar path to the `sp` development process as you
mentioned. Maybe that presents a challenge with regards to resources (i.e.,
funding, time, etc.). Perhaps project this is a candidate for future
proposals to the R Consortium or other funding sources. I hope that this
post is useful in documenting user interest in seeing the `sf` protocol
integrated into other spatial tools and workflows, and I look forward to
hearing others' thoughts on the matter.

Thanks again,

Tiernan





On Sat, May 13, 2017 at 1:39 PM Roger Bivand <[hidden email]> wrote:

> On Fri, 12 May 2017, Tiernan Martin wrote:
>
> > Is anyone thinking about creating an adaptation of the `spdep` package
> that
> > expects sf-class inputs and works well in a pipeline?
>
> I assume that "this is not a pipeline" is a joke for insiders (in the
> pipe)?
>
> No, there is your issue on the sfr github repository that is relevant for
> contiguous neighbours, but not beyond that:
>
> https://github.com/edzer/sfr/issues/234
>
> An sf is a data.frame, and as such should "just work", like "Spatial"
> objects have, in formula/data settings. The problem is (of course) the
> weights matrix.
>
> Should it be a list column (each row has a list nesting two lists, first
> indices, second non-zero weights), or remain separate as it has been for
> 20 years, or become a column-oriented representation (Matrix package) - a
> nested list like a list column or a listw obeject is row-oriented. I had
> started thinking about using a sparse column-oriented representation, but
> have not implemented functions accepting them instead of listw objects.
>
> I am very cautious about creating classes for data that were data.frame,
> then sf, and then have the weights built-in. In the simple case it would
> work, but all you have to do is re-order the rows and the link between the
> neighbour ids and row order breaks down; the same applies to subsetting.
>
> The problems to solve first are related the workflows, and easiest to look
> at in the univariate case (Moran, Geary, join-count, Mantel, G, ...) for
> global and local tests. I think that all or almost all of the NSE verbs
> will cause chaos (mutate, select, group) once weights have been created.
> If there is a way to bind the neighour IDs to the input sf object rows, a
> list column might be possible, but we have to permit multiple such columns
> (Queen, Rook, ...), and ensure that subsetting and row-reordering keep the
> graph relations intact (and their modifications, like row-standardisation)
>
> If you've seen any examples of how tidy relates to graphs (adjacency lists
> are like nb objects), we could learn from that.
>
> How should we go about this technically? spdep is on R-Forge and is happy
> there. Its present functionality has to be maintained as it stands, it has
> too many reverse dependencies to break. Should it be re-written from
> scratch (with more sparse matrices internally for computation)?
>
> Your example creates weights on the fly for a local G* map, but has n=100,
> not say n=90000 (LA census blocks). Using the sf_ geos based methods does
> not use STRtrees, which cut the time needed to find contigious neighbours
> from days to seconds. We ought to pre-compute weights, but this messes up
> the flow of data, because a second lot of data (the weights) have to enter
> the pipe, and be matched with the row-ids.
>
> We'd need proof of concept with realistically sized data sets (not yet NY
> taxis, but maybe later ...). spdep started as spweights, sptests and
> spdep, and the first two got folded into the third when stable. If weights
> are the first thing to go for, sfweights is where to go first (and port
> the STRtree envelope intersections for contiguities). It could define the
> new classes, and tests and modelling would use them.
>
> Thanks for starting this discussion.
>
> Roger
>
> >
> > I understand that there is skepticism about the wisdom of adopting the
> > “tidyverse” principles throughout the R package ecosystem, and I share
> the
> > concern that an over-reliance on any single paradigm could reduce the
> > resilience and diversity of the system as a whole.
> >
> > That said, I believe that the enthusiastic adoption of the `sf` package
> and
> > the package's connections with widely-used tidyverse packages like
> `dplyr`
> > and `ggplot2` may result in increased demand for sf-friendly spatial
> > analysis tools. As an amateur who recently started using R as my primary
> > GIS tool, it seems like the tidyverse's preference for dataframes, S3
> > objects, list columns, and pipeline workflows would be well-suited to the
> > field of spatial analysis. Are there some fundamental reasons why the
> > `spdep` tools cannot (or should not) be adapted to the tidyverse
> "dialect"?
> >
> > Let me put the question in the context of an actual analysis: in February
> > 2017, the pop culture infovis website The Pudding (https://pudding.cool/
> )
> > published an analysis of regional preferences for Oscar-nominated films
> in
> > the US (https://pudding.cool/2017/02/oscars_so_mapped/). A few days ago,
> > the author posted a tutorial explaining the method of “regional
> smoothing”
> > used to create the article’s choropleths (
> > https://pudding.cool/process/regional_smoothing/).
> >
> > The method relies on several `spdep` functions (
> >
> https://github.com/polygraph-cool/smoothing_tutorial/blob/master/smoothing_tutorial.R
> ).
> > In the code below, I provide reprex with a smaller dataset included in
> the
> > `sf` package:
> >
> > library(sf)
> > library(spdep)
> >
> > nc <- st_read(system.file("shape/nc.shp", package = "sf"))  # North
> > Carolina counties
> > nc_shp <- as(nc,'Spatial')
> >
> > coords <- coordinates(nc_shp)
> > IDs<-row.names(as(nc_shp, "data.frame"))
> >
> > knn5 <- knn2nb(knearneigh(coords, k = 5), row.names = IDs)  # find the
> > nearest neighbors for each county
> > knn5 <- include.self(knn5)
> >
> > localGvalues <- localG(x = as.numeric(nc_shp@data$NWBIR74), listw =
> > nb2listw(knn5, style = "B"), zero.policy = TRUE) # calculate the G scores
> > localGvalues <- round(localGvalues,3)
> >
> > nc_shp@data$LOCAL_G <- as.numeric(localGvalues)
> >
> > p1 <- spplot(nc_shp, c('NWBIR74'))
> > p2 <- spplot(nc_shp, c('LOCAL_G'))
> > plot(p1, split=c(1,1,2,2), more=TRUE)
> > plot(p2, split=c(1,2,2,2), more=TRUE)
> >
> > Here’s what I imagine that would look like in a tidyverse pipeline
> (please
> > note that this code is for illustrative purposes and will not run):
> >
> > library(tidyverse)
> > library(purrr)
> > library(sf)
> > library(sfdep) # this package doesn't exist (yet)
> >
> > nc <- st_read(system.file("shape/nc.shp", package = "sf"))
> >
> > nc_g <-
> >  nc %>%
> >  mutate(KNN = map(.x = geometry, ~ sfdep::st_knn(.x, k = 5, include.self
> =
> > TRUE)),  # find the nearest neighbors for each county
> >         NB_LIST = map(.x = KNN, ~ sfdep::st_nb_list(.x, style = 'B')),  #
> > make a list of the neighbors using the binary method
> >         LOCAL_G = sfdep::st_localG(x = NWBIR74, listw = NB_LIST,
> > zero.policy = TRUE),  # calculate the G scores
> >         LOCAL_G = round(LOCAL_G,3))
> >
> > We can see that the (hypothetical) tidyverse version reduces the amount
> of
> > intermediate objects and wraps the creation of the G scores into a single
> > code chunk with clear steps.
> >
> > I'd be grateful to hear from the users and developers of the `spdep` and
> > `sf` packages about this topic!
> >
> > Tiernan Martin
> >
> >       [[alternative HTML version deleted]]
> >
> > _______________________________________________
> > R-sig-Geo mailing list
> > [hidden email]
> > https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>
> --
> Roger Bivand
> Department of Economics, Norwegian School of Economics,
> Helleveien 30, N-5045 Bergen, Norway.
> voice: +47 55 95 93 55 <+47%2055%2095%2093%2055>; e-mail:
> [hidden email]
> Editor-in-Chief of The R Journal, https://journal.r-project.org/index.html
> http://orcid.org/0000-0003-2392-6140
> https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en
        [[alternative HTML version deleted]]

_______________________________________________
R-sig-Geo mailing list
[hidden email]
https://stat.ethz.ch/mailman/listinfo/r-sig-geo

Re: Is a simple feature -friendly version of spdep being developed?

Sat, 05/13/2017 - 15:39
On Fri, 12 May 2017, Tiernan Martin wrote:

> Is anyone thinking about creating an adaptation of the `spdep` package that
> expects sf-class inputs and works well in a pipeline?

I assume that "this is not a pipeline" is a joke for insiders (in the
pipe)?

No, there is your issue on the sfr github repository that is relevant for
contiguous neighbours, but not beyond that:

https://github.com/edzer/sfr/issues/234

An sf is a data.frame, and as such should "just work", like "Spatial"
objects have, in formula/data settings. The problem is (of course) the
weights matrix.

Should it be a list column (each row has a list nesting two lists, first
indices, second non-zero weights), or remain separate as it has been for
20 years, or become a column-oriented representation (Matrix package) - a
nested list like a list column or a listw obeject is row-oriented. I had
started thinking about using a sparse column-oriented representation, but
have not implemented functions accepting them instead of listw objects.

I am very cautious about creating classes for data that were data.frame,
then sf, and then have the weights built-in. In the simple case it would
work, but all you have to do is re-order the rows and the link between the
neighbour ids and row order breaks down; the same applies to subsetting.

The problems to solve first are related the workflows, and easiest to look
at in the univariate case (Moran, Geary, join-count, Mantel, G, ...) for
global and local tests. I think that all or almost all of the NSE verbs
will cause chaos (mutate, select, group) once weights have been created.
If there is a way to bind the neighour IDs to the input sf object rows, a
list column might be possible, but we have to permit multiple such columns
(Queen, Rook, ...), and ensure that subsetting and row-reordering keep the
graph relations intact (and their modifications, like row-standardisation)

If you've seen any examples of how tidy relates to graphs (adjacency lists
are like nb objects), we could learn from that.

How should we go about this technically? spdep is on R-Forge and is happy
there. Its present functionality has to be maintained as it stands, it has
too many reverse dependencies to break. Should it be re-written from
scratch (with more sparse matrices internally for computation)?

Your example creates weights on the fly for a local G* map, but has n=100,
not say n=90000 (LA census blocks). Using the sf_ geos based methods does
not use STRtrees, which cut the time needed to find contigious neighbours
from days to seconds. We ought to pre-compute weights, but this messes up
the flow of data, because a second lot of data (the weights) have to enter
the pipe, and be matched with the row-ids.

We'd need proof of concept with realistically sized data sets (not yet NY
taxis, but maybe later ...). spdep started as spweights, sptests and
spdep, and the first two got folded into the third when stable. If weights
are the first thing to go for, sfweights is where to go first (and port
the STRtree envelope intersections for contiguities). It could define the
new classes, and tests and modelling would use them.

Thanks for starting this discussion.

Roger

>
> I understand that there is skepticism about the wisdom of adopting the
> “tidyverse” principles throughout the R package ecosystem, and I share the
> concern that an over-reliance on any single paradigm could reduce the
> resilience and diversity of the system as a whole.
>
> That said, I believe that the enthusiastic adoption of the `sf` package and
> the package's connections with widely-used tidyverse packages like `dplyr`
> and `ggplot2` may result in increased demand for sf-friendly spatial
> analysis tools. As an amateur who recently started using R as my primary
> GIS tool, it seems like the tidyverse's preference for dataframes, S3
> objects, list columns, and pipeline workflows would be well-suited to the
> field of spatial analysis. Are there some fundamental reasons why the
> `spdep` tools cannot (or should not) be adapted to the tidyverse "dialect"?
>
> Let me put the question in the context of an actual analysis: in February
> 2017, the pop culture infovis website The Pudding (https://pudding.cool/)
> published an analysis of regional preferences for Oscar-nominated films in
> the US (https://pudding.cool/2017/02/oscars_so_mapped/). A few days ago,
> the author posted a tutorial explaining the method of “regional smoothing”
> used to create the article’s choropleths (
> https://pudding.cool/process/regional_smoothing/).
>
> The method relies on several `spdep` functions (
> https://github.com/polygraph-cool/smoothing_tutorial/blob/master/smoothing_tutorial.R).
> In the code below, I provide reprex with a smaller dataset included in the
> `sf` package:
>
> library(sf)
> library(spdep)
>
> nc <- st_read(system.file("shape/nc.shp", package = "sf"))  # North
> Carolina counties
> nc_shp <- as(nc,'Spatial')
>
> coords <- coordinates(nc_shp)
> IDs<-row.names(as(nc_shp, "data.frame"))
>
> knn5 <- knn2nb(knearneigh(coords, k = 5), row.names = IDs)  # find the
> nearest neighbors for each county
> knn5 <- include.self(knn5)
>
> localGvalues <- localG(x = as.numeric(nc_shp@data$NWBIR74), listw =
> nb2listw(knn5, style = "B"), zero.policy = TRUE) # calculate the G scores
> localGvalues <- round(localGvalues,3)
>
> nc_shp@data$LOCAL_G <- as.numeric(localGvalues)
>
> p1 <- spplot(nc_shp, c('NWBIR74'))
> p2 <- spplot(nc_shp, c('LOCAL_G'))
> plot(p1, split=c(1,1,2,2), more=TRUE)
> plot(p2, split=c(1,2,2,2), more=TRUE)
>
> Here’s what I imagine that would look like in a tidyverse pipeline (please
> note that this code is for illustrative purposes and will not run):
>
> library(tidyverse)
> library(purrr)
> library(sf)
> library(sfdep) # this package doesn't exist (yet)
>
> nc <- st_read(system.file("shape/nc.shp", package = "sf"))
>
> nc_g <-
>  nc %>%
>  mutate(KNN = map(.x = geometry, ~ sfdep::st_knn(.x, k = 5, include.self =
> TRUE)),  # find the nearest neighbors for each county
>         NB_LIST = map(.x = KNN, ~ sfdep::st_nb_list(.x, style = 'B')),  #
> make a list of the neighbors using the binary method
>         LOCAL_G = sfdep::st_localG(x = NWBIR74, listw = NB_LIST,
> zero.policy = TRUE),  # calculate the G scores
>         LOCAL_G = round(LOCAL_G,3))
>
> We can see that the (hypothetical) tidyverse version reduces the amount of
> intermediate objects and wraps the creation of the G scores into a single
> code chunk with clear steps.
>
> I'd be grateful to hear from the users and developers of the `spdep` and
> `sf` packages about this topic!
>
> Tiernan Martin
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> R-sig-Geo mailing list
> [hidden email]
> https://stat.ethz.ch/mailman/listinfo/r-sig-geo--
Roger Bivand
Department of Economics, Norwegian School of Economics,
Helleveien 30, N-5045 Bergen, Norway.
voice: +47 55 95 93 55; e-mail: [hidden email]
Editor-in-Chief of The R Journal, https://journal.r-project.org/index.html
http://orcid.org/0000-0003-2392-6140
https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en
_______________________________________________
R-sig-Geo mailing list
[hidden email]
https://stat.ethz.ch/mailman/listinfo/r-sig-geo Roger Bivand
Department of Economics
Norwegian School of Economics
Helleveien 30
N-5045 Bergen, Norway

Re: Drawing World map divided into 6 economic regions

Fri, 05/12/2017 - 21:08
On 05/12/2017 12:36 PM, Christofer Bogaso wrote:
> Hi,
>
> I previously posted this in General R forum, however experts there
> suggested to post in this forum.
>
> Below is the description of my issue :
>
> I am trying to draw a World map which is divided into 6 Economic
> regions as available in below link
>
> http://www.worldbank.org/en/about/annual-report/regions
>
> I am aware of various R ways to draw World map based on Countries like
> one available in
>
> http://stackoverflow.com/questions/24136868/plot-map-with-values-for-countries-as-color-in-r
>
> However I do not want to put any individual Country boundaries,
> instead the individual boundaries of 6 Economic regions in the World.
>
> Can you please suggest how can I achieve such a World map. Any pointer
> will be highly appreciated.
>
> Thanks for your time.
>
You need a column in your data that indicates the economic zone, then
you aggregate (aka dissolve in other GIS systems)*.

* The raster package has this function.

I usually use http://www.naturalearthdata.com/downloads/ for world scale
maps. It has a column with the UN defined regions.

Enjoy,
Alex

_______________________________________________
R-sig-Geo mailing list
[hidden email]
https://stat.ethz.ch/mailman/listinfo/r-sig-geo

Drawing World map divided into 6 economic regions

Fri, 05/12/2017 - 14:36
Hi,

I previously posted this in General R forum, however experts there
suggested to post in this forum.

Below is the description of my issue :

I am trying to draw a World map which is divided into 6 Economic
regions as available in below link

http://www.worldbank.org/en/about/annual-report/regions

I am aware of various R ways to draw World map based on Countries like
one available in

http://stackoverflow.com/questions/24136868/plot-map-with-values-for-countries-as-color-in-r

However I do not want to put any individual Country boundaries,
instead the individual boundaries of 6 Economic regions in the World.

Can you please suggest how can I achieve such a World map. Any pointer
will be highly appreciated.

Thanks for your time.

_______________________________________________
R-sig-Geo mailing list
[hidden email]
https://stat.ethz.ch/mailman/listinfo/r-sig-geo

Is a simple feature -friendly version of spdep being developed?

Fri, 05/12/2017 - 13:22
Is anyone thinking about creating an adaptation of the `spdep` package that
expects sf-class inputs and works well in a pipeline?

I understand that there is skepticism about the wisdom of adopting the
“tidyverse” principles throughout the R package ecosystem, and I share the
concern that an over-reliance on any single paradigm could reduce the
resilience and diversity of the system as a whole.

That said, I believe that the enthusiastic adoption of the `sf` package and
the package's connections with widely-used tidyverse packages like `dplyr`
and `ggplot2` may result in increased demand for sf-friendly spatial
analysis tools. As an amateur who recently started using R as my primary
GIS tool, it seems like the tidyverse's preference for dataframes, S3
objects, list columns, and pipeline workflows would be well-suited to the
field of spatial analysis. Are there some fundamental reasons why the
`spdep` tools cannot (or should not) be adapted to the tidyverse "dialect"?

Let me put the question in the context of an actual analysis: in February
2017, the pop culture infovis website The Pudding (https://pudding.cool/)
published an analysis of regional preferences for Oscar-nominated films in
the US (https://pudding.cool/2017/02/oscars_so_mapped/). A few days ago,
the author posted a tutorial explaining the method of “regional smoothing”
used to create the article’s choropleths (
https://pudding.cool/process/regional_smoothing/).

The method relies on several `spdep` functions (
https://github.com/polygraph-cool/smoothing_tutorial/blob/master/smoothing_tutorial.R).
In the code below, I provide reprex with a smaller dataset included in the
`sf` package:

library(sf)
library(spdep)

nc <- st_read(system.file("shape/nc.shp", package = "sf"))  # North
Carolina counties
nc_shp <- as(nc,'Spatial')

coords <- coordinates(nc_shp)
IDs<-row.names(as(nc_shp, "data.frame"))

knn5 <- knn2nb(knearneigh(coords, k = 5), row.names = IDs)  # find the
nearest neighbors for each county
knn5 <- include.self(knn5)

localGvalues <- localG(x = as.numeric(nc_shp@data$NWBIR74), listw =
nb2listw(knn5, style = "B"), zero.policy = TRUE) # calculate the G scores
localGvalues <- round(localGvalues,3)

nc_shp@data$LOCAL_G <- as.numeric(localGvalues)

p1 <- spplot(nc_shp, c('NWBIR74'))
p2 <- spplot(nc_shp, c('LOCAL_G'))
plot(p1, split=c(1,1,2,2), more=TRUE)
plot(p2, split=c(1,2,2,2), more=TRUE)

Here’s what I imagine that would look like in a tidyverse pipeline (please
note that this code is for illustrative purposes and will not run):

library(tidyverse)
library(purrr)
library(sf)
library(sfdep) # this package doesn't exist (yet)

nc <- st_read(system.file("shape/nc.shp", package = "sf"))

nc_g <-
  nc %>%
  mutate(KNN = map(.x = geometry, ~ sfdep::st_knn(.x, k = 5, include.self =
TRUE)),  # find the nearest neighbors for each county
         NB_LIST = map(.x = KNN, ~ sfdep::st_nb_list(.x, style = 'B')),  #
make a list of the neighbors using the binary method
         LOCAL_G = sfdep::st_localG(x = NWBIR74, listw = NB_LIST,
zero.policy = TRUE),  # calculate the G scores
         LOCAL_G = round(LOCAL_G,3))

We can see that the (hypothetical) tidyverse version reduces the amount of
intermediate objects and wraps the creation of the G scores into a single
code chunk with clear steps.

I'd be grateful to hear from the users and developers of the `spdep` and
`sf` packages about this topic!

Tiernan Martin

        [[alternative HTML version deleted]]

_______________________________________________
R-sig-Geo mailing list
[hidden email]
https://stat.ethz.ch/mailman/listinfo/r-sig-geo

Re: Fill raster with information if it intersects with a Polygon for ASCii File

Fri, 05/12/2017 - 10:30
Does the output need to be an ASCII Raster or a text file with center
coordinates of each cell and a value?

#Make a raster of value 1
#Mask that raster with the polygon layer, updatevalue=0 to set the area
outside the polygon to 0
mask()
https://www.rdocumentation.org/packages/raster/versions/2.5-8/topics/mask

To save as ASCII raster check the writeRaster (rgdal parameters).

To write a text file 1 line with center coordinates and and values,
rasterToPoints()
 https://www.rdocumentation.org/packages/raster/versions/2.5-8/topics/rasterToPoints
Then put the coords in the DataFrame and write.csv, or use writeOGR with
csv as the output format.


You could also rasterize the polygons, and reclassify. Or make the point
layer from the raster and use over, or intersect to pull values into the
points.

Hope that gives you some options.

-Alex

On 05/12/2017 01:21 AM, Miriam Püts wrote:
> Hi everyone,
>
> I am fairly new at working with raster in R. I have combed through the threats about raster and shape files, but I could find completely what I was looking for.
>
> May I start with the task at hand:
>
> I want to create a raster with geographic choordinates (done with raster function).
> Next I would like to fill this raster with the information, if each cell is intersecting with a shape file (Polygon) or not.
> Next I need to write an ASCii file with the choords, and 1 or 0 for intersecting or not.
>
> The reason why I am doing this is because I am working with Ecospace and I would like to include marine protected areas not by drawing but by reading an ASCii file.
>
> I hope I made clear what I am going for.
>
> Thanks for information!
>
>
> <°)))>< >°)))>< >°)))>< >°)))><
>
> Miriam Püts
> Marine Lebende Ressourcen/ Marine Living Resources
> Thünen-Institut für Seefischerei/ Thünen Institute of Sea fisheries
> Palmaille 9
> 22767 Hamburg (Germany)
>
> Tel:  +49 40 38905-105
> Mail: [hidden email]
>
> _______________________________________________
> R-sig-Geo mailing list
> [hidden email]
> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>
_______________________________________________
R-sig-Geo mailing list
[hidden email]
https://stat.ethz.ch/mailman/listinfo/r-sig-geo

Fill raster with information if it intersects with a Polygon for ASCii File

Fri, 05/12/2017 - 03:21
Hi everyone,

I am fairly new at working with raster in R. I have combed through the threats about raster and shape files, but I could find completely what I was looking for.

May I start with the task at hand:

I want to create a raster with geographic choordinates (done with raster function).
Next I would like to fill this raster with the information, if each cell is intersecting with a shape file (Polygon) or not.
Next I need to write an ASCii file with the choords, and 1 or 0 for intersecting or not.

The reason why I am doing this is because I am working with Ecospace and I would like to include marine protected areas not by drawing but by reading an ASCii file.

I hope I made clear what I am going for.

Thanks for information!


<°)))>< >°)))>< >°)))>< >°)))><

Miriam Püts
Marine Lebende Ressourcen/ Marine Living Resources
Thünen-Institut für Seefischerei/ Thünen Institute of Sea fisheries
Palmaille 9
22767 Hamburg (Germany)

Tel:  +49 40 38905-105
Mail: [hidden email]

_______________________________________________
R-sig-Geo mailing list
[hidden email]
https://stat.ethz.ch/mailman/listinfo/r-sig-geo

Re: Problems With Cartography 2 - Help

Thu, 05/11/2017 - 08:25
Andres,

I'm thinking you want to 'unsimplify' the topology prior to your poly2nb by
a slight negative gBuffer on the Galapagos polygons,
reduce the Galapagos polygons by the buffer and un-share the boundaries.
I'm trying some code, but looking at a mix of these as
guidance:

https://www.rdocumentation.org/packages/rgeos/versions/0.3-22/topics/gBuffer
https://gist.github.com/mstrimas/1b4a4b93a9d4a158bce4
https://gis.stackexchange.com/questions/93096/how-to-perform-a-true-gis-clip-of-polygons-layer-using-a-polygon-layer-in-r

Saludos,
Chris

On Wed, May 10, 2017 at 4:27 AM, Andrés Peralta <[hidden email]> wrote:

> Hi to everyone,
>
> I`m working with the second administrative level cartography from Ecuador
> (available in: http://www.ecuadorencifras.gob.ec//documentos/web-inec/
> Cartografia/2015/Clasificador_Geografico/2012/SHP.zip). The shape file is
> called nxcantones.shp. I`ve been having a lot of problems opening and
> working with this cartography in R; but i can open it in Q-GIS and ARCGIS
> without any problem (I have opened it in both programs and saved it again).
> As the cartography has various objects with the same ID - aparently the
> same areas but repeated; we had to run the following sintax in order to
> have one ID in each area:
>
> *carto2012 <- readOGR("CANTONES2012/2012CLEAN.shp", "2012CLEAN",
> stringsAsFactors=F) *
> *proj4string(carto2012) <- CRS("+proj=utm +zone=17 +south +datum=WGS84
> +units=m +no_defs")*
>
> *carto2012x <- unionSpatialPolygons(carto2012, IDs=carto2012$DPA_CANTON)*
> *proj4string(carto2012x) <- CRS("+proj=utm +zone=17 +south +datum=WGS84
> +units=m +no_defs")*
>
>
> *datos <- data.frame(ID=row.names(carto2012x), stringsAsFactors = F)*
> *row.names(datos) <- row.names(carto2012x)*
> *carto2012x2 <- SpatialPolygonsDataFrame(carto2012x, data=datos)*
>
> *carto2012 <- carto2012x2 *
>
> For our work, we had to make a neighborhood matrix of the cartography.
>
> *x.nb <- poly2nb(carto2012u2) # U2 is the cartography after the recode of
> some areas that had to be joined.*
>
> *summary(x.nb)*
>
> This seems to work fine. The problem is we need to connect the three island
> areas (the Galapagos) but when we try to do so, it connects many areas
> along all the cartography. I've tried to do it manually using *edit.nb* but
> it does not work. I´ve also tried the following sintaxes:
>
> x.nb1 <- x.nb
> which(card(x.nb1)==0) #to discover the id of these areas without
> connections
> id <- function(x){which(carto2012u2$ID==x)}
>
> x.nb1[[id(2001)]] = unique(as.integer(sort(c(x.nb1[[id(2001)]],
> id(2002)))))
> x.nb1[[id(2002)]] = unique(as.integer(sort(c(x.nb1[[id(2002)]],
> id(2001)))))
>
> x.nb1[[id(2001)]] = unique(as.integer(sort(c(x.nb1[[id(2001)]],
> id(2003)))))
> x.nb1[[id(2003)]] = unique(as.integer(sort(c(x.nb1[[id(2003)]],
> id(2001)))))
>
> x.nb1[[id(2002)]] = unique(as.integer(sort(c(x.nb1[[id(2002)]],
> id(2003)))))
> x.nb1[[id(2003)]] = unique(as.integer(sort(c(x.nb1[[id(2003)]],
> id(2002)))))
>
> ####
> x.nb1[[id("2001")]] = unique(as.integer(sort(c(x.nb1[[id("2001")]],
> id("2002")))))
> x.nb1[[id("2002")]] = unique(as.integer(sort( id("2001"))))
>
> x.nb1[[id("2001")]] = unique(as.integer(sort(c(x.nb1[[id("2001")]],
> id("2003")))))
> x.nb1[[id("2003")]] = unique(as.integer(sort( id("2001"))))
>
> x.nb1[[id("2002")]] = unique(as.integer(sort(c(x.nb1[[id("2002")]],
> id("2003")))))
> x.nb1[[id("2003")]] = unique(as.integer(sort( id("2002"))))
>
> Any ideas or suggestions? Your help will really be appreciated.
>
>
> --
>
> *             Andrés Peralta*
>         Pre-doctoral Researcher
>      GREDS/EMCONET / ASPB
> <https://www.upf.edu/greds-emconet/en/>   <http://aspb.cat/>
>
>         [[alternative HTML version deleted]]
>
> _______________________________________________
> R-sig-Geo mailing list
> [hidden email]
> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
        [[alternative HTML version deleted]]

_______________________________________________
R-sig-Geo mailing list
[hidden email]
https://stat.ethz.ch/mailman/listinfo/r-sig-geo

Fwd: help - MC simulations at universal co-kriging

Wed, 05/10/2017 - 07:37
Dear r-sig-geo members,

I am trying to perform Monte-Carlo simulations for a universal co-kriging
system. The script I am using would work with a small dataset, as it is
with the meuse dataset. However, my calibration dataset has ~ 37000
observations and four variables (two providing the trend, two to be
modelled). The dataset where I am trying to do the simulation consists
aproximatelly of 2000 locations. Despite using the closest 20-30
observations for local co-kriging, and testing first with just 10 of the
2000 locations, and even for 2-5 simulations, R seems to not finish the
task.

Should I change anything else to speed up the calculations, or complement
gstat with another package?

Thanks in advance for your help,

Mercedes



data(meuse)
data(meuse.grid)

### check correlation with covariables
cor.test(log10(meuse$lead), log10(meuse$zinc))
cor.test(log10(meuse$lead), (meuse$dist))
cor.test(log10(meuse$zinc), (meuse$dist))

meuse$ltpb <- log10(meuse$lead)
meuse$ltzn <- log10(meuse$zinc)

### Transofrm to spatial
coordinates(meuse) <- ~ x + y
coordinates(meuse.grid) <- ~ x + y


# experimental variogam
v.ltpb <- variogram(ltpb ~ dist, data=meuse, cutoff=1800, width=200)
plot(v.ltpb, pl=T)
# estimate variogram model form and parameters by eye
m.ltpb <- vgm(0.035,"Sph",800,0.015)
plot(v.ltpb, pl=T, model=m.ltpb)
# fit model parameters
m.ltpb.f <- fit.variogram(v.ltpb, m.ltpb)
plot(v.ltpb, pl=T, model=m.ltpb.f)

### Set  local kriging in the gstat object
g <- gstat(NULL, id = "ltpb", form = ltpb ~ dist, data=meuse, nmax=30)
g <- gstat(g, id = "ltzn", form = ltzn ~ dist, data=meuse, nmax=30)

v.cross <- variogram(g)
str(v.cross)
plot(v.cross, pl=F)

g <- gstat(g, id = "ltpb", model = m.ltpb.f, fill.all=T, nmax=30)

g <- fit.lmc(v.cross, g, correct.diagonal = 1.01)
plot(v.cross, model=g$model)

# Universal co-kriging
k.c <- predict(g, meuse.grid)
str(k.c)
plot.kresults(k.c, "ltpb", meuse, "lead", meuse, "UcK with Zn covariable")

#### try MC simulation
MC <- 100
sims <- predict(g, meuse.grid, nsim = MC)

        [[alternative HTML version deleted]]

_______________________________________________
R-sig-Geo mailing list
[hidden email]
https://stat.ethz.ch/mailman/listinfo/r-sig-geo

Re: Problems With Cartography - Help

Wed, 05/10/2017 - 05:01
Thank you Thierry, it worked perfectly.

Best regards,

Andrés

On Wed, May 10, 2017 at 10:25 AM, Thierry Onkelinx <[hidden email]
> wrote:

> Dear Andres,
>
> You'll need spTransform() to reproject the layer into WGS84
>
> carto2012x <- unionSpatialPolygons(carto2012, IDs=carto2012$DPA_CANTON)
> spTransform(carto2012x, CRS("+proj=longlat"))
>
> Best regards,
>
> ir. Thierry Onkelinx
> Instituut voor natuur- en bosonderzoek / Research Institute for Nature and
> Forest
> team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance
> Kliniekstraat 25
> 1070 Anderlecht
> Belgium
>
> To call in the statistician after the experiment is done may be no more
> than asking him to perform a post-mortem examination: he may be able to say
> what the experiment died of. ~ Sir Ronald Aylmer Fisher
> The plural of anecdote is not data. ~ Roger Brinner
> The combination of some data and an aching desire for an answer does not
> ensure that a reasonable answer can be extracted from a given body of data.
> ~ John Tukey
>
> 2017-05-10 10:17 GMT+02:00 Andrés Peralta <[hidden email]>:
>
>> Hi to everyone,
>>
>> I`m working with the second administrative level cartography from Ecuador
>> (available in:
>> http://www.ecuadorencifras.gob.ec//documentos/web-inec/Carto
>> grafia/2015/Clasificador_Geografico/2012/SHP.zip).
>> The shape file is called nxcantones.shp. I`ve been having a lot of
>> problems
>> opening and working with this cartography in R; but i can open it in Q-GIS
>> and ARCGIS without any problem (I have opened it in both programs and
>> saved
>> it again). As the cartography has various objects with the same ID -
>> aparently the same areas but repeated; we had to run the following sintax
>> in order to have one ID in each area:
>>
>> *carto2012 <- readOGR("CANTONES2012/2012CLEAN.shp", "2012CLEAN",
>> stringsAsFactors=F) *
>> *proj4string(carto2012) <- CRS("+proj=utm +zone=17 +south +datum=WGS84
>> +units=m +no_defs")*
>>
>> *carto2012x <- unionSpatialPolygons(carto2012, IDs=carto2012$DPA_CANTON)*
>> *proj4string(carto2012x) <- CRS("+proj=utm +zone=17 +south +datum=WGS84
>> +units=m +no_defs")*
>>
>>
>> *datos <- data.frame(ID=row.names(carto2012x), stringsAsFactors = F)*
>> *row.names(datos) <- row.names(carto2012x)*
>> *carto2012x2 <- SpatialPolygonsDataFrame(carto2012x, data=datos)*
>>
>> *carto2012 <- carto2012x2 *
>>
>> Either using the original (nxcantones) or the modified cartography
>> (carto2012), I can plot the map in R, but I can´t plot it in R Leaflet. It
>> just opens the base map. I seems as the projection is lost in the way or
>> that the cartography has a problem.
>>
>> Any ideas?
>>
>> --
>>
>> *             Andrés Peralta*
>>         Pre-doctoral Researcher
>>      GREDS/EMCONET / ASPB
>> <https://www.upf.edu/greds-emconet/en/>   <http://aspb.cat/>
>>
>>         [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> R-sig-Geo mailing list
>> [hidden email]
>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>
>
>

--

*             Andrés Peralta*
        Pre-doctoral Researcher
     GREDS/EMCONET / ASPB
<https://www.upf.edu/greds-emconet/en/>   <http://aspb.cat/>

        [[alternative HTML version deleted]]

_______________________________________________
R-sig-Geo mailing list
[hidden email]
https://stat.ethz.ch/mailman/listinfo/r-sig-geo

Problems With Cartography 2 - Help

Wed, 05/10/2017 - 03:27
Hi to everyone,

I`m working with the second administrative level cartography from Ecuador
(available in: http://www.ecuadorencifras.gob.ec//documentos/web-inec/
Cartografia/2015/Clasificador_Geografico/2012/SHP.zip). The shape file is
called nxcantones.shp. I`ve been having a lot of problems opening and
working with this cartography in R; but i can open it in Q-GIS and ARCGIS
without any problem (I have opened it in both programs and saved it again).
As the cartography has various objects with the same ID - aparently the
same areas but repeated; we had to run the following sintax in order to
have one ID in each area:

*carto2012 <- readOGR("CANTONES2012/2012CLEAN.shp", "2012CLEAN",
stringsAsFactors=F) *
*proj4string(carto2012) <- CRS("+proj=utm +zone=17 +south +datum=WGS84
+units=m +no_defs")*

*carto2012x <- unionSpatialPolygons(carto2012, IDs=carto2012$DPA_CANTON)*
*proj4string(carto2012x) <- CRS("+proj=utm +zone=17 +south +datum=WGS84
+units=m +no_defs")*


*datos <- data.frame(ID=row.names(carto2012x), stringsAsFactors = F)*
*row.names(datos) <- row.names(carto2012x)*
*carto2012x2 <- SpatialPolygonsDataFrame(carto2012x, data=datos)*

*carto2012 <- carto2012x2 *

For our work, we had to make a neighborhood matrix of the cartography.

*x.nb <- poly2nb(carto2012u2) # U2 is the cartography after the recode of
some areas that had to be joined.*

*summary(x.nb)*

This seems to work fine. The problem is we need to connect the three island
areas (the Galapagos) but when we try to do so, it connects many areas
along all the cartography. I've tried to do it manually using *edit.nb* but
it does not work. I´ve also tried the following sintaxes:

x.nb1 <- x.nb
which(card(x.nb1)==0) #to discover the id of these areas without connections
id <- function(x){which(carto2012u2$ID==x)}

x.nb1[[id(2001)]] = unique(as.integer(sort(c(x.nb1[[id(2001)]], id(2002)))))
x.nb1[[id(2002)]] = unique(as.integer(sort(c(x.nb1[[id(2002)]], id(2001)))))

x.nb1[[id(2001)]] = unique(as.integer(sort(c(x.nb1[[id(2001)]], id(2003)))))
x.nb1[[id(2003)]] = unique(as.integer(sort(c(x.nb1[[id(2003)]], id(2001)))))

x.nb1[[id(2002)]] = unique(as.integer(sort(c(x.nb1[[id(2002)]], id(2003)))))
x.nb1[[id(2003)]] = unique(as.integer(sort(c(x.nb1[[id(2003)]], id(2002)))))

####
x.nb1[[id("2001")]] = unique(as.integer(sort(c(x.nb1[[id("2001")]],
id("2002")))))
x.nb1[[id("2002")]] = unique(as.integer(sort( id("2001"))))

x.nb1[[id("2001")]] = unique(as.integer(sort(c(x.nb1[[id("2001")]],
id("2003")))))
x.nb1[[id("2003")]] = unique(as.integer(sort( id("2001"))))

x.nb1[[id("2002")]] = unique(as.integer(sort(c(x.nb1[[id("2002")]],
id("2003")))))
x.nb1[[id("2003")]] = unique(as.integer(sort( id("2002"))))

Any ideas or suggestions? Your help will really be appreciated.


--

*             Andrés Peralta*
        Pre-doctoral Researcher
     GREDS/EMCONET / ASPB
<https://www.upf.edu/greds-emconet/en/>   <http://aspb.cat/>

        [[alternative HTML version deleted]]

_______________________________________________
R-sig-Geo mailing list
[hidden email]
https://stat.ethz.ch/mailman/listinfo/r-sig-geo

Re: Problems With Cartography - Help

Wed, 05/10/2017 - 03:25
Dear Andres,

You'll need spTransform() to reproject the layer into WGS84

carto2012x <- unionSpatialPolygons(carto2012, IDs=carto2012$DPA_CANTON)
spTransform(carto2012x, CRS("+proj=longlat"))

Best regards,

ir. Thierry Onkelinx
Instituut voor natuur- en bosonderzoek / Research Institute for Nature and
Forest
team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance
Kliniekstraat 25
1070 Anderlecht
Belgium

To call in the statistician after the experiment is done may be no more
than asking him to perform a post-mortem examination: he may be able to say
what the experiment died of. ~ Sir Ronald Aylmer Fisher
The plural of anecdote is not data. ~ Roger Brinner
The combination of some data and an aching desire for an answer does not
ensure that a reasonable answer can be extracted from a given body of data.
~ John Tukey

2017-05-10 10:17 GMT+02:00 Andrés Peralta <[hidden email]>:

> Hi to everyone,
>
> I`m working with the second administrative level cartography from Ecuador
> (available in:
> http://www.ecuadorencifras.gob.ec//documentos/web-inec/
> Cartografia/2015/Clasificador_Geografico/2012/SHP.zip).
> The shape file is called nxcantones.shp. I`ve been having a lot of problems
> opening and working with this cartography in R; but i can open it in Q-GIS
> and ARCGIS without any problem (I have opened it in both programs and saved
> it again). As the cartography has various objects with the same ID -
> aparently the same areas but repeated; we had to run the following sintax
> in order to have one ID in each area:
>
> *carto2012 <- readOGR("CANTONES2012/2012CLEAN.shp", "2012CLEAN",
> stringsAsFactors=F) *
> *proj4string(carto2012) <- CRS("+proj=utm +zone=17 +south +datum=WGS84
> +units=m +no_defs")*
>
> *carto2012x <- unionSpatialPolygons(carto2012, IDs=carto2012$DPA_CANTON)*
> *proj4string(carto2012x) <- CRS("+proj=utm +zone=17 +south +datum=WGS84
> +units=m +no_defs")*
>
>
> *datos <- data.frame(ID=row.names(carto2012x), stringsAsFactors = F)*
> *row.names(datos) <- row.names(carto2012x)*
> *carto2012x2 <- SpatialPolygonsDataFrame(carto2012x, data=datos)*
>
> *carto2012 <- carto2012x2 *
>
> Either using the original (nxcantones) or the modified cartography
> (carto2012), I can plot the map in R, but I can´t plot it in R Leaflet. It
> just opens the base map. I seems as the projection is lost in the way or
> that the cartography has a problem.
>
> Any ideas?
>
> --
>
> *             Andrés Peralta*
>         Pre-doctoral Researcher
>      GREDS/EMCONET / ASPB
> <https://www.upf.edu/greds-emconet/en/>   <http://aspb.cat/>
>
>         [[alternative HTML version deleted]]
>
> _______________________________________________
> R-sig-Geo mailing list
> [hidden email]
> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
        [[alternative HTML version deleted]]

_______________________________________________
R-sig-Geo mailing list
[hidden email]
https://stat.ethz.ch/mailman/listinfo/r-sig-geo

Problems With Cartography - Help

Wed, 05/10/2017 - 03:17
Hi to everyone,

I`m working with the second administrative level cartography from Ecuador
(available in:
http://www.ecuadorencifras.gob.ec//documentos/web-inec/Cartografia/2015/Clasificador_Geografico/2012/SHP.zip).
The shape file is called nxcantones.shp. I`ve been having a lot of problems
opening and working with this cartography in R; but i can open it in Q-GIS
and ARCGIS without any problem (I have opened it in both programs and saved
it again). As the cartography has various objects with the same ID -
aparently the same areas but repeated; we had to run the following sintax
in order to have one ID in each area:

*carto2012 <- readOGR("CANTONES2012/2012CLEAN.shp", "2012CLEAN",
stringsAsFactors=F) *
*proj4string(carto2012) <- CRS("+proj=utm +zone=17 +south +datum=WGS84
+units=m +no_defs")*

*carto2012x <- unionSpatialPolygons(carto2012, IDs=carto2012$DPA_CANTON)*
*proj4string(carto2012x) <- CRS("+proj=utm +zone=17 +south +datum=WGS84
+units=m +no_defs")*


*datos <- data.frame(ID=row.names(carto2012x), stringsAsFactors = F)*
*row.names(datos) <- row.names(carto2012x)*
*carto2012x2 <- SpatialPolygonsDataFrame(carto2012x, data=datos)*

*carto2012 <- carto2012x2 *

Either using the original (nxcantones) or the modified cartography
(carto2012), I can plot the map in R, but I can´t plot it in R Leaflet. It
just opens the base map. I seems as the projection is lost in the way or
that the cartography has a problem.

Any ideas?

--

*             Andrés Peralta*
        Pre-doctoral Researcher
     GREDS/EMCONET / ASPB
<https://www.upf.edu/greds-emconet/en/>   <http://aspb.cat/>

        [[alternative HTML version deleted]]

_______________________________________________
R-sig-Geo mailing list
[hidden email]
https://stat.ethz.ch/mailman/listinfo/r-sig-geo

Re: How does sf do rgeos::gUnaryUnion or maptools::unionSpatialPolygons?

Tue, 05/09/2017 - 13:39
On Tue, 9 May 2017, Edzer Pebesma wrote:

>
>
> On 09/05/17 15:32, Roger Bivand wrote:
>> While https://github.com/edzer/sfr/wiki/migrating is very helpful,
>> rgeos::gUnaryUnion is not the typical use case of sf::st_union. The
>> typical use case, from ASDAR 1st edition, chapter 5 and the maptools
>> "combine_maptools" vignette (the shapefiles are shipped with maptools),
>> is grouping features that should belong to the same statistical entity:
>>
>> library(maptools)
>> vignette("combine_maptools")
>>
>> ###################################################
>> ### chunk number 16:
>> ###################################################
>>
>> library(sf)
>> nc90 <- st_read(system.file("shapes/co37_d90.shp", package = "maptools"))
>> st_crs(nc90) <- "+proj=longlat +datum=NAD27"
>> table(table(paste(nc90$ST, nc90$CO, sep="")))
>>
>> The point is that two counties are represented by multiple features of
>> "POLYGON" objects, rather than single "MULTIPOLYGON" objects, in the
>> input shapefile. I've tried doing:
>>
>> ids <- factor(paste(nc90$ST, nc90$CO, sep=""))
>> nc90a <- st_cast(nc90, to="MULTIPOLYGON", ids=as.integer(ids))
>>
>> but:
>>
>>> dim(nc90a)
>> [1] 104   9
>>
>> all turned into "MULTIPOLYGON", but not grouped, even though I think
>> ids= are as they should be:
>>
>>> table(table(as.integer(ids)))
>>
>>  1  2  4
>> 98  1  1
>
> this is at least documented: nc90 is of class `sf`, and st_cast.sf
> has no ids argument; ... is ignored. If it would merge, it'd need
> some guidance what to do with non-geometry feature attributes.
>
OK, thanks! I was hoping one or other of the many contributors to sf would
jump in, given the amount of time you commit to moving sf forward!

>>
>> This may be to avoid dropping data.frame rows. It looks as though I can
>> get there using:
>>
>> nc90a <- st_cast(st_geometry(nc90), to="MULTIPOLYGON", ids=as.integer(ids))
>>
>>> length(nc90a)
>> [1] 100
>>
>> but all are "MULTIPOLYGON", not a mixture of "POLYGON" and
>> "MULTIPOLYGON" features, and I've no idea which is which - that is how
>> the order of nc90a relates to the counties of nc90. How do I associate
>> the features in nc90a with their county ids? Where do i find the ids I
>> gave to st_cast in nc90a?
>
> st_cast uses base::split, which converts ids to a factor,
> and returns a list with the order of the levels of that factor
> (alphabetically?). Neither convenient, nor documented. I guess more
> intuitive would be to squeeze, but keep original order, right?
>
> I'd suggest to use instead
>
> nc90a = aggregate(nc90, list(ids = ids), head, n = 1)
>
> which returns a GEOMETRY sf, with 2 MULTIPOLYGON and 98 POLYGON. You
> could st_cast that to MULTIPOLYGON.
This works and keeps the output as c("sf", "data.frame"):

ids <- factor(paste(nc90$ST, nc90$CO, sep=""))
t0 <- aggregate(nc90, list(ids = ids), head, n = 1)
nc90a <- t0[, c("ids", attr(t0, "sf_column"))]
rm(t0)
# n is an argument to head saying take the first value in each ids-group

>
> Alternatively:
>
> library(dplyr)
> bind_cols(nc90, ids=ids) %>%
> group_by(ids) %>%
> summarise(ST=head(ST,1), CO=head(CO,1), do_union=FALSE)

This works but returns an object of: c("sf", "tbl_df", "tbl", "data.frame"):

library(dplyr)
nc90b <- summarise(group_by(bind_cols(nc90, data.frame(ids=ids)), ids),
   do_union=FALSE)

or equivalently (for released dplyr):

bind_cols(nc90, data.frame(ids=ids)) %>%
         group_by(ids) %>%
         summarise(do_union=FALSE) -> nc90b

This fails in:

all.equal(nc90b, nc90a, check.attributes=FALSE)
# Error in equal_data_frame(target, current, ignore_col_order =
# ignore_col_order,  :
#  Can't join on 'geometry' x 'geometry' because of incompatible types
# (sfc_GEOMETRY, sfc / sfc_GEOMETRY, sfc)

but:

> all.equal(nc90a, nc90b, check.attributes=FALSE)
[1] TRUE

so being c("tbl_df", "tbl") before "data.frame" makes a difference.

Since I always try to check intermediate results, I only found the missing
data.frame() in bind_cols() by stepping through - is that implicit in the
development version of dplyr (or sf)?

Sometime the maptools vignette will migrate, when I get so far ...

Thanks again,

Roger

>
> converts straight into MULTIPOLYGON.
>
> I'll update the sp -> sf migration wiki.
>
>>
>> What am I missing? I'm writing here rather than raising an issue on GH
>> because others may want to know too, and Edzer needs others to reply for
>> him if they know the answer.
>>
>> Puzzled,
>>
>> Roger
>>
>
>
--
Roger Bivand
Department of Economics, Norwegian School of Economics,
Helleveien 30, N-5045 Bergen, Norway.
voice: +47 55 95 93 55; e-mail: [hidden email]
Editor-in-Chief of The R Journal, https://journal.r-project.org/index.html
http://orcid.org/0000-0003-2392-6140
https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en

_______________________________________________
R-sig-Geo mailing list
[hidden email]
https://stat.ethz.ch/mailman/listinfo/r-sig-geo
Roger Bivand
Department of Economics
Norwegian School of Economics
Helleveien 30
N-5045 Bergen, Norway

Re: How does sf do rgeos::gUnaryUnion or maptools::unionSpatialPolygons?

Tue, 05/09/2017 - 12:11


On 09/05/17 15:32, Roger Bivand wrote:
> While https://github.com/edzer/sfr/wiki/migrating is very helpful,
> rgeos::gUnaryUnion is not the typical use case of sf::st_union. The
> typical use case, from ASDAR 1st edition, chapter 5 and the maptools
> "combine_maptools" vignette (the shapefiles are shipped with maptools),
> is grouping features that should belong to the same statistical entity:
>
> library(maptools)
> vignette("combine_maptools")
>
> ###################################################
> ### chunk number 16:
> ###################################################
>
> library(sf)
> nc90 <- st_read(system.file("shapes/co37_d90.shp", package = "maptools"))
> st_crs(nc90) <- "+proj=longlat +datum=NAD27"
> table(table(paste(nc90$ST, nc90$CO, sep="")))
>
> The point is that two counties are represented by multiple features of
> "POLYGON" objects, rather than single "MULTIPOLYGON" objects, in the
> input shapefile. I've tried doing:
>
> ids <- factor(paste(nc90$ST, nc90$CO, sep=""))
> nc90a <- st_cast(nc90, to="MULTIPOLYGON", ids=as.integer(ids))
>
> but:
>
>> dim(nc90a)
> [1] 104   9
>
> all turned into "MULTIPOLYGON", but not grouped, even though I think
> ids= are as they should be:
>
>> table(table(as.integer(ids)))
>
>  1  2  4
> 98  1  1 this is at least documented: nc90 is of class `sf`, and st_cast.sf
has no ids argument; ... is ignored. If it would merge, it'd need
some guidance what to do with non-geometry feature attributes.

>
> This may be to avoid dropping data.frame rows. It looks as though I can
> get there using:
>
> nc90a <- st_cast(st_geometry(nc90), to="MULTIPOLYGON", ids=as.integer(ids))
>
>> length(nc90a)
> [1] 100
>
> but all are "MULTIPOLYGON", not a mixture of "POLYGON" and
> "MULTIPOLYGON" features, and I've no idea which is which - that is how
> the order of nc90a relates to the counties of nc90. How do I associate
> the features in nc90a with their county ids? Where do i find the ids I
> gave to st_cast in nc90a? st_cast uses base::split, which converts ids to a factor,
and returns a list with the order of the levels of that factor
(alphabetically?). Neither convenient, nor documented. I guess more
intuitive would be to squeeze, but keep original order, right?

I'd suggest to use instead

nc90a = aggregate(nc90, list(ids = ids), head, n = 1)

which returns a GEOMETRY sf, with 2 MULTIPOLYGON and 98 POLYGON. You
could st_cast that to MULTIPOLYGON.

Alternatively:

library(dplyr)
bind_cols(nc90, ids=ids) %>%
        group_by(ids) %>%
        summarise(ST=head(ST,1), CO=head(CO,1), do_union=FALSE)

converts straight into MULTIPOLYGON.

I'll update the sp -> sf migration wiki.

>
> What am I missing? I'm writing here rather than raising an issue on GH
> because others may want to know too, and Edzer needs others to reply for
> him if they know the answer.
>
> Puzzled,
>
> Roger
>

--
Edzer Pebesma
Institute for Geoinformatics  (ifgi),  University of Münster
Heisenbergstraße 2, 48149 Münster, Germany; +49 251 83 33081
Journal of Statistical Software:   http://www.jstatsoft.org/
Computers & Geosciences:   http://elsevier.com/locate/cageo/


_______________________________________________
R-sig-Geo mailing list
[hidden email]
https://stat.ethz.ch/mailman/listinfo/r-sig-geo
signature.asc (484 bytes) Download Attachment

How does sf do rgeos::gUnaryUnion or maptools::unionSpatialPolygons?

Tue, 05/09/2017 - 08:32
While https://github.com/edzer/sfr/wiki/migrating is very helpful,
rgeos::gUnaryUnion is not the typical use case of sf::st_union. The
typical use case, from ASDAR 1st edition, chapter 5 and the maptools
"combine_maptools" vignette (the shapefiles are shipped with maptools), is
grouping features that should belong to the same statistical entity:

library(maptools)
vignette("combine_maptools")

###################################################
### chunk number 16:
###################################################

library(sf)
nc90 <- st_read(system.file("shapes/co37_d90.shp", package = "maptools"))
st_crs(nc90) <- "+proj=longlat +datum=NAD27"
table(table(paste(nc90$ST, nc90$CO, sep="")))

The point is that two counties are represented by multiple features of
"POLYGON" objects, rather than single "MULTIPOLYGON" objects, in the input
shapefile. I've tried doing:

ids <- factor(paste(nc90$ST, nc90$CO, sep=""))
nc90a <- st_cast(nc90, to="MULTIPOLYGON", ids=as.integer(ids))

but:

> dim(nc90a)
[1] 104   9

all turned into "MULTIPOLYGON", but not grouped, even though I think ids=
are as they should be:

> table(table(as.integer(ids)))

  1  2  4
98  1  1

This may be to avoid dropping data.frame rows. It looks as though I can
get there using:

nc90a <- st_cast(st_geometry(nc90), to="MULTIPOLYGON", ids=as.integer(ids))

> length(nc90a)
[1] 100

but all are "MULTIPOLYGON", not a mixture of "POLYGON" and "MULTIPOLYGON"
features, and I've no idea which is which - that is how the order of nc90a
relates to the counties of nc90. How do I associate the features in
nc90a with their county ids? Where do i find the ids I gave to st_cast in
nc90a?

What am I missing? I'm writing here rather than raising an issue on GH
because others may want to know too, and Edzer needs others to reply for
him if they know the answer.

Puzzled,

Roger

--
Roger Bivand
Department of Economics, Norwegian School of Economics,
Helleveien 30, N-5045 Bergen, Norway.
voice: +47 55 95 93 55; e-mail: [hidden email]
Editor-in-Chief of The R Journal, https://journal.r-project.org/index.html
http://orcid.org/0000-0003-2392-6140
https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en

_______________________________________________
R-sig-Geo mailing list
[hidden email]
https://stat.ethz.ch/mailman/listinfo/r-sig-geo
Roger Bivand
Department of Economics
Norwegian School of Economics
Helleveien 30
N-5045 Bergen, Norway

Pages