This is an

**for**__archive__**R-sig-geo**, not a forum. It is not possible to post through Nabble - you may not start a new thread nor follow up an existing thread. If you wish to post, but are not subscribed to the list through the list homepage, subscribe first**(through the list homepage, not via Nabble)**and post from your subscribed email address. Until 2015-06-20, subscribers could post through Nabble, but policy has been changed as too many non-subscribers misunderstood the interface.*Updated:*26 min 53 sec ago

### How to use covariate files for regression kriging

Hi all,

I am new to R and geostatistics.

I have raster files (NDVI and DEM) and I want to use them as covariates for

regression kriging.

Do I need to project them first to same projected coordinate system

and resample

to same cell size before converting them to txt file for use in r.

Thanks

[[alternative HTML version deleted]]

_______________________________________________

R-sig-Geo mailing list

[hidden email]

https://stat.ethz.ch/mailman/listinfo/r-sig-geo

I am new to R and geostatistics.

I have raster files (NDVI and DEM) and I want to use them as covariates for

regression kriging.

Do I need to project them first to same projected coordinate system

and resample

to same cell size before converting them to txt file for use in r.

Thanks

[[alternative HTML version deleted]]

_______________________________________________

R-sig-Geo mailing list

[hidden email]

https://stat.ethz.ch/mailman/listinfo/r-sig-geo

### Re: How to find all first order neighbors of a collection of points

On Fri, 13 Jul 2018, Benjamin Lieberman wrote:

> Roger-

>

> Thank you so much for the help. In our case, first order neighbors are

> all neighbors who are adjacent to a voter. Second order neighbors are

> then all neighbors who are adjacent to the first order neighbors. Hope

> that this could clarify what I have been referencing this time.

So you need to define what you mean by adjacent for the purposes of your

study. This depends on knowing the underlying behavioural patterns

affecting interaction.

Roger

>

> I will try the method you suggested, thank you.

>

> Best,

> Ben

> --

> Benjamin Lieberman

> Muhlenberg College 2019

> Mobile: 301.299.8928

>

>> On Jul 13, 2018, at 7:30 AM, Roger Bivand <[hidden email]> wrote:

>>

>> On Fri, 13 Jul 2018, Benjamin Lieberman wrote:

>>

>>> All-

>>>

>>> I would like to note that as the data is proprietary, and for obvious privacy concerns, the lat/long pairs were randomly generated, and were not taken directly from the data.

>>

>> Thanks for the clarification. Note that if the data are a sample, that is not a complete listing for one or more study areas, you don't know who the first order neighbour (the most proximate other voter) is, because that indidivual may not be in the sample. Your fallback then is to treat the data as aggregates, unless you rule out local sampling variability.

>>

>> Roger

>>

>>>

>>>

>>> --

>>> Benjamin Lieberman

>>> Muhlenberg College 2019

>>> Mobile: 301.299.8928

>>>

>>>> On Jul 13, 2018, at 6:58 AM, Benjamin Lieberman <[hidden email]> wrote:

>>>>

>>>> Roger anf Facu,

>>>>

>>>> Thank you very much for the help. In terms of the data, I only provided the ID and Lat/Long pairs because they were the only covariates which were necessary. The data set we are using was purchased and contains voter registration information, voter history, and census tract information, after some geocoding took place. The locations are the residents houses, in this instance.

>>>>

>>>> I have rerun the knn with longlat = T, but I am still hung up on the idea of the first order neighbors. I have reread the vignette and section 5 discusses High-Order Neighbors, but there isn’t any mention of first or second order neighbors, as you mentioned above (“first order neighbors are not defined”). One of the pieces of literature I found said that polygons are problematic to work with, as are tesslations for precisely the reason you mentioned, non-planarity. For this reason, I am hung up on the idea of how to find all first order neighbors for a point, especially as the number of first order neighbors varies from point to point, and such knearneigh would not be appropriate here.

>>>>

>>>> If this is something that does not seem feasible, maybe another tactic is necessary.

>>>>

>>>> Again, thank you all for the help.

>>>>

>>>> Warmest

>>>> --

>>>> Benjamin Lieberman

>>>> Muhlenberg College 2019

>>>> Mobile: 301.299.8928

>>>>

>>>>> On Jul 13, 2018, at 6:11 AM, Roger Bivand <[hidden email] <mailto:[hidden email]> <mailto:[hidden email] <mailto:[hidden email]>>> wrote:

>>>>>

>>>>> On Fri, 13 Jul 2018, Facundo Muñoz wrote:

>>>>>

>>>>>> Dear Benjamin,

>>>>>>

>>>>>> I'm not sure how you define "first order neighbors" for a point. The

>>>>>> first thing that comes to my mind is to use their corresponding voronoi

>>>>>> polygons and define neighborhood from there. Following your code:

>>>>>

>>>>> Thanks, the main source of confusion is that "first order neighbors" are not defined. A k=1 neighbour could be (as below), as could k=6, or voronoi neighbours, or sphere of influence etc. So reading vignette("nb") would be a starting point.

>>>>>

>>>>> Also note that voronoi and other graph-based neighbours should only use planar coordinates - including dismo::voronoi, which uses deldir::deldir() - just like spdep::tri2nb(). Triangulation can lead to spurious neighbours on the convex hull.

>>>>>

>>>>>>

>>>>>> v <- dismo::voronoi(coords)

>>>>>> par(mfrow = c(1, 2), xaxt = "n", yaxt = "n", mgp = c(0, 0, 0))

>>>>>> plot(coords, type = "n", xlab = NA, ylab = NA)

>>>>>> plot(v, add = TRUE)

>>>>>> text(x = coords[, 1], y = coords[, 2], labels = voter.subset$Voter.ID)

>>>>>> plot(coords, type = "n", xlab = NA, ylab = NA)

>>>>>> plot(poly2nb(v), coords, add = TRUE, col = "gray")

>>>>>>

>>>>>> ƒacu.-

>>>>>>

>>>>>>

>>>>>> On 07/12/2018 09:00 PM, Benjamin Lieberman wrote:

>>>>>>> Hi all,

>>>>>>>

>>>>>>> Currently, I am working with U.S. voter data. Below, I included a brief example of the structure of the data with some reproducible code. My data set consists of roughly 233,000 (233k) entries, each specifying a voter and their particular latitude/longitude pair.

>>>>>

>>>>> Using individual voter data is highly dangerous, and must in every case be subject to the strictest privacy rules. Voter data does not in essence have position - the only valid voting data that has position is of the voting station/precinct, and those data are aggregated to preserve anonymity.

>>>>>

>>>>> Why does position and voter data not have position? Which location should you use - residence, workplace, what? What are these locations proxying? Nothing valid can be drawn from "just voter data" - you can get conclusions from carefully constructed stratified exit polls, but there the key gender/age/ethnicity/social class/etc. confounders are handled by design. Why should voting decisions be influenced by proximity (they are not)? The missing element here is looking carefully at relevant covariates at more aggregated levels (in the US typically zoning controlling social class positional segregation, etc.).

>>>>>

>>>>>>> I have been using the spdep package with the hope of creating a CAR model. To begin the analysis, we need to find all first order neighbors of every point in the data.

>>>>>>>

>>>>>>> While spdep has fantastic commands for finding k nearest neighbors (knearneigh), and a useful command for finding lag of order 3 or more (nblag), I have yet to find a method which is suitable for our purposes (lag = 1, or lag =2). Additionally, I looked into altering the nblag command to accommodate maxlag = 1 or maxlag = 2, but the command relies on an nb format, which is problematic as we are looking for the underlying neighborhood structure.

>>>>>>>

>>>>>>> There has been numerous work done with polygons, or data which already is in “nb” format, but after reading the literature, it seems that polygons are not appropriate, nor are distance based neighbor techniques, due to density fluctuations over the area of interest.

>>>>>>>

>>>>>>> Below is some reproducible code I wrote. I would like to note that I am currently working in R 1.1.453 on a MacBook.

>>>>>

>>>>> You mean RStudio, there is no such version of R.

>>>>>

>>>>>>>

>>>>>>> # Create a data frame of 10 voters, picked at random

>>>>>>> voter.1 = c(1, -75.52187, 40.62320)

>>>>>>> voter.2 = c(2,-75.56373, 40.55216)

>>>>>>> voter.3 = c(3,-75.39587, 40.55416)

>>>>>>> voter.4 = c(4,-75.42248, 40.64326)

>>>>>>> voter.5 = c(5,-75.56654, 40.54948)

>>>>>>> voter.6 = c(6,-75.56257, 40.67375)

>>>>>>> voter.7 = c(7, -75.51888, 40.59715)

>>>>>>> voter.8 = c(8, -75.59879, 40.60014)

>>>>>>> voter.9 = c(9, -75.59879, 40.60014)

>>>>>>> voter.10 = c(10, -75.50877, 40.53129)

>>>>>>>

>>>>>

>>>>> These are in geographical coordinates.

>>>>>

>>>>>>> # Bind the vectors together

>>>>>>> voter.subset = rbind(voter.1, voter.2, voter.3, voter.4, voter.5, voter.6, voter.7, voter.8, voter.9, voter.10)

>>>>>>>

>>>>>>> # Rename the columns

>>>>>>> colnames(voter.subset) = c("Voter.ID", "Longitude", "Latitude")

>>>>>>>

>>>>>>> # Change the class from a matrix to a data frame

>>>>>>> voter.subset = as.data.frame(voter.subset)

>>>>>>>

>>>>>>> # Load in the required packages

>>>>>>> library(spdep)

>>>>>>> library(sp)

>>>>>>>

>>>>>>> # Set the coordinates

>>>>>>> coordinates(voter.subset) = c("Longitude", "Latitude")

>>>>>>> coords = coordinates(voter.subset)

>>>>>>>

>>>>>>> # Jitter to ensure no duplicate points

>>>>>>> coords = jitter(coords, factor = 1)

>>>>>>>

>>>>>

>>>>> jitter does not respect geographical coordinated (decimal degree metric).

>>>>>

>>>>>>> # Find the first nearest neighbor of each point

>>>>>>> one.nn = knearneigh(coords, k=1)

>>>>>

>>>>> See the help page (hint: longlat=TRUE to use Great Circle distances, much slower than planar).

>>>>>

>>>>>>>

>>>>>>> # Convert the first nearest neighbor to format "nb"

>>>>>>> one.nn_nb = knn2nb(one.nn, sym = F)

>>>>>>>

>>>>>>> Thank you in advance for any help you may offer, and for taking the time to read this. I have consulted Applied Spatial Data Analysis with R (Bivand, Pebesma, Gomez-Rubio), as well as other Sig-Geo threads, the spdep documentation, and the nb vignette (Bivand, April 3, 2018) from earlier this year.

>>>>>>>

>>>>>>> Warmest,

>>>>>>> Ben

>>>>>>> --

>>>>>>> Benjamin Lieberman

>>>>>>> Muhlenberg College 2019

>>>>>>> Mobile: 301.299.8928

>>>>>>>

>>>>>>>

>>>>>>>

>>>>>>> [[alternative HTML version deleted]]

>>>>>

>>>>> Plain text only, please.

>>>>>

>>>>>>>

>>>>>>> _______________________________________________

>>>>>>> R-sig-Geo mailing list

>>>>>>> [hidden email] <mailto:[hidden email]> <mailto:[hidden email] <mailto:[hidden email]>>

>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo <https://stat.ethz.ch/mailman/listinfo/r-sig-geo> <https://stat.ethz.ch/mailman/listinfo/r-sig-geo <https://stat.ethz.ch/mailman/listinfo/r-sig-geo>>

>>>>>>

>>>>>>

>>>>>> [[alternative HTML version deleted]]

>>>>>>

>>>>>> _______________________________________________

>>>>>> R-sig-Geo mailing list

>>>>>> [hidden email] <mailto:[hidden email]> <mailto:[hidden email] <mailto:[hidden email]>>

>>>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo <https://stat.ethz.ch/mailman/listinfo/r-sig-geo> <https://stat.ethz.ch/mailman/listinfo/r-sig-geo <https://stat.ethz.ch/mailman/listinfo/r-sig-geo>>

>>>>>>

>>>>>

>>>>> --

>>>>> Roger Bivand

>>>>> Department of Economics, Norwegian School of Economics,

>>>>> Helleveien 30, N-5045 Bergen, Norway.

>>>>> voice: +47 55 95 93 55; e-mail: [hidden email] <mailto:[hidden email]> <mailto:[hidden email] <mailto:[hidden email]>>

>>>>> http://orcid.org/0000-0003-2392-6140 <http://orcid.org/0000-0003-2392-6140> <http://orcid.org/0000-0003-2392-6140 <http://orcid.org/0000-0003-2392-6140>>

>>>>> https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en_______________________________________________ <https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en_______________________________________________><https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en_______________________________________________ <https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en_______________________________________________>>

>>>>> R-sig-Geo mailing list

>>>>> [hidden email] <mailto:[hidden email]> <mailto:[hidden email] <mailto:[hidden email]>>

>>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo <https://stat.ethz.ch/mailman/listinfo/r-sig-geo> <https://stat.ethz.ch/mailman/listinfo/r-sig-geo <https://stat.ethz.ch/mailman/listinfo/r-sig-geo>>

>>>

>>>

>>> [[alternative HTML version deleted]]

>>>

>>> _______________________________________________

>>> R-sig-Geo mailing list

>>> [hidden email] <mailto:[hidden email]>

>>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo <https://stat.ethz.ch/mailman/listinfo/r-sig-geo>

>>>

>>

>> --

>> Roger Bivand

>> Department of Economics, Norwegian School of Economics,

>> Helleveien 30, N-5045 Bergen, Norway.

>> voice: +47 55 95 93 55; e-mail: [hidden email] <mailto:[hidden email]>

>> http://orcid.org/0000-0003-2392-6140 <http://orcid.org/0000-0003-2392-6140>

>> https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en <https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en>

>

> [[alternative HTML version deleted]]

>

> _______________________________________________

> R-sig-Geo mailing list

> [hidden email]

> https://stat.ethz.ch/mailman/listinfo/r-sig-geo

> --

Roger Bivand

Department of Economics, Norwegian School of Economics,

Helleveien 30, N-5045 Bergen, Norway.

voice: +47 55 95 93 55; e-mail: [hidden email]

http://orcid.org/0000-0003-2392-6140

https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en

_______________________________________________

R-sig-Geo mailing list

[hidden email]

https://stat.ethz.ch/mailman/listinfo/r-sig-geo

Roger Bivand

Department of Economics

Norwegian School of Economics

Helleveien 30

N-5045 Bergen, Norway

> Roger-

>

> Thank you so much for the help. In our case, first order neighbors are

> all neighbors who are adjacent to a voter. Second order neighbors are

> then all neighbors who are adjacent to the first order neighbors. Hope

> that this could clarify what I have been referencing this time.

So you need to define what you mean by adjacent for the purposes of your

study. This depends on knowing the underlying behavioural patterns

affecting interaction.

Roger

>

> I will try the method you suggested, thank you.

>

> Best,

> Ben

> --

> Benjamin Lieberman

> Muhlenberg College 2019

> Mobile: 301.299.8928

>

>> On Jul 13, 2018, at 7:30 AM, Roger Bivand <[hidden email]> wrote:

>>

>> On Fri, 13 Jul 2018, Benjamin Lieberman wrote:

>>

>>> All-

>>>

>>> I would like to note that as the data is proprietary, and for obvious privacy concerns, the lat/long pairs were randomly generated, and were not taken directly from the data.

>>

>> Thanks for the clarification. Note that if the data are a sample, that is not a complete listing for one or more study areas, you don't know who the first order neighbour (the most proximate other voter) is, because that indidivual may not be in the sample. Your fallback then is to treat the data as aggregates, unless you rule out local sampling variability.

>>

>> Roger

>>

>>>

>>>

>>> --

>>> Benjamin Lieberman

>>> Muhlenberg College 2019

>>> Mobile: 301.299.8928

>>>

>>>> On Jul 13, 2018, at 6:58 AM, Benjamin Lieberman <[hidden email]> wrote:

>>>>

>>>> Roger anf Facu,

>>>>

>>>> Thank you very much for the help. In terms of the data, I only provided the ID and Lat/Long pairs because they were the only covariates which were necessary. The data set we are using was purchased and contains voter registration information, voter history, and census tract information, after some geocoding took place. The locations are the residents houses, in this instance.

>>>>

>>>> I have rerun the knn with longlat = T, but I am still hung up on the idea of the first order neighbors. I have reread the vignette and section 5 discusses High-Order Neighbors, but there isn’t any mention of first or second order neighbors, as you mentioned above (“first order neighbors are not defined”). One of the pieces of literature I found said that polygons are problematic to work with, as are tesslations for precisely the reason you mentioned, non-planarity. For this reason, I am hung up on the idea of how to find all first order neighbors for a point, especially as the number of first order neighbors varies from point to point, and such knearneigh would not be appropriate here.

>>>>

>>>> If this is something that does not seem feasible, maybe another tactic is necessary.

>>>>

>>>> Again, thank you all for the help.

>>>>

>>>> Warmest

>>>> --

>>>> Benjamin Lieberman

>>>> Muhlenberg College 2019

>>>> Mobile: 301.299.8928

>>>>

>>>>> On Jul 13, 2018, at 6:11 AM, Roger Bivand <[hidden email] <mailto:[hidden email]> <mailto:[hidden email] <mailto:[hidden email]>>> wrote:

>>>>>

>>>>> On Fri, 13 Jul 2018, Facundo Muñoz wrote:

>>>>>

>>>>>> Dear Benjamin,

>>>>>>

>>>>>> I'm not sure how you define "first order neighbors" for a point. The

>>>>>> first thing that comes to my mind is to use their corresponding voronoi

>>>>>> polygons and define neighborhood from there. Following your code:

>>>>>

>>>>> Thanks, the main source of confusion is that "first order neighbors" are not defined. A k=1 neighbour could be (as below), as could k=6, or voronoi neighbours, or sphere of influence etc. So reading vignette("nb") would be a starting point.

>>>>>

>>>>> Also note that voronoi and other graph-based neighbours should only use planar coordinates - including dismo::voronoi, which uses deldir::deldir() - just like spdep::tri2nb(). Triangulation can lead to spurious neighbours on the convex hull.

>>>>>

>>>>>>

>>>>>> v <- dismo::voronoi(coords)

>>>>>> par(mfrow = c(1, 2), xaxt = "n", yaxt = "n", mgp = c(0, 0, 0))

>>>>>> plot(coords, type = "n", xlab = NA, ylab = NA)

>>>>>> plot(v, add = TRUE)

>>>>>> text(x = coords[, 1], y = coords[, 2], labels = voter.subset$Voter.ID)

>>>>>> plot(coords, type = "n", xlab = NA, ylab = NA)

>>>>>> plot(poly2nb(v), coords, add = TRUE, col = "gray")

>>>>>>

>>>>>> ƒacu.-

>>>>>>

>>>>>>

>>>>>> On 07/12/2018 09:00 PM, Benjamin Lieberman wrote:

>>>>>>> Hi all,

>>>>>>>

>>>>>>> Currently, I am working with U.S. voter data. Below, I included a brief example of the structure of the data with some reproducible code. My data set consists of roughly 233,000 (233k) entries, each specifying a voter and their particular latitude/longitude pair.

>>>>>

>>>>> Using individual voter data is highly dangerous, and must in every case be subject to the strictest privacy rules. Voter data does not in essence have position - the only valid voting data that has position is of the voting station/precinct, and those data are aggregated to preserve anonymity.

>>>>>

>>>>> Why does position and voter data not have position? Which location should you use - residence, workplace, what? What are these locations proxying? Nothing valid can be drawn from "just voter data" - you can get conclusions from carefully constructed stratified exit polls, but there the key gender/age/ethnicity/social class/etc. confounders are handled by design. Why should voting decisions be influenced by proximity (they are not)? The missing element here is looking carefully at relevant covariates at more aggregated levels (in the US typically zoning controlling social class positional segregation, etc.).

>>>>>

>>>>>>> I have been using the spdep package with the hope of creating a CAR model. To begin the analysis, we need to find all first order neighbors of every point in the data.

>>>>>>>

>>>>>>> While spdep has fantastic commands for finding k nearest neighbors (knearneigh), and a useful command for finding lag of order 3 or more (nblag), I have yet to find a method which is suitable for our purposes (lag = 1, or lag =2). Additionally, I looked into altering the nblag command to accommodate maxlag = 1 or maxlag = 2, but the command relies on an nb format, which is problematic as we are looking for the underlying neighborhood structure.

>>>>>>>

>>>>>>> There has been numerous work done with polygons, or data which already is in “nb” format, but after reading the literature, it seems that polygons are not appropriate, nor are distance based neighbor techniques, due to density fluctuations over the area of interest.

>>>>>>>

>>>>>>> Below is some reproducible code I wrote. I would like to note that I am currently working in R 1.1.453 on a MacBook.

>>>>>

>>>>> You mean RStudio, there is no such version of R.

>>>>>

>>>>>>>

>>>>>>> # Create a data frame of 10 voters, picked at random

>>>>>>> voter.1 = c(1, -75.52187, 40.62320)

>>>>>>> voter.2 = c(2,-75.56373, 40.55216)

>>>>>>> voter.3 = c(3,-75.39587, 40.55416)

>>>>>>> voter.4 = c(4,-75.42248, 40.64326)

>>>>>>> voter.5 = c(5,-75.56654, 40.54948)

>>>>>>> voter.6 = c(6,-75.56257, 40.67375)

>>>>>>> voter.7 = c(7, -75.51888, 40.59715)

>>>>>>> voter.8 = c(8, -75.59879, 40.60014)

>>>>>>> voter.9 = c(9, -75.59879, 40.60014)

>>>>>>> voter.10 = c(10, -75.50877, 40.53129)

>>>>>>>

>>>>>

>>>>> These are in geographical coordinates.

>>>>>

>>>>>>> # Bind the vectors together

>>>>>>> voter.subset = rbind(voter.1, voter.2, voter.3, voter.4, voter.5, voter.6, voter.7, voter.8, voter.9, voter.10)

>>>>>>>

>>>>>>> # Rename the columns

>>>>>>> colnames(voter.subset) = c("Voter.ID", "Longitude", "Latitude")

>>>>>>>

>>>>>>> # Change the class from a matrix to a data frame

>>>>>>> voter.subset = as.data.frame(voter.subset)

>>>>>>>

>>>>>>> # Load in the required packages

>>>>>>> library(spdep)

>>>>>>> library(sp)

>>>>>>>

>>>>>>> # Set the coordinates

>>>>>>> coordinates(voter.subset) = c("Longitude", "Latitude")

>>>>>>> coords = coordinates(voter.subset)

>>>>>>>

>>>>>>> # Jitter to ensure no duplicate points

>>>>>>> coords = jitter(coords, factor = 1)

>>>>>>>

>>>>>

>>>>> jitter does not respect geographical coordinated (decimal degree metric).

>>>>>

>>>>>>> # Find the first nearest neighbor of each point

>>>>>>> one.nn = knearneigh(coords, k=1)

>>>>>

>>>>> See the help page (hint: longlat=TRUE to use Great Circle distances, much slower than planar).

>>>>>

>>>>>>>

>>>>>>> # Convert the first nearest neighbor to format "nb"

>>>>>>> one.nn_nb = knn2nb(one.nn, sym = F)

>>>>>>>

>>>>>>> Thank you in advance for any help you may offer, and for taking the time to read this. I have consulted Applied Spatial Data Analysis with R (Bivand, Pebesma, Gomez-Rubio), as well as other Sig-Geo threads, the spdep documentation, and the nb vignette (Bivand, April 3, 2018) from earlier this year.

>>>>>>>

>>>>>>> Warmest,

>>>>>>> Ben

>>>>>>> --

>>>>>>> Benjamin Lieberman

>>>>>>> Muhlenberg College 2019

>>>>>>> Mobile: 301.299.8928

>>>>>>>

>>>>>>>

>>>>>>>

>>>>>>> [[alternative HTML version deleted]]

>>>>>

>>>>> Plain text only, please.

>>>>>

>>>>>>>

>>>>>>> _______________________________________________

>>>>>>> R-sig-Geo mailing list

>>>>>>> [hidden email] <mailto:[hidden email]> <mailto:[hidden email] <mailto:[hidden email]>>

>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo <https://stat.ethz.ch/mailman/listinfo/r-sig-geo> <https://stat.ethz.ch/mailman/listinfo/r-sig-geo <https://stat.ethz.ch/mailman/listinfo/r-sig-geo>>

>>>>>>

>>>>>>

>>>>>> [[alternative HTML version deleted]]

>>>>>>

>>>>>> _______________________________________________

>>>>>> R-sig-Geo mailing list

>>>>>> [hidden email] <mailto:[hidden email]> <mailto:[hidden email] <mailto:[hidden email]>>

>>>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo <https://stat.ethz.ch/mailman/listinfo/r-sig-geo> <https://stat.ethz.ch/mailman/listinfo/r-sig-geo <https://stat.ethz.ch/mailman/listinfo/r-sig-geo>>

>>>>>>

>>>>>

>>>>> --

>>>>> Roger Bivand

>>>>> Department of Economics, Norwegian School of Economics,

>>>>> Helleveien 30, N-5045 Bergen, Norway.

>>>>> voice: +47 55 95 93 55; e-mail: [hidden email] <mailto:[hidden email]> <mailto:[hidden email] <mailto:[hidden email]>>

>>>>> http://orcid.org/0000-0003-2392-6140 <http://orcid.org/0000-0003-2392-6140> <http://orcid.org/0000-0003-2392-6140 <http://orcid.org/0000-0003-2392-6140>>

>>>>> https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en_______________________________________________ <https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en_______________________________________________><https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en_______________________________________________ <https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en_______________________________________________>>

>>>>> R-sig-Geo mailing list

>>>>> [hidden email] <mailto:[hidden email]> <mailto:[hidden email] <mailto:[hidden email]>>

>>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo <https://stat.ethz.ch/mailman/listinfo/r-sig-geo> <https://stat.ethz.ch/mailman/listinfo/r-sig-geo <https://stat.ethz.ch/mailman/listinfo/r-sig-geo>>

>>>

>>>

>>> [[alternative HTML version deleted]]

>>>

>>> _______________________________________________

>>> R-sig-Geo mailing list

>>> [hidden email] <mailto:[hidden email]>

>>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo <https://stat.ethz.ch/mailman/listinfo/r-sig-geo>

>>>

>>

>> --

>> Roger Bivand

>> Department of Economics, Norwegian School of Economics,

>> Helleveien 30, N-5045 Bergen, Norway.

>> voice: +47 55 95 93 55; e-mail: [hidden email] <mailto:[hidden email]>

>> http://orcid.org/0000-0003-2392-6140 <http://orcid.org/0000-0003-2392-6140>

>> https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en <https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en>

>

> [[alternative HTML version deleted]]

>

> _______________________________________________

> R-sig-Geo mailing list

> [hidden email]

> https://stat.ethz.ch/mailman/listinfo/r-sig-geo

> --

Roger Bivand

Department of Economics, Norwegian School of Economics,

Helleveien 30, N-5045 Bergen, Norway.

voice: +47 55 95 93 55; e-mail: [hidden email]

http://orcid.org/0000-0003-2392-6140

https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en

_______________________________________________

R-sig-Geo mailing list

[hidden email]

https://stat.ethz.ch/mailman/listinfo/r-sig-geo

Roger Bivand

Department of Economics

Norwegian School of Economics

Helleveien 30

N-5045 Bergen, Norway

### Re: How to find all first order neighbors of a collection of points

Roger-

Thank you so much for the help. In our case, first order neighbors are all neighbors who are adjacent to a voter. Second order neighbors are then all neighbors who are adjacent to the first order neighbors. Hope that this could clarify what I have been referencing this time.

I will try the method you suggested, thank you.

Best,

Ben

--

Benjamin Lieberman

Muhlenberg College 2019

Mobile: 301.299.8928

> On Jul 13, 2018, at 7:30 AM, Roger Bivand <[hidden email]> wrote:

>

> On Fri, 13 Jul 2018, Benjamin Lieberman wrote:

>

>> All-

>>

>> I would like to note that as the data is proprietary, and for obvious privacy concerns, the lat/long pairs were randomly generated, and were not taken directly from the data.

>

> Thanks for the clarification. Note that if the data are a sample, that is not a complete listing for one or more study areas, you don't know who the first order neighbour (the most proximate other voter) is, because that indidivual may not be in the sample. Your fallback then is to treat the data as aggregates, unless you rule out local sampling variability.

>

> Roger

>

>>

>>

>> --

>> Benjamin Lieberman

>> Muhlenberg College 2019

>> Mobile: 301.299.8928

>>

>>> On Jul 13, 2018, at 6:58 AM, Benjamin Lieberman <[hidden email]> wrote:

>>>

>>> Roger anf Facu,

>>>

>>> Thank you very much for the help. In terms of the data, I only provided the ID and Lat/Long pairs because they were the only covariates which were necessary. The data set we are using was purchased and contains voter registration information, voter history, and census tract information, after some geocoding took place. The locations are the residents houses, in this instance.

>>>

>>> I have rerun the knn with longlat = T, but I am still hung up on the idea of the first order neighbors. I have reread the vignette and section 5 discusses High-Order Neighbors, but there isn’t any mention of first or second order neighbors, as you mentioned above (“first order neighbors are not defined”). One of the pieces of literature I found said that polygons are problematic to work with, as are tesslations for precisely the reason you mentioned, non-planarity. For this reason, I am hung up on the idea of how to find all first order neighbors for a point, especially as the number of first order neighbors varies from point to point, and such knearneigh would not be appropriate here.

>>>

>>> If this is something that does not seem feasible, maybe another tactic is necessary.

>>>

>>> Again, thank you all for the help.

>>>

>>> Warmest

>>> --

>>> Benjamin Lieberman

>>> Muhlenberg College 2019

>>> Mobile: 301.299.8928

>>>

>>>> On Jul 13, 2018, at 6:11 AM, Roger Bivand <[hidden email] <mailto:[hidden email]> <mailto:[hidden email] <mailto:[hidden email]>>> wrote:

>>>>

>>>> On Fri, 13 Jul 2018, Facundo Muñoz wrote:

>>>>

>>>>> Dear Benjamin,

>>>>>

>>>>> I'm not sure how you define "first order neighbors" for a point. The

>>>>> first thing that comes to my mind is to use their corresponding voronoi

>>>>> polygons and define neighborhood from there. Following your code:

>>>>

>>>> Thanks, the main source of confusion is that "first order neighbors" are not defined. A k=1 neighbour could be (as below), as could k=6, or voronoi neighbours, or sphere of influence etc. So reading vignette("nb") would be a starting point.

>>>>

>>>> Also note that voronoi and other graph-based neighbours should only use planar coordinates - including dismo::voronoi, which uses deldir::deldir() - just like spdep::tri2nb(). Triangulation can lead to spurious neighbours on the convex hull.

>>>>

>>>>>

>>>>> v <- dismo::voronoi(coords)

>>>>> par(mfrow = c(1, 2), xaxt = "n", yaxt = "n", mgp = c(0, 0, 0))

>>>>> plot(coords, type = "n", xlab = NA, ylab = NA)

>>>>> plot(v, add = TRUE)

>>>>> text(x = coords[, 1], y = coords[, 2], labels = voter.subset$Voter.ID)

>>>>> plot(coords, type = "n", xlab = NA, ylab = NA)

>>>>> plot(poly2nb(v), coords, add = TRUE, col = "gray")

>>>>>

>>>>> ƒacu.-

>>>>>

>>>>>

>>>>> On 07/12/2018 09:00 PM, Benjamin Lieberman wrote:

>>>>>> Hi all,

>>>>>>

>>>>>> Currently, I am working with U.S. voter data. Below, I included a brief example of the structure of the data with some reproducible code. My data set consists of roughly 233,000 (233k) entries, each specifying a voter and their particular latitude/longitude pair.

>>>>

>>>> Using individual voter data is highly dangerous, and must in every case be subject to the strictest privacy rules. Voter data does not in essence have position - the only valid voting data that has position is of the voting station/precinct, and those data are aggregated to preserve anonymity.

>>>>

>>>> Why does position and voter data not have position? Which location should you use - residence, workplace, what? What are these locations proxying? Nothing valid can be drawn from "just voter data" - you can get conclusions from carefully constructed stratified exit polls, but there the key gender/age/ethnicity/social class/etc. confounders are handled by design. Why should voting decisions be influenced by proximity (they are not)? The missing element here is looking carefully at relevant covariates at more aggregated levels (in the US typically zoning controlling social class positional segregation, etc.).

>>>>

>>>>>> I have been using the spdep package with the hope of creating a CAR model. To begin the analysis, we need to find all first order neighbors of every point in the data.

>>>>>>

>>>>>> While spdep has fantastic commands for finding k nearest neighbors (knearneigh), and a useful command for finding lag of order 3 or more (nblag), I have yet to find a method which is suitable for our purposes (lag = 1, or lag =2). Additionally, I looked into altering the nblag command to accommodate maxlag = 1 or maxlag = 2, but the command relies on an nb format, which is problematic as we are looking for the underlying neighborhood structure.

>>>>>>

>>>>>> There has been numerous work done with polygons, or data which already is in “nb” format, but after reading the literature, it seems that polygons are not appropriate, nor are distance based neighbor techniques, due to density fluctuations over the area of interest.

>>>>>>

>>>>>> Below is some reproducible code I wrote. I would like to note that I am currently working in R 1.1.453 on a MacBook.

>>>>

>>>> You mean RStudio, there is no such version of R.

>>>>

>>>>>>

>>>>>> # Create a data frame of 10 voters, picked at random

>>>>>> voter.1 = c(1, -75.52187, 40.62320)

>>>>>> voter.2 = c(2,-75.56373, 40.55216)

>>>>>> voter.3 = c(3,-75.39587, 40.55416)

>>>>>> voter.4 = c(4,-75.42248, 40.64326)

>>>>>> voter.5 = c(5,-75.56654, 40.54948)

>>>>>> voter.6 = c(6,-75.56257, 40.67375)

>>>>>> voter.7 = c(7, -75.51888, 40.59715)

>>>>>> voter.8 = c(8, -75.59879, 40.60014)

>>>>>> voter.9 = c(9, -75.59879, 40.60014)

>>>>>> voter.10 = c(10, -75.50877, 40.53129)

>>>>>>

>>>>

>>>> These are in geographical coordinates.

>>>>

>>>>>> # Bind the vectors together

>>>>>> voter.subset = rbind(voter.1, voter.2, voter.3, voter.4, voter.5, voter.6, voter.7, voter.8, voter.9, voter.10)

>>>>>>

>>>>>> # Rename the columns

>>>>>> colnames(voter.subset) = c("Voter.ID", "Longitude", "Latitude")

>>>>>>

>>>>>> # Change the class from a matrix to a data frame

>>>>>> voter.subset = as.data.frame(voter.subset)

>>>>>>

>>>>>> # Load in the required packages

>>>>>> library(spdep)

>>>>>> library(sp)

>>>>>>

>>>>>> # Set the coordinates

>>>>>> coordinates(voter.subset) = c("Longitude", "Latitude")

>>>>>> coords = coordinates(voter.subset)

>>>>>>

>>>>>> # Jitter to ensure no duplicate points

>>>>>> coords = jitter(coords, factor = 1)

>>>>>>

>>>>

>>>> jitter does not respect geographical coordinated (decimal degree metric).

>>>>

>>>>>> # Find the first nearest neighbor of each point

>>>>>> one.nn = knearneigh(coords, k=1)

>>>>

>>>> See the help page (hint: longlat=TRUE to use Great Circle distances, much slower than planar).

>>>>

>>>>>>

>>>>>> # Convert the first nearest neighbor to format "nb"

>>>>>> one.nn_nb = knn2nb(one.nn, sym = F)

>>>>>>

>>>>>> Thank you in advance for any help you may offer, and for taking the time to read this. I have consulted Applied Spatial Data Analysis with R (Bivand, Pebesma, Gomez-Rubio), as well as other Sig-Geo threads, the spdep documentation, and the nb vignette (Bivand, April 3, 2018) from earlier this year.

>>>>>>

>>>>>> Warmest,

>>>>>> Ben

>>>>>> --

>>>>>> Benjamin Lieberman

>>>>>> Muhlenberg College 2019

>>>>>> Mobile: 301.299.8928

>>>>>>

>>>>>>

>>>>>>

>>>>>> [[alternative HTML version deleted]]

>>>>

>>>> Plain text only, please.

>>>>

>>>>>>

>>>>>> _______________________________________________

>>>>>> R-sig-Geo mailing list

>>>>>> [hidden email] <mailto:[hidden email]> <mailto:[hidden email] <mailto:[hidden email]>>

>>>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo <https://stat.ethz.ch/mailman/listinfo/r-sig-geo> <https://stat.ethz.ch/mailman/listinfo/r-sig-geo <https://stat.ethz.ch/mailman/listinfo/r-sig-geo>>

>>>>>

>>>>>

>>>>> [[alternative HTML version deleted]]

>>>>>

>>>>> _______________________________________________

>>>>> R-sig-Geo mailing list

>>>>> [hidden email] <mailto:[hidden email]> <mailto:[hidden email] <mailto:[hidden email]>>

>>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo <https://stat.ethz.ch/mailman/listinfo/r-sig-geo> <https://stat.ethz.ch/mailman/listinfo/r-sig-geo <https://stat.ethz.ch/mailman/listinfo/r-sig-geo>>

>>>>>

>>>>

>>>> --

>>>> Roger Bivand

>>>> Department of Economics, Norwegian School of Economics,

>>>> Helleveien 30, N-5045 Bergen, Norway.

>>>> voice: +47 55 95 93 55; e-mail: [hidden email] <mailto:[hidden email]> <mailto:[hidden email] <mailto:[hidden email]>>

>>>> http://orcid.org/0000-0003-2392-6140 <http://orcid.org/0000-0003-2392-6140> <http://orcid.org/0000-0003-2392-6140 <http://orcid.org/0000-0003-2392-6140>>

>>>> https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en_______________________________________________ <https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en_______________________________________________><https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en_______________________________________________ <https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en_______________________________________________>>

>>>> R-sig-Geo mailing list

>>>> [hidden email] <mailto:[hidden email]> <mailto:[hidden email] <mailto:[hidden email]>>

>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo <https://stat.ethz.ch/mailman/listinfo/r-sig-geo> <https://stat.ethz.ch/mailman/listinfo/r-sig-geo <https://stat.ethz.ch/mailman/listinfo/r-sig-geo>>

>>

>>

>> [[alternative HTML version deleted]]

>>

>> _______________________________________________

>> R-sig-Geo mailing list

>> [hidden email] <mailto:[hidden email]>

>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo <https://stat.ethz.ch/mailman/listinfo/r-sig-geo>

>>

>

> --

> Roger Bivand

> Department of Economics, Norwegian School of Economics,

> Helleveien 30, N-5045 Bergen, Norway.

> voice: +47 55 95 93 55; e-mail: [hidden email] <mailto:[hidden email]>

> http://orcid.org/0000-0003-2392-6140 <http://orcid.org/0000-0003-2392-6140>

> https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en <https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en>

[[alternative HTML version deleted]]

_______________________________________________

R-sig-Geo mailing list

[hidden email]

https://stat.ethz.ch/mailman/listinfo/r-sig-geo

Thank you so much for the help. In our case, first order neighbors are all neighbors who are adjacent to a voter. Second order neighbors are then all neighbors who are adjacent to the first order neighbors. Hope that this could clarify what I have been referencing this time.

I will try the method you suggested, thank you.

Best,

Ben

--

Benjamin Lieberman

Muhlenberg College 2019

Mobile: 301.299.8928

> On Jul 13, 2018, at 7:30 AM, Roger Bivand <[hidden email]> wrote:

>

> On Fri, 13 Jul 2018, Benjamin Lieberman wrote:

>

>> All-

>>

>> I would like to note that as the data is proprietary, and for obvious privacy concerns, the lat/long pairs were randomly generated, and were not taken directly from the data.

>

> Thanks for the clarification. Note that if the data are a sample, that is not a complete listing for one or more study areas, you don't know who the first order neighbour (the most proximate other voter) is, because that indidivual may not be in the sample. Your fallback then is to treat the data as aggregates, unless you rule out local sampling variability.

>

> Roger

>

>>

>>

>> --

>> Benjamin Lieberman

>> Muhlenberg College 2019

>> Mobile: 301.299.8928

>>

>>> On Jul 13, 2018, at 6:58 AM, Benjamin Lieberman <[hidden email]> wrote:

>>>

>>> Roger anf Facu,

>>>

>>> Thank you very much for the help. In terms of the data, I only provided the ID and Lat/Long pairs because they were the only covariates which were necessary. The data set we are using was purchased and contains voter registration information, voter history, and census tract information, after some geocoding took place. The locations are the residents houses, in this instance.

>>>

>>> I have rerun the knn with longlat = T, but I am still hung up on the idea of the first order neighbors. I have reread the vignette and section 5 discusses High-Order Neighbors, but there isn’t any mention of first or second order neighbors, as you mentioned above (“first order neighbors are not defined”). One of the pieces of literature I found said that polygons are problematic to work with, as are tesslations for precisely the reason you mentioned, non-planarity. For this reason, I am hung up on the idea of how to find all first order neighbors for a point, especially as the number of first order neighbors varies from point to point, and such knearneigh would not be appropriate here.

>>>

>>> If this is something that does not seem feasible, maybe another tactic is necessary.

>>>

>>> Again, thank you all for the help.

>>>

>>> Warmest

>>> --

>>> Benjamin Lieberman

>>> Muhlenberg College 2019

>>> Mobile: 301.299.8928

>>>

>>>> On Jul 13, 2018, at 6:11 AM, Roger Bivand <[hidden email] <mailto:[hidden email]> <mailto:[hidden email] <mailto:[hidden email]>>> wrote:

>>>>

>>>> On Fri, 13 Jul 2018, Facundo Muñoz wrote:

>>>>

>>>>> Dear Benjamin,

>>>>>

>>>>> I'm not sure how you define "first order neighbors" for a point. The

>>>>> first thing that comes to my mind is to use their corresponding voronoi

>>>>> polygons and define neighborhood from there. Following your code:

>>>>

>>>> Thanks, the main source of confusion is that "first order neighbors" are not defined. A k=1 neighbour could be (as below), as could k=6, or voronoi neighbours, or sphere of influence etc. So reading vignette("nb") would be a starting point.

>>>>

>>>> Also note that voronoi and other graph-based neighbours should only use planar coordinates - including dismo::voronoi, which uses deldir::deldir() - just like spdep::tri2nb(). Triangulation can lead to spurious neighbours on the convex hull.

>>>>

>>>>>

>>>>> v <- dismo::voronoi(coords)

>>>>> par(mfrow = c(1, 2), xaxt = "n", yaxt = "n", mgp = c(0, 0, 0))

>>>>> plot(coords, type = "n", xlab = NA, ylab = NA)

>>>>> plot(v, add = TRUE)

>>>>> text(x = coords[, 1], y = coords[, 2], labels = voter.subset$Voter.ID)

>>>>> plot(coords, type = "n", xlab = NA, ylab = NA)

>>>>> plot(poly2nb(v), coords, add = TRUE, col = "gray")

>>>>>

>>>>> ƒacu.-

>>>>>

>>>>>

>>>>> On 07/12/2018 09:00 PM, Benjamin Lieberman wrote:

>>>>>> Hi all,

>>>>>>

>>>>>> Currently, I am working with U.S. voter data. Below, I included a brief example of the structure of the data with some reproducible code. My data set consists of roughly 233,000 (233k) entries, each specifying a voter and their particular latitude/longitude pair.

>>>>

>>>> Using individual voter data is highly dangerous, and must in every case be subject to the strictest privacy rules. Voter data does not in essence have position - the only valid voting data that has position is of the voting station/precinct, and those data are aggregated to preserve anonymity.

>>>>

>>>> Why does position and voter data not have position? Which location should you use - residence, workplace, what? What are these locations proxying? Nothing valid can be drawn from "just voter data" - you can get conclusions from carefully constructed stratified exit polls, but there the key gender/age/ethnicity/social class/etc. confounders are handled by design. Why should voting decisions be influenced by proximity (they are not)? The missing element here is looking carefully at relevant covariates at more aggregated levels (in the US typically zoning controlling social class positional segregation, etc.).

>>>>

>>>>>> I have been using the spdep package with the hope of creating a CAR model. To begin the analysis, we need to find all first order neighbors of every point in the data.

>>>>>>

>>>>>> While spdep has fantastic commands for finding k nearest neighbors (knearneigh), and a useful command for finding lag of order 3 or more (nblag), I have yet to find a method which is suitable for our purposes (lag = 1, or lag =2). Additionally, I looked into altering the nblag command to accommodate maxlag = 1 or maxlag = 2, but the command relies on an nb format, which is problematic as we are looking for the underlying neighborhood structure.

>>>>>>

>>>>>> There has been numerous work done with polygons, or data which already is in “nb” format, but after reading the literature, it seems that polygons are not appropriate, nor are distance based neighbor techniques, due to density fluctuations over the area of interest.

>>>>>>

>>>>>> Below is some reproducible code I wrote. I would like to note that I am currently working in R 1.1.453 on a MacBook.

>>>>

>>>> You mean RStudio, there is no such version of R.

>>>>

>>>>>>

>>>>>> # Create a data frame of 10 voters, picked at random

>>>>>> voter.1 = c(1, -75.52187, 40.62320)

>>>>>> voter.2 = c(2,-75.56373, 40.55216)

>>>>>> voter.3 = c(3,-75.39587, 40.55416)

>>>>>> voter.4 = c(4,-75.42248, 40.64326)

>>>>>> voter.5 = c(5,-75.56654, 40.54948)

>>>>>> voter.6 = c(6,-75.56257, 40.67375)

>>>>>> voter.7 = c(7, -75.51888, 40.59715)

>>>>>> voter.8 = c(8, -75.59879, 40.60014)

>>>>>> voter.9 = c(9, -75.59879, 40.60014)

>>>>>> voter.10 = c(10, -75.50877, 40.53129)

>>>>>>

>>>>

>>>> These are in geographical coordinates.

>>>>

>>>>>> # Bind the vectors together

>>>>>> voter.subset = rbind(voter.1, voter.2, voter.3, voter.4, voter.5, voter.6, voter.7, voter.8, voter.9, voter.10)

>>>>>>

>>>>>> # Rename the columns

>>>>>> colnames(voter.subset) = c("Voter.ID", "Longitude", "Latitude")

>>>>>>

>>>>>> # Change the class from a matrix to a data frame

>>>>>> voter.subset = as.data.frame(voter.subset)

>>>>>>

>>>>>> # Load in the required packages

>>>>>> library(spdep)

>>>>>> library(sp)

>>>>>>

>>>>>> # Set the coordinates

>>>>>> coordinates(voter.subset) = c("Longitude", "Latitude")

>>>>>> coords = coordinates(voter.subset)

>>>>>>

>>>>>> # Jitter to ensure no duplicate points

>>>>>> coords = jitter(coords, factor = 1)

>>>>>>

>>>>

>>>> jitter does not respect geographical coordinated (decimal degree metric).

>>>>

>>>>>> # Find the first nearest neighbor of each point

>>>>>> one.nn = knearneigh(coords, k=1)

>>>>

>>>> See the help page (hint: longlat=TRUE to use Great Circle distances, much slower than planar).

>>>>

>>>>>>

>>>>>> # Convert the first nearest neighbor to format "nb"

>>>>>> one.nn_nb = knn2nb(one.nn, sym = F)

>>>>>>

>>>>>> Thank you in advance for any help you may offer, and for taking the time to read this. I have consulted Applied Spatial Data Analysis with R (Bivand, Pebesma, Gomez-Rubio), as well as other Sig-Geo threads, the spdep documentation, and the nb vignette (Bivand, April 3, 2018) from earlier this year.

>>>>>>

>>>>>> Warmest,

>>>>>> Ben

>>>>>> --

>>>>>> Benjamin Lieberman

>>>>>> Muhlenberg College 2019

>>>>>> Mobile: 301.299.8928

>>>>>>

>>>>>>

>>>>>>

>>>>>> [[alternative HTML version deleted]]

>>>>

>>>> Plain text only, please.

>>>>

>>>>>>

>>>>>> _______________________________________________

>>>>>> R-sig-Geo mailing list

>>>>>> [hidden email] <mailto:[hidden email]> <mailto:[hidden email] <mailto:[hidden email]>>

>>>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo <https://stat.ethz.ch/mailman/listinfo/r-sig-geo> <https://stat.ethz.ch/mailman/listinfo/r-sig-geo <https://stat.ethz.ch/mailman/listinfo/r-sig-geo>>

>>>>>

>>>>>

>>>>> [[alternative HTML version deleted]]

>>>>>

>>>>> _______________________________________________

>>>>> R-sig-Geo mailing list

>>>>> [hidden email] <mailto:[hidden email]> <mailto:[hidden email] <mailto:[hidden email]>>

>>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo <https://stat.ethz.ch/mailman/listinfo/r-sig-geo> <https://stat.ethz.ch/mailman/listinfo/r-sig-geo <https://stat.ethz.ch/mailman/listinfo/r-sig-geo>>

>>>>>

>>>>

>>>> --

>>>> Roger Bivand

>>>> Department of Economics, Norwegian School of Economics,

>>>> Helleveien 30, N-5045 Bergen, Norway.

>>>> voice: +47 55 95 93 55; e-mail: [hidden email] <mailto:[hidden email]> <mailto:[hidden email] <mailto:[hidden email]>>

>>>> http://orcid.org/0000-0003-2392-6140 <http://orcid.org/0000-0003-2392-6140> <http://orcid.org/0000-0003-2392-6140 <http://orcid.org/0000-0003-2392-6140>>

>>>> https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en_______________________________________________ <https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en_______________________________________________><https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en_______________________________________________ <https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en_______________________________________________>>

>>>> R-sig-Geo mailing list

>>>> [hidden email] <mailto:[hidden email]> <mailto:[hidden email] <mailto:[hidden email]>>

>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo <https://stat.ethz.ch/mailman/listinfo/r-sig-geo> <https://stat.ethz.ch/mailman/listinfo/r-sig-geo <https://stat.ethz.ch/mailman/listinfo/r-sig-geo>>

>>

>>

>> [[alternative HTML version deleted]]

>>

>> _______________________________________________

>> R-sig-Geo mailing list

>> [hidden email] <mailto:[hidden email]>

>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo <https://stat.ethz.ch/mailman/listinfo/r-sig-geo>

>>

>

> --

> Roger Bivand

> Department of Economics, Norwegian School of Economics,

> Helleveien 30, N-5045 Bergen, Norway.

> voice: +47 55 95 93 55; e-mail: [hidden email] <mailto:[hidden email]>

> http://orcid.org/0000-0003-2392-6140 <http://orcid.org/0000-0003-2392-6140>

> https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en <https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en>

[[alternative HTML version deleted]]

_______________________________________________

R-sig-Geo mailing list

[hidden email]

https://stat.ethz.ch/mailman/listinfo/r-sig-geo

### Re: How to find all first order neighbors of a collection of points

On Fri, 13 Jul 2018, Benjamin Lieberman wrote:

> All-

>

> I would like to note that as the data is proprietary, and for obvious

> privacy concerns, the lat/long pairs were randomly generated, and were

> not taken directly from the data.

Thanks for the clarification. Note that if the data are a sample, that is

not a complete listing for one or more study areas, you don't know who the

first order neighbour (the most proximate other voter) is, because that

indidivual may not be in the sample. Your fallback then is to treat the

data as aggregates, unless you rule out local sampling variability.

Roger

>

>

> --

> Benjamin Lieberman

> Muhlenberg College 2019

> Mobile: 301.299.8928

>

>> On Jul 13, 2018, at 6:58 AM, Benjamin Lieberman <[hidden email]> wrote:

>>

>> Roger anf Facu,

>>

>> Thank you very much for the help. In terms of the data, I only provided the ID and Lat/Long pairs because they were the only covariates which were necessary. The data set we are using was purchased and contains voter registration information, voter history, and census tract information, after some geocoding took place. The locations are the residents houses, in this instance.

>>

>> I have rerun the knn with longlat = T, but I am still hung up on the idea of the first order neighbors. I have reread the vignette and section 5 discusses High-Order Neighbors, but there isn’t any mention of first or second order neighbors, as you mentioned above (“first order neighbors are not defined”). One of the pieces of literature I found said that polygons are problematic to work with, as are tesslations for precisely the reason you mentioned, non-planarity. For this reason, I am hung up on the idea of how to find all first order neighbors for a point, especially as the number of first order neighbors varies from point to point, and such knearneigh would not be appropriate here.

>>

>> If this is something that does not seem feasible, maybe another tactic is necessary.

>>

>> Again, thank you all for the help.

>>

>> Warmest

>> --

>> Benjamin Lieberman

>> Muhlenberg College 2019

>> Mobile: 301.299.8928

>>

>>> On Jul 13, 2018, at 6:11 AM, Roger Bivand <[hidden email] <mailto:[hidden email]>> wrote:

>>>

>>> On Fri, 13 Jul 2018, Facundo Muñoz wrote:

>>>

>>>> Dear Benjamin,

>>>>

>>>> I'm not sure how you define "first order neighbors" for a point. The

>>>> first thing that comes to my mind is to use their corresponding voronoi

>>>> polygons and define neighborhood from there. Following your code:

>>>

>>> Thanks, the main source of confusion is that "first order neighbors" are not defined. A k=1 neighbour could be (as below), as could k=6, or voronoi neighbours, or sphere of influence etc. So reading vignette("nb") would be a starting point.

>>>

>>> Also note that voronoi and other graph-based neighbours should only use planar coordinates - including dismo::voronoi, which uses deldir::deldir() - just like spdep::tri2nb(). Triangulation can lead to spurious neighbours on the convex hull.

>>>

>>>>

>>>> v <- dismo::voronoi(coords)

>>>> par(mfrow = c(1, 2), xaxt = "n", yaxt = "n", mgp = c(0, 0, 0))

>>>> plot(coords, type = "n", xlab = NA, ylab = NA)

>>>> plot(v, add = TRUE)

>>>> text(x = coords[, 1], y = coords[, 2], labels = voter.subset$Voter.ID)

>>>> plot(coords, type = "n", xlab = NA, ylab = NA)

>>>> plot(poly2nb(v), coords, add = TRUE, col = "gray")

>>>>

>>>> ƒacu.-

>>>>

>>>>

>>>> On 07/12/2018 09:00 PM, Benjamin Lieberman wrote:

>>>>> Hi all,

>>>>>

>>>>> Currently, I am working with U.S. voter data. Below, I included a brief example of the structure of the data with some reproducible code. My data set consists of roughly 233,000 (233k) entries, each specifying a voter and their particular latitude/longitude pair.

>>>

>>> Using individual voter data is highly dangerous, and must in every case be subject to the strictest privacy rules. Voter data does not in essence have position - the only valid voting data that has position is of the voting station/precinct, and those data are aggregated to preserve anonymity.

>>>

>>> Why does position and voter data not have position? Which location should you use - residence, workplace, what? What are these locations proxying? Nothing valid can be drawn from "just voter data" - you can get conclusions from carefully constructed stratified exit polls, but there the key gender/age/ethnicity/social class/etc. confounders are handled by design. Why should voting decisions be influenced by proximity (they are not)? The missing element here is looking carefully at relevant covariates at more aggregated levels (in the US typically zoning controlling social class positional segregation, etc.).

>>>

>>>>> I have been using the spdep package with the hope of creating a CAR model. To begin the analysis, we need to find all first order neighbors of every point in the data.

>>>>>

>>>>> While spdep has fantastic commands for finding k nearest neighbors (knearneigh), and a useful command for finding lag of order 3 or more (nblag), I have yet to find a method which is suitable for our purposes (lag = 1, or lag =2). Additionally, I looked into altering the nblag command to accommodate maxlag = 1 or maxlag = 2, but the command relies on an nb format, which is problematic as we are looking for the underlying neighborhood structure.

>>>>>

>>>>> There has been numerous work done with polygons, or data which already is in “nb” format, but after reading the literature, it seems that polygons are not appropriate, nor are distance based neighbor techniques, due to density fluctuations over the area of interest.

>>>>>

>>>>> Below is some reproducible code I wrote. I would like to note that I am currently working in R 1.1.453 on a MacBook.

>>>

>>> You mean RStudio, there is no such version of R.

>>>

>>>>>

>>>>> # Create a data frame of 10 voters, picked at random

>>>>> voter.1 = c(1, -75.52187, 40.62320)

>>>>> voter.2 = c(2,-75.56373, 40.55216)

>>>>> voter.3 = c(3,-75.39587, 40.55416)

>>>>> voter.4 = c(4,-75.42248, 40.64326)

>>>>> voter.5 = c(5,-75.56654, 40.54948)

>>>>> voter.6 = c(6,-75.56257, 40.67375)

>>>>> voter.7 = c(7, -75.51888, 40.59715)

>>>>> voter.8 = c(8, -75.59879, 40.60014)

>>>>> voter.9 = c(9, -75.59879, 40.60014)

>>>>> voter.10 = c(10, -75.50877, 40.53129)

>>>>>

>>>

>>> These are in geographical coordinates.

>>>

>>>>> # Bind the vectors together

>>>>> voter.subset = rbind(voter.1, voter.2, voter.3, voter.4, voter.5, voter.6, voter.7, voter.8, voter.9, voter.10)

>>>>>

>>>>> # Rename the columns

>>>>> colnames(voter.subset) = c("Voter.ID", "Longitude", "Latitude")

>>>>>

>>>>> # Change the class from a matrix to a data frame

>>>>> voter.subset = as.data.frame(voter.subset)

>>>>>

>>>>> # Load in the required packages

>>>>> library(spdep)

>>>>> library(sp)

>>>>>

>>>>> # Set the coordinates

>>>>> coordinates(voter.subset) = c("Longitude", "Latitude")

>>>>> coords = coordinates(voter.subset)

>>>>>

>>>>> # Jitter to ensure no duplicate points

>>>>> coords = jitter(coords, factor = 1)

>>>>>

>>>

>>> jitter does not respect geographical coordinated (decimal degree metric).

>>>

>>>>> # Find the first nearest neighbor of each point

>>>>> one.nn = knearneigh(coords, k=1)

>>>

>>> See the help page (hint: longlat=TRUE to use Great Circle distances, much slower than planar).

>>>

>>>>>

>>>>> # Convert the first nearest neighbor to format "nb"

>>>>> one.nn_nb = knn2nb(one.nn, sym = F)

>>>>>

>>>>> Thank you in advance for any help you may offer, and for taking the time to read this. I have consulted Applied Spatial Data Analysis with R (Bivand, Pebesma, Gomez-Rubio), as well as other Sig-Geo threads, the spdep documentation, and the nb vignette (Bivand, April 3, 2018) from earlier this year.

>>>>>

>>>>> Warmest,

>>>>> Ben

>>>>> --

>>>>> Benjamin Lieberman

>>>>> Muhlenberg College 2019

>>>>> Mobile: 301.299.8928

>>>>>

>>>>>

>>>>>

>>>>> [[alternative HTML version deleted]]

>>>

>>> Plain text only, please.

>>>

>>>>>

>>>>> _______________________________________________

>>>>> R-sig-Geo mailing list

>>>>> [hidden email] <mailto:[hidden email]>

>>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo <https://stat.ethz.ch/mailman/listinfo/r-sig-geo>

>>>>

>>>>

>>>> [[alternative HTML version deleted]]

>>>>

>>>> _______________________________________________

>>>> R-sig-Geo mailing list

>>>> [hidden email] <mailto:[hidden email]>

>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo <https://stat.ethz.ch/mailman/listinfo/r-sig-geo>

>>>>

>>>

>>> --

>>> Roger Bivand

>>> Department of Economics, Norwegian School of Economics,

>>> Helleveien 30, N-5045 Bergen, Norway.

>>> voice: +47 55 95 93 55; e-mail: [hidden email] <mailto:[hidden email]>

>>> http://orcid.org/0000-0003-2392-6140 <http://orcid.org/0000-0003-2392-6140>

>>> https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en_______________________________________________ <https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en_______________________________________________>

>>> R-sig-Geo mailing list

>>> [hidden email] <mailto:[hidden email]>

>>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo <https://stat.ethz.ch/mailman/listinfo/r-sig-geo>

>

>

> [[alternative HTML version deleted]]

>

> _______________________________________________

> R-sig-Geo mailing list

> [hidden email]

> https://stat.ethz.ch/mailman/listinfo/r-sig-geo

> --

Roger Bivand

Department of Economics, Norwegian School of Economics,

Helleveien 30, N-5045 Bergen, Norway.

voice: +47 55 95 93 55; e-mail: [hidden email]

http://orcid.org/0000-0003-2392-6140

https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en

_______________________________________________

R-sig-Geo mailing list

[hidden email]

https://stat.ethz.ch/mailman/listinfo/r-sig-geo

Roger Bivand

Department of Economics

Norwegian School of Economics

Helleveien 30

N-5045 Bergen, Norway

> All-

>

> I would like to note that as the data is proprietary, and for obvious

> privacy concerns, the lat/long pairs were randomly generated, and were

> not taken directly from the data.

Thanks for the clarification. Note that if the data are a sample, that is

not a complete listing for one or more study areas, you don't know who the

first order neighbour (the most proximate other voter) is, because that

indidivual may not be in the sample. Your fallback then is to treat the

data as aggregates, unless you rule out local sampling variability.

Roger

>

>

> --

> Benjamin Lieberman

> Muhlenberg College 2019

> Mobile: 301.299.8928

>

>> On Jul 13, 2018, at 6:58 AM, Benjamin Lieberman <[hidden email]> wrote:

>>

>> Roger anf Facu,

>>

>> Thank you very much for the help. In terms of the data, I only provided the ID and Lat/Long pairs because they were the only covariates which were necessary. The data set we are using was purchased and contains voter registration information, voter history, and census tract information, after some geocoding took place. The locations are the residents houses, in this instance.

>>

>> I have rerun the knn with longlat = T, but I am still hung up on the idea of the first order neighbors. I have reread the vignette and section 5 discusses High-Order Neighbors, but there isn’t any mention of first or second order neighbors, as you mentioned above (“first order neighbors are not defined”). One of the pieces of literature I found said that polygons are problematic to work with, as are tesslations for precisely the reason you mentioned, non-planarity. For this reason, I am hung up on the idea of how to find all first order neighbors for a point, especially as the number of first order neighbors varies from point to point, and such knearneigh would not be appropriate here.

>>

>> If this is something that does not seem feasible, maybe another tactic is necessary.

>>

>> Again, thank you all for the help.

>>

>> Warmest

>> --

>> Benjamin Lieberman

>> Muhlenberg College 2019

>> Mobile: 301.299.8928

>>

>>> On Jul 13, 2018, at 6:11 AM, Roger Bivand <[hidden email] <mailto:[hidden email]>> wrote:

>>>

>>> On Fri, 13 Jul 2018, Facundo Muñoz wrote:

>>>

>>>> Dear Benjamin,

>>>>

>>>> I'm not sure how you define "first order neighbors" for a point. The

>>>> first thing that comes to my mind is to use their corresponding voronoi

>>>> polygons and define neighborhood from there. Following your code:

>>>

>>> Thanks, the main source of confusion is that "first order neighbors" are not defined. A k=1 neighbour could be (as below), as could k=6, or voronoi neighbours, or sphere of influence etc. So reading vignette("nb") would be a starting point.

>>>

>>> Also note that voronoi and other graph-based neighbours should only use planar coordinates - including dismo::voronoi, which uses deldir::deldir() - just like spdep::tri2nb(). Triangulation can lead to spurious neighbours on the convex hull.

>>>

>>>>

>>>> v <- dismo::voronoi(coords)

>>>> par(mfrow = c(1, 2), xaxt = "n", yaxt = "n", mgp = c(0, 0, 0))

>>>> plot(coords, type = "n", xlab = NA, ylab = NA)

>>>> plot(v, add = TRUE)

>>>> text(x = coords[, 1], y = coords[, 2], labels = voter.subset$Voter.ID)

>>>> plot(coords, type = "n", xlab = NA, ylab = NA)

>>>> plot(poly2nb(v), coords, add = TRUE, col = "gray")

>>>>

>>>> ƒacu.-

>>>>

>>>>

>>>> On 07/12/2018 09:00 PM, Benjamin Lieberman wrote:

>>>>> Hi all,

>>>>>

>>>>> Currently, I am working with U.S. voter data. Below, I included a brief example of the structure of the data with some reproducible code. My data set consists of roughly 233,000 (233k) entries, each specifying a voter and their particular latitude/longitude pair.

>>>

>>> Using individual voter data is highly dangerous, and must in every case be subject to the strictest privacy rules. Voter data does not in essence have position - the only valid voting data that has position is of the voting station/precinct, and those data are aggregated to preserve anonymity.

>>>

>>> Why does position and voter data not have position? Which location should you use - residence, workplace, what? What are these locations proxying? Nothing valid can be drawn from "just voter data" - you can get conclusions from carefully constructed stratified exit polls, but there the key gender/age/ethnicity/social class/etc. confounders are handled by design. Why should voting decisions be influenced by proximity (they are not)? The missing element here is looking carefully at relevant covariates at more aggregated levels (in the US typically zoning controlling social class positional segregation, etc.).

>>>

>>>>> I have been using the spdep package with the hope of creating a CAR model. To begin the analysis, we need to find all first order neighbors of every point in the data.

>>>>>

>>>>> While spdep has fantastic commands for finding k nearest neighbors (knearneigh), and a useful command for finding lag of order 3 or more (nblag), I have yet to find a method which is suitable for our purposes (lag = 1, or lag =2). Additionally, I looked into altering the nblag command to accommodate maxlag = 1 or maxlag = 2, but the command relies on an nb format, which is problematic as we are looking for the underlying neighborhood structure.

>>>>>

>>>>> There has been numerous work done with polygons, or data which already is in “nb” format, but after reading the literature, it seems that polygons are not appropriate, nor are distance based neighbor techniques, due to density fluctuations over the area of interest.

>>>>>

>>>>> Below is some reproducible code I wrote. I would like to note that I am currently working in R 1.1.453 on a MacBook.

>>>

>>> You mean RStudio, there is no such version of R.

>>>

>>>>>

>>>>> # Create a data frame of 10 voters, picked at random

>>>>> voter.1 = c(1, -75.52187, 40.62320)

>>>>> voter.2 = c(2,-75.56373, 40.55216)

>>>>> voter.3 = c(3,-75.39587, 40.55416)

>>>>> voter.4 = c(4,-75.42248, 40.64326)

>>>>> voter.5 = c(5,-75.56654, 40.54948)

>>>>> voter.6 = c(6,-75.56257, 40.67375)

>>>>> voter.7 = c(7, -75.51888, 40.59715)

>>>>> voter.8 = c(8, -75.59879, 40.60014)

>>>>> voter.9 = c(9, -75.59879, 40.60014)

>>>>> voter.10 = c(10, -75.50877, 40.53129)

>>>>>

>>>

>>> These are in geographical coordinates.

>>>

>>>>> # Bind the vectors together

>>>>> voter.subset = rbind(voter.1, voter.2, voter.3, voter.4, voter.5, voter.6, voter.7, voter.8, voter.9, voter.10)

>>>>>

>>>>> # Rename the columns

>>>>> colnames(voter.subset) = c("Voter.ID", "Longitude", "Latitude")

>>>>>

>>>>> # Change the class from a matrix to a data frame

>>>>> voter.subset = as.data.frame(voter.subset)

>>>>>

>>>>> # Load in the required packages

>>>>> library(spdep)

>>>>> library(sp)

>>>>>

>>>>> # Set the coordinates

>>>>> coordinates(voter.subset) = c("Longitude", "Latitude")

>>>>> coords = coordinates(voter.subset)

>>>>>

>>>>> # Jitter to ensure no duplicate points

>>>>> coords = jitter(coords, factor = 1)

>>>>>

>>>

>>> jitter does not respect geographical coordinated (decimal degree metric).

>>>

>>>>> # Find the first nearest neighbor of each point

>>>>> one.nn = knearneigh(coords, k=1)

>>>

>>> See the help page (hint: longlat=TRUE to use Great Circle distances, much slower than planar).

>>>

>>>>>

>>>>> # Convert the first nearest neighbor to format "nb"

>>>>> one.nn_nb = knn2nb(one.nn, sym = F)

>>>>>

>>>>> Thank you in advance for any help you may offer, and for taking the time to read this. I have consulted Applied Spatial Data Analysis with R (Bivand, Pebesma, Gomez-Rubio), as well as other Sig-Geo threads, the spdep documentation, and the nb vignette (Bivand, April 3, 2018) from earlier this year.

>>>>>

>>>>> Warmest,

>>>>> Ben

>>>>> --

>>>>> Benjamin Lieberman

>>>>> Muhlenberg College 2019

>>>>> Mobile: 301.299.8928

>>>>>

>>>>>

>>>>>

>>>>> [[alternative HTML version deleted]]

>>>

>>> Plain text only, please.

>>>

>>>>>

>>>>> _______________________________________________

>>>>> R-sig-Geo mailing list

>>>>> [hidden email] <mailto:[hidden email]>

>>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo <https://stat.ethz.ch/mailman/listinfo/r-sig-geo>

>>>>

>>>>

>>>> [[alternative HTML version deleted]]

>>>>

>>>> _______________________________________________

>>>> R-sig-Geo mailing list

>>>> [hidden email] <mailto:[hidden email]>

>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo <https://stat.ethz.ch/mailman/listinfo/r-sig-geo>

>>>>

>>>

>>> --

>>> Roger Bivand

>>> Department of Economics, Norwegian School of Economics,

>>> Helleveien 30, N-5045 Bergen, Norway.

>>> voice: +47 55 95 93 55; e-mail: [hidden email] <mailto:[hidden email]>

>>> http://orcid.org/0000-0003-2392-6140 <http://orcid.org/0000-0003-2392-6140>

>>> https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en_______________________________________________ <https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en_______________________________________________>

>>> R-sig-Geo mailing list

>>> [hidden email] <mailto:[hidden email]>

>>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo <https://stat.ethz.ch/mailman/listinfo/r-sig-geo>

>

>

> [[alternative HTML version deleted]]

>

> _______________________________________________

> R-sig-Geo mailing list

> [hidden email]

> https://stat.ethz.ch/mailman/listinfo/r-sig-geo

> --

Roger Bivand

Department of Economics, Norwegian School of Economics,

Helleveien 30, N-5045 Bergen, Norway.

voice: +47 55 95 93 55; e-mail: [hidden email]

http://orcid.org/0000-0003-2392-6140

https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en

_______________________________________________

R-sig-Geo mailing list

[hidden email]

https://stat.ethz.ch/mailman/listinfo/r-sig-geo

Roger Bivand

Department of Economics

Norwegian School of Economics

Helleveien 30

N-5045 Bergen, Norway

### Re: How to find all first order neighbors of a collection of points

On Fri, 13 Jul 2018, Benjamin Lieberman wrote:

> Roger anf Facu,

>

> Thank you very much for the help. In terms of the data, I only provided

> the ID and Lat/Long pairs because they were the only covariates which

> were necessary. The data set we are using was purchased and contains

> voter registration information, voter history, and census tract

> information, after some geocoding took place. The locations are the

> residents houses, in this instance.

>

> I have rerun the knn with longlat = T, but I am still hung up on the

> idea of the first order neighbors. I have reread the vignette and

> section 5 discusses High-Order Neighbors, but there isn’t any mention of

> first or second order neighbors, as you mentioned above (“first order

> neighbors are not defined”). One of the pieces of literature I found

> said that polygons are problematic to work with, as are tesslations for

> precisely the reason you mentioned, non-planarity. For this reason, I am

> hung up on the idea of how to find all first order neighbors for a

> point, especially as the number of first order neighbors varies from

> point to point, and such knearneigh would not be appropriate here. So project them, and use Euclidean distances in distance or graph-based

methods (or knn). You still have not defined "first order neighbors". That

is your call alone. If you believe that voter behaviour is like a

contagious disease, define contagion, and from that "first order

neighbours". If you are simply accounting for missing background

covariates that have a larger spatial footprint rather than voter-voter

interaction, it probably doesn't matter much. What is the implied model

here - that voters behave by observing the behaviour of their proximate

neighbours (giving similar behaviour for near neighbours) or that voters

are patched/segregated by residence, and near neighbours behave similarly

not because of information spillovers between voters, but because the

voters are subject to aggregate social/economic conditions?

Roger

>

> If this is something that does not seem feasible, maybe another tactic

> is necessary.

>

> Again, thank you all for the help.

>

> Warmest

> --

> Benjamin Lieberman

> Muhlenberg College 2019

> Mobile: 301.299.8928

>

>> On Jul 13, 2018, at 6:11 AM, Roger Bivand <[hidden email]> wrote:

>>

>> On Fri, 13 Jul 2018, Facundo Muñoz wrote:

>>

>>> Dear Benjamin,

>>>

>>> I'm not sure how you define "first order neighbors" for a point. The

>>> first thing that comes to my mind is to use their corresponding voronoi

>>> polygons and define neighborhood from there. Following your code:

>>

>> Thanks, the main source of confusion is that "first order neighbors" are not defined. A k=1 neighbour could be (as below), as could k=6, or voronoi neighbours, or sphere of influence etc. So reading vignette("nb") would be a starting point.

>>

>> Also note that voronoi and other graph-based neighbours should only use planar coordinates - including dismo::voronoi, which uses deldir::deldir() - just like spdep::tri2nb(). Triangulation can lead to spurious neighbours on the convex hull.

>>

>>>

>>> v <- dismo::voronoi(coords)

>>> par(mfrow = c(1, 2), xaxt = "n", yaxt = "n", mgp = c(0, 0, 0))

>>> plot(coords, type = "n", xlab = NA, ylab = NA)

>>> plot(v, add = TRUE)

>>> text(x = coords[, 1], y = coords[, 2], labels = voter.subset$Voter.ID)

>>> plot(coords, type = "n", xlab = NA, ylab = NA)

>>> plot(poly2nb(v), coords, add = TRUE, col = "gray")

>>>

>>> ƒacu.-

>>>

>>>

>>> On 07/12/2018 09:00 PM, Benjamin Lieberman wrote:

>>>> Hi all,

>>>>

>>>> Currently, I am working with U.S. voter data. Below, I included a brief example of the structure of the data with some reproducible code. My data set consists of roughly 233,000 (233k) entries, each specifying a voter and their particular latitude/longitude pair.

>>

>> Using individual voter data is highly dangerous, and must in every case be subject to the strictest privacy rules. Voter data does not in essence have position - the only valid voting data that has position is of the voting station/precinct, and those data are aggregated to preserve anonymity.

>>

>> Why does position and voter data not have position? Which location should you use - residence, workplace, what? What are these locations proxying? Nothing valid can be drawn from "just voter data" - you can get conclusions from carefully constructed stratified exit polls, but there the key gender/age/ethnicity/social class/etc. confounders are handled by design. Why should voting decisions be influenced by proximity (they are not)? The missing element here is looking carefully at relevant covariates at more aggregated levels (in the US typically zoning controlling social class positional segregation, etc.).

>>

>>>> I have been using the spdep package with the hope of creating a CAR model. To begin the analysis, we need to find all first order neighbors of every point in the data.

>>>>

>>>> While spdep has fantastic commands for finding k nearest neighbors (knearneigh), and a useful command for finding lag of order 3 or more (nblag), I have yet to find a method which is suitable for our purposes (lag = 1, or lag =2). Additionally, I looked into altering the nblag command to accommodate maxlag = 1 or maxlag = 2, but the command relies on an nb format, which is problematic as we are looking for the underlying neighborhood structure.

>>>>

>>>> There has been numerous work done with polygons, or data which already is in “nb” format, but after reading the literature, it seems that polygons are not appropriate, nor are distance based neighbor techniques, due to density fluctuations over the area of interest.

>>>>

>>>> Below is some reproducible code I wrote. I would like to note that I am currently working in R 1.1.453 on a MacBook.

>>

>> You mean RStudio, there is no such version of R.

>>

>>>>

>>>> # Create a data frame of 10 voters, picked at random

>>>> voter.1 = c(1, -75.52187, 40.62320)

>>>> voter.2 = c(2,-75.56373, 40.55216)

>>>> voter.3 = c(3,-75.39587, 40.55416)

>>>> voter.4 = c(4,-75.42248, 40.64326)

>>>> voter.5 = c(5,-75.56654, 40.54948)

>>>> voter.6 = c(6,-75.56257, 40.67375)

>>>> voter.7 = c(7, -75.51888, 40.59715)

>>>> voter.8 = c(8, -75.59879, 40.60014)

>>>> voter.9 = c(9, -75.59879, 40.60014)

>>>> voter.10 = c(10, -75.50877, 40.53129)

>>>>

>>

>> These are in geographical coordinates.

>>

>>>> # Bind the vectors together

>>>> voter.subset = rbind(voter.1, voter.2, voter.3, voter.4, voter.5, voter.6, voter.7, voter.8, voter.9, voter.10)

>>>>

>>>> # Rename the columns

>>>> colnames(voter.subset) = c("Voter.ID", "Longitude", "Latitude")

>>>>

>>>> # Change the class from a matrix to a data frame

>>>> voter.subset = as.data.frame(voter.subset)

>>>>

>>>> # Load in the required packages

>>>> library(spdep)

>>>> library(sp)

>>>>

>>>> # Set the coordinates

>>>> coordinates(voter.subset) = c("Longitude", "Latitude")

>>>> coords = coordinates(voter.subset)

>>>>

>>>> # Jitter to ensure no duplicate points

>>>> coords = jitter(coords, factor = 1)

>>>>

>>

>> jitter does not respect geographical coordinated (decimal degree metric).

>>

>>>> # Find the first nearest neighbor of each point

>>>> one.nn = knearneigh(coords, k=1)

>>

>> See the help page (hint: longlat=TRUE to use Great Circle distances, much slower than planar).

>>

>>>>

>>>> # Convert the first nearest neighbor to format "nb"

>>>> one.nn_nb = knn2nb(one.nn, sym = F)

>>>>

>>>> Thank you in advance for any help you may offer, and for taking the time to read this. I have consulted Applied Spatial Data Analysis with R (Bivand, Pebesma, Gomez-Rubio), as well as other Sig-Geo threads, the spdep documentation, and the nb vignette (Bivand, April 3, 2018) from earlier this year.

>>>>

>>>> Warmest,

>>>> Ben

>>>> --

>>>> Benjamin Lieberman

>>>> Muhlenberg College 2019

>>>> Mobile: 301.299.8928

>>>>

>>>>

>>>>

>>>> [[alternative HTML version deleted]]

>>

>> Plain text only, please.

>>

>>>>

>>>> _______________________________________________

>>>> R-sig-Geo mailing list

>>>> [hidden email] <mailto:[hidden email]>

>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo <https://stat.ethz.ch/mailman/listinfo/r-sig-geo>

>>>

>>>

>>> [[alternative HTML version deleted]]

>>>

>>> _______________________________________________

>>> R-sig-Geo mailing list

>>> [hidden email] <mailto:[hidden email]>

>>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo <https://stat.ethz.ch/mailman/listinfo/r-sig-geo>

>>>

>>

>> --

>> Roger Bivand

>> Department of Economics, Norwegian School of Economics,

>> Helleveien 30, N-5045 Bergen, Norway.

>> voice: +47 55 95 93 55; e-mail: [hidden email] <mailto:[hidden email]>

>> http://orcid.org/0000-0003-2392-6140 <http://orcid.org/0000-0003-2392-6140>

>> https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en_______________________________________________ <https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en_______________________________________________>

>> R-sig-Geo mailing list

>> [hidden email] <mailto:[hidden email]>

>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo <https://stat.ethz.ch/mailman/listinfo/r-sig-geo>

>

> [[alternative HTML version deleted]]

>

> _______________________________________________

> R-sig-Geo mailing list

> [hidden email]

> https://stat.ethz.ch/mailman/listinfo/r-sig-geo

> --

Roger Bivand

Department of Economics, Norwegian School of Economics,

Helleveien 30, N-5045 Bergen, Norway.

voice: +47 55 95 93 55; e-mail: [hidden email]

http://orcid.org/0000-0003-2392-6140

https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en

_______________________________________________

R-sig-Geo mailing list

[hidden email]

https://stat.ethz.ch/mailman/listinfo/r-sig-geo

Roger Bivand

Department of Economics

Norwegian School of Economics

Helleveien 30

N-5045 Bergen, Norway

> Roger anf Facu,

>

> Thank you very much for the help. In terms of the data, I only provided

> the ID and Lat/Long pairs because they were the only covariates which

> were necessary. The data set we are using was purchased and contains

> voter registration information, voter history, and census tract

> information, after some geocoding took place. The locations are the

> residents houses, in this instance.

>

> I have rerun the knn with longlat = T, but I am still hung up on the

> idea of the first order neighbors. I have reread the vignette and

> section 5 discusses High-Order Neighbors, but there isn’t any mention of

> first or second order neighbors, as you mentioned above (“first order

> neighbors are not defined”). One of the pieces of literature I found

> said that polygons are problematic to work with, as are tesslations for

> precisely the reason you mentioned, non-planarity. For this reason, I am

> hung up on the idea of how to find all first order neighbors for a

> point, especially as the number of first order neighbors varies from

> point to point, and such knearneigh would not be appropriate here. So project them, and use Euclidean distances in distance or graph-based

methods (or knn). You still have not defined "first order neighbors". That

is your call alone. If you believe that voter behaviour is like a

contagious disease, define contagion, and from that "first order

neighbours". If you are simply accounting for missing background

covariates that have a larger spatial footprint rather than voter-voter

interaction, it probably doesn't matter much. What is the implied model

here - that voters behave by observing the behaviour of their proximate

neighbours (giving similar behaviour for near neighbours) or that voters

are patched/segregated by residence, and near neighbours behave similarly

not because of information spillovers between voters, but because the

voters are subject to aggregate social/economic conditions?

Roger

>

> If this is something that does not seem feasible, maybe another tactic

> is necessary.

>

> Again, thank you all for the help.

>

> Warmest

> --

> Benjamin Lieberman

> Muhlenberg College 2019

> Mobile: 301.299.8928

>

>> On Jul 13, 2018, at 6:11 AM, Roger Bivand <[hidden email]> wrote:

>>

>> On Fri, 13 Jul 2018, Facundo Muñoz wrote:

>>

>>> Dear Benjamin,

>>>

>>> I'm not sure how you define "first order neighbors" for a point. The

>>> first thing that comes to my mind is to use their corresponding voronoi

>>> polygons and define neighborhood from there. Following your code:

>>

>> Thanks, the main source of confusion is that "first order neighbors" are not defined. A k=1 neighbour could be (as below), as could k=6, or voronoi neighbours, or sphere of influence etc. So reading vignette("nb") would be a starting point.

>>

>> Also note that voronoi and other graph-based neighbours should only use planar coordinates - including dismo::voronoi, which uses deldir::deldir() - just like spdep::tri2nb(). Triangulation can lead to spurious neighbours on the convex hull.

>>

>>>

>>> v <- dismo::voronoi(coords)

>>> par(mfrow = c(1, 2), xaxt = "n", yaxt = "n", mgp = c(0, 0, 0))

>>> plot(coords, type = "n", xlab = NA, ylab = NA)

>>> plot(v, add = TRUE)

>>> text(x = coords[, 1], y = coords[, 2], labels = voter.subset$Voter.ID)

>>> plot(coords, type = "n", xlab = NA, ylab = NA)

>>> plot(poly2nb(v), coords, add = TRUE, col = "gray")

>>>

>>> ƒacu.-

>>>

>>>

>>> On 07/12/2018 09:00 PM, Benjamin Lieberman wrote:

>>>> Hi all,

>>>>

>>>> Currently, I am working with U.S. voter data. Below, I included a brief example of the structure of the data with some reproducible code. My data set consists of roughly 233,000 (233k) entries, each specifying a voter and their particular latitude/longitude pair.

>>

>> Using individual voter data is highly dangerous, and must in every case be subject to the strictest privacy rules. Voter data does not in essence have position - the only valid voting data that has position is of the voting station/precinct, and those data are aggregated to preserve anonymity.

>>

>> Why does position and voter data not have position? Which location should you use - residence, workplace, what? What are these locations proxying? Nothing valid can be drawn from "just voter data" - you can get conclusions from carefully constructed stratified exit polls, but there the key gender/age/ethnicity/social class/etc. confounders are handled by design. Why should voting decisions be influenced by proximity (they are not)? The missing element here is looking carefully at relevant covariates at more aggregated levels (in the US typically zoning controlling social class positional segregation, etc.).

>>

>>>> I have been using the spdep package with the hope of creating a CAR model. To begin the analysis, we need to find all first order neighbors of every point in the data.

>>>>

>>>> While spdep has fantastic commands for finding k nearest neighbors (knearneigh), and a useful command for finding lag of order 3 or more (nblag), I have yet to find a method which is suitable for our purposes (lag = 1, or lag =2). Additionally, I looked into altering the nblag command to accommodate maxlag = 1 or maxlag = 2, but the command relies on an nb format, which is problematic as we are looking for the underlying neighborhood structure.

>>>>

>>>> There has been numerous work done with polygons, or data which already is in “nb” format, but after reading the literature, it seems that polygons are not appropriate, nor are distance based neighbor techniques, due to density fluctuations over the area of interest.

>>>>

>>>> Below is some reproducible code I wrote. I would like to note that I am currently working in R 1.1.453 on a MacBook.

>>

>> You mean RStudio, there is no such version of R.

>>

>>>>

>>>> # Create a data frame of 10 voters, picked at random

>>>> voter.1 = c(1, -75.52187, 40.62320)

>>>> voter.2 = c(2,-75.56373, 40.55216)

>>>> voter.3 = c(3,-75.39587, 40.55416)

>>>> voter.4 = c(4,-75.42248, 40.64326)

>>>> voter.5 = c(5,-75.56654, 40.54948)

>>>> voter.6 = c(6,-75.56257, 40.67375)

>>>> voter.7 = c(7, -75.51888, 40.59715)

>>>> voter.8 = c(8, -75.59879, 40.60014)

>>>> voter.9 = c(9, -75.59879, 40.60014)

>>>> voter.10 = c(10, -75.50877, 40.53129)

>>>>

>>

>> These are in geographical coordinates.

>>

>>>> # Bind the vectors together

>>>> voter.subset = rbind(voter.1, voter.2, voter.3, voter.4, voter.5, voter.6, voter.7, voter.8, voter.9, voter.10)

>>>>

>>>> # Rename the columns

>>>> colnames(voter.subset) = c("Voter.ID", "Longitude", "Latitude")

>>>>

>>>> # Change the class from a matrix to a data frame

>>>> voter.subset = as.data.frame(voter.subset)

>>>>

>>>> # Load in the required packages

>>>> library(spdep)

>>>> library(sp)

>>>>

>>>> # Set the coordinates

>>>> coordinates(voter.subset) = c("Longitude", "Latitude")

>>>> coords = coordinates(voter.subset)

>>>>

>>>> # Jitter to ensure no duplicate points

>>>> coords = jitter(coords, factor = 1)

>>>>

>>

>> jitter does not respect geographical coordinated (decimal degree metric).

>>

>>>> # Find the first nearest neighbor of each point

>>>> one.nn = knearneigh(coords, k=1)

>>

>> See the help page (hint: longlat=TRUE to use Great Circle distances, much slower than planar).

>>

>>>>

>>>> # Convert the first nearest neighbor to format "nb"

>>>> one.nn_nb = knn2nb(one.nn, sym = F)

>>>>

>>>> Thank you in advance for any help you may offer, and for taking the time to read this. I have consulted Applied Spatial Data Analysis with R (Bivand, Pebesma, Gomez-Rubio), as well as other Sig-Geo threads, the spdep documentation, and the nb vignette (Bivand, April 3, 2018) from earlier this year.

>>>>

>>>> Warmest,

>>>> Ben

>>>> --

>>>> Benjamin Lieberman

>>>> Muhlenberg College 2019

>>>> Mobile: 301.299.8928

>>>>

>>>>

>>>>

>>>> [[alternative HTML version deleted]]

>>

>> Plain text only, please.

>>

>>>>

>>>> _______________________________________________

>>>> R-sig-Geo mailing list

>>>> [hidden email] <mailto:[hidden email]>

>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo <https://stat.ethz.ch/mailman/listinfo/r-sig-geo>

>>>

>>>

>>> [[alternative HTML version deleted]]

>>>

>>> _______________________________________________

>>> R-sig-Geo mailing list

>>> [hidden email] <mailto:[hidden email]>

>>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo <https://stat.ethz.ch/mailman/listinfo/r-sig-geo>

>>>

>>

>> --

>> Roger Bivand

>> Department of Economics, Norwegian School of Economics,

>> Helleveien 30, N-5045 Bergen, Norway.

>> voice: +47 55 95 93 55; e-mail: [hidden email] <mailto:[hidden email]>

>> http://orcid.org/0000-0003-2392-6140 <http://orcid.org/0000-0003-2392-6140>

>> https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en_______________________________________________ <https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en_______________________________________________>

>> R-sig-Geo mailing list

>> [hidden email] <mailto:[hidden email]>

>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo <https://stat.ethz.ch/mailman/listinfo/r-sig-geo>

>

> [[alternative HTML version deleted]]

>

> _______________________________________________

> R-sig-Geo mailing list

> [hidden email]

> https://stat.ethz.ch/mailman/listinfo/r-sig-geo

> --

Roger Bivand

Department of Economics, Norwegian School of Economics,

Helleveien 30, N-5045 Bergen, Norway.

voice: +47 55 95 93 55; e-mail: [hidden email]

http://orcid.org/0000-0003-2392-6140

https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en

_______________________________________________

R-sig-Geo mailing list

[hidden email]

https://stat.ethz.ch/mailman/listinfo/r-sig-geo

Roger Bivand

Department of Economics

Norwegian School of Economics

Helleveien 30

N-5045 Bergen, Norway

### Re: How to find all first order neighbors of a collection of points

All-

I would like to note that as the data is proprietary, and for obvious privacy concerns, the lat/long pairs were randomly generated, and were not taken directly from the data.

--

Benjamin Lieberman

Muhlenberg College 2019

Mobile: 301.299.8928

> On Jul 13, 2018, at 6:58 AM, Benjamin Lieberman <[hidden email]> wrote:

>

> Roger anf Facu,

>

> Thank you very much for the help. In terms of the data, I only provided the ID and Lat/Long pairs because they were the only covariates which were necessary. The data set we are using was purchased and contains voter registration information, voter history, and census tract information, after some geocoding took place. The locations are the residents houses, in this instance.

>

> I have rerun the knn with longlat = T, but I am still hung up on the idea of the first order neighbors. I have reread the vignette and section 5 discusses High-Order Neighbors, but there isn’t any mention of first or second order neighbors, as you mentioned above (“first order neighbors are not defined”). One of the pieces of literature I found said that polygons are problematic to work with, as are tesslations for precisely the reason you mentioned, non-planarity. For this reason, I am hung up on the idea of how to find all first order neighbors for a point, especially as the number of first order neighbors varies from point to point, and such knearneigh would not be appropriate here.

>

> If this is something that does not seem feasible, maybe another tactic is necessary.

>

> Again, thank you all for the help.

>

> Warmest

> --

> Benjamin Lieberman

> Muhlenberg College 2019

> Mobile: 301.299.8928

>

>> On Jul 13, 2018, at 6:11 AM, Roger Bivand <[hidden email] <mailto:[hidden email]>> wrote:

>>

>> On Fri, 13 Jul 2018, Facundo Muñoz wrote:

>>

>>> Dear Benjamin,

>>>

>>> I'm not sure how you define "first order neighbors" for a point. The

>>> first thing that comes to my mind is to use their corresponding voronoi

>>> polygons and define neighborhood from there. Following your code:

>>

>> Thanks, the main source of confusion is that "first order neighbors" are not defined. A k=1 neighbour could be (as below), as could k=6, or voronoi neighbours, or sphere of influence etc. So reading vignette("nb") would be a starting point.

>>

>> Also note that voronoi and other graph-based neighbours should only use planar coordinates - including dismo::voronoi, which uses deldir::deldir() - just like spdep::tri2nb(). Triangulation can lead to spurious neighbours on the convex hull.

>>

>>>

>>> v <- dismo::voronoi(coords)

>>> par(mfrow = c(1, 2), xaxt = "n", yaxt = "n", mgp = c(0, 0, 0))

>>> plot(coords, type = "n", xlab = NA, ylab = NA)

>>> plot(v, add = TRUE)

>>> text(x = coords[, 1], y = coords[, 2], labels = voter.subset$Voter.ID)

>>> plot(coords, type = "n", xlab = NA, ylab = NA)

>>> plot(poly2nb(v), coords, add = TRUE, col = "gray")

>>>

>>> ƒacu.-

>>>

>>>

>>> On 07/12/2018 09:00 PM, Benjamin Lieberman wrote:

>>>> Hi all,

>>>>

>>>> Currently, I am working with U.S. voter data. Below, I included a brief example of the structure of the data with some reproducible code. My data set consists of roughly 233,000 (233k) entries, each specifying a voter and their particular latitude/longitude pair.

>>

>> Using individual voter data is highly dangerous, and must in every case be subject to the strictest privacy rules. Voter data does not in essence have position - the only valid voting data that has position is of the voting station/precinct, and those data are aggregated to preserve anonymity.

>>

>> Why does position and voter data not have position? Which location should you use - residence, workplace, what? What are these locations proxying? Nothing valid can be drawn from "just voter data" - you can get conclusions from carefully constructed stratified exit polls, but there the key gender/age/ethnicity/social class/etc. confounders are handled by design. Why should voting decisions be influenced by proximity (they are not)? The missing element here is looking carefully at relevant covariates at more aggregated levels (in the US typically zoning controlling social class positional segregation, etc.).

>>

>>>> I have been using the spdep package with the hope of creating a CAR model. To begin the analysis, we need to find all first order neighbors of every point in the data.

>>>>

>>>> While spdep has fantastic commands for finding k nearest neighbors (knearneigh), and a useful command for finding lag of order 3 or more (nblag), I have yet to find a method which is suitable for our purposes (lag = 1, or lag =2). Additionally, I looked into altering the nblag command to accommodate maxlag = 1 or maxlag = 2, but the command relies on an nb format, which is problematic as we are looking for the underlying neighborhood structure.

>>>>

>>>> There has been numerous work done with polygons, or data which already is in “nb” format, but after reading the literature, it seems that polygons are not appropriate, nor are distance based neighbor techniques, due to density fluctuations over the area of interest.

>>>>

>>>> Below is some reproducible code I wrote. I would like to note that I am currently working in R 1.1.453 on a MacBook.

>>

>> You mean RStudio, there is no such version of R.

>>

>>>>

>>>> # Create a data frame of 10 voters, picked at random

>>>> voter.1 = c(1, -75.52187, 40.62320)

>>>> voter.2 = c(2,-75.56373, 40.55216)

>>>> voter.3 = c(3,-75.39587, 40.55416)

>>>> voter.4 = c(4,-75.42248, 40.64326)

>>>> voter.5 = c(5,-75.56654, 40.54948)

>>>> voter.6 = c(6,-75.56257, 40.67375)

>>>> voter.7 = c(7, -75.51888, 40.59715)

>>>> voter.8 = c(8, -75.59879, 40.60014)

>>>> voter.9 = c(9, -75.59879, 40.60014)

>>>> voter.10 = c(10, -75.50877, 40.53129)

>>>>

>>

>> These are in geographical coordinates.

>>

>>>> # Bind the vectors together

>>>> voter.subset = rbind(voter.1, voter.2, voter.3, voter.4, voter.5, voter.6, voter.7, voter.8, voter.9, voter.10)

>>>>

>>>> # Rename the columns

>>>> colnames(voter.subset) = c("Voter.ID", "Longitude", "Latitude")

>>>>

>>>> # Change the class from a matrix to a data frame

>>>> voter.subset = as.data.frame(voter.subset)

>>>>

>>>> # Load in the required packages

>>>> library(spdep)

>>>> library(sp)

>>>>

>>>> # Set the coordinates

>>>> coordinates(voter.subset) = c("Longitude", "Latitude")

>>>> coords = coordinates(voter.subset)

>>>>

>>>> # Jitter to ensure no duplicate points

>>>> coords = jitter(coords, factor = 1)

>>>>

>>

>> jitter does not respect geographical coordinated (decimal degree metric).

>>

>>>> # Find the first nearest neighbor of each point

>>>> one.nn = knearneigh(coords, k=1)

>>

>> See the help page (hint: longlat=TRUE to use Great Circle distances, much slower than planar).

>>

>>>>

>>>> # Convert the first nearest neighbor to format "nb"

>>>> one.nn_nb = knn2nb(one.nn, sym = F)

>>>>

>>>> Thank you in advance for any help you may offer, and for taking the time to read this. I have consulted Applied Spatial Data Analysis with R (Bivand, Pebesma, Gomez-Rubio), as well as other Sig-Geo threads, the spdep documentation, and the nb vignette (Bivand, April 3, 2018) from earlier this year.

>>>>

>>>> Warmest,

>>>> Ben

>>>> --

>>>> Benjamin Lieberman

>>>> Muhlenberg College 2019

>>>> Mobile: 301.299.8928

>>>>

>>>>

>>>>

>>>> [[alternative HTML version deleted]]

>>

>> Plain text only, please.

>>

>>>>

>>>> _______________________________________________

>>>> R-sig-Geo mailing list

>>>> [hidden email] <mailto:[hidden email]>

>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo <https://stat.ethz.ch/mailman/listinfo/r-sig-geo>

>>>

>>>

>>> [[alternative HTML version deleted]]

>>>

>>> _______________________________________________

>>> R-sig-Geo mailing list

>>> [hidden email] <mailto:[hidden email]>

>>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo <https://stat.ethz.ch/mailman/listinfo/r-sig-geo>

>>>

>>

>> --

>> Roger Bivand

>> Department of Economics, Norwegian School of Economics,

>> Helleveien 30, N-5045 Bergen, Norway.

>> voice: +47 55 95 93 55; e-mail: [hidden email] <mailto:[hidden email]>

>> http://orcid.org/0000-0003-2392-6140 <http://orcid.org/0000-0003-2392-6140>

>> https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en_______________________________________________ <https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en_______________________________________________>

>> R-sig-Geo mailing list

>> [hidden email] <mailto:[hidden email]>

>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo <https://stat.ethz.ch/mailman/listinfo/r-sig-geo>

[[alternative HTML version deleted]]

_______________________________________________

R-sig-Geo mailing list

[hidden email]

https://stat.ethz.ch/mailman/listinfo/r-sig-geo

I would like to note that as the data is proprietary, and for obvious privacy concerns, the lat/long pairs were randomly generated, and were not taken directly from the data.

--

Benjamin Lieberman

Muhlenberg College 2019

Mobile: 301.299.8928

> On Jul 13, 2018, at 6:58 AM, Benjamin Lieberman <[hidden email]> wrote:

>

> Roger anf Facu,

>

> Thank you very much for the help. In terms of the data, I only provided the ID and Lat/Long pairs because they were the only covariates which were necessary. The data set we are using was purchased and contains voter registration information, voter history, and census tract information, after some geocoding took place. The locations are the residents houses, in this instance.

>

> I have rerun the knn with longlat = T, but I am still hung up on the idea of the first order neighbors. I have reread the vignette and section 5 discusses High-Order Neighbors, but there isn’t any mention of first or second order neighbors, as you mentioned above (“first order neighbors are not defined”). One of the pieces of literature I found said that polygons are problematic to work with, as are tesslations for precisely the reason you mentioned, non-planarity. For this reason, I am hung up on the idea of how to find all first order neighbors for a point, especially as the number of first order neighbors varies from point to point, and such knearneigh would not be appropriate here.

>

> If this is something that does not seem feasible, maybe another tactic is necessary.

>

> Again, thank you all for the help.

>

> Warmest

> --

> Benjamin Lieberman

> Muhlenberg College 2019

> Mobile: 301.299.8928

>

>> On Jul 13, 2018, at 6:11 AM, Roger Bivand <[hidden email] <mailto:[hidden email]>> wrote:

>>

>> On Fri, 13 Jul 2018, Facundo Muñoz wrote:

>>

>>> Dear Benjamin,

>>>

>>> I'm not sure how you define "first order neighbors" for a point. The

>>> first thing that comes to my mind is to use their corresponding voronoi

>>> polygons and define neighborhood from there. Following your code:

>>

>> Thanks, the main source of confusion is that "first order neighbors" are not defined. A k=1 neighbour could be (as below), as could k=6, or voronoi neighbours, or sphere of influence etc. So reading vignette("nb") would be a starting point.

>>

>> Also note that voronoi and other graph-based neighbours should only use planar coordinates - including dismo::voronoi, which uses deldir::deldir() - just like spdep::tri2nb(). Triangulation can lead to spurious neighbours on the convex hull.

>>

>>>

>>> v <- dismo::voronoi(coords)

>>> par(mfrow = c(1, 2), xaxt = "n", yaxt = "n", mgp = c(0, 0, 0))

>>> plot(coords, type = "n", xlab = NA, ylab = NA)

>>> plot(v, add = TRUE)

>>> text(x = coords[, 1], y = coords[, 2], labels = voter.subset$Voter.ID)

>>> plot(coords, type = "n", xlab = NA, ylab = NA)

>>> plot(poly2nb(v), coords, add = TRUE, col = "gray")

>>>

>>> ƒacu.-

>>>

>>>

>>> On 07/12/2018 09:00 PM, Benjamin Lieberman wrote:

>>>> Hi all,

>>>>

>>>> Currently, I am working with U.S. voter data. Below, I included a brief example of the structure of the data with some reproducible code. My data set consists of roughly 233,000 (233k) entries, each specifying a voter and their particular latitude/longitude pair.

>>

>> Using individual voter data is highly dangerous, and must in every case be subject to the strictest privacy rules. Voter data does not in essence have position - the only valid voting data that has position is of the voting station/precinct, and those data are aggregated to preserve anonymity.

>>

>> Why does position and voter data not have position? Which location should you use - residence, workplace, what? What are these locations proxying? Nothing valid can be drawn from "just voter data" - you can get conclusions from carefully constructed stratified exit polls, but there the key gender/age/ethnicity/social class/etc. confounders are handled by design. Why should voting decisions be influenced by proximity (they are not)? The missing element here is looking carefully at relevant covariates at more aggregated levels (in the US typically zoning controlling social class positional segregation, etc.).

>>

>>>> I have been using the spdep package with the hope of creating a CAR model. To begin the analysis, we need to find all first order neighbors of every point in the data.

>>>>

>>>> While spdep has fantastic commands for finding k nearest neighbors (knearneigh), and a useful command for finding lag of order 3 or more (nblag), I have yet to find a method which is suitable for our purposes (lag = 1, or lag =2). Additionally, I looked into altering the nblag command to accommodate maxlag = 1 or maxlag = 2, but the command relies on an nb format, which is problematic as we are looking for the underlying neighborhood structure.

>>>>

>>>> There has been numerous work done with polygons, or data which already is in “nb” format, but after reading the literature, it seems that polygons are not appropriate, nor are distance based neighbor techniques, due to density fluctuations over the area of interest.

>>>>

>>>> Below is some reproducible code I wrote. I would like to note that I am currently working in R 1.1.453 on a MacBook.

>>

>> You mean RStudio, there is no such version of R.

>>

>>>>

>>>> # Create a data frame of 10 voters, picked at random

>>>> voter.1 = c(1, -75.52187, 40.62320)

>>>> voter.2 = c(2,-75.56373, 40.55216)

>>>> voter.3 = c(3,-75.39587, 40.55416)

>>>> voter.4 = c(4,-75.42248, 40.64326)

>>>> voter.5 = c(5,-75.56654, 40.54948)

>>>> voter.6 = c(6,-75.56257, 40.67375)

>>>> voter.7 = c(7, -75.51888, 40.59715)

>>>> voter.8 = c(8, -75.59879, 40.60014)

>>>> voter.9 = c(9, -75.59879, 40.60014)

>>>> voter.10 = c(10, -75.50877, 40.53129)

>>>>

>>

>> These are in geographical coordinates.

>>

>>>> # Bind the vectors together

>>>> voter.subset = rbind(voter.1, voter.2, voter.3, voter.4, voter.5, voter.6, voter.7, voter.8, voter.9, voter.10)

>>>>

>>>> # Rename the columns

>>>> colnames(voter.subset) = c("Voter.ID", "Longitude", "Latitude")

>>>>

>>>> # Change the class from a matrix to a data frame

>>>> voter.subset = as.data.frame(voter.subset)

>>>>

>>>> # Load in the required packages

>>>> library(spdep)

>>>> library(sp)

>>>>

>>>> # Set the coordinates

>>>> coordinates(voter.subset) = c("Longitude", "Latitude")

>>>> coords = coordinates(voter.subset)

>>>>

>>>> # Jitter to ensure no duplicate points

>>>> coords = jitter(coords, factor = 1)

>>>>

>>

>> jitter does not respect geographical coordinated (decimal degree metric).

>>

>>>> # Find the first nearest neighbor of each point

>>>> one.nn = knearneigh(coords, k=1)

>>

>> See the help page (hint: longlat=TRUE to use Great Circle distances, much slower than planar).

>>

>>>>

>>>> # Convert the first nearest neighbor to format "nb"

>>>> one.nn_nb = knn2nb(one.nn, sym = F)

>>>>

>>>> Thank you in advance for any help you may offer, and for taking the time to read this. I have consulted Applied Spatial Data Analysis with R (Bivand, Pebesma, Gomez-Rubio), as well as other Sig-Geo threads, the spdep documentation, and the nb vignette (Bivand, April 3, 2018) from earlier this year.

>>>>

>>>> Warmest,

>>>> Ben

>>>> --

>>>> Benjamin Lieberman

>>>> Muhlenberg College 2019

>>>> Mobile: 301.299.8928

>>>>

>>>>

>>>>

>>>> [[alternative HTML version deleted]]

>>

>> Plain text only, please.

>>

>>>>

>>>> _______________________________________________

>>>> R-sig-Geo mailing list

>>>> [hidden email] <mailto:[hidden email]>

>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo <https://stat.ethz.ch/mailman/listinfo/r-sig-geo>

>>>

>>>

>>> [[alternative HTML version deleted]]

>>>

>>> _______________________________________________

>>> R-sig-Geo mailing list

>>> [hidden email] <mailto:[hidden email]>

>>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo <https://stat.ethz.ch/mailman/listinfo/r-sig-geo>

>>>

>>

>> --

>> Roger Bivand

>> Department of Economics, Norwegian School of Economics,

>> Helleveien 30, N-5045 Bergen, Norway.

>> voice: +47 55 95 93 55; e-mail: [hidden email] <mailto:[hidden email]>

>> http://orcid.org/0000-0003-2392-6140 <http://orcid.org/0000-0003-2392-6140>

>> https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en_______________________________________________ <https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en_______________________________________________>

>> R-sig-Geo mailing list

>> [hidden email] <mailto:[hidden email]>

>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo <https://stat.ethz.ch/mailman/listinfo/r-sig-geo>

[[alternative HTML version deleted]]

_______________________________________________

R-sig-Geo mailing list

[hidden email]

https://stat.ethz.ch/mailman/listinfo/r-sig-geo

### Re: How to find all first order neighbors of a collection of points

Roger anf Facu,

Thank you very much for the help. In terms of the data, I only provided the ID and Lat/Long pairs because they were the only covariates which were necessary. The data set we are using was purchased and contains voter registration information, voter history, and census tract information, after some geocoding took place. The locations are the residents houses, in this instance.

I have rerun the knn with longlat = T, but I am still hung up on the idea of the first order neighbors. I have reread the vignette and section 5 discusses High-Order Neighbors, but there isn’t any mention of first or second order neighbors, as you mentioned above (“first order neighbors are not defined”). One of the pieces of literature I found said that polygons are problematic to work with, as are tesslations for precisely the reason you mentioned, non-planarity. For this reason, I am hung up on the idea of how to find all first order neighbors for a point, especially as the number of first order neighbors varies from point to point, and such knearneigh would not be appropriate here.

If this is something that does not seem feasible, maybe another tactic is necessary.

Again, thank you all for the help.

Warmest

--

Benjamin Lieberman

Muhlenberg College 2019

Mobile: 301.299.8928

> On Jul 13, 2018, at 6:11 AM, Roger Bivand <[hidden email]> wrote:

>

> On Fri, 13 Jul 2018, Facundo Muñoz wrote:

>

>> Dear Benjamin,

>>

>> I'm not sure how you define "first order neighbors" for a point. The

>> first thing that comes to my mind is to use their corresponding voronoi

>> polygons and define neighborhood from there. Following your code:

>

> Thanks, the main source of confusion is that "first order neighbors" are not defined. A k=1 neighbour could be (as below), as could k=6, or voronoi neighbours, or sphere of influence etc. So reading vignette("nb") would be a starting point.

>

> Also note that voronoi and other graph-based neighbours should only use planar coordinates - including dismo::voronoi, which uses deldir::deldir() - just like spdep::tri2nb(). Triangulation can lead to spurious neighbours on the convex hull.

>

>>

>> v <- dismo::voronoi(coords)

>> par(mfrow = c(1, 2), xaxt = "n", yaxt = "n", mgp = c(0, 0, 0))

>> plot(coords, type = "n", xlab = NA, ylab = NA)

>> plot(v, add = TRUE)

>> text(x = coords[, 1], y = coords[, 2], labels = voter.subset$Voter.ID)

>> plot(coords, type = "n", xlab = NA, ylab = NA)

>> plot(poly2nb(v), coords, add = TRUE, col = "gray")

>>

>> ƒacu.-

>>

>>

>> On 07/12/2018 09:00 PM, Benjamin Lieberman wrote:

>>> Hi all,

>>>

>>> Currently, I am working with U.S. voter data. Below, I included a brief example of the structure of the data with some reproducible code. My data set consists of roughly 233,000 (233k) entries, each specifying a voter and their particular latitude/longitude pair.

>

> Using individual voter data is highly dangerous, and must in every case be subject to the strictest privacy rules. Voter data does not in essence have position - the only valid voting data that has position is of the voting station/precinct, and those data are aggregated to preserve anonymity.

>

> Why does position and voter data not have position? Which location should you use - residence, workplace, what? What are these locations proxying? Nothing valid can be drawn from "just voter data" - you can get conclusions from carefully constructed stratified exit polls, but there the key gender/age/ethnicity/social class/etc. confounders are handled by design. Why should voting decisions be influenced by proximity (they are not)? The missing element here is looking carefully at relevant covariates at more aggregated levels (in the US typically zoning controlling social class positional segregation, etc.).

>

>>> I have been using the spdep package with the hope of creating a CAR model. To begin the analysis, we need to find all first order neighbors of every point in the data.

>>>

>>> While spdep has fantastic commands for finding k nearest neighbors (knearneigh), and a useful command for finding lag of order 3 or more (nblag), I have yet to find a method which is suitable for our purposes (lag = 1, or lag =2). Additionally, I looked into altering the nblag command to accommodate maxlag = 1 or maxlag = 2, but the command relies on an nb format, which is problematic as we are looking for the underlying neighborhood structure.

>>>

>>> There has been numerous work done with polygons, or data which already is in “nb” format, but after reading the literature, it seems that polygons are not appropriate, nor are distance based neighbor techniques, due to density fluctuations over the area of interest.

>>>

>>> Below is some reproducible code I wrote. I would like to note that I am currently working in R 1.1.453 on a MacBook.

>

> You mean RStudio, there is no such version of R.

>

>>>

>>> # Create a data frame of 10 voters, picked at random

>>> voter.1 = c(1, -75.52187, 40.62320)

>>> voter.2 = c(2,-75.56373, 40.55216)

>>> voter.3 = c(3,-75.39587, 40.55416)

>>> voter.4 = c(4,-75.42248, 40.64326)

>>> voter.5 = c(5,-75.56654, 40.54948)

>>> voter.6 = c(6,-75.56257, 40.67375)

>>> voter.7 = c(7, -75.51888, 40.59715)

>>> voter.8 = c(8, -75.59879, 40.60014)

>>> voter.9 = c(9, -75.59879, 40.60014)

>>> voter.10 = c(10, -75.50877, 40.53129)

>>>

>

> These are in geographical coordinates.

>

>>> # Bind the vectors together

>>> voter.subset = rbind(voter.1, voter.2, voter.3, voter.4, voter.5, voter.6, voter.7, voter.8, voter.9, voter.10)

>>>

>>> # Rename the columns

>>> colnames(voter.subset) = c("Voter.ID", "Longitude", "Latitude")

>>>

>>> # Change the class from a matrix to a data frame

>>> voter.subset = as.data.frame(voter.subset)

>>>

>>> # Load in the required packages

>>> library(spdep)

>>> library(sp)

>>>

>>> # Set the coordinates

>>> coordinates(voter.subset) = c("Longitude", "Latitude")

>>> coords = coordinates(voter.subset)

>>>

>>> # Jitter to ensure no duplicate points

>>> coords = jitter(coords, factor = 1)

>>>

>

> jitter does not respect geographical coordinated (decimal degree metric).

>

>>> # Find the first nearest neighbor of each point

>>> one.nn = knearneigh(coords, k=1)

>

> See the help page (hint: longlat=TRUE to use Great Circle distances, much slower than planar).

>

>>>

>>> # Convert the first nearest neighbor to format "nb"

>>> one.nn_nb = knn2nb(one.nn, sym = F)

>>>

>>> Thank you in advance for any help you may offer, and for taking the time to read this. I have consulted Applied Spatial Data Analysis with R (Bivand, Pebesma, Gomez-Rubio), as well as other Sig-Geo threads, the spdep documentation, and the nb vignette (Bivand, April 3, 2018) from earlier this year.

>>>

>>> Warmest,

>>> Ben

>>> --

>>> Benjamin Lieberman

>>> Muhlenberg College 2019

>>> Mobile: 301.299.8928

>>>

>>>

>>>

>>> [[alternative HTML version deleted]]

>

> Plain text only, please.

>

>>>

>>> _______________________________________________

>>> R-sig-Geo mailing list

>>> [hidden email] <mailto:[hidden email]>

>>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo <https://stat.ethz.ch/mailman/listinfo/r-sig-geo>

>>

>>

>> [[alternative HTML version deleted]]

>>

>> _______________________________________________

>> R-sig-Geo mailing list

>> [hidden email] <mailto:[hidden email]>

>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo <https://stat.ethz.ch/mailman/listinfo/r-sig-geo>

>>

>

> --

> Roger Bivand

> Department of Economics, Norwegian School of Economics,

> Helleveien 30, N-5045 Bergen, Norway.

> voice: +47 55 95 93 55; e-mail: [hidden email] <mailto:[hidden email]>

> http://orcid.org/0000-0003-2392-6140 <http://orcid.org/0000-0003-2392-6140>

> https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en_______________________________________________ <https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en_______________________________________________>

> R-sig-Geo mailing list

> [hidden email] <mailto:[hidden email]>

> https://stat.ethz.ch/mailman/listinfo/r-sig-geo <https://stat.ethz.ch/mailman/listinfo/r-sig-geo>

[[alternative HTML version deleted]]

_______________________________________________

R-sig-Geo mailing list

[hidden email]

https://stat.ethz.ch/mailman/listinfo/r-sig-geo

Thank you very much for the help. In terms of the data, I only provided the ID and Lat/Long pairs because they were the only covariates which were necessary. The data set we are using was purchased and contains voter registration information, voter history, and census tract information, after some geocoding took place. The locations are the residents houses, in this instance.

I have rerun the knn with longlat = T, but I am still hung up on the idea of the first order neighbors. I have reread the vignette and section 5 discusses High-Order Neighbors, but there isn’t any mention of first or second order neighbors, as you mentioned above (“first order neighbors are not defined”). One of the pieces of literature I found said that polygons are problematic to work with, as are tesslations for precisely the reason you mentioned, non-planarity. For this reason, I am hung up on the idea of how to find all first order neighbors for a point, especially as the number of first order neighbors varies from point to point, and such knearneigh would not be appropriate here.

If this is something that does not seem feasible, maybe another tactic is necessary.

Again, thank you all for the help.

Warmest

--

Benjamin Lieberman

Muhlenberg College 2019

Mobile: 301.299.8928

> On Jul 13, 2018, at 6:11 AM, Roger Bivand <[hidden email]> wrote:

>

> On Fri, 13 Jul 2018, Facundo Muñoz wrote:

>

>> Dear Benjamin,

>>

>> I'm not sure how you define "first order neighbors" for a point. The

>> first thing that comes to my mind is to use their corresponding voronoi

>> polygons and define neighborhood from there. Following your code:

>

> Thanks, the main source of confusion is that "first order neighbors" are not defined. A k=1 neighbour could be (as below), as could k=6, or voronoi neighbours, or sphere of influence etc. So reading vignette("nb") would be a starting point.

>

> Also note that voronoi and other graph-based neighbours should only use planar coordinates - including dismo::voronoi, which uses deldir::deldir() - just like spdep::tri2nb(). Triangulation can lead to spurious neighbours on the convex hull.

>

>>

>> v <- dismo::voronoi(coords)

>> par(mfrow = c(1, 2), xaxt = "n", yaxt = "n", mgp = c(0, 0, 0))

>> plot(coords, type = "n", xlab = NA, ylab = NA)

>> plot(v, add = TRUE)

>> text(x = coords[, 1], y = coords[, 2], labels = voter.subset$Voter.ID)

>> plot(coords, type = "n", xlab = NA, ylab = NA)

>> plot(poly2nb(v), coords, add = TRUE, col = "gray")

>>

>> ƒacu.-

>>

>>

>> On 07/12/2018 09:00 PM, Benjamin Lieberman wrote:

>>> Hi all,

>>>

>>> Currently, I am working with U.S. voter data. Below, I included a brief example of the structure of the data with some reproducible code. My data set consists of roughly 233,000 (233k) entries, each specifying a voter and their particular latitude/longitude pair.

>

> Using individual voter data is highly dangerous, and must in every case be subject to the strictest privacy rules. Voter data does not in essence have position - the only valid voting data that has position is of the voting station/precinct, and those data are aggregated to preserve anonymity.

>

> Why does position and voter data not have position? Which location should you use - residence, workplace, what? What are these locations proxying? Nothing valid can be drawn from "just voter data" - you can get conclusions from carefully constructed stratified exit polls, but there the key gender/age/ethnicity/social class/etc. confounders are handled by design. Why should voting decisions be influenced by proximity (they are not)? The missing element here is looking carefully at relevant covariates at more aggregated levels (in the US typically zoning controlling social class positional segregation, etc.).

>

>>> I have been using the spdep package with the hope of creating a CAR model. To begin the analysis, we need to find all first order neighbors of every point in the data.

>>>

>>> While spdep has fantastic commands for finding k nearest neighbors (knearneigh), and a useful command for finding lag of order 3 or more (nblag), I have yet to find a method which is suitable for our purposes (lag = 1, or lag =2). Additionally, I looked into altering the nblag command to accommodate maxlag = 1 or maxlag = 2, but the command relies on an nb format, which is problematic as we are looking for the underlying neighborhood structure.

>>>

>>> There has been numerous work done with polygons, or data which already is in “nb” format, but after reading the literature, it seems that polygons are not appropriate, nor are distance based neighbor techniques, due to density fluctuations over the area of interest.

>>>

>>> Below is some reproducible code I wrote. I would like to note that I am currently working in R 1.1.453 on a MacBook.

>

> You mean RStudio, there is no such version of R.

>

>>>

>>> # Create a data frame of 10 voters, picked at random

>>> voter.1 = c(1, -75.52187, 40.62320)

>>> voter.2 = c(2,-75.56373, 40.55216)

>>> voter.3 = c(3,-75.39587, 40.55416)

>>> voter.4 = c(4,-75.42248, 40.64326)

>>> voter.5 = c(5,-75.56654, 40.54948)

>>> voter.6 = c(6,-75.56257, 40.67375)

>>> voter.7 = c(7, -75.51888, 40.59715)

>>> voter.8 = c(8, -75.59879, 40.60014)

>>> voter.9 = c(9, -75.59879, 40.60014)

>>> voter.10 = c(10, -75.50877, 40.53129)

>>>

>

> These are in geographical coordinates.

>

>>> # Bind the vectors together

>>> voter.subset = rbind(voter.1, voter.2, voter.3, voter.4, voter.5, voter.6, voter.7, voter.8, voter.9, voter.10)

>>>

>>> # Rename the columns

>>> colnames(voter.subset) = c("Voter.ID", "Longitude", "Latitude")

>>>

>>> # Change the class from a matrix to a data frame

>>> voter.subset = as.data.frame(voter.subset)

>>>

>>> # Load in the required packages

>>> library(spdep)

>>> library(sp)

>>>

>>> # Set the coordinates

>>> coordinates(voter.subset) = c("Longitude", "Latitude")

>>> coords = coordinates(voter.subset)

>>>

>>> # Jitter to ensure no duplicate points

>>> coords = jitter(coords, factor = 1)

>>>

>

> jitter does not respect geographical coordinated (decimal degree metric).

>

>>> # Find the first nearest neighbor of each point

>>> one.nn = knearneigh(coords, k=1)

>

> See the help page (hint: longlat=TRUE to use Great Circle distances, much slower than planar).

>

>>>

>>> # Convert the first nearest neighbor to format "nb"

>>> one.nn_nb = knn2nb(one.nn, sym = F)

>>>

>>> Thank you in advance for any help you may offer, and for taking the time to read this. I have consulted Applied Spatial Data Analysis with R (Bivand, Pebesma, Gomez-Rubio), as well as other Sig-Geo threads, the spdep documentation, and the nb vignette (Bivand, April 3, 2018) from earlier this year.

>>>

>>> Warmest,

>>> Ben

>>> --

>>> Benjamin Lieberman

>>> Muhlenberg College 2019

>>> Mobile: 301.299.8928

>>>

>>>

>>>

>>> [[alternative HTML version deleted]]

>

> Plain text only, please.

>

>>>

>>> _______________________________________________

>>> R-sig-Geo mailing list

>>> [hidden email] <mailto:[hidden email]>

>>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo <https://stat.ethz.ch/mailman/listinfo/r-sig-geo>

>>

>>

>> [[alternative HTML version deleted]]

>>

>> _______________________________________________

>> R-sig-Geo mailing list

>> [hidden email] <mailto:[hidden email]>

>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo <https://stat.ethz.ch/mailman/listinfo/r-sig-geo>

>>

>

> --

> Roger Bivand

> Department of Economics, Norwegian School of Economics,

> Helleveien 30, N-5045 Bergen, Norway.

> voice: +47 55 95 93 55; e-mail: [hidden email] <mailto:[hidden email]>

> http://orcid.org/0000-0003-2392-6140 <http://orcid.org/0000-0003-2392-6140>

> https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en_______________________________________________ <https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en_______________________________________________>

> R-sig-Geo mailing list

> [hidden email] <mailto:[hidden email]>

> https://stat.ethz.ch/mailman/listinfo/r-sig-geo <https://stat.ethz.ch/mailman/listinfo/r-sig-geo>

[[alternative HTML version deleted]]

_______________________________________________

R-sig-Geo mailing list

[hidden email]

https://stat.ethz.ch/mailman/listinfo/r-sig-geo

### Re: How to find all first order neighbors of a collection of points

On Fri, 13 Jul 2018, Facundo Muñoz wrote:

> Dear Benjamin,

>

> I'm not sure how you define "first order neighbors" for a point. The

> first thing that comes to my mind is to use their corresponding voronoi

> polygons and define neighborhood from there. Following your code:

Thanks, the main source of confusion is that "first order neighbors" are

not defined. A k=1 neighbour could be (as below), as could k=6, or voronoi

neighbours, or sphere of influence etc. So reading vignette("nb") would be

a starting point.

Also note that voronoi and other graph-based neighbours should only use

planar coordinates - including dismo::voronoi, which uses deldir::deldir()

- just like spdep::tri2nb(). Triangulation can lead to spurious neighbours

on the convex hull.

>

> v <- dismo::voronoi(coords)

> par(mfrow = c(1, 2), xaxt = "n", yaxt = "n", mgp = c(0, 0, 0))

> plot(coords, type = "n", xlab = NA, ylab = NA)

> plot(v, add = TRUE)

> text(x = coords[, 1], y = coords[, 2], labels = voter.subset$Voter.ID)

> plot(coords, type = "n", xlab = NA, ylab = NA)

> plot(poly2nb(v), coords, add = TRUE, col = "gray")

>

> ƒacu.-

>

>

> On 07/12/2018 09:00 PM, Benjamin Lieberman wrote:

>> Hi all,

>>

>> Currently, I am working with U.S. voter data. Below, I included a brief

>> example of the structure of the data with some reproducible code. My

>> data set consists of roughly 233,000 (233k) entries, each specifying a

>> voter and their particular latitude/longitude pair. Using individual voter data is highly dangerous, and must in every case be

subject to the strictest privacy rules. Voter data does not in essence

have position - the only valid voting data that has position is of the

voting station/precinct, and those data are aggregated to preserve

anonymity.

Why does position and voter data not have position? Which location should

you use - residence, workplace, what? What are these locations proxying?

Nothing valid can be drawn from "just voter data" - you can get

conclusions from carefully constructed stratified exit polls, but there

the key gender/age/ethnicity/social class/etc. confounders are handled by

design. Why should voting decisions be influenced by proximity (they are

not)? The missing element here is looking carefully at relevant covariates

at more aggregated levels (in the US typically zoning controlling social

class positional segregation, etc.).

>> I have been using the spdep package with the hope of creating a CAR

>> model. To begin the analysis, we need to find all first order neighbors

>> of every point in the data.

>>

>> While spdep has fantastic commands for finding k nearest neighbors

>> (knearneigh), and a useful command for finding lag of order 3 or more

>> (nblag), I have yet to find a method which is suitable for our purposes

>> (lag = 1, or lag =2). Additionally, I looked into altering the nblag

>> command to accommodate maxlag = 1 or maxlag = 2, but the command relies

>> on an nb format, which is problematic as we are looking for the

>> underlying neighborhood structure.

>>

>> There has been numerous work done with polygons, or data which already

>> is in “nb” format, but after reading the literature, it seems that

>> polygons are not appropriate, nor are distance based neighbor

>> techniques, due to density fluctuations over the area of interest.

>>

>> Below is some reproducible code I wrote. I would like to note that I am

>> currently working in R 1.1.453 on a MacBook. You mean RStudio, there is no such version of R.

>>

>> # Create a data frame of 10 voters, picked at random

>> voter.1 = c(1, -75.52187, 40.62320)

>> voter.2 = c(2,-75.56373, 40.55216)

>> voter.3 = c(3,-75.39587, 40.55416)

>> voter.4 = c(4,-75.42248, 40.64326)

>> voter.5 = c(5,-75.56654, 40.54948)

>> voter.6 = c(6,-75.56257, 40.67375)

>> voter.7 = c(7, -75.51888, 40.59715)

>> voter.8 = c(8, -75.59879, 40.60014)

>> voter.9 = c(9, -75.59879, 40.60014)

>> voter.10 = c(10, -75.50877, 40.53129)

>> These are in geographical coordinates.

>> # Bind the vectors together

>> voter.subset = rbind(voter.1, voter.2, voter.3, voter.4, voter.5, voter.6, voter.7, voter.8, voter.9, voter.10)

>>

>> # Rename the columns

>> colnames(voter.subset) = c("Voter.ID", "Longitude", "Latitude")

>>

>> # Change the class from a matrix to a data frame

>> voter.subset = as.data.frame(voter.subset)

>>

>> # Load in the required packages

>> library(spdep)

>> library(sp)

>>

>> # Set the coordinates

>> coordinates(voter.subset) = c("Longitude", "Latitude")

>> coords = coordinates(voter.subset)

>>

>> # Jitter to ensure no duplicate points

>> coords = jitter(coords, factor = 1)

>> jitter does not respect geographical coordinated (decimal degree metric).

>> # Find the first nearest neighbor of each point

>> one.nn = knearneigh(coords, k=1)

See the help page (hint: longlat=TRUE to use Great Circle distances, much

slower than planar).

>>

>> # Convert the first nearest neighbor to format "nb"

>> one.nn_nb = knn2nb(one.nn, sym = F)

>>

>> Thank you in advance for any help you may offer, and for taking the

>> time to read this. I have consulted Applied Spatial Data Analysis with

>> R (Bivand, Pebesma, Gomez-Rubio), as well as other Sig-Geo threads, the

>> spdep documentation, and the nb vignette (Bivand, April 3, 2018) from

>> earlier this year.

>>

>> Warmest,

>> Ben

>> --

>> Benjamin Lieberman

>> Muhlenberg College 2019

>> Mobile: 301.299.8928

>>

>>

>>

>> [[alternative HTML version deleted]] Plain text only, please.

>>

>> _______________________________________________

>> R-sig-Geo mailing list

>> [hidden email]

>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo

>

>

> [[alternative HTML version deleted]]

>

> _______________________________________________

> R-sig-Geo mailing list

> [hidden email]

> https://stat.ethz.ch/mailman/listinfo/r-sig-geo

> --

Roger Bivand

Department of Economics, Norwegian School of Economics,

Helleveien 30, N-5045 Bergen, Norway.

voice: +47 55 95 93 55; e-mail: [hidden email]

http://orcid.org/0000-0003-2392-6140

https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en

_______________________________________________

R-sig-Geo mailing list

[hidden email]

https://stat.ethz.ch/mailman/listinfo/r-sig-geo

Roger Bivand

Department of Economics

Norwegian School of Economics

Helleveien 30

N-5045 Bergen, Norway

> Dear Benjamin,

>

> I'm not sure how you define "first order neighbors" for a point. The

> first thing that comes to my mind is to use their corresponding voronoi

> polygons and define neighborhood from there. Following your code:

Thanks, the main source of confusion is that "first order neighbors" are

not defined. A k=1 neighbour could be (as below), as could k=6, or voronoi

neighbours, or sphere of influence etc. So reading vignette("nb") would be

a starting point.

Also note that voronoi and other graph-based neighbours should only use

planar coordinates - including dismo::voronoi, which uses deldir::deldir()

- just like spdep::tri2nb(). Triangulation can lead to spurious neighbours

on the convex hull.

>

> v <- dismo::voronoi(coords)

> par(mfrow = c(1, 2), xaxt = "n", yaxt = "n", mgp = c(0, 0, 0))

> plot(coords, type = "n", xlab = NA, ylab = NA)

> plot(v, add = TRUE)

> text(x = coords[, 1], y = coords[, 2], labels = voter.subset$Voter.ID)

> plot(coords, type = "n", xlab = NA, ylab = NA)

> plot(poly2nb(v), coords, add = TRUE, col = "gray")

>

> ƒacu.-

>

>

> On 07/12/2018 09:00 PM, Benjamin Lieberman wrote:

>> Hi all,

>>

>> Currently, I am working with U.S. voter data. Below, I included a brief

>> example of the structure of the data with some reproducible code. My

>> data set consists of roughly 233,000 (233k) entries, each specifying a

>> voter and their particular latitude/longitude pair. Using individual voter data is highly dangerous, and must in every case be

subject to the strictest privacy rules. Voter data does not in essence

have position - the only valid voting data that has position is of the

voting station/precinct, and those data are aggregated to preserve

anonymity.

Why does position and voter data not have position? Which location should

you use - residence, workplace, what? What are these locations proxying?

Nothing valid can be drawn from "just voter data" - you can get

conclusions from carefully constructed stratified exit polls, but there

the key gender/age/ethnicity/social class/etc. confounders are handled by

design. Why should voting decisions be influenced by proximity (they are

not)? The missing element here is looking carefully at relevant covariates

at more aggregated levels (in the US typically zoning controlling social

class positional segregation, etc.).

>> I have been using the spdep package with the hope of creating a CAR

>> model. To begin the analysis, we need to find all first order neighbors

>> of every point in the data.

>>

>> While spdep has fantastic commands for finding k nearest neighbors

>> (knearneigh), and a useful command for finding lag of order 3 or more

>> (nblag), I have yet to find a method which is suitable for our purposes

>> (lag = 1, or lag =2). Additionally, I looked into altering the nblag

>> command to accommodate maxlag = 1 or maxlag = 2, but the command relies

>> on an nb format, which is problematic as we are looking for the

>> underlying neighborhood structure.

>>

>> There has been numerous work done with polygons, or data which already

>> is in “nb” format, but after reading the literature, it seems that

>> polygons are not appropriate, nor are distance based neighbor

>> techniques, due to density fluctuations over the area of interest.

>>

>> Below is some reproducible code I wrote. I would like to note that I am

>> currently working in R 1.1.453 on a MacBook. You mean RStudio, there is no such version of R.

>>

>> # Create a data frame of 10 voters, picked at random

>> voter.1 = c(1, -75.52187, 40.62320)

>> voter.2 = c(2,-75.56373, 40.55216)

>> voter.3 = c(3,-75.39587, 40.55416)

>> voter.4 = c(4,-75.42248, 40.64326)

>> voter.5 = c(5,-75.56654, 40.54948)

>> voter.6 = c(6,-75.56257, 40.67375)

>> voter.7 = c(7, -75.51888, 40.59715)

>> voter.8 = c(8, -75.59879, 40.60014)

>> voter.9 = c(9, -75.59879, 40.60014)

>> voter.10 = c(10, -75.50877, 40.53129)

>> These are in geographical coordinates.

>> # Bind the vectors together

>> voter.subset = rbind(voter.1, voter.2, voter.3, voter.4, voter.5, voter.6, voter.7, voter.8, voter.9, voter.10)

>>

>> # Rename the columns

>> colnames(voter.subset) = c("Voter.ID", "Longitude", "Latitude")

>>

>> # Change the class from a matrix to a data frame

>> voter.subset = as.data.frame(voter.subset)

>>

>> # Load in the required packages

>> library(spdep)

>> library(sp)

>>

>> # Set the coordinates

>> coordinates(voter.subset) = c("Longitude", "Latitude")

>> coords = coordinates(voter.subset)

>>

>> # Jitter to ensure no duplicate points

>> coords = jitter(coords, factor = 1)

>> jitter does not respect geographical coordinated (decimal degree metric).

>> # Find the first nearest neighbor of each point

>> one.nn = knearneigh(coords, k=1)

See the help page (hint: longlat=TRUE to use Great Circle distances, much

slower than planar).

>>

>> # Convert the first nearest neighbor to format "nb"

>> one.nn_nb = knn2nb(one.nn, sym = F)

>>

>> Thank you in advance for any help you may offer, and for taking the

>> time to read this. I have consulted Applied Spatial Data Analysis with

>> R (Bivand, Pebesma, Gomez-Rubio), as well as other Sig-Geo threads, the

>> spdep documentation, and the nb vignette (Bivand, April 3, 2018) from

>> earlier this year.

>>

>> Warmest,

>> Ben

>> --

>> Benjamin Lieberman

>> Muhlenberg College 2019

>> Mobile: 301.299.8928

>>

>>

>>

>> [[alternative HTML version deleted]] Plain text only, please.

>>

>> _______________________________________________

>> R-sig-Geo mailing list

>> [hidden email]

>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo

>

>

> [[alternative HTML version deleted]]

>

> _______________________________________________

> R-sig-Geo mailing list

> [hidden email]

> https://stat.ethz.ch/mailman/listinfo/r-sig-geo

> --

Roger Bivand

Department of Economics, Norwegian School of Economics,

Helleveien 30, N-5045 Bergen, Norway.

voice: +47 55 95 93 55; e-mail: [hidden email]

http://orcid.org/0000-0003-2392-6140

https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en

_______________________________________________

R-sig-Geo mailing list

[hidden email]

https://stat.ethz.ch/mailman/listinfo/r-sig-geo

Roger Bivand

Department of Economics

Norwegian School of Economics

Helleveien 30

N-5045 Bergen, Norway

### Re: How to find all first order neighbors of a collection of points

Dear Benjamin,

I'm not sure how you define "first order neighbors" for a point. The

first thing that comes to my mind is to use their corresponding voronoi

polygons and define neighborhood from there. Following your code:

v <- dismo::voronoi(coords)

par(mfrow = c(1, 2), xaxt = "n", yaxt = "n", mgp = c(0, 0, 0))

plot(coords, type = "n", xlab = NA, ylab = NA)

plot(v, add = TRUE)

text(x = coords[, 1], y = coords[, 2], labels = voter.subset$Voter.ID)

plot(coords, type = "n", xlab = NA, ylab = NA)

plot(poly2nb(v), coords, add = TRUE, col = "gray")

ƒacu.-

On 07/12/2018 09:00 PM, Benjamin Lieberman wrote:

> Hi all,

>

> Currently, I am working with U.S. voter data. Below, I included a brief example of the structure of the data with some reproducible code. My data set consists of roughly 233,000 (233k) entries, each specifying a voter and their particular latitude/longitude pair. I have been using the spdep package with the hope of creating a CAR model. To begin the analysis, we need to find all first order neighbors of every point in the data.

>

> While spdep has fantastic commands for finding k nearest neighbors (knearneigh), and a useful command for finding lag of order 3 or more (nblag), I have yet to find a method which is suitable for our purposes (lag = 1, or lag =2). Additionally, I looked into altering the nblag command to accommodate maxlag = 1 or maxlag = 2, but the command relies on an nb format, which is problematic as we are looking for the underlying neighborhood structure.

>

> There has been numerous work done with polygons, or data which already is in “nb” format, but after reading the literature, it seems that polygons are not appropriate, nor are distance based neighbor techniques, due to density fluctuations over the area of interest.

>

> Below is some reproducible code I wrote. I would like to note that I am currently working in R 1.1.453 on a MacBook.

>

> # Create a data frame of 10 voters, picked at random

> voter.1 = c(1, -75.52187, 40.62320)

> voter.2 = c(2,-75.56373, 40.55216)

> voter.3 = c(3,-75.39587, 40.55416)

> voter.4 = c(4,-75.42248, 40.64326)

> voter.5 = c(5,-75.56654, 40.54948)

> voter.6 = c(6,-75.56257, 40.67375)

> voter.7 = c(7, -75.51888, 40.59715)

> voter.8 = c(8, -75.59879, 40.60014)

> voter.9 = c(9, -75.59879, 40.60014)

> voter.10 = c(10, -75.50877, 40.53129)

>

> # Bind the vectors together

> voter.subset = rbind(voter.1, voter.2, voter.3, voter.4, voter.5, voter.6, voter.7, voter.8, voter.9, voter.10)

>

> # Rename the columns

> colnames(voter.subset) = c("Voter.ID", "Longitude", "Latitude")

>

> # Change the class from a matrix to a data frame

> voter.subset = as.data.frame(voter.subset)

>

> # Load in the required packages

> library(spdep)

> library(sp)

>

> # Set the coordinates

> coordinates(voter.subset) = c("Longitude", "Latitude")

> coords = coordinates(voter.subset)

>

> # Jitter to ensure no duplicate points

> coords = jitter(coords, factor = 1)

>

> # Find the first nearest neighbor of each point

> one.nn = knearneigh(coords, k=1)

>

> # Convert the first nearest neighbor to format "nb"

> one.nn_nb = knn2nb(one.nn, sym = F)

>

> Thank you in advance for any help you may offer, and for taking the time to read this. I have consulted Applied Spatial Data Analysis with R (Bivand, Pebesma, Gomez-Rubio), as well as other Sig-Geo threads, the spdep documentation, and the nb vignette (Bivand, April 3, 2018) from earlier this year.

>

> Warmest,

> Ben

> --

> Benjamin Lieberman

> Muhlenberg College 2019

> Mobile: 301.299.8928

>

>

>

> [[alternative HTML version deleted]]

>

> _______________________________________________

> R-sig-Geo mailing list

> [hidden email]

> https://stat.ethz.ch/mailman/listinfo/r-sig-geo

[[alternative HTML version deleted]]

_______________________________________________

R-sig-Geo mailing list

[hidden email]

https://stat.ethz.ch/mailman/listinfo/r-sig-geo

I'm not sure how you define "first order neighbors" for a point. The

first thing that comes to my mind is to use their corresponding voronoi

polygons and define neighborhood from there. Following your code:

v <- dismo::voronoi(coords)

par(mfrow = c(1, 2), xaxt = "n", yaxt = "n", mgp = c(0, 0, 0))

plot(coords, type = "n", xlab = NA, ylab = NA)

plot(v, add = TRUE)

text(x = coords[, 1], y = coords[, 2], labels = voter.subset$Voter.ID)

plot(coords, type = "n", xlab = NA, ylab = NA)

plot(poly2nb(v), coords, add = TRUE, col = "gray")

ƒacu.-

On 07/12/2018 09:00 PM, Benjamin Lieberman wrote:

> Hi all,

>

> Currently, I am working with U.S. voter data. Below, I included a brief example of the structure of the data with some reproducible code. My data set consists of roughly 233,000 (233k) entries, each specifying a voter and their particular latitude/longitude pair. I have been using the spdep package with the hope of creating a CAR model. To begin the analysis, we need to find all first order neighbors of every point in the data.

>

> While spdep has fantastic commands for finding k nearest neighbors (knearneigh), and a useful command for finding lag of order 3 or more (nblag), I have yet to find a method which is suitable for our purposes (lag = 1, or lag =2). Additionally, I looked into altering the nblag command to accommodate maxlag = 1 or maxlag = 2, but the command relies on an nb format, which is problematic as we are looking for the underlying neighborhood structure.

>

> There has been numerous work done with polygons, or data which already is in “nb” format, but after reading the literature, it seems that polygons are not appropriate, nor are distance based neighbor techniques, due to density fluctuations over the area of interest.

>

> Below is some reproducible code I wrote. I would like to note that I am currently working in R 1.1.453 on a MacBook.

>

> # Create a data frame of 10 voters, picked at random

> voter.1 = c(1, -75.52187, 40.62320)

> voter.2 = c(2,-75.56373, 40.55216)

> voter.3 = c(3,-75.39587, 40.55416)

> voter.4 = c(4,-75.42248, 40.64326)

> voter.5 = c(5,-75.56654, 40.54948)

> voter.6 = c(6,-75.56257, 40.67375)

> voter.7 = c(7, -75.51888, 40.59715)

> voter.8 = c(8, -75.59879, 40.60014)

> voter.9 = c(9, -75.59879, 40.60014)

> voter.10 = c(10, -75.50877, 40.53129)

>

> # Bind the vectors together

> voter.subset = rbind(voter.1, voter.2, voter.3, voter.4, voter.5, voter.6, voter.7, voter.8, voter.9, voter.10)

>

> # Rename the columns

> colnames(voter.subset) = c("Voter.ID", "Longitude", "Latitude")

>

> # Change the class from a matrix to a data frame

> voter.subset = as.data.frame(voter.subset)

>

> # Load in the required packages

> library(spdep)

> library(sp)

>

> # Set the coordinates

> coordinates(voter.subset) = c("Longitude", "Latitude")

> coords = coordinates(voter.subset)

>

> # Jitter to ensure no duplicate points

> coords = jitter(coords, factor = 1)

>

> # Find the first nearest neighbor of each point

> one.nn = knearneigh(coords, k=1)

>

> # Convert the first nearest neighbor to format "nb"

> one.nn_nb = knn2nb(one.nn, sym = F)

>

> Thank you in advance for any help you may offer, and for taking the time to read this. I have consulted Applied Spatial Data Analysis with R (Bivand, Pebesma, Gomez-Rubio), as well as other Sig-Geo threads, the spdep documentation, and the nb vignette (Bivand, April 3, 2018) from earlier this year.

>

> Warmest,

> Ben

> --

> Benjamin Lieberman

> Muhlenberg College 2019

> Mobile: 301.299.8928

>

>

>

> [[alternative HTML version deleted]]

>

> _______________________________________________

> R-sig-Geo mailing list

> [hidden email]

> https://stat.ethz.ch/mailman/listinfo/r-sig-geo

[[alternative HTML version deleted]]

_______________________________________________

R-sig-Geo mailing list

[hidden email]

https://stat.ethz.ch/mailman/listinfo/r-sig-geo

### How to find all first order neighbors of a collection of points

Hi all,

Currently, I am working with U.S. voter data. Below, I included a brief example of the structure of the data with some reproducible code. My data set consists of roughly 233,000 (233k) entries, each specifying a voter and their particular latitude/longitude pair. I have been using the spdep package with the hope of creating a CAR model. To begin the analysis, we need to find all first order neighbors of every point in the data.

While spdep has fantastic commands for finding k nearest neighbors (knearneigh), and a useful command for finding lag of order 3 or more (nblag), I have yet to find a method which is suitable for our purposes (lag = 1, or lag =2). Additionally, I looked into altering the nblag command to accommodate maxlag = 1 or maxlag = 2, but the command relies on an nb format, which is problematic as we are looking for the underlying neighborhood structure.

There has been numerous work done with polygons, or data which already is in “nb” format, but after reading the literature, it seems that polygons are not appropriate, nor are distance based neighbor techniques, due to density fluctuations over the area of interest.

Below is some reproducible code I wrote. I would like to note that I am currently working in R 1.1.453 on a MacBook.

# Create a data frame of 10 voters, picked at random

voter.1 = c(1, -75.52187, 40.62320)

voter.2 = c(2,-75.56373, 40.55216)

voter.3 = c(3,-75.39587, 40.55416)

voter.4 = c(4,-75.42248, 40.64326)

voter.5 = c(5,-75.56654, 40.54948)

voter.6 = c(6,-75.56257, 40.67375)

voter.7 = c(7, -75.51888, 40.59715)

voter.8 = c(8, -75.59879, 40.60014)

voter.9 = c(9, -75.59879, 40.60014)

voter.10 = c(10, -75.50877, 40.53129)

# Bind the vectors together

voter.subset = rbind(voter.1, voter.2, voter.3, voter.4, voter.5, voter.6, voter.7, voter.8, voter.9, voter.10)

# Rename the columns

colnames(voter.subset) = c("Voter.ID", "Longitude", "Latitude")

# Change the class from a matrix to a data frame

voter.subset = as.data.frame(voter.subset)

# Load in the required packages

library(spdep)

library(sp)

# Set the coordinates

coordinates(voter.subset) = c("Longitude", "Latitude")

coords = coordinates(voter.subset)

# Jitter to ensure no duplicate points

coords = jitter(coords, factor = 1)

# Find the first nearest neighbor of each point

one.nn = knearneigh(coords, k=1)

# Convert the first nearest neighbor to format "nb"

one.nn_nb = knn2nb(one.nn, sym = F)

Thank you in advance for any help you may offer, and for taking the time to read this. I have consulted Applied Spatial Data Analysis with R (Bivand, Pebesma, Gomez-Rubio), as well as other Sig-Geo threads, the spdep documentation, and the nb vignette (Bivand, April 3, 2018) from earlier this year.

Warmest,

Ben

--

Benjamin Lieberman

Muhlenberg College 2019

Mobile: 301.299.8928

[[alternative HTML version deleted]]

_______________________________________________

R-sig-Geo mailing list

[hidden email]

https://stat.ethz.ch/mailman/listinfo/r-sig-geo

Currently, I am working with U.S. voter data. Below, I included a brief example of the structure of the data with some reproducible code. My data set consists of roughly 233,000 (233k) entries, each specifying a voter and their particular latitude/longitude pair. I have been using the spdep package with the hope of creating a CAR model. To begin the analysis, we need to find all first order neighbors of every point in the data.

While spdep has fantastic commands for finding k nearest neighbors (knearneigh), and a useful command for finding lag of order 3 or more (nblag), I have yet to find a method which is suitable for our purposes (lag = 1, or lag =2). Additionally, I looked into altering the nblag command to accommodate maxlag = 1 or maxlag = 2, but the command relies on an nb format, which is problematic as we are looking for the underlying neighborhood structure.

There has been numerous work done with polygons, or data which already is in “nb” format, but after reading the literature, it seems that polygons are not appropriate, nor are distance based neighbor techniques, due to density fluctuations over the area of interest.

Below is some reproducible code I wrote. I would like to note that I am currently working in R 1.1.453 on a MacBook.

# Create a data frame of 10 voters, picked at random

voter.1 = c(1, -75.52187, 40.62320)

voter.2 = c(2,-75.56373, 40.55216)

voter.3 = c(3,-75.39587, 40.55416)

voter.4 = c(4,-75.42248, 40.64326)

voter.5 = c(5,-75.56654, 40.54948)

voter.6 = c(6,-75.56257, 40.67375)

voter.7 = c(7, -75.51888, 40.59715)

voter.8 = c(8, -75.59879, 40.60014)

voter.9 = c(9, -75.59879, 40.60014)

voter.10 = c(10, -75.50877, 40.53129)

# Bind the vectors together

voter.subset = rbind(voter.1, voter.2, voter.3, voter.4, voter.5, voter.6, voter.7, voter.8, voter.9, voter.10)

# Rename the columns

colnames(voter.subset) = c("Voter.ID", "Longitude", "Latitude")

# Change the class from a matrix to a data frame

voter.subset = as.data.frame(voter.subset)

# Load in the required packages

library(spdep)

library(sp)

# Set the coordinates

coordinates(voter.subset) = c("Longitude", "Latitude")

coords = coordinates(voter.subset)

# Jitter to ensure no duplicate points

coords = jitter(coords, factor = 1)

# Find the first nearest neighbor of each point

one.nn = knearneigh(coords, k=1)

# Convert the first nearest neighbor to format "nb"

one.nn_nb = knn2nb(one.nn, sym = F)

Thank you in advance for any help you may offer, and for taking the time to read this. I have consulted Applied Spatial Data Analysis with R (Bivand, Pebesma, Gomez-Rubio), as well as other Sig-Geo threads, the spdep documentation, and the nb vignette (Bivand, April 3, 2018) from earlier this year.

Warmest,

Ben

--

Benjamin Lieberman

Muhlenberg College 2019

Mobile: 301.299.8928

[[alternative HTML version deleted]]

_______________________________________________

R-sig-Geo mailing list

[hidden email]

https://stat.ethz.ch/mailman/listinfo/r-sig-geo

### Re: Distance from not NA cells in a raster

Hi,

You may have solved this already, but I get tripped up on the "the distance from all not NA cells in a raster". Is it the distance each NA cell is from each non-NA cell? Also, I'm wondering why you want to know the distance to ALL non-NA cells - what is your big-picture purpose for wanting these distances?

Cheers,

Ben

> On Jul 6, 2018, at 12:22 PM, Gregovich, Dave P (DFG) <[hidden email]> wrote:

>

> Hi,

> I would like to obtain the distance from all not NA cells in a raster. This works for smaller rasters, but seems difficult for the size of rasters (~ 8000 pixel square) I am working with.

> Below is what I've tried. I would be OK calling other software from R, or using some parallelization, if it might help.

> Thanks so much for your help! If I could just calculate this distance in two hours or less (or so) I would be satisfied.

> Dave.

>

> rm(list=ls())

> library(raster)

>

> #make raster

> rast <- raster(nrow = 8000, ncol = 8000, ext = extent(0,1,0,1))

>

> #generate cells to calculate distance from.

> rast[sample(8000^2, 10000)] <- 1

>

> #try two different methods...

> dist1 <- gridDistance(rast, origin = 1)#throws an error after x minutes

> #'Error: cannot allocate vector of size 3.8 Gb'

> dist2 <- distance(rast)#ran all night, R was hung up in the morning and had to force shutdown.

>

> ___________________________________________

> Dave Gregovich

> Research Analyst

> Alaska Department of Fish and Game

> Division of Wildlife Conservation

> 802 3rd Street

> Douglas, AK 99824

> 907-465-4291

> ___________________________________________

>

>

> [[alternative HTML version deleted]]

>

> _______________________________________________

> R-sig-Geo mailing list

> [hidden email]

> https://stat.ethz.ch/mailman/listinfo/r-sig-geo

>

Ben Tupper

Bigelow Laboratory for Ocean Sciences

60 Bigelow Drive, P.O. Box 380

East Boothbay, Maine 04544

http://www.bigelow.org

Ecological Forecasting: https://eco.bigelow.org/

[[alternative HTML version deleted]]

_______________________________________________

R-sig-Geo mailing list

[hidden email]

https://stat.ethz.ch/mailman/listinfo/r-sig-geo

You may have solved this already, but I get tripped up on the "the distance from all not NA cells in a raster". Is it the distance each NA cell is from each non-NA cell? Also, I'm wondering why you want to know the distance to ALL non-NA cells - what is your big-picture purpose for wanting these distances?

Cheers,

Ben

> On Jul 6, 2018, at 12:22 PM, Gregovich, Dave P (DFG) <[hidden email]> wrote:

>

> Hi,

> I would like to obtain the distance from all not NA cells in a raster. This works for smaller rasters, but seems difficult for the size of rasters (~ 8000 pixel square) I am working with.

> Below is what I've tried. I would be OK calling other software from R, or using some parallelization, if it might help.

> Thanks so much for your help! If I could just calculate this distance in two hours or less (or so) I would be satisfied.

> Dave.

>

> rm(list=ls())

> library(raster)

>

> #make raster

> rast <- raster(nrow = 8000, ncol = 8000, ext = extent(0,1,0,1))

>

> #generate cells to calculate distance from.

> rast[sample(8000^2, 10000)] <- 1

>

> #try two different methods...

> dist1 <- gridDistance(rast, origin = 1)#throws an error after x minutes

> #'Error: cannot allocate vector of size 3.8 Gb'

> dist2 <- distance(rast)#ran all night, R was hung up in the morning and had to force shutdown.

>

> ___________________________________________

> Dave Gregovich

> Research Analyst

> Alaska Department of Fish and Game

> Division of Wildlife Conservation

> 802 3rd Street

> Douglas, AK 99824

> 907-465-4291

> ___________________________________________

>

>

> [[alternative HTML version deleted]]

>

> _______________________________________________

> R-sig-Geo mailing list

> [hidden email]

> https://stat.ethz.ch/mailman/listinfo/r-sig-geo

>

Ben Tupper

Bigelow Laboratory for Ocean Sciences

60 Bigelow Drive, P.O. Box 380

East Boothbay, Maine 04544

http://www.bigelow.org

Ecological Forecasting: https://eco.bigelow.org/

[[alternative HTML version deleted]]

_______________________________________________

R-sig-Geo mailing list

[hidden email]

https://stat.ethz.ch/mailman/listinfo/r-sig-geo

### Re: Spatial autocorrelation help

Hi Orcun

I am not quite sure if im doing this correctly but I do understand that i first need to check spatial autocorrelation occurs in my data. so i did this below steps and after that checked it again in best model residuals

# Another approach to find SAC by creating neighbors first, then get distances between each point and neighbors, then

# inverse the distance and then check the SAC using mora's I

coord <- cbind(data$long, data$lat)

coords <- coordinates(coord)

# creates a matrix of nn indexes - knearneigh to get nearest neighbors

nn5 <- knearneigh(coords, k=5)

mi5.nlist <- knn2nb(nn5, row.names = NULL, sym=FALSE)

# creates a sp weights matrix

mi5.sw <- nb2listw(mi5.nlist)

# cal moran's I using distance as weights

# calculates the distance

mi5.dist <- nbdists(mi5.nlist, coords)

# now invert the distnace to determine weights (closer =higher)

mi5.dist1 <- lapply(mi5.dist, function(x){ifelse(is.finite(1/x), (1/x), (1/0.001))})

mi5.dist2 <- lapply(mi5.dist, function(x){ifelse(is.finite(1/x^2), (1/x^2), (1/0.001^2))})

# check the distance between the distribution

summary(unlist(mi5.dist1))

# now create sp weights matrix weighted on distance

mi5.d1sw <- nb2listw(mi5.nlist, glist=mi5.dist1)

mi5.d2sw <- nb2listw(mi5.nlist, glist=mi5.dist2)

# morans test

moran.test(as.numeric(data$response), mi5.d1sw)

moran.test(as.numeric(data$response), mi5.d2sw)

This first moran’s test gives :

Moran I statistic standard deviate = 2.0328, p-value = 0.02104

alternative hypothesis: greater

sample estimates:

Moran I statistic Expectation Variance

0.105850408 -0.004608295 0.002952729

Second morans test gives:

Moran I statistic standard deviate = 2.3848, p-value = 0.008545

alternative hypothesis: greater

sample estimates:

Moran I statistic Expectation Variance

0.154097396 -0.004608295 0.004428848

And both indicates presence of spatial autocorrelation in the raw data.

Should i account for this in all models or if i perform logistic mixed model it is fine……help is much appreciated. Difficult to understand what the problem is and how to solve it

> On 11 Jul 2018, at 7:01 PM, Orcun Morali <[hidden email]> wrote:

>

> Hi Dechen,

>

> As for measuring spatial autocorrelation, one thing I noticed about your output is that you are using the randomization assumption in spdep::moran.test. Randomization assumption is not appropriate for Moran's I of regression residuals and spdep::lm.morantest is the function to correctly calculate moments of the measure for regression residuals anyway. Before using lm.morantest though, if I were you, I would check whether its inference applies to logistic regression residuals as well, since the theory was initially based on the classical regression.

>

> As for fitting a spatial logistic model if you need it, McSpatial package in R might help you.

>

> Best Regards,

>

> Orcun

>

> On 10/07/18 20:46, Dechen Lham wrote:

>> Hello all,

>>

>> I would like some help in my problem below:

>>

>> I am running a logistic regression and my best model residuals has spatial autocorrelation (SAC) when checked as below and also on the raw data of the response type. My response is binary 0 and 1 (type of prey and to be predicted by several predictors). These type of prey are obtained from a total of 200 locations (where the faecal samples are collected from). In order to account for this SAC , I used the auto_covdist function from spdep package. But when i use this as a new predictor in my model, and then check for spatial autocorrelation in the residues of the model, there is still spatial autocorrelation,…..could u see if i am doing something wrong please?

>>

>> #account for SAC in the model using weights

>> # auto_covariate is a distance weighted covariate

>> data$response <- as.numeric(data$response)

>> auto_weight <- autocov_dist(data$prey.type, xy=coords, nbs=1, type="inverse", zero.policy = TRUE,style="W", longlat = TRUE)

>>

>> m5_auto <- glm(response ~ predictor1 + predictor2 + predictor3 + predictor4 + predictor1:predictor4, weight=auto_weight, family=quasibinomial("logit"), data=data)

>>

>> # check spatial autocorrelation - first convert data to spatial points dataframe

>> dat <- SpatialPointsDataFrame(cbind(data$long, data$lat), data)

>> lstw <- nb2listw(knn2nb(knearneigh(dat, k = 2)))

>>

>> # check SAC in model residuals

>> moran.test(residuals.glm(m5_auto), lstw) # and gives the below:

>>

>> Moran I test under randomisation

>>

>> data: residuals.glm(m5)

>> weights: lstw

>>

>> Moran I statistic standard deviate = 1.9194, p-value = 0.02747

>> alternative hypothesis: greater

>> sample estimates:

>> Moran I statistic Expectation Variance

>> 0.160824328 -0.004608295 0.007428642

>>

>> -Someone said its stupid to account for spatial autocorrelation in a logistic regression when you have a significant SAC using moran’s I. So i am now wondering how this can be solved? or does a SAC in a logistic regression be just ignored?

>>

>> I am new to spatial statistics and now idea how to move with such. I only know that my data has spatial

>> autocorrelation (which i hope to have checked correctly using morans I as above) and now need to account for this in my analysis. Some advice would be greatly appreciated by people who have used to account for SAC in their logistic models. Is a logistic mixed models an option to consider?especially if your covariates are spatial in nature,…i read somewhere that if you cant account for SAC in glm then you can move to mixed models esp if your covariates are spatial which is expected to digest the SAC.

>>

>> Help and advice would be greatly appreciated.

>>

>> _______________________________________________

>> R-sig-Geo mailing list

>> [hidden email]

>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo

>

> _______________________________________________

> R-sig-Geo mailing list

> [hidden email]

> https://stat.ethz.ch/mailman/listinfo/r-sig-geo

_______________________________________________

R-sig-Geo mailing list

[hidden email]

https://stat.ethz.ch/mailman/listinfo/r-sig-geo

I am not quite sure if im doing this correctly but I do understand that i first need to check spatial autocorrelation occurs in my data. so i did this below steps and after that checked it again in best model residuals

# Another approach to find SAC by creating neighbors first, then get distances between each point and neighbors, then

# inverse the distance and then check the SAC using mora's I

coord <- cbind(data$long, data$lat)

coords <- coordinates(coord)

# creates a matrix of nn indexes - knearneigh to get nearest neighbors

nn5 <- knearneigh(coords, k=5)

mi5.nlist <- knn2nb(nn5, row.names = NULL, sym=FALSE)

# creates a sp weights matrix

mi5.sw <- nb2listw(mi5.nlist)

# cal moran's I using distance as weights

# calculates the distance

mi5.dist <- nbdists(mi5.nlist, coords)

# now invert the distnace to determine weights (closer =higher)

mi5.dist1 <- lapply(mi5.dist, function(x){ifelse(is.finite(1/x), (1/x), (1/0.001))})

mi5.dist2 <- lapply(mi5.dist, function(x){ifelse(is.finite(1/x^2), (1/x^2), (1/0.001^2))})

# check the distance between the distribution

summary(unlist(mi5.dist1))

# now create sp weights matrix weighted on distance

mi5.d1sw <- nb2listw(mi5.nlist, glist=mi5.dist1)

mi5.d2sw <- nb2listw(mi5.nlist, glist=mi5.dist2)

# morans test

moran.test(as.numeric(data$response), mi5.d1sw)

moran.test(as.numeric(data$response), mi5.d2sw)

This first moran’s test gives :

Moran I statistic standard deviate = 2.0328, p-value = 0.02104

alternative hypothesis: greater

sample estimates:

Moran I statistic Expectation Variance

0.105850408 -0.004608295 0.002952729

Second morans test gives:

Moran I statistic standard deviate = 2.3848, p-value = 0.008545

alternative hypothesis: greater

sample estimates:

Moran I statistic Expectation Variance

0.154097396 -0.004608295 0.004428848

And both indicates presence of spatial autocorrelation in the raw data.

Should i account for this in all models or if i perform logistic mixed model it is fine……help is much appreciated. Difficult to understand what the problem is and how to solve it

> On 11 Jul 2018, at 7:01 PM, Orcun Morali <[hidden email]> wrote:

>

> Hi Dechen,

>

> As for measuring spatial autocorrelation, one thing I noticed about your output is that you are using the randomization assumption in spdep::moran.test. Randomization assumption is not appropriate for Moran's I of regression residuals and spdep::lm.morantest is the function to correctly calculate moments of the measure for regression residuals anyway. Before using lm.morantest though, if I were you, I would check whether its inference applies to logistic regression residuals as well, since the theory was initially based on the classical regression.

>

> As for fitting a spatial logistic model if you need it, McSpatial package in R might help you.

>

> Best Regards,

>

> Orcun

>

> On 10/07/18 20:46, Dechen Lham wrote:

>> Hello all,

>>

>> I would like some help in my problem below:

>>

>> I am running a logistic regression and my best model residuals has spatial autocorrelation (SAC) when checked as below and also on the raw data of the response type. My response is binary 0 and 1 (type of prey and to be predicted by several predictors). These type of prey are obtained from a total of 200 locations (where the faecal samples are collected from). In order to account for this SAC , I used the auto_covdist function from spdep package. But when i use this as a new predictor in my model, and then check for spatial autocorrelation in the residues of the model, there is still spatial autocorrelation,…..could u see if i am doing something wrong please?

>>

>> #account for SAC in the model using weights

>> # auto_covariate is a distance weighted covariate

>> data$response <- as.numeric(data$response)

>> auto_weight <- autocov_dist(data$prey.type, xy=coords, nbs=1, type="inverse", zero.policy = TRUE,style="W", longlat = TRUE)

>>

>> m5_auto <- glm(response ~ predictor1 + predictor2 + predictor3 + predictor4 + predictor1:predictor4, weight=auto_weight, family=quasibinomial("logit"), data=data)

>>

>> # check spatial autocorrelation - first convert data to spatial points dataframe

>> dat <- SpatialPointsDataFrame(cbind(data$long, data$lat), data)

>> lstw <- nb2listw(knn2nb(knearneigh(dat, k = 2)))

>>

>> # check SAC in model residuals

>> moran.test(residuals.glm(m5_auto), lstw) # and gives the below:

>>

>> Moran I test under randomisation

>>

>> data: residuals.glm(m5)

>> weights: lstw

>>

>> Moran I statistic standard deviate = 1.9194, p-value = 0.02747

>> alternative hypothesis: greater

>> sample estimates:

>> Moran I statistic Expectation Variance

>> 0.160824328 -0.004608295 0.007428642

>>

>> -Someone said its stupid to account for spatial autocorrelation in a logistic regression when you have a significant SAC using moran’s I. So i am now wondering how this can be solved? or does a SAC in a logistic regression be just ignored?

>>

>> I am new to spatial statistics and now idea how to move with such. I only know that my data has spatial

>> autocorrelation (which i hope to have checked correctly using morans I as above) and now need to account for this in my analysis. Some advice would be greatly appreciated by people who have used to account for SAC in their logistic models. Is a logistic mixed models an option to consider?especially if your covariates are spatial in nature,…i read somewhere that if you cant account for SAC in glm then you can move to mixed models esp if your covariates are spatial which is expected to digest the SAC.

>>

>> Help and advice would be greatly appreciated.

>>

>> _______________________________________________

>> R-sig-Geo mailing list

>> [hidden email]

>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo

>

> _______________________________________________

> R-sig-Geo mailing list

> [hidden email]

> https://stat.ethz.ch/mailman/listinfo/r-sig-geo

_______________________________________________

R-sig-Geo mailing list

[hidden email]

https://stat.ethz.ch/mailman/listinfo/r-sig-geo

### Re: Spatial autocorrelation help

Hi Dechen,

As for measuring spatial autocorrelation, one thing I noticed about your

output is that you are using the randomization assumption in

spdep::moran.test. Randomization assumption is not appropriate for

Moran's I of regression residuals and spdep::lm.morantest is the

function to correctly calculate moments of the measure for regression

residuals anyway. Before using lm.morantest though, if I were you, I

would check whether its inference applies to logistic regression

residuals as well, since the theory was initially based on the classical

regression.

As for fitting a spatial logistic model if you need it, McSpatial

package in R might help you.

Best Regards,

Orcun

On 10/07/18 20:46, Dechen Lham wrote:

> Hello all,

>

> I would like some help in my problem below:

>

> I am running a logistic regression and my best model residuals has spatial autocorrelation (SAC) when checked as below and also on the raw data of the response type. My response is binary 0 and 1 (type of prey and to be predicted by several predictors). These type of prey are obtained from a total of 200 locations (where the faecal samples are collected from). In order to account for this SAC , I used the auto_covdist function from spdep package. But when i use this as a new predictor in my model, and then check for spatial autocorrelation in the residues of the model, there is still spatial autocorrelation,…..could u see if i am doing something wrong please?

>

> #account for SAC in the model using weights

> # auto_covariate is a distance weighted covariate

> data$response <- as.numeric(data$response)

> auto_weight <- autocov_dist(data$prey.type, xy=coords, nbs=1, type="inverse", zero.policy = TRUE,style="W", longlat = TRUE)

>

> m5_auto <- glm(response ~ predictor1 + predictor2 + predictor3 + predictor4 + predictor1:predictor4, weight=auto_weight, family=quasibinomial("logit"), data=data)

>

> # check spatial autocorrelation - first convert data to spatial points dataframe

> dat <- SpatialPointsDataFrame(cbind(data$long, data$lat), data)

> lstw <- nb2listw(knn2nb(knearneigh(dat, k = 2)))

>

> # check SAC in model residuals

> moran.test(residuals.glm(m5_auto), lstw) # and gives the below:

>

> Moran I test under randomisation

>

> data: residuals.glm(m5)

> weights: lstw

>

> Moran I statistic standard deviate = 1.9194, p-value = 0.02747

> alternative hypothesis: greater

> sample estimates:

> Moran I statistic Expectation Variance

> 0.160824328 -0.004608295 0.007428642

>

> -Someone said its stupid to account for spatial autocorrelation in a logistic regression when you have a significant SAC using moran’s I. So i am now wondering how this can be solved? or does a SAC in a logistic regression be just ignored?

>

> I am new to spatial statistics and now idea how to move with such. I only know that my data has spatial

> autocorrelation (which i hope to have checked correctly using morans I as above) and now need to account for this in my analysis. Some advice would be greatly appreciated by people who have used to account for SAC in their logistic models. Is a logistic mixed models an option to consider?especially if your covariates are spatial in nature,…i read somewhere that if you cant account for SAC in glm then you can move to mixed models esp if your covariates are spatial which is expected to digest the SAC.

>

> Help and advice would be greatly appreciated.

>

> _______________________________________________

> R-sig-Geo mailing list

> [hidden email]

> https://stat.ethz.ch/mailman/listinfo/r-sig-geo

_______________________________________________

R-sig-Geo mailing list

[hidden email]

https://stat.ethz.ch/mailman/listinfo/r-sig-geo

As for measuring spatial autocorrelation, one thing I noticed about your

output is that you are using the randomization assumption in

spdep::moran.test. Randomization assumption is not appropriate for

Moran's I of regression residuals and spdep::lm.morantest is the

function to correctly calculate moments of the measure for regression

residuals anyway. Before using lm.morantest though, if I were you, I

would check whether its inference applies to logistic regression

residuals as well, since the theory was initially based on the classical

regression.

As for fitting a spatial logistic model if you need it, McSpatial

package in R might help you.

Best Regards,

Orcun

On 10/07/18 20:46, Dechen Lham wrote:

> Hello all,

>

> I would like some help in my problem below:

>

> I am running a logistic regression and my best model residuals has spatial autocorrelation (SAC) when checked as below and also on the raw data of the response type. My response is binary 0 and 1 (type of prey and to be predicted by several predictors). These type of prey are obtained from a total of 200 locations (where the faecal samples are collected from). In order to account for this SAC , I used the auto_covdist function from spdep package. But when i use this as a new predictor in my model, and then check for spatial autocorrelation in the residues of the model, there is still spatial autocorrelation,…..could u see if i am doing something wrong please?

>

> #account for SAC in the model using weights

> # auto_covariate is a distance weighted covariate

> data$response <- as.numeric(data$response)

> auto_weight <- autocov_dist(data$prey.type, xy=coords, nbs=1, type="inverse", zero.policy = TRUE,style="W", longlat = TRUE)

>

> m5_auto <- glm(response ~ predictor1 + predictor2 + predictor3 + predictor4 + predictor1:predictor4, weight=auto_weight, family=quasibinomial("logit"), data=data)

>

> # check spatial autocorrelation - first convert data to spatial points dataframe

> dat <- SpatialPointsDataFrame(cbind(data$long, data$lat), data)

> lstw <- nb2listw(knn2nb(knearneigh(dat, k = 2)))

>

> # check SAC in model residuals

> moran.test(residuals.glm(m5_auto), lstw) # and gives the below:

>

> Moran I test under randomisation

>

> data: residuals.glm(m5)

> weights: lstw

>

> Moran I statistic standard deviate = 1.9194, p-value = 0.02747

> alternative hypothesis: greater

> sample estimates:

> Moran I statistic Expectation Variance

> 0.160824328 -0.004608295 0.007428642

>

> -Someone said its stupid to account for spatial autocorrelation in a logistic regression when you have a significant SAC using moran’s I. So i am now wondering how this can be solved? or does a SAC in a logistic regression be just ignored?

>

> I am new to spatial statistics and now idea how to move with such. I only know that my data has spatial

> autocorrelation (which i hope to have checked correctly using morans I as above) and now need to account for this in my analysis. Some advice would be greatly appreciated by people who have used to account for SAC in their logistic models. Is a logistic mixed models an option to consider?especially if your covariates are spatial in nature,…i read somewhere that if you cant account for SAC in glm then you can move to mixed models esp if your covariates are spatial which is expected to digest the SAC.

>

> Help and advice would be greatly appreciated.

>

> _______________________________________________

> R-sig-Geo mailing list

> [hidden email]

> https://stat.ethz.ch/mailman/listinfo/r-sig-geo

_______________________________________________

R-sig-Geo mailing list

[hidden email]

https://stat.ethz.ch/mailman/listinfo/r-sig-geo

### Re: Spatial autocorrelation help

Hi Patrick

Thank you for your quick response and i went through your thesis and its very useful information. One thing that i was wondering was, you could potentially also use quadratic terms of the predictors which may have non-linear relation with the response variable right? rather than to use GAMM.

Besides i need to still figure out how to check the SAC correctly in my data as there is the global morans I and a local morans I right? Further need to figure out how to plot them correctly to see the patterns. I did make a correlogram of the raw data and from the residuals of the best model but both looked very similar and also after accounting for SAC, the morans I was significant and SAC was not accounted for. So it would be great if you can see I am doing something wrong while accepting for the SAC below…please

regards

> On 10 Jul 2018, at 8:38 PM, Patrick Schratz <[hidden email]> wrote:

>

> Hi Dechen,

>

> it is very important to account for SAC in any model. This can be done in various ways. In log.reg it is common to include spatial autocorrelation structures that describe the underlying SAC. To do so, you can use mixed models, e.g. MASS::glmmPQL().

>

> Also have a look at Wood (2017) Generalized Additive Models in R.

>

> I did account for it in my master thesis.Even though the code is not attached, it may help you: https://zenodo.org/record/814262 <https://zenodo.org/record/814262>

> Cheers, Patrick

> On Jul 10 2018, at 7:46 pm, Dechen Lham <[hidden email]> wrote:

>

> Hello all,

>

> I would like some help in my problem below:

>

> I am running a logistic regression and my best model residuals has spatial autocorrelation (SAC) when checked as below and also on the raw data of the response type. My response is binary 0 and 1 (type of prey and to be predicted by several predictors). These type of prey are obtained from a total of 200 locations (where the faecal samples are collected from). In order to account for this SAC , I used the auto_covdist function from spdep package. But when i use this as a new predictor in my model, and then check for spatial autocorrelation in the residues of the model, there is still spatial autocorrelation,…..could u see if i am doing something wrong please?

>

> #account for SAC in the model using weights

> # auto_covariate is a distance weighted covariate

> data$response <- as.numeric(data$response)

> auto_weight <- autocov_dist(data$prey.type, xy=coords, nbs=1, type="inverse", zero.policy = TRUE,style="W", longlat = TRUE)

>

> m5_auto <- glm(response ~ predictor1 + predictor2 + predictor3 + predictor4 + predictor1:predictor4, weight=auto_weight, family=quasibinomial("logit"), data=data)

>

> # check spatial autocorrelation - first convert data to spatial points dataframe

> dat <- SpatialPointsDataFrame(cbind(data$long, data$lat), data)

> lstw <- nb2listw(knn2nb(knearneigh(dat, k = 2)))

>

> # check SAC in model residuals

> moran.test(residuals.glm(m5_auto), lstw) # and gives the below:

>

> Moran I test under randomisation

>

> data: residuals.glm(m5)

> weights: lstw

>

> Moran I statistic standard deviate = 1.9194, p-value = 0.02747

> alternative hypothesis: greater

> sample estimates:

> Moran I statistic Expectation Variance

> 0.160824328 -0.004608295 0.007428642

>

> -Someone said its stupid to account for spatial autocorrelation in a logistic regression when you have a significant SAC using moran’s I. So i am now wondering how this can be solved? or does a SAC in a logistic regression be just ignored?

>

> I am new to spatial statistics and now idea how to move with such. I only know that my data has spatial

> autocorrelation (which i hope to have checked correctly using morans I as above) and now need to account for this in my analysis. Some advice would be greatly appreciated by people who have used to account for SAC in their logistic models. Is a logistic mixed models an option to consider?especially if your covariates are spatial in nature,…i read somewhere that if you cant account for SAC in glm then you can move to mixed models esp if your covariates are spatial which is expected to digest the SAC.

>

> Help and advice would be greatly appreciated.

>

> _______________________________________________

> R-sig-Geo mailing list

> [hidden email]

> https://stat.ethz.ch/mailman/listinfo/r-sig-geo

[[alternative HTML version deleted]]

_______________________________________________

R-sig-Geo mailing list

[hidden email]

https://stat.ethz.ch/mailman/listinfo/r-sig-geo

Thank you for your quick response and i went through your thesis and its very useful information. One thing that i was wondering was, you could potentially also use quadratic terms of the predictors which may have non-linear relation with the response variable right? rather than to use GAMM.

Besides i need to still figure out how to check the SAC correctly in my data as there is the global morans I and a local morans I right? Further need to figure out how to plot them correctly to see the patterns. I did make a correlogram of the raw data and from the residuals of the best model but both looked very similar and also after accounting for SAC, the morans I was significant and SAC was not accounted for. So it would be great if you can see I am doing something wrong while accepting for the SAC below…please

regards

> On 10 Jul 2018, at 8:38 PM, Patrick Schratz <[hidden email]> wrote:

>

> Hi Dechen,

>

> it is very important to account for SAC in any model. This can be done in various ways. In log.reg it is common to include spatial autocorrelation structures that describe the underlying SAC. To do so, you can use mixed models, e.g. MASS::glmmPQL().

>

> Also have a look at Wood (2017) Generalized Additive Models in R.

>

> I did account for it in my master thesis.Even though the code is not attached, it may help you: https://zenodo.org/record/814262 <https://zenodo.org/record/814262>

> Cheers, Patrick

> On Jul 10 2018, at 7:46 pm, Dechen Lham <[hidden email]> wrote:

>

> Hello all,

>

> I would like some help in my problem below:

>

> I am running a logistic regression and my best model residuals has spatial autocorrelation (SAC) when checked as below and also on the raw data of the response type. My response is binary 0 and 1 (type of prey and to be predicted by several predictors). These type of prey are obtained from a total of 200 locations (where the faecal samples are collected from). In order to account for this SAC , I used the auto_covdist function from spdep package. But when i use this as a new predictor in my model, and then check for spatial autocorrelation in the residues of the model, there is still spatial autocorrelation,…..could u see if i am doing something wrong please?

>

> #account for SAC in the model using weights

> # auto_covariate is a distance weighted covariate

> data$response <- as.numeric(data$response)

> auto_weight <- autocov_dist(data$prey.type, xy=coords, nbs=1, type="inverse", zero.policy = TRUE,style="W", longlat = TRUE)

>

> m5_auto <- glm(response ~ predictor1 + predictor2 + predictor3 + predictor4 + predictor1:predictor4, weight=auto_weight, family=quasibinomial("logit"), data=data)

>

> # check spatial autocorrelation - first convert data to spatial points dataframe

> dat <- SpatialPointsDataFrame(cbind(data$long, data$lat), data)

> lstw <- nb2listw(knn2nb(knearneigh(dat, k = 2)))

>

> # check SAC in model residuals

> moran.test(residuals.glm(m5_auto), lstw) # and gives the below:

>

> Moran I test under randomisation

>

> data: residuals.glm(m5)

> weights: lstw

>

> Moran I statistic standard deviate = 1.9194, p-value = 0.02747

> alternative hypothesis: greater

> sample estimates:

> Moran I statistic Expectation Variance

> 0.160824328 -0.004608295 0.007428642

>

> -Someone said its stupid to account for spatial autocorrelation in a logistic regression when you have a significant SAC using moran’s I. So i am now wondering how this can be solved? or does a SAC in a logistic regression be just ignored?

>

> I am new to spatial statistics and now idea how to move with such. I only know that my data has spatial

> autocorrelation (which i hope to have checked correctly using morans I as above) and now need to account for this in my analysis. Some advice would be greatly appreciated by people who have used to account for SAC in their logistic models. Is a logistic mixed models an option to consider?especially if your covariates are spatial in nature,…i read somewhere that if you cant account for SAC in glm then you can move to mixed models esp if your covariates are spatial which is expected to digest the SAC.

>

> Help and advice would be greatly appreciated.

>

> _______________________________________________

> R-sig-Geo mailing list

> [hidden email]

> https://stat.ethz.ch/mailman/listinfo/r-sig-geo

[[alternative HTML version deleted]]

_______________________________________________

R-sig-Geo mailing list

[hidden email]

https://stat.ethz.ch/mailman/listinfo/r-sig-geo

### Re: Spatial autocorrelation help

Hi Dechen,

it is very important to account for SAC in any model. This can be done in various ways. In log.reg it is common to include spatial autocorrelation structures that describe the underlying SAC. To do so, you can use mixed models, e.g. MASS::glmmPQL().

Also have a look at Wood (2017) Generalized Additive Models in R.

I did account for it in my master thesis.Even though the code is not attached, it may help you: https://zenodo.org/record/814262

Cheers, Patrick

On Jul 10 2018, at 7:46 pm, Dechen Lham <[hidden email]> wrote:

>

> Hello all,

> I would like some help in my problem below:

> I am running a logistic regression and my best model residuals has spatial autocorrelation (SAC) when checked as below and also on the raw data of the response type. My response is binary 0 and 1 (type of prey and to be predicted by several predictors). These type of prey are obtained from a total of 200 locations (where the faecal samples are collected from). In order to account for this SAC , I used the auto_covdist function from spdep package. But when i use this as a new predictor in my model, and then check for spatial autocorrelation in the residues of the model, there is still spatial autocorrelation,…..could u see if i am doing something wrong please?

> #account for SAC in the model using weights

> # auto_covariate is a distance weighted covariate

> data$response <- as.numeric(data$response)

> auto_weight <- autocov_dist(data$prey.type, xy=coords, nbs=1, type="inverse", zero.policy = TRUE,style="W", longlat = TRUE)

>

> m5_auto <- glm(response ~ predictor1 + predictor2 + predictor3 + predictor4 + predictor1:predictor4, weight=auto_weight, family=quasibinomial("logit"), data=data)

> # check spatial autocorrelation - first convert data to spatial points dataframe

> dat <- SpatialPointsDataFrame(cbind(data$long, data$lat), data)

> lstw <- nb2listw(knn2nb(knearneigh(dat, k = 2)))

>

> # check SAC in model residuals

> moran.test(residuals.glm(m5_auto), lstw) # and gives the below:

>

> Moran I test under randomisation

> data: residuals.glm(m5)

> weights: lstw

>

> Moran I statistic standard deviate = 1.9194, p-value = 0.02747

> alternative hypothesis: greater

> sample estimates:

> Moran I statistic Expectation Variance

> 0.160824328 -0.004608295 0.007428642

>

> -Someone said its stupid to account for spatial autocorrelation in a logistic regression when you have a significant SAC using moran’s I. So i am now wondering how this can be solved? or does a SAC in a logistic regression be just ignored?

> I am new to spatial statistics and now idea how to move with such. I only know that my data has spatial

> autocorrelation (which i hope to have checked correctly using morans I as above) and now need to account for this in my analysis. Some advice would be greatly appreciated by people who have used to account for SAC in their logistic models. Is a logistic mixed models an option to consider?especially if your covariates are spatial in nature,…i read somewhere that if you cant account for SAC in glm then you can move to mixed models esp if your covariates are spatial which is expected to digest the SAC.

>

> Help and advice would be greatly appreciated.

> _______________________________________________

> R-sig-Geo mailing list

> [hidden email]

> https://stat.ethz.ch/mailman/listinfo/r-sig-geo

>

[[alternative HTML version deleted]]

_______________________________________________

R-sig-Geo mailing list

[hidden email]

https://stat.ethz.ch/mailman/listinfo/r-sig-geo

it is very important to account for SAC in any model. This can be done in various ways. In log.reg it is common to include spatial autocorrelation structures that describe the underlying SAC. To do so, you can use mixed models, e.g. MASS::glmmPQL().

Also have a look at Wood (2017) Generalized Additive Models in R.

I did account for it in my master thesis.Even though the code is not attached, it may help you: https://zenodo.org/record/814262

Cheers, Patrick

On Jul 10 2018, at 7:46 pm, Dechen Lham <[hidden email]> wrote:

>

> Hello all,

> I would like some help in my problem below:

> I am running a logistic regression and my best model residuals has spatial autocorrelation (SAC) when checked as below and also on the raw data of the response type. My response is binary 0 and 1 (type of prey and to be predicted by several predictors). These type of prey are obtained from a total of 200 locations (where the faecal samples are collected from). In order to account for this SAC , I used the auto_covdist function from spdep package. But when i use this as a new predictor in my model, and then check for spatial autocorrelation in the residues of the model, there is still spatial autocorrelation,…..could u see if i am doing something wrong please?

> #account for SAC in the model using weights

> # auto_covariate is a distance weighted covariate

> data$response <- as.numeric(data$response)

> auto_weight <- autocov_dist(data$prey.type, xy=coords, nbs=1, type="inverse", zero.policy = TRUE,style="W", longlat = TRUE)

>

> m5_auto <- glm(response ~ predictor1 + predictor2 + predictor3 + predictor4 + predictor1:predictor4, weight=auto_weight, family=quasibinomial("logit"), data=data)

> # check spatial autocorrelation - first convert data to spatial points dataframe

> dat <- SpatialPointsDataFrame(cbind(data$long, data$lat), data)

> lstw <- nb2listw(knn2nb(knearneigh(dat, k = 2)))

>

> # check SAC in model residuals

> moran.test(residuals.glm(m5_auto), lstw) # and gives the below:

>

> Moran I test under randomisation

> data: residuals.glm(m5)

> weights: lstw

>

> Moran I statistic standard deviate = 1.9194, p-value = 0.02747

> alternative hypothesis: greater

> sample estimates:

> Moran I statistic Expectation Variance

> 0.160824328 -0.004608295 0.007428642

>

> -Someone said its stupid to account for spatial autocorrelation in a logistic regression when you have a significant SAC using moran’s I. So i am now wondering how this can be solved? or does a SAC in a logistic regression be just ignored?

> I am new to spatial statistics and now idea how to move with such. I only know that my data has spatial

> autocorrelation (which i hope to have checked correctly using morans I as above) and now need to account for this in my analysis. Some advice would be greatly appreciated by people who have used to account for SAC in their logistic models. Is a logistic mixed models an option to consider?especially if your covariates are spatial in nature,…i read somewhere that if you cant account for SAC in glm then you can move to mixed models esp if your covariates are spatial which is expected to digest the SAC.

>

> Help and advice would be greatly appreciated.

> _______________________________________________

> R-sig-Geo mailing list

> [hidden email]

> https://stat.ethz.ch/mailman/listinfo/r-sig-geo

>

[[alternative HTML version deleted]]

_______________________________________________

R-sig-Geo mailing list

[hidden email]

https://stat.ethz.ch/mailman/listinfo/r-sig-geo

### Spatial autocorrelation help

Hello all,

I would like some help in my problem below:

I am running a logistic regression and my best model residuals has spatial autocorrelation (SAC) when checked as below and also on the raw data of the response type. My response is binary 0 and 1 (type of prey and to be predicted by several predictors). These type of prey are obtained from a total of 200 locations (where the faecal samples are collected from). In order to account for this SAC , I used the auto_covdist function from spdep package. But when i use this as a new predictor in my model, and then check for spatial autocorrelation in the residues of the model, there is still spatial autocorrelation,…..could u see if i am doing something wrong please?

#account for SAC in the model using weights

# auto_covariate is a distance weighted covariate

data$response <- as.numeric(data$response)

auto_weight <- autocov_dist(data$prey.type, xy=coords, nbs=1, type="inverse", zero.policy = TRUE,style="W", longlat = TRUE)

m5_auto <- glm(response ~ predictor1 + predictor2 + predictor3 + predictor4 + predictor1:predictor4, weight=auto_weight, family=quasibinomial("logit"), data=data)

# check spatial autocorrelation - first convert data to spatial points dataframe

dat <- SpatialPointsDataFrame(cbind(data$long, data$lat), data)

lstw <- nb2listw(knn2nb(knearneigh(dat, k = 2)))

# check SAC in model residuals

moran.test(residuals.glm(m5_auto), lstw) # and gives the below:

Moran I test under randomisation

data: residuals.glm(m5)

weights: lstw

Moran I statistic standard deviate = 1.9194, p-value = 0.02747

alternative hypothesis: greater

sample estimates:

Moran I statistic Expectation Variance

0.160824328 -0.004608295 0.007428642

-Someone said its stupid to account for spatial autocorrelation in a logistic regression when you have a significant SAC using moran’s I. So i am now wondering how this can be solved? or does a SAC in a logistic regression be just ignored?

I am new to spatial statistics and now idea how to move with such. I only know that my data has spatial

autocorrelation (which i hope to have checked correctly using morans I as above) and now need to account for this in my analysis. Some advice would be greatly appreciated by people who have used to account for SAC in their logistic models. Is a logistic mixed models an option to consider?especially if your covariates are spatial in nature,…i read somewhere that if you cant account for SAC in glm then you can move to mixed models esp if your covariates are spatial which is expected to digest the SAC.

Help and advice would be greatly appreciated.

_______________________________________________

R-sig-Geo mailing list

[hidden email]

https://stat.ethz.ch/mailman/listinfo/r-sig-geo

I would like some help in my problem below:

I am running a logistic regression and my best model residuals has spatial autocorrelation (SAC) when checked as below and also on the raw data of the response type. My response is binary 0 and 1 (type of prey and to be predicted by several predictors). These type of prey are obtained from a total of 200 locations (where the faecal samples are collected from). In order to account for this SAC , I used the auto_covdist function from spdep package. But when i use this as a new predictor in my model, and then check for spatial autocorrelation in the residues of the model, there is still spatial autocorrelation,…..could u see if i am doing something wrong please?

#account for SAC in the model using weights

# auto_covariate is a distance weighted covariate

data$response <- as.numeric(data$response)

auto_weight <- autocov_dist(data$prey.type, xy=coords, nbs=1, type="inverse", zero.policy = TRUE,style="W", longlat = TRUE)

m5_auto <- glm(response ~ predictor1 + predictor2 + predictor3 + predictor4 + predictor1:predictor4, weight=auto_weight, family=quasibinomial("logit"), data=data)

# check spatial autocorrelation - first convert data to spatial points dataframe

dat <- SpatialPointsDataFrame(cbind(data$long, data$lat), data)

lstw <- nb2listw(knn2nb(knearneigh(dat, k = 2)))

# check SAC in model residuals

moran.test(residuals.glm(m5_auto), lstw) # and gives the below:

Moran I test under randomisation

data: residuals.glm(m5)

weights: lstw

Moran I statistic standard deviate = 1.9194, p-value = 0.02747

alternative hypothesis: greater

sample estimates:

Moran I statistic Expectation Variance

0.160824328 -0.004608295 0.007428642

-Someone said its stupid to account for spatial autocorrelation in a logistic regression when you have a significant SAC using moran’s I. So i am now wondering how this can be solved? or does a SAC in a logistic regression be just ignored?

I am new to spatial statistics and now idea how to move with such. I only know that my data has spatial

autocorrelation (which i hope to have checked correctly using morans I as above) and now need to account for this in my analysis. Some advice would be greatly appreciated by people who have used to account for SAC in their logistic models. Is a logistic mixed models an option to consider?especially if your covariates are spatial in nature,…i read somewhere that if you cant account for SAC in glm then you can move to mixed models esp if your covariates are spatial which is expected to digest the SAC.

Help and advice would be greatly appreciated.

_______________________________________________

R-sig-Geo mailing list

[hidden email]

https://stat.ethz.ch/mailman/listinfo/r-sig-geo

### Re: converting from MSL EGM2008 to RH2000

On Tue, 10 Jul 2018, Harp, Nathen (DOT) via R-sig-Geo wrote:

> I don't believe R has geodetic packages but that is what you need. See NGS for examples,,

> https://www.ngs.noaa.gov/PC_PROD/pc_prod.shtml

While this is currently the case, we hope that PROJ >=5 will make this

possible - for now please also consider using PROJ command line tools with

geodetic pipelines, maybe ask on the proj list, and keep everyone

informed!

Roger

> ________________________________

> From: R-sig-Geo <[hidden email]> on behalf of Francis Freire <[hidden email]>

> Sent: Tuesday, July 10, 2018 7:20:59 AM

> To: '[hidden email]'

> Subject: [R-sig-Geo] converting from MSL EGM2008 to RH2000

>

> ATTENTION: This email came from an external source. Do not open attachments or click on links from unknown senders or unexpected emails.

>

>

> Hi,

>

> I am particularly new to R and would like to ask a question. I have looking all over the net for a way to convert the vertical reference system of our z values in our xyz text file from MSL EGM2008 to RH2000 using R. Has anyone done this before or can point me in the right directions?

>

> Best,

>

> Francis

> _______________________________________________

> R-sig-Geo mailing list

> [hidden email]

> https://stat.ethz.ch/mailman/listinfo/r-sig-geo

>

> [[alternative HTML version deleted]]

>

> _______________________________________________

> R-sig-Geo mailing list

> [hidden email]

> https://stat.ethz.ch/mailman/listinfo/r-sig-geo

>

--

Roger Bivand

Department of Economics, Norwegian School of Economics,

Helleveien 30, N-5045 Bergen, Norway.

voice: +47 55 95 93 55; e-mail: [hidden email]

http://orcid.org/0000-0003-2392-6140

https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en

_______________________________________________

R-sig-Geo mailing list

[hidden email]

https://stat.ethz.ch/mailman/listinfo/r-sig-geo

Roger Bivand

Department of Economics

Norwegian School of Economics

Helleveien 30

N-5045 Bergen, Norway

> I don't believe R has geodetic packages but that is what you need. See NGS for examples,,

> https://www.ngs.noaa.gov/PC_PROD/pc_prod.shtml

While this is currently the case, we hope that PROJ >=5 will make this

possible - for now please also consider using PROJ command line tools with

geodetic pipelines, maybe ask on the proj list, and keep everyone

informed!

Roger

> ________________________________

> From: R-sig-Geo <[hidden email]> on behalf of Francis Freire <[hidden email]>

> Sent: Tuesday, July 10, 2018 7:20:59 AM

> To: '[hidden email]'

> Subject: [R-sig-Geo] converting from MSL EGM2008 to RH2000

>

> ATTENTION: This email came from an external source. Do not open attachments or click on links from unknown senders or unexpected emails.

>

>

> Hi,

>

> I am particularly new to R and would like to ask a question. I have looking all over the net for a way to convert the vertical reference system of our z values in our xyz text file from MSL EGM2008 to RH2000 using R. Has anyone done this before or can point me in the right directions?

>

> Best,

>

> Francis

> _______________________________________________

> R-sig-Geo mailing list

> [hidden email]

> https://stat.ethz.ch/mailman/listinfo/r-sig-geo

>

> [[alternative HTML version deleted]]

>

> _______________________________________________

> R-sig-Geo mailing list

> [hidden email]

> https://stat.ethz.ch/mailman/listinfo/r-sig-geo

>

--

Roger Bivand

Department of Economics, Norwegian School of Economics,

Helleveien 30, N-5045 Bergen, Norway.

voice: +47 55 95 93 55; e-mail: [hidden email]

http://orcid.org/0000-0003-2392-6140

https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en

_______________________________________________

R-sig-Geo mailing list

[hidden email]

https://stat.ethz.ch/mailman/listinfo/r-sig-geo

Roger Bivand

Department of Economics

Norwegian School of Economics

Helleveien 30

N-5045 Bergen, Norway

### Re: converting from MSL EGM2008 to RH2000

I don't believe R has geodetic packages but that is what you need. See NGS for examples,,

https://www.ngs.noaa.gov/PC_PROD/pc_prod.shtml

________________________________

From: R-sig-Geo <[hidden email]> on behalf of Francis Freire <[hidden email]>

Sent: Tuesday, July 10, 2018 7:20:59 AM

To: '[hidden email]'

Subject: [R-sig-Geo] converting from MSL EGM2008 to RH2000

ATTENTION: This email came from an external source. Do not open attachments or click on links from unknown senders or unexpected emails.

Hi,

I am particularly new to R and would like to ask a question. I have looking all over the net for a way to convert the vertical reference system of our z values in our xyz text file from MSL EGM2008 to RH2000 using R. Has anyone done this before or can point me in the right directions?

Best,

Francis

_______________________________________________

R-sig-Geo mailing list

[hidden email]

https://stat.ethz.ch/mailman/listinfo/r-sig-geo

[[alternative HTML version deleted]]

_______________________________________________

R-sig-Geo mailing list

[hidden email]

https://stat.ethz.ch/mailman/listinfo/r-sig-geo

https://www.ngs.noaa.gov/PC_PROD/pc_prod.shtml

________________________________

From: R-sig-Geo <[hidden email]> on behalf of Francis Freire <[hidden email]>

Sent: Tuesday, July 10, 2018 7:20:59 AM

To: '[hidden email]'

Subject: [R-sig-Geo] converting from MSL EGM2008 to RH2000

ATTENTION: This email came from an external source. Do not open attachments or click on links from unknown senders or unexpected emails.

Hi,

I am particularly new to R and would like to ask a question. I have looking all over the net for a way to convert the vertical reference system of our z values in our xyz text file from MSL EGM2008 to RH2000 using R. Has anyone done this before or can point me in the right directions?

Best,

Francis

_______________________________________________

R-sig-Geo mailing list

[hidden email]

https://stat.ethz.ch/mailman/listinfo/r-sig-geo

[[alternative HTML version deleted]]

_______________________________________________

R-sig-Geo mailing list

[hidden email]

https://stat.ethz.ch/mailman/listinfo/r-sig-geo

### converting from MSL EGM2008 to RH2000

Hi,

I am particularly new to R and would like to ask a question. I have looking all over the net for a way to convert the vertical reference system of our z values in our xyz text file from MSL EGM2008 to RH2000 using R. Has anyone done this before or can point me in the right directions?

Best,

Francis

_______________________________________________

R-sig-Geo mailing list

[hidden email]

https://stat.ethz.ch/mailman/listinfo/r-sig-geo

### Memory issues using raster::subs

Hi all,

I am having issues with using the subs function in the raster package. In the past, I have successfully used the function to reclassify a raster, but now when I try to use it, I am receiving the error " Error: cannot allocate vector of size 2.0 Gb". The code is the same as what I had used before with a larger raster and data.frame.

For example, this code works:

segments = raster(D:/path/To/InputRaster.tif) ### objects raster

obj_predicted = data.frame(zone,predicted)

filename="D:/path/To/Raster.tif"

subs(segments,obj_predicted,by=1,which=2,filename=filename,progress="text")

The segments raster is of size 71026 by 78701. obj_predicted is a 1,693,839 X 2 column data frame, with each row of the first column of the data frame corresponding to a pixel value in the segments raster.

However, when I replace the segments raster with another raster that is 14157 by 11923 and the obj_predicted data frame is 6588 x 2, I receive the error message. The crs of both rasters is the same, both data frames are essentially the same etc.

Sorry that I can't really provide the data to attempt reproduction. Any help would be appreciated. I am going to attempt to process by block, but this seems

Wade

_______________________________________________

R-sig-Geo mailing list

[hidden email]

https://stat.ethz.ch/mailman/listinfo/r-sig-geo

I am having issues with using the subs function in the raster package. In the past, I have successfully used the function to reclassify a raster, but now when I try to use it, I am receiving the error " Error: cannot allocate vector of size 2.0 Gb". The code is the same as what I had used before with a larger raster and data.frame.

For example, this code works:

segments = raster(D:/path/To/InputRaster.tif) ### objects raster

obj_predicted = data.frame(zone,predicted)

filename="D:/path/To/Raster.tif"

subs(segments,obj_predicted,by=1,which=2,filename=filename,progress="text")

The segments raster is of size 71026 by 78701. obj_predicted is a 1,693,839 X 2 column data frame, with each row of the first column of the data frame corresponding to a pixel value in the segments raster.

However, when I replace the segments raster with another raster that is 14157 by 11923 and the obj_predicted data frame is 6588 x 2, I receive the error message. The crs of both rasters is the same, both data frames are essentially the same etc.

Sorry that I can't really provide the data to attempt reproduction. Any help would be appreciated. I am going to attempt to process by block, but this seems

Wade

_______________________________________________

R-sig-Geo mailing list

[hidden email]

https://stat.ethz.ch/mailman/listinfo/r-sig-geo