This is an

**for**__archive__**R-sig-geo**, not a forum. It is not possible to post through Nabble - you may not start a new thread nor follow up an existing thread. If you wish to post, but are not subscribed to the list through the list homepage, subscribe first**(through the list homepage, not via Nabble)**and post from your subscribed email address. Until 2015-06-20, subscribers could post through Nabble, but policy has been changed as too many non-subscribers misunderstood the interface.*Updated:*2 hours 43 min ago

### Re: testing nlme models for spatial correlation

Dear Javier,

Your problem is hard to understand without a reproducible example. You only

gives the code, not the data nor the error message.

Best regards,

ir. Thierry Onkelinx

Instituut voor natuur- en bosonderzoek / Research Institute for Nature and

Forest

team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance

Kliniekstraat 25

1070 Anderlecht

Belgium

To call in the statistician after the experiment is done may be no more

than asking him to perform a post-mortem examination: he may be able to say

what the experiment died of. ~ Sir Ronald Aylmer Fisher

The plural of anecdote is not data. ~ Roger Brinner

The combination of some data and an aching desire for an answer does not

ensure that a reasonable answer can be extracted from a given body of data.

~ John Tukey

2017-08-01 16:36 GMT+02:00 Javier Moreira <[hidden email]>:

> Thanks Thierry,

> I would check that info.

> Any ideas why if i choose the finest level of random effects as

> /SUBMUESTREO and run model 4 (with correlation) wont let me?

> If i undesrtand you wel, you adress more the second question i made, im

> all right?

> Thanks!

>

>

> El 1 ago. 2017 8:10 a. m., "Thierry Onkelinx" <[hidden email]>

> escribió:

>

> Dear Javier,

>

> The correlation structure in nlme only works on the residuals within the

> finest level of random effect. Observations in different random effect are

> independent.

>

> Have a look at the INLA package (http://www.r-inla.org). INLA allows for

> correlated random effects. So you have spatial correlation among the random

> effects (instead of among residuals). INLA has options for correlations

> along a 2D regular grid, a neighbourhood graph or a mesh. If you want an

> book on this, I recommend Zuur et al (2017) Beginner's Guide to Spatial,

> Temporal and Spatial-Temporal Ecological Data Analysis with R-INLA.

>

> Best regards,

>

>

> ir. Thierry Onkelinx

> Instituut voor natuur- en bosonderzoek / Research Institute for Nature and

> Forest

> team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance

> Kliniekstraat 25

> 1070 Anderlecht

> Belgium

>

> To call in the statistician after the experiment is done may be no more

> than asking him to perform a post-mortem examination: he may be able to say

> what the experiment died of. ~ Sir Ronald Aylmer Fisher

> The plural of anecdote is not data. ~ Roger Brinner

> The combination of some data and an aching desire for an answer does not

> ensure that a reasonable answer can be extracted from a given body of data.

> ~ John Tukey

>

> 2017-08-01 11:31 GMT+02:00 Javier Moreira <[hidden email]>:

>

>> hi,

>> im trying to generate different models to account for spatial correlation.

>> Im using nlme package, and mixed models, in order to compare two models,

>> one that doesnt include the spatial correlation and one that does.

>>

>> Its a nested design, one that has 4 leves,

>> BLOQUE/ AMBIENTE/ TRATAMIENTO/ SUBMUESTREO

>> Its a harvest set data, with multiple point of data/ treatment, so the

>> last

>> level account for another term in the error for "sub-muestreo".

>>

>> My first problem its, when i try to add de correlation term to the model,

>> i

>> cant, when the random effects are taken to the level /SUBMUESTREO, and i

>> have to leave it to the level of TRATAMIENTO.

>> When i do that, i have 2 differences between models, the term accounting

>> for sub-muestreo, and the spatial correlation.

>>

>> #MODELO 2##

>> attach(base_modelo3)

>> str(base_modelo3)

>> data1=base_modelo3

>> str(data1)

>> data1=groupedData(REND.~1|BLOQUE/AMBIENTE/TRATAMIENTO/SUBMUESTREO,

>> data=data1, units=list(y="(ton/ha)"))

>> data1$TRATAMIENTO =factor(data1$TRATAMIENTO)

>> data1$BLOQUE =factor(data1$BLOQUE)

>> data1$AMBIENTE =factor(data1$AMBIENTE)

>>

>> modelo2_MM<-lme(REND.~1+TRATAMIENTO*AMBIENTE,

>> random=~1|BLOQUE/AMBIENTE/TRATAMIENTO/SUBMUESTREO,

>> weights=varComb(varIdent(form=~1|TRATAMIENTO)),

>> data=data1,

>> control=lmeControl(niterEM=150,msMaxIter=200))

>> summary(modelo2_MM)

>> anova(modelo2_MM)

>>

>> ##MODELO 4##

>>

>> modelo4_MM<-lme(REND.~1+TRATAMIENTO*AMBIENTE,

>> random=~1|BLOQUE/AMBIENTE/TRATAMIENTO,

>> weights=varComb(varIdent(form=~1|TRATAMIENTO)),

>> correlation=corExp(form=~X+Y,nugget=T),

>> data=data1,

>> control=lmeControl(niterEM=150,msMaxIter=200))

>> summary(modelo4_MM)

>> anova(modelo4_MM)

>>

>> My second problem is, that i need to get the specific parameter for the

>> error term that belongs to the spatial correlation, in order to map it.

>> For

>> what i watch, what lme does is send it to the general error, and so, what

>> i

>> could do is make the differences between the residuals of these two

>> models.

>> so, its essetial to get the exact same model, except for the correlation

>> structure.

>> If anybody knows how to get the specific term of error accounting for the

>> correlation, it would be wonderful.

>>

>> E24=residuals(modelo24_3,type = "response")

>> E40=residuals(modelo4_MM,type = "response")

>> EE=E24-E40

>> post=data.frame(E24,E40,EE,data1$X,data1$Y)

>>

>> coordinates(post)<-c("data1.X","data1.Y")

>> coor_post<-coordinates(post)

>>

>> bubble(post,"E24",main="residuos modelo 2")

>> bubble(post,"E40",main="residuos modelo 4")

>> bubble(post,"EE",main="Est.espacial removida por Modelo 4")

>>

>> thanks!

>>

>> --

>> Javier Moreira de Souza

>> Ingeniero Agrónomo

>> 099 406 006

>>

>> [[alternative HTML version deleted]]

>>

>> _______________________________________________

>> R-sig-Geo mailing list

>> [hidden email]

>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo

>

>

>

>

[[alternative HTML version deleted]]

_______________________________________________

R-sig-Geo mailing list

[hidden email]

https://stat.ethz.ch/mailman/listinfo/r-sig-geo

Your problem is hard to understand without a reproducible example. You only

gives the code, not the data nor the error message.

Best regards,

ir. Thierry Onkelinx

Instituut voor natuur- en bosonderzoek / Research Institute for Nature and

Forest

team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance

Kliniekstraat 25

1070 Anderlecht

Belgium

To call in the statistician after the experiment is done may be no more

than asking him to perform a post-mortem examination: he may be able to say

what the experiment died of. ~ Sir Ronald Aylmer Fisher

The plural of anecdote is not data. ~ Roger Brinner

The combination of some data and an aching desire for an answer does not

ensure that a reasonable answer can be extracted from a given body of data.

~ John Tukey

2017-08-01 16:36 GMT+02:00 Javier Moreira <[hidden email]>:

> Thanks Thierry,

> I would check that info.

> Any ideas why if i choose the finest level of random effects as

> /SUBMUESTREO and run model 4 (with correlation) wont let me?

> If i undesrtand you wel, you adress more the second question i made, im

> all right?

> Thanks!

>

>

> El 1 ago. 2017 8:10 a. m., "Thierry Onkelinx" <[hidden email]>

> escribió:

>

> Dear Javier,

>

> The correlation structure in nlme only works on the residuals within the

> finest level of random effect. Observations in different random effect are

> independent.

>

> Have a look at the INLA package (http://www.r-inla.org). INLA allows for

> correlated random effects. So you have spatial correlation among the random

> effects (instead of among residuals). INLA has options for correlations

> along a 2D regular grid, a neighbourhood graph or a mesh. If you want an

> book on this, I recommend Zuur et al (2017) Beginner's Guide to Spatial,

> Temporal and Spatial-Temporal Ecological Data Analysis with R-INLA.

>

> Best regards,

>

>

> ir. Thierry Onkelinx

> Instituut voor natuur- en bosonderzoek / Research Institute for Nature and

> Forest

> team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance

> Kliniekstraat 25

> 1070 Anderlecht

> Belgium

>

> To call in the statistician after the experiment is done may be no more

> than asking him to perform a post-mortem examination: he may be able to say

> what the experiment died of. ~ Sir Ronald Aylmer Fisher

> The plural of anecdote is not data. ~ Roger Brinner

> The combination of some data and an aching desire for an answer does not

> ensure that a reasonable answer can be extracted from a given body of data.

> ~ John Tukey

>

> 2017-08-01 11:31 GMT+02:00 Javier Moreira <[hidden email]>:

>

>> hi,

>> im trying to generate different models to account for spatial correlation.

>> Im using nlme package, and mixed models, in order to compare two models,

>> one that doesnt include the spatial correlation and one that does.

>>

>> Its a nested design, one that has 4 leves,

>> BLOQUE/ AMBIENTE/ TRATAMIENTO/ SUBMUESTREO

>> Its a harvest set data, with multiple point of data/ treatment, so the

>> last

>> level account for another term in the error for "sub-muestreo".

>>

>> My first problem its, when i try to add de correlation term to the model,

>> i

>> cant, when the random effects are taken to the level /SUBMUESTREO, and i

>> have to leave it to the level of TRATAMIENTO.

>> When i do that, i have 2 differences between models, the term accounting

>> for sub-muestreo, and the spatial correlation.

>>

>> #MODELO 2##

>> attach(base_modelo3)

>> str(base_modelo3)

>> data1=base_modelo3

>> str(data1)

>> data1=groupedData(REND.~1|BLOQUE/AMBIENTE/TRATAMIENTO/SUBMUESTREO,

>> data=data1, units=list(y="(ton/ha)"))

>> data1$TRATAMIENTO =factor(data1$TRATAMIENTO)

>> data1$BLOQUE =factor(data1$BLOQUE)

>> data1$AMBIENTE =factor(data1$AMBIENTE)

>>

>> modelo2_MM<-lme(REND.~1+TRATAMIENTO*AMBIENTE,

>> random=~1|BLOQUE/AMBIENTE/TRATAMIENTO/SUBMUESTREO,

>> weights=varComb(varIdent(form=~1|TRATAMIENTO)),

>> data=data1,

>> control=lmeControl(niterEM=150,msMaxIter=200))

>> summary(modelo2_MM)

>> anova(modelo2_MM)

>>

>> ##MODELO 4##

>>

>> modelo4_MM<-lme(REND.~1+TRATAMIENTO*AMBIENTE,

>> random=~1|BLOQUE/AMBIENTE/TRATAMIENTO,

>> weights=varComb(varIdent(form=~1|TRATAMIENTO)),

>> correlation=corExp(form=~X+Y,nugget=T),

>> data=data1,

>> control=lmeControl(niterEM=150,msMaxIter=200))

>> summary(modelo4_MM)

>> anova(modelo4_MM)

>>

>> My second problem is, that i need to get the specific parameter for the

>> error term that belongs to the spatial correlation, in order to map it.

>> For

>> what i watch, what lme does is send it to the general error, and so, what

>> i

>> could do is make the differences between the residuals of these two

>> models.

>> so, its essetial to get the exact same model, except for the correlation

>> structure.

>> If anybody knows how to get the specific term of error accounting for the

>> correlation, it would be wonderful.

>>

>> E24=residuals(modelo24_3,type = "response")

>> E40=residuals(modelo4_MM,type = "response")

>> EE=E24-E40

>> post=data.frame(E24,E40,EE,data1$X,data1$Y)

>>

>> coordinates(post)<-c("data1.X","data1.Y")

>> coor_post<-coordinates(post)

>>

>> bubble(post,"E24",main="residuos modelo 2")

>> bubble(post,"E40",main="residuos modelo 4")

>> bubble(post,"EE",main="Est.espacial removida por Modelo 4")

>>

>> thanks!

>>

>> --

>> Javier Moreira de Souza

>> Ingeniero Agrónomo

>> 099 406 006

>>

>> [[alternative HTML version deleted]]

>>

>> _______________________________________________

>> R-sig-Geo mailing list

>> [hidden email]

>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo

>

>

>

>

[[alternative HTML version deleted]]

_______________________________________________

R-sig-Geo mailing list

[hidden email]

https://stat.ethz.ch/mailman/listinfo/r-sig-geo

### Re: testing nlme models for spatial correlation

Thanks Thierry,

I would check that info.

Any ideas why if i choose the finest level of random effects as

/SUBMUESTREO and run model 4 (with correlation) wont let me?

If i undesrtand you wel, you adress more the second question i made, im all

right?

Thanks!

El 1 ago. 2017 8:10 a. m., "Thierry Onkelinx" <[hidden email]>

escribió:

Dear Javier,

The correlation structure in nlme only works on the residuals within the

finest level of random effect. Observations in different random effect are

independent.

Have a look at the INLA package (http://www.r-inla.org). INLA allows for

correlated random effects. So you have spatial correlation among the random

effects (instead of among residuals). INLA has options for correlations

along a 2D regular grid, a neighbourhood graph or a mesh. If you want an

book on this, I recommend Zuur et al (2017) Beginner's Guide to Spatial,

Temporal and Spatial-Temporal Ecological Data Analysis with R-INLA.

Best regards,

ir. Thierry Onkelinx

Instituut voor natuur- en bosonderzoek / Research Institute for Nature and

Forest

team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance

Kliniekstraat 25

1070 Anderlecht

Belgium

To call in the statistician after the experiment is done may be no more

than asking him to perform a post-mortem examination: he may be able to say

what the experiment died of. ~ Sir Ronald Aylmer Fisher

The plural of anecdote is not data. ~ Roger Brinner

The combination of some data and an aching desire for an answer does not

ensure that a reasonable answer can be extracted from a given body of data.

~ John Tukey

2017-08-01 11:31 GMT+02:00 Javier Moreira <[hidden email]>:

> hi,

> im trying to generate different models to account for spatial correlation.

> Im using nlme package, and mixed models, in order to compare two models,

> one that doesnt include the spatial correlation and one that does.

>

> Its a nested design, one that has 4 leves,

> BLOQUE/ AMBIENTE/ TRATAMIENTO/ SUBMUESTREO

> Its a harvest set data, with multiple point of data/ treatment, so the last

> level account for another term in the error for "sub-muestreo".

>

> My first problem its, when i try to add de correlation term to the model, i

> cant, when the random effects are taken to the level /SUBMUESTREO, and i

> have to leave it to the level of TRATAMIENTO.

> When i do that, i have 2 differences between models, the term accounting

> for sub-muestreo, and the spatial correlation.

>

> #MODELO 2##

> attach(base_modelo3)

> str(base_modelo3)

> data1=base_modelo3

> str(data1)

> data1=groupedData(REND.~1|BLOQUE/AMBIENTE/TRATAMIENTO/SUBMUESTREO,

> data=data1, units=list(y="(ton/ha)"))

> data1$TRATAMIENTO =factor(data1$TRATAMIENTO)

> data1$BLOQUE =factor(data1$BLOQUE)

> data1$AMBIENTE =factor(data1$AMBIENTE)

>

> modelo2_MM<-lme(REND.~1+TRATAMIENTO*AMBIENTE,

> random=~1|BLOQUE/AMBIENTE/TRATAMIENTO/SUBMUESTREO,

> weights=varComb(varIdent(form=~1|TRATAMIENTO)),

> data=data1,

> control=lmeControl(niterEM=150,msMaxIter=200))

> summary(modelo2_MM)

> anova(modelo2_MM)

>

> ##MODELO 4##

>

> modelo4_MM<-lme(REND.~1+TRATAMIENTO*AMBIENTE,

> random=~1|BLOQUE/AMBIENTE/TRATAMIENTO,

> weights=varComb(varIdent(form=~1|TRATAMIENTO)),

> correlation=corExp(form=~X+Y,nugget=T),

> data=data1,

> control=lmeControl(niterEM=150,msMaxIter=200))

> summary(modelo4_MM)

> anova(modelo4_MM)

>

> My second problem is, that i need to get the specific parameter for the

> error term that belongs to the spatial correlation, in order to map it. For

> what i watch, what lme does is send it to the general error, and so, what i

> could do is make the differences between the residuals of these two models.

> so, its essetial to get the exact same model, except for the correlation

> structure.

> If anybody knows how to get the specific term of error accounting for the

> correlation, it would be wonderful.

>

> E24=residuals(modelo24_3,type = "response")

> E40=residuals(modelo4_MM,type = "response")

> EE=E24-E40

> post=data.frame(E24,E40,EE,data1$X,data1$Y)

>

> coordinates(post)<-c("data1.X","data1.Y")

> coor_post<-coordinates(post)

>

> bubble(post,"E24",main="residuos modelo 2")

> bubble(post,"E40",main="residuos modelo 4")

> bubble(post,"EE",main="Est.espacial removida por Modelo 4")

>

> thanks!

>

> --

> Javier Moreira de Souza

> Ingeniero Agrónomo

> 099 406 006

>

> [[alternative HTML version deleted]]

>

> _______________________________________________

> R-sig-Geo mailing list

> [hidden email]

> https://stat.ethz.ch/mailman/listinfo/r-sig-geo

[[alternative HTML version deleted]]

_______________________________________________

R-sig-Geo mailing list

[hidden email]

https://stat.ethz.ch/mailman/listinfo/r-sig-geo

I would check that info.

Any ideas why if i choose the finest level of random effects as

/SUBMUESTREO and run model 4 (with correlation) wont let me?

If i undesrtand you wel, you adress more the second question i made, im all

right?

Thanks!

El 1 ago. 2017 8:10 a. m., "Thierry Onkelinx" <[hidden email]>

escribió:

Dear Javier,

The correlation structure in nlme only works on the residuals within the

finest level of random effect. Observations in different random effect are

independent.

Have a look at the INLA package (http://www.r-inla.org). INLA allows for

correlated random effects. So you have spatial correlation among the random

effects (instead of among residuals). INLA has options for correlations

along a 2D regular grid, a neighbourhood graph or a mesh. If you want an

book on this, I recommend Zuur et al (2017) Beginner's Guide to Spatial,

Temporal and Spatial-Temporal Ecological Data Analysis with R-INLA.

Best regards,

ir. Thierry Onkelinx

Instituut voor natuur- en bosonderzoek / Research Institute for Nature and

Forest

team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance

Kliniekstraat 25

1070 Anderlecht

Belgium

To call in the statistician after the experiment is done may be no more

than asking him to perform a post-mortem examination: he may be able to say

what the experiment died of. ~ Sir Ronald Aylmer Fisher

The plural of anecdote is not data. ~ Roger Brinner

The combination of some data and an aching desire for an answer does not

ensure that a reasonable answer can be extracted from a given body of data.

~ John Tukey

2017-08-01 11:31 GMT+02:00 Javier Moreira <[hidden email]>:

> hi,

> im trying to generate different models to account for spatial correlation.

> Im using nlme package, and mixed models, in order to compare two models,

> one that doesnt include the spatial correlation and one that does.

>

> Its a nested design, one that has 4 leves,

> BLOQUE/ AMBIENTE/ TRATAMIENTO/ SUBMUESTREO

> Its a harvest set data, with multiple point of data/ treatment, so the last

> level account for another term in the error for "sub-muestreo".

>

> My first problem its, when i try to add de correlation term to the model, i

> cant, when the random effects are taken to the level /SUBMUESTREO, and i

> have to leave it to the level of TRATAMIENTO.

> When i do that, i have 2 differences between models, the term accounting

> for sub-muestreo, and the spatial correlation.

>

> #MODELO 2##

> attach(base_modelo3)

> str(base_modelo3)

> data1=base_modelo3

> str(data1)

> data1=groupedData(REND.~1|BLOQUE/AMBIENTE/TRATAMIENTO/SUBMUESTREO,

> data=data1, units=list(y="(ton/ha)"))

> data1$TRATAMIENTO =factor(data1$TRATAMIENTO)

> data1$BLOQUE =factor(data1$BLOQUE)

> data1$AMBIENTE =factor(data1$AMBIENTE)

>

> modelo2_MM<-lme(REND.~1+TRATAMIENTO*AMBIENTE,

> random=~1|BLOQUE/AMBIENTE/TRATAMIENTO/SUBMUESTREO,

> weights=varComb(varIdent(form=~1|TRATAMIENTO)),

> data=data1,

> control=lmeControl(niterEM=150,msMaxIter=200))

> summary(modelo2_MM)

> anova(modelo2_MM)

>

> ##MODELO 4##

>

> modelo4_MM<-lme(REND.~1+TRATAMIENTO*AMBIENTE,

> random=~1|BLOQUE/AMBIENTE/TRATAMIENTO,

> weights=varComb(varIdent(form=~1|TRATAMIENTO)),

> correlation=corExp(form=~X+Y,nugget=T),

> data=data1,

> control=lmeControl(niterEM=150,msMaxIter=200))

> summary(modelo4_MM)

> anova(modelo4_MM)

>

> My second problem is, that i need to get the specific parameter for the

> error term that belongs to the spatial correlation, in order to map it. For

> what i watch, what lme does is send it to the general error, and so, what i

> could do is make the differences between the residuals of these two models.

> so, its essetial to get the exact same model, except for the correlation

> structure.

> If anybody knows how to get the specific term of error accounting for the

> correlation, it would be wonderful.

>

> E24=residuals(modelo24_3,type = "response")

> E40=residuals(modelo4_MM,type = "response")

> EE=E24-E40

> post=data.frame(E24,E40,EE,data1$X,data1$Y)

>

> coordinates(post)<-c("data1.X","data1.Y")

> coor_post<-coordinates(post)

>

> bubble(post,"E24",main="residuos modelo 2")

> bubble(post,"E40",main="residuos modelo 4")

> bubble(post,"EE",main="Est.espacial removida por Modelo 4")

>

> thanks!

>

> --

> Javier Moreira de Souza

> Ingeniero Agrónomo

> 099 406 006

>

> [[alternative HTML version deleted]]

>

> _______________________________________________

> R-sig-Geo mailing list

> [hidden email]

> https://stat.ethz.ch/mailman/listinfo/r-sig-geo

[[alternative HTML version deleted]]

_______________________________________________

R-sig-Geo mailing list

[hidden email]

https://stat.ethz.ch/mailman/listinfo/r-sig-geo

### Re: testing nlme models for spatial correlation

Dear Javier,

The correlation structure in nlme only works on the residuals within the

finest level of random effect. Observations in different random effect are

independent.

Have a look at the INLA package (http://www.r-inla.org). INLA allows for

correlated random effects. So you have spatial correlation among the random

effects (instead of among residuals). INLA has options for correlations

along a 2D regular grid, a neighbourhood graph or a mesh. If you want an

book on this, I recommend Zuur et al (2017) Beginner's Guide to Spatial,

Temporal and Spatial-Temporal Ecological Data Analysis with R-INLA.

Best regards,

ir. Thierry Onkelinx

Instituut voor natuur- en bosonderzoek / Research Institute for Nature and

Forest

team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance

Kliniekstraat 25

1070 Anderlecht

Belgium

To call in the statistician after the experiment is done may be no more

than asking him to perform a post-mortem examination: he may be able to say

what the experiment died of. ~ Sir Ronald Aylmer Fisher

The plural of anecdote is not data. ~ Roger Brinner

The combination of some data and an aching desire for an answer does not

ensure that a reasonable answer can be extracted from a given body of data.

~ John Tukey

2017-08-01 11:31 GMT+02:00 Javier Moreira <[hidden email]>:

> hi,

> im trying to generate different models to account for spatial correlation.

> Im using nlme package, and mixed models, in order to compare two models,

> one that doesnt include the spatial correlation and one that does.

>

> Its a nested design, one that has 4 leves,

> BLOQUE/ AMBIENTE/ TRATAMIENTO/ SUBMUESTREO

> Its a harvest set data, with multiple point of data/ treatment, so the last

> level account for another term in the error for "sub-muestreo".

>

> My first problem its, when i try to add de correlation term to the model, i

> cant, when the random effects are taken to the level /SUBMUESTREO, and i

> have to leave it to the level of TRATAMIENTO.

> When i do that, i have 2 differences between models, the term accounting

> for sub-muestreo, and the spatial correlation.

>

> #MODELO 2##

> attach(base_modelo3)

> str(base_modelo3)

> data1=base_modelo3

> str(data1)

> data1=groupedData(REND.~1|BLOQUE/AMBIENTE/TRATAMIENTO/SUBMUESTREO,

> data=data1, units=list(y="(ton/ha)"))

> data1$TRATAMIENTO =factor(data1$TRATAMIENTO)

> data1$BLOQUE =factor(data1$BLOQUE)

> data1$AMBIENTE =factor(data1$AMBIENTE)

>

> modelo2_MM<-lme(REND.~1+TRATAMIENTO*AMBIENTE,

> random=~1|BLOQUE/AMBIENTE/TRATAMIENTO/SUBMUESTREO,

> weights=varComb(varIdent(form=~1|TRATAMIENTO)),

> data=data1,

> control=lmeControl(niterEM=150,msMaxIter=200))

> summary(modelo2_MM)

> anova(modelo2_MM)

>

> ##MODELO 4##

>

> modelo4_MM<-lme(REND.~1+TRATAMIENTO*AMBIENTE,

> random=~1|BLOQUE/AMBIENTE/TRATAMIENTO,

> weights=varComb(varIdent(form=~1|TRATAMIENTO)),

> correlation=corExp(form=~X+Y,nugget=T),

> data=data1,

> control=lmeControl(niterEM=150,msMaxIter=200))

> summary(modelo4_MM)

> anova(modelo4_MM)

>

> My second problem is, that i need to get the specific parameter for the

> error term that belongs to the spatial correlation, in order to map it. For

> what i watch, what lme does is send it to the general error, and so, what i

> could do is make the differences between the residuals of these two models.

> so, its essetial to get the exact same model, except for the correlation

> structure.

> If anybody knows how to get the specific term of error accounting for the

> correlation, it would be wonderful.

>

> E24=residuals(modelo24_3,type = "response")

> E40=residuals(modelo4_MM,type = "response")

> EE=E24-E40

> post=data.frame(E24,E40,EE,data1$X,data1$Y)

>

> coordinates(post)<-c("data1.X","data1.Y")

> coor_post<-coordinates(post)

>

> bubble(post,"E24",main="residuos modelo 2")

> bubble(post,"E40",main="residuos modelo 4")

> bubble(post,"EE",main="Est.espacial removida por Modelo 4")

>

> thanks!

>

> --

> Javier Moreira de Souza

> Ingeniero Agrónomo

> 099 406 006

>

> [[alternative HTML version deleted]]

>

> _______________________________________________

> R-sig-Geo mailing list

> [hidden email]

> https://stat.ethz.ch/mailman/listinfo/r-sig-geo

[[alternative HTML version deleted]]

_______________________________________________

R-sig-Geo mailing list

[hidden email]

https://stat.ethz.ch/mailman/listinfo/r-sig-geo

The correlation structure in nlme only works on the residuals within the

finest level of random effect. Observations in different random effect are

independent.

Have a look at the INLA package (http://www.r-inla.org). INLA allows for

correlated random effects. So you have spatial correlation among the random

effects (instead of among residuals). INLA has options for correlations

along a 2D regular grid, a neighbourhood graph or a mesh. If you want an

book on this, I recommend Zuur et al (2017) Beginner's Guide to Spatial,

Temporal and Spatial-Temporal Ecological Data Analysis with R-INLA.

Best regards,

ir. Thierry Onkelinx

Instituut voor natuur- en bosonderzoek / Research Institute for Nature and

Forest

team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance

Kliniekstraat 25

1070 Anderlecht

Belgium

To call in the statistician after the experiment is done may be no more

than asking him to perform a post-mortem examination: he may be able to say

what the experiment died of. ~ Sir Ronald Aylmer Fisher

The plural of anecdote is not data. ~ Roger Brinner

The combination of some data and an aching desire for an answer does not

ensure that a reasonable answer can be extracted from a given body of data.

~ John Tukey

2017-08-01 11:31 GMT+02:00 Javier Moreira <[hidden email]>:

> hi,

> im trying to generate different models to account for spatial correlation.

> Im using nlme package, and mixed models, in order to compare two models,

> one that doesnt include the spatial correlation and one that does.

>

> Its a nested design, one that has 4 leves,

> BLOQUE/ AMBIENTE/ TRATAMIENTO/ SUBMUESTREO

> Its a harvest set data, with multiple point of data/ treatment, so the last

> level account for another term in the error for "sub-muestreo".

>

> My first problem its, when i try to add de correlation term to the model, i

> cant, when the random effects are taken to the level /SUBMUESTREO, and i

> have to leave it to the level of TRATAMIENTO.

> When i do that, i have 2 differences between models, the term accounting

> for sub-muestreo, and the spatial correlation.

>

> #MODELO 2##

> attach(base_modelo3)

> str(base_modelo3)

> data1=base_modelo3

> str(data1)

> data1=groupedData(REND.~1|BLOQUE/AMBIENTE/TRATAMIENTO/SUBMUESTREO,

> data=data1, units=list(y="(ton/ha)"))

> data1$TRATAMIENTO =factor(data1$TRATAMIENTO)

> data1$BLOQUE =factor(data1$BLOQUE)

> data1$AMBIENTE =factor(data1$AMBIENTE)

>

> modelo2_MM<-lme(REND.~1+TRATAMIENTO*AMBIENTE,

> random=~1|BLOQUE/AMBIENTE/TRATAMIENTO/SUBMUESTREO,

> weights=varComb(varIdent(form=~1|TRATAMIENTO)),

> data=data1,

> control=lmeControl(niterEM=150,msMaxIter=200))

> summary(modelo2_MM)

> anova(modelo2_MM)

>

> ##MODELO 4##

>

> modelo4_MM<-lme(REND.~1+TRATAMIENTO*AMBIENTE,

> random=~1|BLOQUE/AMBIENTE/TRATAMIENTO,

> weights=varComb(varIdent(form=~1|TRATAMIENTO)),

> correlation=corExp(form=~X+Y,nugget=T),

> data=data1,

> control=lmeControl(niterEM=150,msMaxIter=200))

> summary(modelo4_MM)

> anova(modelo4_MM)

>

> My second problem is, that i need to get the specific parameter for the

> error term that belongs to the spatial correlation, in order to map it. For

> what i watch, what lme does is send it to the general error, and so, what i

> could do is make the differences between the residuals of these two models.

> so, its essetial to get the exact same model, except for the correlation

> structure.

> If anybody knows how to get the specific term of error accounting for the

> correlation, it would be wonderful.

>

> E24=residuals(modelo24_3,type = "response")

> E40=residuals(modelo4_MM,type = "response")

> EE=E24-E40

> post=data.frame(E24,E40,EE,data1$X,data1$Y)

>

> coordinates(post)<-c("data1.X","data1.Y")

> coor_post<-coordinates(post)

>

> bubble(post,"E24",main="residuos modelo 2")

> bubble(post,"E40",main="residuos modelo 4")

> bubble(post,"EE",main="Est.espacial removida por Modelo 4")

>

> thanks!

>

> --

> Javier Moreira de Souza

> Ingeniero Agrónomo

> 099 406 006

>

> [[alternative HTML version deleted]]

>

> _______________________________________________

> R-sig-Geo mailing list

> [hidden email]

> https://stat.ethz.ch/mailman/listinfo/r-sig-geo

[[alternative HTML version deleted]]

_______________________________________________

R-sig-Geo mailing list

[hidden email]

https://stat.ethz.ch/mailman/listinfo/r-sig-geo

### testing nlme models for spatial correlation

hi,

im trying to generate different models to account for spatial correlation.

Im using nlme package, and mixed models, in order to compare two models,

one that doesnt include the spatial correlation and one that does.

Its a nested design, one that has 4 leves,

BLOQUE/ AMBIENTE/ TRATAMIENTO/ SUBMUESTREO

Its a harvest set data, with multiple point of data/ treatment, so the last

level account for another term in the error for "sub-muestreo".

My first problem its, when i try to add de correlation term to the model, i

cant, when the random effects are taken to the level /SUBMUESTREO, and i

have to leave it to the level of TRATAMIENTO.

When i do that, i have 2 differences between models, the term accounting

for sub-muestreo, and the spatial correlation.

#MODELO 2##

attach(base_modelo3)

str(base_modelo3)

data1=base_modelo3

str(data1)

data1=groupedData(REND.~1|BLOQUE/AMBIENTE/TRATAMIENTO/SUBMUESTREO,

data=data1, units=list(y="(ton/ha)"))

data1$TRATAMIENTO =factor(data1$TRATAMIENTO)

data1$BLOQUE =factor(data1$BLOQUE)

data1$AMBIENTE =factor(data1$AMBIENTE)

modelo2_MM<-lme(REND.~1+TRATAMIENTO*AMBIENTE,

random=~1|BLOQUE/AMBIENTE/TRATAMIENTO/SUBMUESTREO,

weights=varComb(varIdent(form=~1|TRATAMIENTO)),

data=data1,

control=lmeControl(niterEM=150,msMaxIter=200))

summary(modelo2_MM)

anova(modelo2_MM)

##MODELO 4##

modelo4_MM<-lme(REND.~1+TRATAMIENTO*AMBIENTE,

random=~1|BLOQUE/AMBIENTE/TRATAMIENTO,

weights=varComb(varIdent(form=~1|TRATAMIENTO)),

correlation=corExp(form=~X+Y,nugget=T),

data=data1,

control=lmeControl(niterEM=150,msMaxIter=200))

summary(modelo4_MM)

anova(modelo4_MM)

My second problem is, that i need to get the specific parameter for the

error term that belongs to the spatial correlation, in order to map it. For

what i watch, what lme does is send it to the general error, and so, what i

could do is make the differences between the residuals of these two models.

so, its essetial to get the exact same model, except for the correlation

structure.

If anybody knows how to get the specific term of error accounting for the

correlation, it would be wonderful.

E24=residuals(modelo24_3,type = "response")

E40=residuals(modelo4_MM,type = "response")

EE=E24-E40

post=data.frame(E24,E40,EE,data1$X,data1$Y)

coordinates(post)<-c("data1.X","data1.Y")

coor_post<-coordinates(post)

bubble(post,"E24",main="residuos modelo 2")

bubble(post,"E40",main="residuos modelo 4")

bubble(post,"EE",main="Est.espacial removida por Modelo 4")

thanks!

--

Javier Moreira de Souza

Ingeniero Agrónomo

099 406 006

[[alternative HTML version deleted]]

_______________________________________________

R-sig-Geo mailing list

[hidden email]

https://stat.ethz.ch/mailman/listinfo/r-sig-geo

im trying to generate different models to account for spatial correlation.

Im using nlme package, and mixed models, in order to compare two models,

one that doesnt include the spatial correlation and one that does.

Its a nested design, one that has 4 leves,

BLOQUE/ AMBIENTE/ TRATAMIENTO/ SUBMUESTREO

Its a harvest set data, with multiple point of data/ treatment, so the last

level account for another term in the error for "sub-muestreo".

My first problem its, when i try to add de correlation term to the model, i

cant, when the random effects are taken to the level /SUBMUESTREO, and i

have to leave it to the level of TRATAMIENTO.

When i do that, i have 2 differences between models, the term accounting

for sub-muestreo, and the spatial correlation.

#MODELO 2##

attach(base_modelo3)

str(base_modelo3)

data1=base_modelo3

str(data1)

data1=groupedData(REND.~1|BLOQUE/AMBIENTE/TRATAMIENTO/SUBMUESTREO,

data=data1, units=list(y="(ton/ha)"))

data1$TRATAMIENTO =factor(data1$TRATAMIENTO)

data1$BLOQUE =factor(data1$BLOQUE)

data1$AMBIENTE =factor(data1$AMBIENTE)

modelo2_MM<-lme(REND.~1+TRATAMIENTO*AMBIENTE,

random=~1|BLOQUE/AMBIENTE/TRATAMIENTO/SUBMUESTREO,

weights=varComb(varIdent(form=~1|TRATAMIENTO)),

data=data1,

control=lmeControl(niterEM=150,msMaxIter=200))

summary(modelo2_MM)

anova(modelo2_MM)

##MODELO 4##

modelo4_MM<-lme(REND.~1+TRATAMIENTO*AMBIENTE,

random=~1|BLOQUE/AMBIENTE/TRATAMIENTO,

weights=varComb(varIdent(form=~1|TRATAMIENTO)),

correlation=corExp(form=~X+Y,nugget=T),

data=data1,

control=lmeControl(niterEM=150,msMaxIter=200))

summary(modelo4_MM)

anova(modelo4_MM)

My second problem is, that i need to get the specific parameter for the

error term that belongs to the spatial correlation, in order to map it. For

what i watch, what lme does is send it to the general error, and so, what i

could do is make the differences between the residuals of these two models.

so, its essetial to get the exact same model, except for the correlation

structure.

If anybody knows how to get the specific term of error accounting for the

correlation, it would be wonderful.

E24=residuals(modelo24_3,type = "response")

E40=residuals(modelo4_MM,type = "response")

EE=E24-E40

post=data.frame(E24,E40,EE,data1$X,data1$Y)

coordinates(post)<-c("data1.X","data1.Y")

coor_post<-coordinates(post)

bubble(post,"E24",main="residuos modelo 2")

bubble(post,"E40",main="residuos modelo 4")

bubble(post,"EE",main="Est.espacial removida por Modelo 4")

thanks!

--

Javier Moreira de Souza

Ingeniero Agrónomo

099 406 006

[[alternative HTML version deleted]]

_______________________________________________

R-sig-Geo mailing list

[hidden email]

https://stat.ethz.ch/mailman/listinfo/r-sig-geo

### Re: Test for spatial stationarity

On Tue, 1 Aug 2017, sara osama wrote:

> Dear all

> Hope this email finds you well. I want to ask if there is a formal test

> for spatial stationarity like the one for temporal stationarity. I have

> searched a lot, but I could not find a formal test by which we can

> conclude about stationarity in spatial data.

Sara:

Could you please clarify what you mean by "spatial data"? Do you mean

point observations of a continuous spatial process - say to make

predictions for unobserved locations, as in kriging? In this case, there

is a formal literature discussing stationarity, although I'm not aware of

formal tests (practically one might compare predictions made using models

that assume stationarity and those that do not, to see whether the

assumption changes the output; if it does, probably stationarity is not

present as assumed).

In the point process literature (point patterns), there are also results,

but they are often handled by relaxing assumptions about the homogeneity

of the underlying surface (introducing covariates influencing the

process).

For lattice (aggregated, polygon) data, there are tests based on

geographically weighted regression, but these lack power if the bandwidth

is inappropriate and/or the underlying model is mis-specified. They are

not univariate tests, and may best be seen as exploratory. There may be

other literatures, look for work for example by Werner Mueller and

co-authors.

Hope this helps,

Roger

> I would be grateful if anyone can help me with this.

> Best

> Sara

>

> [[alternative HTML version deleted]]

>

> _______________________________________________

> R-sig-Geo mailing list

> [hidden email]

> https://stat.ethz.ch/mailman/listinfo/r-sig-geo

>

--

Roger Bivand

Department of Economics, Norwegian School of Economics,

Helleveien 30, N-5045 Bergen, Norway.

voice: +47 55 95 93 55; e-mail: [hidden email]

Editor-in-Chief of The R Journal, https://journal.r-project.org/index.html

http://orcid.org/0000-0003-2392-6140

https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en

_______________________________________________

R-sig-Geo mailing list

[hidden email]

https://stat.ethz.ch/mailman/listinfo/r-sig-geo

Roger Bivand

Department of Economics

Norwegian School of Economics

Helleveien 30

N-5045 Bergen, Norway

> Dear all

> Hope this email finds you well. I want to ask if there is a formal test

> for spatial stationarity like the one for temporal stationarity. I have

> searched a lot, but I could not find a formal test by which we can

> conclude about stationarity in spatial data.

Sara:

Could you please clarify what you mean by "spatial data"? Do you mean

point observations of a continuous spatial process - say to make

predictions for unobserved locations, as in kriging? In this case, there

is a formal literature discussing stationarity, although I'm not aware of

formal tests (practically one might compare predictions made using models

that assume stationarity and those that do not, to see whether the

assumption changes the output; if it does, probably stationarity is not

present as assumed).

In the point process literature (point patterns), there are also results,

but they are often handled by relaxing assumptions about the homogeneity

of the underlying surface (introducing covariates influencing the

process).

For lattice (aggregated, polygon) data, there are tests based on

geographically weighted regression, but these lack power if the bandwidth

is inappropriate and/or the underlying model is mis-specified. They are

not univariate tests, and may best be seen as exploratory. There may be

other literatures, look for work for example by Werner Mueller and

co-authors.

Hope this helps,

Roger

> I would be grateful if anyone can help me with this.

> Best

> Sara

>

> [[alternative HTML version deleted]]

>

> _______________________________________________

> R-sig-Geo mailing list

> [hidden email]

> https://stat.ethz.ch/mailman/listinfo/r-sig-geo

>

--

Roger Bivand

Department of Economics, Norwegian School of Economics,

Helleveien 30, N-5045 Bergen, Norway.

voice: +47 55 95 93 55; e-mail: [hidden email]

Editor-in-Chief of The R Journal, https://journal.r-project.org/index.html

http://orcid.org/0000-0003-2392-6140

https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en

_______________________________________________

R-sig-Geo mailing list

[hidden email]

https://stat.ethz.ch/mailman/listinfo/r-sig-geo

Roger Bivand

Department of Economics

Norwegian School of Economics

Helleveien 30

N-5045 Bergen, Norway

### Test for spatial stationarity

Dear all

Hope this email finds you well. I want to ask if there is a formal test for

spatial stationarity like the one for temporal stationarity. I have

searched a lot, but I could not find a formal test by which we can conclude

about stationarity in spatial data.

I would be grateful if anyone can help me with this.

Best

Sara

[[alternative HTML version deleted]]

_______________________________________________

R-sig-Geo mailing list

[hidden email]

https://stat.ethz.ch/mailman/listinfo/r-sig-geo

Hope this email finds you well. I want to ask if there is a formal test for

spatial stationarity like the one for temporal stationarity. I have

searched a lot, but I could not find a formal test by which we can conclude

about stationarity in spatial data.

I would be grateful if anyone can help me with this.

Best

Sara

[[alternative HTML version deleted]]

_______________________________________________

R-sig-Geo mailing list

[hidden email]

https://stat.ethz.ch/mailman/listinfo/r-sig-geo

### Correlation between Raster Files

Hi Everyone,

I am a newbie is spatial statistics (but I know some R) and I apologize if my question has been posted before.

I live in Kenya and I have a set of raster files (matrices) containing some spatial data (for instance :yearly precipitation per square degree) and this data changes with every year.

So I would like to find the correlation between these (raster files) matrices. These matrices are rather small (200X400) and each element.

Thank you.

Any suggestion is welcome!

David

[[alternative HTML version deleted]]

_______________________________________________

R-sig-Geo mailing list

[hidden email]

https://stat.ethz.ch/mailman/listinfo/r-sig-geo

I am a newbie is spatial statistics (but I know some R) and I apologize if my question has been posted before.

I live in Kenya and I have a set of raster files (matrices) containing some spatial data (for instance :yearly precipitation per square degree) and this data changes with every year.

So I would like to find the correlation between these (raster files) matrices. These matrices are rather small (200X400) and each element.

Thank you.

Any suggestion is welcome!

David

[[alternative HTML version deleted]]

_______________________________________________

R-sig-Geo mailing list

[hidden email]

https://stat.ethz.ch/mailman/listinfo/r-sig-geo

### Re: bivariate spatial correlation in R

Rafael,

I'm sorry, but there is no way you can logically "analyze who benefits the

recent changes in the transport system in terms of access to jobs" from

the data you have.

Even if you had aggregate household income data for 2014 and 2017 (not for

2010 only), you would not know whether wealthier families had not dispaced

poorer families as accessibility improved. You need individual data,

either survey or register, preferably panel, to show that changes in

accessibility change the economic welfare of households controlling for

movement of households. The timestamps on the data make any attempt to do

this very risky; the real findings from a hypothetical surevey-based panel

might be completely different, especially if poorer households were

displaced (also indirectly, through rising house prices driven by improved

accessibility). Gauging the welfare effects of transport investments is

very hard to instrument.

The closest I could get was to map deciles of the change in access (more

negatives than positives) and compare the aspatial income distributions:

library(spdep)

library(rgdal)

map <- readOGR(dsn=".", layer="test_map")

library(classInt)

cI <- classIntervals(map$diffaccess, n=10, style="quantile")

library(RColorBrewer)

ybrpal <- brewer.pal(6, "YlOrBr")

fC <- findColours(cI, ybrpal)

qtm(map, fill="diffaccess", fill.breaks=cI$brks, format="Europe2")

map$faccess <- factor(findInterval(map$diffaccess, cI$brks,

all.inside=TRUE), labels=names(attr(fC, "table")))

qtm(map, fill="diffaccess", fill.breaks=cI$brks, format="Europe2")

acc_income <- split(map$income, map$faccess)

do.call("rbind", lapply(acc_income, summary))

dens <- lapply(acc_income, density)

plot(1, ylab="", xlab="", type="n", xlim=c(-2000, 11000), ylim=c(0,

0.002))

for (i in seq(along=dens)) lines(dens[[i]], col=i)

legend("topright", legend=names(dens), col=1:length(dens), lty=1, bty="n")

These density curves really do not suggest any clear relationship, other

than that some areas with increased accessibility had higher incomes in

2010.

You can examine the reverse relationship - were aggregate areas that were

more wealthy in 2010 able to attract more changes to accessibility? The

answer seems to be yes, they were able to do this:

nb <- poly2nb(map)

lw <- nb2listw(nb, style = "W", zero.policy = T)

lm.morantest(lm(diffaccess ~ I(income/1000), map), lw)

# SLX model

summary(lmSLX(diffaccess ~ I(income/1000), map, lw))

lm.morantest(lmSLX(diffaccess ~ I(income/1000), map, lw), lw)

# Spatial Durbin error model - SDEM

obj <- errorsarlm(diffaccess ~ I(income/1000), map, lw, etype="emixed")

summary(impacts(obj))

summary(impacts(lmSLX(diffaccess ~ I(income/1000), map, lw)))

LR.sarlm(lmSLX(diffaccess ~ I(income/1000), map, lw), obj)

It would be possible to run lm.morantest.sad() on the output of the SDEM

model taking global spatial autocorrelation into account. If you need

that, follow up in this thread.

Bivariate Moran's I should not be used in this case, but could be used in

other cases - use in change over time is troubling because randomisation

will not be a good guide as t=1 and t=2 are subject to temporal as well as

spatial autocorrelation, so you cannot use permutation bootstrap to find

a usable measure of significance.

Hope this clarifies, and thanks for the code.

Roger

On Sun, 30 Jul 2017, Rafael Pereira wrote:

> Roger,

>

> Population and income data are single point in time and come from the 2010

> Census.

>

> Accessibility variables in 2014 and 2017 show the proportion of jobs

> accessible by public transport under 60 minutes. The variable diffaccess

> shows the difference between these two. It's in percentage points

> (access2017 - access2014)

>

> best,

>

> Rafael H M Pereira

> urbandemographics.blogspot.com

>

> On Sun, Jul 30, 2017 at 7:41 AM, Roger Bivand <[hidden email]> wrote:

>

>> Thanks, I'll get back when able, offline now. What are the units of

>> observation, and are aggregate household incomes observed only once?

>>

>> Roger

>>

>> Roger Bivand

>> Norwegian School of Economics

>> Bergen, Norway

>>

>>

>>

>> Fra: Rafael Pereira

>> Sendt: søndag 30. juli, 00.39

>> Emne: Re: [R-sig-Geo] bivariate spatial correlation in R

>> Kopi: Rogério Barbosa, [hidden email]

>>

>>

>> Hi all, here is a reproducible example to calculate in R bivariate Moran's

>> I and LISA clusters. This example is based on a this answer provided in SO*

>> and it uses a toy model of my data. The R script and the shape file with

>> the data are available on this link. https://gist.github.com/

>> rafapereirabr/5348193abf779625f5e8c5090776a228 What this example does is

>> to estimate the spatial association between household income per capita and

>> the gains in accessibility to jobs. The aim is to analyze who benefits the

>> recent changes in the transport system in terms of access to jobs. So the

>> idea is not to find causal relationships, but spatial association between

>> areas of high/low income who had high/low gains in accessibility. The

>> variables in the data show info on the proportion of jobs accessible in

>> both years 2014 and 2017 (access2014, access2017) and the difference

>> between the two years in percentage points (diffaccess). Roger, I know you

>> have shown to be a bit sceptical about this application of bivariate

>> Moran's I. Do you still think a spatial regression would be more

>> appropriate? Also, I would be glad to hear if others have comments on the

>> code. This function is not implemented in any package so it would be great

>> to have some feedback. Rafael H M Pereira urbandemographics.blogspot.com

>> * https://stackoverflow.com/questions/45177590/map-of-

>> bivariate-spatial-correlation-in-r-bivariate-lisa On Wed, Jul 26, 2017 at

>> 11:07 AM, Roger Bivand wrote: > On Wed, 26 Jul 2017, Rafael Pereira wrote:

>>>> Roger, >> >> This example was provided only for the sake or making the

>> code easily >> reproducible for others and I'm more interested in how the

>> bi-variate >> Moran >> could be implemented in R, but your comments are

>> very much welcomed and >> I've made changes to the question. >> >> My

>> actual case study looks at bi-variate spatial correlation between (a) >>

>> average household income per capita and (b) proportion of jobs in the city

>>>> that are accessible under 60 minutes by transit. I don't think I could

>> use >> rates in this case but I will normalize the variables using >>

>> scale(data$variable). >> > > Please provide a reproducible example, either

>> with a link to a data > subset, or using a builtin data set. My guess is

>> that you do not need > bi-variate spatial correlation at all, but rather a

>> spatial regression. > > The "causal" variable would then the the proportion

>> of jobs accessible > within 60 minutes by transit, though this is extremely

>> blunt, and lots of > other covariates (demography, etc.) impact average

>> household income per > capita (per block/tract?). Since there are many

>> missing variables in your > specification, any spatial correlation would be

>> most closely associated > with them (demography, housing costs, education,

>> etc.), and the choice of > units of measurement would dominate the outcome.

>>>> This is also why bi-variate spatial correlation is seldom a good idea,

>> I > believe. It can be done, but most likely shouldn't, unless it can be >

>> motivated properly. > > By the way, the weighted and FDR-corrected SAD

>> local Moran's I p-values of > the black/white ratio for Oregon (your toy

>> example) did deliver the goods - > if you zoom in in mapview::mapview, you

>> can see that it detects a rate > hotspot between the rivers. > > Roger > >

>>>>> best, >> >> Rafael H M Pereira >> >> On Mon, Jul 24, 2017 at 7:56 PM,

>> Roger Bivand >> wrote: >> >> On Mon, 24 Jul 2017, Rafael Pereira wrote: >>>

>>>>> Hi all, >>> >>>> >>>> I would like to ask whether some you conducted

>> bi-variate spatial >>>> correlation in R. >>>> >>>> I know the bi-variate

>> Moran's I is not implemented in the spdep library. >>>> I left a question

>> on SO but also wanted to hear if anyone if the >>>> mainlist >>>> have come

>> across this. >>>> https://stackoverflow.com/questions/45177590/map-of-

>> bivariat >>>> e-spatial-correlation-in-r-bivariate-lisa >>>> >>>> I also

>> know Roger Bivand has implemented the L index proposed by Lee >>>> (2001)

>>>>>> in spdep, but I'm not I'm not sure whether the L local correlation

>>>>>> coefficients can be interpreted the same way as the local Moran's I

>>>>>> coefficients. I couldn't find any reference commenting on this issue.

>> I >>>> would very much appreciate your thoughts this. >>>> >>>> >>> In the

>> SO question, and in the follow-up, your presumably throw-away >>> example

>> makes fundamental mistakes. The code in spdep by Virgilio >>> Gómez-Rubio

>> is for uni- and bivariate L, and produces point values of >>> local >>> L.

>> This isn't the main problem, which is rather that you are not taking >>>

>> account of the underlying population counts, nor shrinking any estimates

>>>>> of >>> significance to accommodate population sizes. Population sizes

>> vary from >>> 0 >>> to 11858, with the lower quartile at 3164 and upper

>> 5698: >>> plot(ecdf(oregon.tract$pop2000)). Should you be comparing rates

>> in >>> stead? >>> These are also compositional variables (sum to pop2000,

>> or 1 if rates) >>> with >>> the other missing components. You would

>> probably be better served by >>> tools >>> examining spatial segregation,

>> such as for example the seg package. >>> >>> The 0 count populations cause

>> problems for an unofficial alternative, the >>> black/white ratio: >>> >>>

>> oregon.tract1 0,] >>> oregon.tract1$rat >> nb >> lw >> >>> which should

>> still be adjusted by weighting: >>> >>> lm0 >> >>> I'm not advising this,

>> but running localmoran.sad on this model output >>> yields SAD p-values <

>> 0.05 after FDR correction only in contiguous tracts >>> on the Washington

>> state line in Portland between the Columbia and >>> Williamette rivers. So

>> do look at the variables you are using before >>> rushing into things. >>>

>>>>> Hope this clarifies, >>> >>> Roger >>> >>> >>> best, >>>> >>>> Rafael

>> HM Pereira >>>> http://urbandemographics.blogspot.com >>>> >>>>

>> [[alternative HTML version deleted]] >>>> >>>>

>> _______________________________________________ >>>> R-sig-Geo mailing

>> list >>>> [hidden email] >>>> https://stat.ethz.ch/mailman/

>> listinfo/r-sig-geo >>>> >>>> >>>> -- >>> Roger Bivand >>> Department of

>> Economics, Norwegian School of Economics, >>> Helleveien 30, N-5045 Bergen,

>> Norway. >>> voice: +47 55 95 93 55 <+47%2055%2095%2093%2055>; e-mail:

>> [hidden email] >>> Editor-in-Chief of The R Journal,

>> https://journal.r-project.org/ >>> index.html >>>

>> http://orcid.org/0000-0003-2392-6140 >>> https://scholar.google.no/

>> citations?user=AWeghB0AAAAJ&hl=en >>> >> >> [[alternative HTML version

>> deleted]] >> >> _______________________________________________ >>

>> R-sig-Geo mailing list >> [hidden email] >>

>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo >> > > -- > Roger Bivand

>>> Department of Economics, Norwegian School of Economics, > Helleveien 30,

>> N-5045 Bergen, Norway. > voice: +47 55 95 93 55 <+47%2055%2095%2093%2055>;

>> e-mail: [hidden email] > Editor-in-Chief of The R Journal,

>> https://journal.r-project.org/index.html > http://orcid.org/0000-0003-

>> 2392-6140 > https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en >

>> [[alternative HTML version deleted]] _______________________________________________

>> R-sig-Geo mailing list [hidden email]

>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo

>>

>>

> --

Roger Bivand

Department of Economics, Norwegian School of Economics,

Helleveien 30, N-5045 Bergen, Norway.

voice: +47 55 95 93 55; e-mail: [hidden email]

Editor-in-Chief of The R Journal, https://journal.r-project.org/index.html

http://orcid.org/0000-0003-2392-6140

https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en

_______________________________________________

R-sig-Geo mailing list

[hidden email]

https://stat.ethz.ch/mailman/listinfo/r-sig-geo Roger Bivand

Department of Economics

Norwegian School of Economics

Helleveien 30

N-5045 Bergen, Norway

I'm sorry, but there is no way you can logically "analyze who benefits the

recent changes in the transport system in terms of access to jobs" from

the data you have.

Even if you had aggregate household income data for 2014 and 2017 (not for

2010 only), you would not know whether wealthier families had not dispaced

poorer families as accessibility improved. You need individual data,

either survey or register, preferably panel, to show that changes in

accessibility change the economic welfare of households controlling for

movement of households. The timestamps on the data make any attempt to do

this very risky; the real findings from a hypothetical surevey-based panel

might be completely different, especially if poorer households were

displaced (also indirectly, through rising house prices driven by improved

accessibility). Gauging the welfare effects of transport investments is

very hard to instrument.

The closest I could get was to map deciles of the change in access (more

negatives than positives) and compare the aspatial income distributions:

library(spdep)

library(rgdal)

map <- readOGR(dsn=".", layer="test_map")

library(classInt)

cI <- classIntervals(map$diffaccess, n=10, style="quantile")

library(RColorBrewer)

ybrpal <- brewer.pal(6, "YlOrBr")

fC <- findColours(cI, ybrpal)

qtm(map, fill="diffaccess", fill.breaks=cI$brks, format="Europe2")

map$faccess <- factor(findInterval(map$diffaccess, cI$brks,

all.inside=TRUE), labels=names(attr(fC, "table")))

qtm(map, fill="diffaccess", fill.breaks=cI$brks, format="Europe2")

acc_income <- split(map$income, map$faccess)

do.call("rbind", lapply(acc_income, summary))

dens <- lapply(acc_income, density)

plot(1, ylab="", xlab="", type="n", xlim=c(-2000, 11000), ylim=c(0,

0.002))

for (i in seq(along=dens)) lines(dens[[i]], col=i)

legend("topright", legend=names(dens), col=1:length(dens), lty=1, bty="n")

These density curves really do not suggest any clear relationship, other

than that some areas with increased accessibility had higher incomes in

2010.

You can examine the reverse relationship - were aggregate areas that were

more wealthy in 2010 able to attract more changes to accessibility? The

answer seems to be yes, they were able to do this:

nb <- poly2nb(map)

lw <- nb2listw(nb, style = "W", zero.policy = T)

lm.morantest(lm(diffaccess ~ I(income/1000), map), lw)

# SLX model

summary(lmSLX(diffaccess ~ I(income/1000), map, lw))

lm.morantest(lmSLX(diffaccess ~ I(income/1000), map, lw), lw)

# Spatial Durbin error model - SDEM

obj <- errorsarlm(diffaccess ~ I(income/1000), map, lw, etype="emixed")

summary(impacts(obj))

summary(impacts(lmSLX(diffaccess ~ I(income/1000), map, lw)))

LR.sarlm(lmSLX(diffaccess ~ I(income/1000), map, lw), obj)

It would be possible to run lm.morantest.sad() on the output of the SDEM

model taking global spatial autocorrelation into account. If you need

that, follow up in this thread.

Bivariate Moran's I should not be used in this case, but could be used in

other cases - use in change over time is troubling because randomisation

will not be a good guide as t=1 and t=2 are subject to temporal as well as

spatial autocorrelation, so you cannot use permutation bootstrap to find

a usable measure of significance.

Hope this clarifies, and thanks for the code.

Roger

On Sun, 30 Jul 2017, Rafael Pereira wrote:

> Roger,

>

> Population and income data are single point in time and come from the 2010

> Census.

>

> Accessibility variables in 2014 and 2017 show the proportion of jobs

> accessible by public transport under 60 minutes. The variable diffaccess

> shows the difference between these two. It's in percentage points

> (access2017 - access2014)

>

> best,

>

> Rafael H M Pereira

> urbandemographics.blogspot.com

>

> On Sun, Jul 30, 2017 at 7:41 AM, Roger Bivand <[hidden email]> wrote:

>

>> Thanks, I'll get back when able, offline now. What are the units of

>> observation, and are aggregate household incomes observed only once?

>>

>> Roger

>>

>> Roger Bivand

>> Norwegian School of Economics

>> Bergen, Norway

>>

>>

>>

>> Fra: Rafael Pereira

>> Sendt: søndag 30. juli, 00.39

>> Emne: Re: [R-sig-Geo] bivariate spatial correlation in R

>> Kopi: Rogério Barbosa, [hidden email]

>>

>>

>> Hi all, here is a reproducible example to calculate in R bivariate Moran's

>> I and LISA clusters. This example is based on a this answer provided in SO*

>> and it uses a toy model of my data. The R script and the shape file with

>> the data are available on this link. https://gist.github.com/

>> rafapereirabr/5348193abf779625f5e8c5090776a228 What this example does is

>> to estimate the spatial association between household income per capita and

>> the gains in accessibility to jobs. The aim is to analyze who benefits the

>> recent changes in the transport system in terms of access to jobs. So the

>> idea is not to find causal relationships, but spatial association between

>> areas of high/low income who had high/low gains in accessibility. The

>> variables in the data show info on the proportion of jobs accessible in

>> both years 2014 and 2017 (access2014, access2017) and the difference

>> between the two years in percentage points (diffaccess). Roger, I know you

>> have shown to be a bit sceptical about this application of bivariate

>> Moran's I. Do you still think a spatial regression would be more

>> appropriate? Also, I would be glad to hear if others have comments on the

>> code. This function is not implemented in any package so it would be great

>> to have some feedback. Rafael H M Pereira urbandemographics.blogspot.com

>> * https://stackoverflow.com/questions/45177590/map-of-

>> bivariate-spatial-correlation-in-r-bivariate-lisa On Wed, Jul 26, 2017 at

>> 11:07 AM, Roger Bivand wrote: > On Wed, 26 Jul 2017, Rafael Pereira wrote:

>>>> Roger, >> >> This example was provided only for the sake or making the

>> code easily >> reproducible for others and I'm more interested in how the

>> bi-variate >> Moran >> could be implemented in R, but your comments are

>> very much welcomed and >> I've made changes to the question. >> >> My

>> actual case study looks at bi-variate spatial correlation between (a) >>

>> average household income per capita and (b) proportion of jobs in the city

>>>> that are accessible under 60 minutes by transit. I don't think I could

>> use >> rates in this case but I will normalize the variables using >>

>> scale(data$variable). >> > > Please provide a reproducible example, either

>> with a link to a data > subset, or using a builtin data set. My guess is

>> that you do not need > bi-variate spatial correlation at all, but rather a

>> spatial regression. > > The "causal" variable would then the the proportion

>> of jobs accessible > within 60 minutes by transit, though this is extremely

>> blunt, and lots of > other covariates (demography, etc.) impact average

>> household income per > capita (per block/tract?). Since there are many

>> missing variables in your > specification, any spatial correlation would be

>> most closely associated > with them (demography, housing costs, education,

>> etc.), and the choice of > units of measurement would dominate the outcome.

>>>> This is also why bi-variate spatial correlation is seldom a good idea,

>> I > believe. It can be done, but most likely shouldn't, unless it can be >

>> motivated properly. > > By the way, the weighted and FDR-corrected SAD

>> local Moran's I p-values of > the black/white ratio for Oregon (your toy

>> example) did deliver the goods - > if you zoom in in mapview::mapview, you

>> can see that it detects a rate > hotspot between the rivers. > > Roger > >

>>>>> best, >> >> Rafael H M Pereira >> >> On Mon, Jul 24, 2017 at 7:56 PM,

>> Roger Bivand >> wrote: >> >> On Mon, 24 Jul 2017, Rafael Pereira wrote: >>>

>>>>> Hi all, >>> >>>> >>>> I would like to ask whether some you conducted

>> bi-variate spatial >>>> correlation in R. >>>> >>>> I know the bi-variate

>> Moran's I is not implemented in the spdep library. >>>> I left a question

>> on SO but also wanted to hear if anyone if the >>>> mainlist >>>> have come

>> across this. >>>> https://stackoverflow.com/questions/45177590/map-of-

>> bivariat >>>> e-spatial-correlation-in-r-bivariate-lisa >>>> >>>> I also

>> know Roger Bivand has implemented the L index proposed by Lee >>>> (2001)

>>>>>> in spdep, but I'm not I'm not sure whether the L local correlation

>>>>>> coefficients can be interpreted the same way as the local Moran's I

>>>>>> coefficients. I couldn't find any reference commenting on this issue.

>> I >>>> would very much appreciate your thoughts this. >>>> >>>> >>> In the

>> SO question, and in the follow-up, your presumably throw-away >>> example

>> makes fundamental mistakes. The code in spdep by Virgilio >>> Gómez-Rubio

>> is for uni- and bivariate L, and produces point values of >>> local >>> L.

>> This isn't the main problem, which is rather that you are not taking >>>

>> account of the underlying population counts, nor shrinking any estimates

>>>>> of >>> significance to accommodate population sizes. Population sizes

>> vary from >>> 0 >>> to 11858, with the lower quartile at 3164 and upper

>> 5698: >>> plot(ecdf(oregon.tract$pop2000)). Should you be comparing rates

>> in >>> stead? >>> These are also compositional variables (sum to pop2000,

>> or 1 if rates) >>> with >>> the other missing components. You would

>> probably be better served by >>> tools >>> examining spatial segregation,

>> such as for example the seg package. >>> >>> The 0 count populations cause

>> problems for an unofficial alternative, the >>> black/white ratio: >>> >>>

>> oregon.tract1 0,] >>> oregon.tract1$rat >> nb >> lw >> >>> which should

>> still be adjusted by weighting: >>> >>> lm0 >> >>> I'm not advising this,

>> but running localmoran.sad on this model output >>> yields SAD p-values <

>> 0.05 after FDR correction only in contiguous tracts >>> on the Washington

>> state line in Portland between the Columbia and >>> Williamette rivers. So

>> do look at the variables you are using before >>> rushing into things. >>>

>>>>> Hope this clarifies, >>> >>> Roger >>> >>> >>> best, >>>> >>>> Rafael

>> HM Pereira >>>> http://urbandemographics.blogspot.com >>>> >>>>

>> [[alternative HTML version deleted]] >>>> >>>>

>> _______________________________________________ >>>> R-sig-Geo mailing

>> list >>>> [hidden email] >>>> https://stat.ethz.ch/mailman/

>> listinfo/r-sig-geo >>>> >>>> >>>> -- >>> Roger Bivand >>> Department of

>> Economics, Norwegian School of Economics, >>> Helleveien 30, N-5045 Bergen,

>> Norway. >>> voice: +47 55 95 93 55 <+47%2055%2095%2093%2055>; e-mail:

>> [hidden email] >>> Editor-in-Chief of The R Journal,

>> https://journal.r-project.org/ >>> index.html >>>

>> http://orcid.org/0000-0003-2392-6140 >>> https://scholar.google.no/

>> citations?user=AWeghB0AAAAJ&hl=en >>> >> >> [[alternative HTML version

>> deleted]] >> >> _______________________________________________ >>

>> R-sig-Geo mailing list >> [hidden email] >>

>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo >> > > -- > Roger Bivand

>>> Department of Economics, Norwegian School of Economics, > Helleveien 30,

>> N-5045 Bergen, Norway. > voice: +47 55 95 93 55 <+47%2055%2095%2093%2055>;

>> e-mail: [hidden email] > Editor-in-Chief of The R Journal,

>> https://journal.r-project.org/index.html > http://orcid.org/0000-0003-

>> 2392-6140 > https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en >

>> [[alternative HTML version deleted]] _______________________________________________

>> R-sig-Geo mailing list [hidden email]

>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo

>>

>>

> --

Roger Bivand

Department of Economics, Norwegian School of Economics,

Helleveien 30, N-5045 Bergen, Norway.

voice: +47 55 95 93 55; e-mail: [hidden email]

Editor-in-Chief of The R Journal, https://journal.r-project.org/index.html

http://orcid.org/0000-0003-2392-6140

https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en

_______________________________________________

R-sig-Geo mailing list

[hidden email]

https://stat.ethz.ch/mailman/listinfo/r-sig-geo Roger Bivand

Department of Economics

Norwegian School of Economics

Helleveien 30

N-5045 Bergen, Norway

### Re: Differences between moran.test and lm.morantest

Thanks a lot for such a detailed response, Roger.

Best

Javi

-----Mensaje original-----

De: Roger Bivand [mailto:[hidden email]]

Enviado el: domingo, 30 de julio de 2017 18:42

Para: Javier García

CC: [hidden email]

Asunto: Re: [R-sig-Geo] Differences between moran.test and lm.morantest

On Sat, 29 Jul 2017, Javier García wrote:

> Hello everybody:

>

>

>

> Currently I am working on a paper in which we need to analyze the

> presence of possible spatial correlation in the data. With this aim I

> am running some tests in R. I am a little bit confused about the

> differences between moran.test and lm.morantest R functions. The

> problem I face to is that when I run moran.test on my regression

> residuals the result is totally different from the one I obtain when I

> use lm.morantest with the lm regression object (please, see below the

> different outputs I get and after it a reproducible example). In

> particular, whereas the observed Moran I is the same, the expectation and variance differ dramatically, getting opposite conclusions.

> I would appreciate very much if someone could clarify for me which is

> the cause behind this. By the way, I also run LM tests (LMerr, RLMerr,

> LMlag and

> RLMlag) not rejecting the null hypothesis in any of them (all p-values

> are higher than 0.7), which is in clear contradiction with the

> lm.morantest? how is this possible?

>

moran.test() is for "primary" variables only - read the reference in

?moran.test. The mean model applied to the this variable is the intercept,

that is the mean only:

> moran.test(COL.OLD$CRIME, nb2listw(COL.nb, style="W"),

+ randomisation=FALSE, alternative="two.sided")

Moran I test under normality

data: COL.OLD$CRIME

weights: nb2listw(COL.nb, style = "W")

Moran I statistic standard deviate = 5.6754, p-value = 1.384e-08

alternative hypothesis: two.sided

sample estimates:

Moran I statistic Expectation Variance

0.510951264 -0.020833333 0.008779831

under the Normal assumption.

lm.morantest() can reproduce the same results for the same model:

> lm.morantest(lm(CRIME ~ 1, data=COL.OLD), nb2listw(COL.nb, style="W"),

+ alternative="two.sided")

Global Moran I for regression residuals

data:

model: lm(formula = CRIME ~ 1, data = COL.OLD)

weights: nb2listw(COL.nb, style = "W")

Moran I statistic standard deviate = 5.6754, p-value = 1.384e-08

alternative hypothesis: two.sided

sample estimates:

Observed Moran I Expectation Variance

0.510951264 -0.020833333 0.008779831

Note that the Normal VI for moran.test() is:

VI <- (wc$nn * wc$S1 - wc$n * wc$S2 + 3 * S02)/(S02 *

(wc$nn - 1))

and for lm.morantest():

XtXinv <- chol2inv(model$qr$qr[p1, p1, drop = FALSE])

X <- model.matrix(terms(model), model.frame(model))

...

Z <- lag.listw(listw.U, X, zero.policy = zero.policy)

C1 <- t(X) %*% Z

trA <- (sum(diag(XtXinv %*% C1)))

EI <- -((N * trA)/((N - p) * S0))

C2 <- t(Z) %*% Z

C3 <- XtXinv %*% C1

trA2 <- sum(diag(C3 %*% C3))

trB <- sum(diag(4 * (XtXinv %*% C2)))

VI <- (((N * N)/((S0 * S0) * (N - p) * (N - p + 2))) * (S1 +

2 * trA2 - trB - ((2 * (trA^2))/(N - p))))

where you see easily that the issue of dependent residuals (only n-p are

independent even in the aspatial case, so you see n-p instead of n) is

handled, as are products of X, WX, and (X'X)^{-1}.

Reading the references is essential and should not be neglected. Reading

the code may help, but doesn't explain the reasoning behind the code and

the output. If you need to test model residuals, use the appropriate test.

Using moran.test() on extracted residuals if covariates are included in

the model is never justified.

Different outcomes from Moran's I and LM suggest severe mis-specification

in your model, see for example:

https://groups.google.com/forum/#!msg/openspace-list/k4F4jI9cU1I/s5bj8r4nwn4

J

for a simplified flowchart for choosing models.

Hope this clarifies,

Roger

>

>

>

>

> MY PARTICULAR CASE

>

>

>

> reg.OLS <- lm(y~z1+z2+z3+z4+z5+z6+z7+z8+z9+z10, data=datos)

>

>

>

> moran.test(resid(reg.OLS),alternative="two.sided", W_n)

>

>

>

> Moran I test under randomisation

>

>

>

> data: resid(reg.OLS)

>

> weights: W_n

>

>

>

> Moran I statistic standard deviate = 0.4434, p-value = 0.6575

>

> alternative hypothesis: two.sided

>

> sample estimates:

>

> Moran I statistic Expectation Variance

>

> 1.596378e-05 -3.595829e-04 7.173448e-07

>

>

>

>

>

>

>

> moran.lm <-lm.morantest(reg.OLS, W_n, alternative="two.sided")

>

> print(moran.lm)

>

>

>

> Global Moran I for regression residuals

>

>

>

> data:

>

> model: lm(formula = y ~ z1 + z2 + z3 + z4 + z5 + z6 + z7 + z8 + z9 + z10

>

> , data = datos)

>

> weights: W_n

>

>

>

> Moran I statistic standard deviate = 11.649, p-value < 2.2e-16

>

> alternative hypothesis: two.sided

>

> sample estimates:

>

> Observed Moran I Expectation Variance

>

> 1.596378e-05 -1.913005e-03 2.741816e-08

>

>

>

>

>

> A REPRODUCIBLE EXAMPLE

>

>

>

>

>

> library(spdep)

>

> data(oldcol)

>

> oldcrime.lm <- lm(CRIME ~ HOVAL + INC + OPEN + PLUMB + DISCBD + PERIM, data

> = COL.OLD)

>

>

>

> moran.test(resid(oldcrime.lm), nb2listw(COL.nb, style="W"))

>

>

>

> Moran I test under randomisation

>

>

>

> data: resid(oldcrime.lm)

>

> weights: nb2listw(COL.nb, style = "W")

>

>

>

> Moran I statistic standard deviate = 1.2733, p-value = 0.1015

>

> alternative hypothesis: greater

>

> sample estimates:

>

> Moran I statistic Expectation Variance

>

> 0.096711162 -0.020833333 0.008521765

>

>

>

>

>

> lm.morantest(oldcrime.lm, nb2listw(COL.nb, style="W"))

>

>

>

> Global Moran I for regression residuals

>

>

>

> data:

>

> model: lm(formula = CRIME ~ HOVAL + INC + OPEN + PLUMB + DISCBD +

>

> PERIM, data = COL.OLD)

>

> weights: nb2listw(COL.nb, style = "W")

>

>

>

> Moran I statistic standard deviate = 1.6668, p-value = 0.04777

>

> alternative hypothesis: greater

>

> sample estimates:

>

> Observed Moran I Expectation Variance

>

> 0.096711162 -0.052848581 0.008050938

>

>

>

> Thanks a lot in advance and sorry for the inconvenience.

>

>

>

> Javi

>

>

>

>

>

>

>

>

> JAVIER GARCÍA

>

>

>

> Departamento de Economía Aplicada III (Econometría y Estadística)

>

> Facultad de Economía y Empresa (Sección Sarriko)

> Avda. Lehendakari Aguirre 83

>

> 48015 BILBAO

> T.: +34 601 7126 F.: +34 601 3754

> <http://www.ehu.es/> www.ehu.es

>

> http://www.unibertsitate-hedakuntza.ehu.es/p268-content/es/contenidos/inform

>

acion/manual_id_corp/es_manual/images/firma_email_upv_euskampus_bilingue.gif

>

>

>

>

>

>

--

Roger Bivand

Department of Economics, Norwegian School of Economics,

Helleveien 30, N-5045 Bergen, Norway.

voice: +47 55 95 93 55; e-mail: [hidden email]

Editor-in-Chief of The R Journal, https://journal.r-project.org/index.html

http://orcid.org/0000-0003-2392-6140

https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en

_______________________________________________

R-sig-Geo mailing list

[hidden email]

https://stat.ethz.ch/mailman/listinfo/r-sig-geo

Best

Javi

-----Mensaje original-----

De: Roger Bivand [mailto:[hidden email]]

Enviado el: domingo, 30 de julio de 2017 18:42

Para: Javier García

CC: [hidden email]

Asunto: Re: [R-sig-Geo] Differences between moran.test and lm.morantest

On Sat, 29 Jul 2017, Javier García wrote:

> Hello everybody:

>

>

>

> Currently I am working on a paper in which we need to analyze the

> presence of possible spatial correlation in the data. With this aim I

> am running some tests in R. I am a little bit confused about the

> differences between moran.test and lm.morantest R functions. The

> problem I face to is that when I run moran.test on my regression

> residuals the result is totally different from the one I obtain when I

> use lm.morantest with the lm regression object (please, see below the

> different outputs I get and after it a reproducible example). In

> particular, whereas the observed Moran I is the same, the expectation and variance differ dramatically, getting opposite conclusions.

> I would appreciate very much if someone could clarify for me which is

> the cause behind this. By the way, I also run LM tests (LMerr, RLMerr,

> LMlag and

> RLMlag) not rejecting the null hypothesis in any of them (all p-values

> are higher than 0.7), which is in clear contradiction with the

> lm.morantest? how is this possible?

>

moran.test() is for "primary" variables only - read the reference in

?moran.test. The mean model applied to the this variable is the intercept,

that is the mean only:

> moran.test(COL.OLD$CRIME, nb2listw(COL.nb, style="W"),

+ randomisation=FALSE, alternative="two.sided")

Moran I test under normality

data: COL.OLD$CRIME

weights: nb2listw(COL.nb, style = "W")

Moran I statistic standard deviate = 5.6754, p-value = 1.384e-08

alternative hypothesis: two.sided

sample estimates:

Moran I statistic Expectation Variance

0.510951264 -0.020833333 0.008779831

under the Normal assumption.

lm.morantest() can reproduce the same results for the same model:

> lm.morantest(lm(CRIME ~ 1, data=COL.OLD), nb2listw(COL.nb, style="W"),

+ alternative="two.sided")

Global Moran I for regression residuals

data:

model: lm(formula = CRIME ~ 1, data = COL.OLD)

weights: nb2listw(COL.nb, style = "W")

Moran I statistic standard deviate = 5.6754, p-value = 1.384e-08

alternative hypothesis: two.sided

sample estimates:

Observed Moran I Expectation Variance

0.510951264 -0.020833333 0.008779831

Note that the Normal VI for moran.test() is:

VI <- (wc$nn * wc$S1 - wc$n * wc$S2 + 3 * S02)/(S02 *

(wc$nn - 1))

and for lm.morantest():

XtXinv <- chol2inv(model$qr$qr[p1, p1, drop = FALSE])

X <- model.matrix(terms(model), model.frame(model))

...

Z <- lag.listw(listw.U, X, zero.policy = zero.policy)

C1 <- t(X) %*% Z

trA <- (sum(diag(XtXinv %*% C1)))

EI <- -((N * trA)/((N - p) * S0))

C2 <- t(Z) %*% Z

C3 <- XtXinv %*% C1

trA2 <- sum(diag(C3 %*% C3))

trB <- sum(diag(4 * (XtXinv %*% C2)))

VI <- (((N * N)/((S0 * S0) * (N - p) * (N - p + 2))) * (S1 +

2 * trA2 - trB - ((2 * (trA^2))/(N - p))))

where you see easily that the issue of dependent residuals (only n-p are

independent even in the aspatial case, so you see n-p instead of n) is

handled, as are products of X, WX, and (X'X)^{-1}.

Reading the references is essential and should not be neglected. Reading

the code may help, but doesn't explain the reasoning behind the code and

the output. If you need to test model residuals, use the appropriate test.

Using moran.test() on extracted residuals if covariates are included in

the model is never justified.

Different outcomes from Moran's I and LM suggest severe mis-specification

in your model, see for example:

https://groups.google.com/forum/#!msg/openspace-list/k4F4jI9cU1I/s5bj8r4nwn4

J

for a simplified flowchart for choosing models.

Hope this clarifies,

Roger

>

>

>

>

> MY PARTICULAR CASE

>

>

>

> reg.OLS <- lm(y~z1+z2+z3+z4+z5+z6+z7+z8+z9+z10, data=datos)

>

>

>

> moran.test(resid(reg.OLS),alternative="two.sided", W_n)

>

>

>

> Moran I test under randomisation

>

>

>

> data: resid(reg.OLS)

>

> weights: W_n

>

>

>

> Moran I statistic standard deviate = 0.4434, p-value = 0.6575

>

> alternative hypothesis: two.sided

>

> sample estimates:

>

> Moran I statistic Expectation Variance

>

> 1.596378e-05 -3.595829e-04 7.173448e-07

>

>

>

>

>

>

>

> moran.lm <-lm.morantest(reg.OLS, W_n, alternative="two.sided")

>

> print(moran.lm)

>

>

>

> Global Moran I for regression residuals

>

>

>

> data:

>

> model: lm(formula = y ~ z1 + z2 + z3 + z4 + z5 + z6 + z7 + z8 + z9 + z10

>

> , data = datos)

>

> weights: W_n

>

>

>

> Moran I statistic standard deviate = 11.649, p-value < 2.2e-16

>

> alternative hypothesis: two.sided

>

> sample estimates:

>

> Observed Moran I Expectation Variance

>

> 1.596378e-05 -1.913005e-03 2.741816e-08

>

>

>

>

>

> A REPRODUCIBLE EXAMPLE

>

>

>

>

>

> library(spdep)

>

> data(oldcol)

>

> oldcrime.lm <- lm(CRIME ~ HOVAL + INC + OPEN + PLUMB + DISCBD + PERIM, data

> = COL.OLD)

>

>

>

> moran.test(resid(oldcrime.lm), nb2listw(COL.nb, style="W"))

>

>

>

> Moran I test under randomisation

>

>

>

> data: resid(oldcrime.lm)

>

> weights: nb2listw(COL.nb, style = "W")

>

>

>

> Moran I statistic standard deviate = 1.2733, p-value = 0.1015

>

> alternative hypothesis: greater

>

> sample estimates:

>

> Moran I statistic Expectation Variance

>

> 0.096711162 -0.020833333 0.008521765

>

>

>

>

>

> lm.morantest(oldcrime.lm, nb2listw(COL.nb, style="W"))

>

>

>

> Global Moran I for regression residuals

>

>

>

> data:

>

> model: lm(formula = CRIME ~ HOVAL + INC + OPEN + PLUMB + DISCBD +

>

> PERIM, data = COL.OLD)

>

> weights: nb2listw(COL.nb, style = "W")

>

>

>

> Moran I statistic standard deviate = 1.6668, p-value = 0.04777

>

> alternative hypothesis: greater

>

> sample estimates:

>

> Observed Moran I Expectation Variance

>

> 0.096711162 -0.052848581 0.008050938

>

>

>

> Thanks a lot in advance and sorry for the inconvenience.

>

>

>

> Javi

>

>

>

>

>

>

>

>

> JAVIER GARCÍA

>

>

>

> Departamento de Economía Aplicada III (Econometría y Estadística)

>

> Facultad de Economía y Empresa (Sección Sarriko)

> Avda. Lehendakari Aguirre 83

>

> 48015 BILBAO

> T.: +34 601 7126 F.: +34 601 3754

> <http://www.ehu.es/> www.ehu.es

>

> http://www.unibertsitate-hedakuntza.ehu.es/p268-content/es/contenidos/inform

>

acion/manual_id_corp/es_manual/images/firma_email_upv_euskampus_bilingue.gif

>

>

>

>

>

>

--

Roger Bivand

Department of Economics, Norwegian School of Economics,

Helleveien 30, N-5045 Bergen, Norway.

voice: +47 55 95 93 55; e-mail: [hidden email]

Editor-in-Chief of The R Journal, https://journal.r-project.org/index.html

http://orcid.org/0000-0003-2392-6140

https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en

_______________________________________________

R-sig-Geo mailing list

[hidden email]

https://stat.ethz.ch/mailman/listinfo/r-sig-geo

### Re: Problem in extracting data from GRIB files

The print-out of rdata is good, but I can't really address the issue you

say about NAs without knowing what is in "pt_dams". For that we need to

know what kind of thing it is:

class(pt_dams)

what projection it's in

projection(pt_dams)

and it's extent, and unfortunately for that you need to know what kind of

thing it is so either

extent(pt_dams) ## assuming it's a classed spatial-kind-of-thing

or this will vaguely give the idea

head(pt_dams) ## assuming it's a matrix or data frame

You've shown a lot of code but I think we only need to concentrated on

"rdata", as a representative of the raster data all your files will have,

and pt_dams and this line:

data_dams_size<-extract(rdata,pts_dams)

(It's also possible to make the entire task much faster by building an

index at that first step, possibly with cellnumbers = TRUE

But, you'll have to make sure that extract line is doing what you need it

to first. It's very hard to do this by email, and you really haven't honed

into the key details yet. If the coordinates in pt_dams don't fall within

the extent of rdata, then you're in trouble (or just in the wrong

projection, probably). raster makes life easy if you know the coordinates

are in the same space as the raster, but none of this metadata stuff is in

any way consistent or complete so as ever "it depends".

Cheers, Mike.

On Mon, 31 Jul 2017 at 20:13 Miluji Sb <[hidden email]> wrote:

> Dear Mike,

>

> Thank you very much for your suggestion, apologies for the delayed reply -

> I did not have access to internet over the weekend.

>

> Here is the print-out for rdata:

>

> class : RasterStack

> dimensions : 150, 360, 54000, 8 (nrow, ncol, ncell, nlayers)

> resolution : 1, 1 (x, y)

> extent : -180, 180, -60, 90 (xmin, xmax, ymin, ymax)

> coord. ref. : +proj=longlat +a=6367470 +b=6367470 +no_defs

> names : GLDAS_NOA//215507.020, GLDAS_NOA//215507.020,

> GLDAS_NOA//215507.020, GLDAS_NOA//215507.020, GLDAS_NOA//215507.020,

> GLDAS_NOA//215517.020, GLDAS_NOA//215517.020, GLDAS_NOA//215517.020

>

> I also made this change [rdata <- raster::stack(lapply(filelist,

> function(x) raster(x,band=12)))] but still getting NAs. The full (ugly)

> code is below

>

> I would be grateful for any help. Thank you again!

>

> Sincerely,

>

> Milu

>

> ####

> for (y in year) {

> setwd (y)

>

> day <- dir(path=getwd(),full.names=TRUE, recursive=FALSE,pattern =

> "^[0-9]" )

>

> n=0

>

> for(d in day) {

> setwd (d)

>

> n=n+1

>

>

> filelist<-list.files(pattern=".grb$")

>

> rdata <- raster::stack(lapply(filelist, function(x)

> raster(x,band=12)))

>

> data_dams_size<-extract(rdata,pts_dams)

> DF_data_dams_size<- as.data.frame(data_dams_size)

>

> #Join ISO3, lon, lat, dams data, climate data

> joined_dams <- cbind(foo_dams, DF_data_dams_size)

>

> #Keep only ISO3 LON LAT id

> joined_dams_reduced <- joined_dams

> dams_codes <- joined_dams[,c(1:3)]

>

> names(joined_dams_reduced) <- gsub("GLDAS_NOAH10_3H.A", "",

> names(joined_dams_reduced))

> names(joined_dams_reduced) <- gsub(".020", "",

> names(joined_dams_reduced))

>

> #From mm/s to mm_day

> joined_dams_reduced[joined_dams_reduced==9999] <- NA

>

> ##for prec, snow and runoff use:

> #joined_dams_reduced$mm_day<-rowSums(joined_dams_reduced[,4:11])*10800

>

> ##for temperature use:

> joined_dams_reduced$mm_day<-rowMeans(joined_dams_reduced[,4:11])-273.15

>

> joined_mm_day<-joined_dams_reduced[,12]

>

> #Names

> yr_name <- substr(filelist,18,21)

> csvname <- paste(paste(yr_name, sep="_"), ".csv", sep="")

> csvname <- csvname[1]

>

> #Stack days to get full year

> if (n==1) {joined_mm_day_year <- joined_mm_day} else

> {joined_mm_day_year <- cbind(joined_mm_day_year, joined_mm_day)}

>

> }

>

> #Join mm_day with dams codes

> joined_mm_day_year_names=cbind(dams_codes,joined_mm_day_year)

>

> newnames <-t(as.data.frame(c(paste ("day",

> seq(1,(ncol(joined_mm_day_year_names)-3),1), sep="_"))))

> names(joined_mm_day_year_names)[4:ncol(joined_mm_day_year_names)]=

> newnames

>

>

> write.csv(joined_mm_day_year_names,

> file=file.path("C:/Users/Desktop/temp/",csvname))

>

> }

>

>

> On Sat, Jul 29, 2017 at 12:36 AM, Michael Sumner <[hidden email]>

> wrote:

>

>>

>>

>> On Sat, 29 Jul 2017 at 02:21 Miluji Sb <[hidden email]> wrote:

>>

>>> Dear all,

>>>

>>> I have a set of coordinates for cities:

>>>

>>> structure(list(lon = c(3.7174243, 3.2246995, 2.928656, 33.3822764,

>>> 10.40237, 24.7535746, 7.2619532, -0.370679, -4.486076, -4.097899

>>> ), lat = c(51.0543422, 51.209348, 51.21543, 35.1855659, 55.403756,

>>> 59.4369608, 43.7101728, 49.182863, 48.390394, 47.997542)), .Names =

>>> c("lon",

>>> "lat"), na.action = structure(c(5L, 6L, 31L, 55L, 59L, 61L, 67L,

>>> 68L), .Names = c("5", "6", "31", "55", "59", "61", "67", "68"

>>> ), class = "omit"), row.names = c(1L, 2L, 3L, 4L, 7L, 8L, 9L,

>>> 10L, 11L, 12L), class = "data.frame")

>>>

>>> I am trying to extract climatic data from GLDAS climatic data (1 degree

>>> x 1

>>> degree) GRIB files with the following code:

>>>

>>> # Convert to spatial points

>>> pts_dams <- SpatialPoints(lonlat,proj4string=CRS("+proj=longlat"))

>>>

>>> # Extract

>>> filelist<-list.files(pattern=".grb$")

>>>

>>>

>> This part is a bit wrong:

>>

>>

>>> data<-stack(filelist[1])

>>> data@layers<-sapply(filelist, function(x) raster(x,band=12))

>>>

>>

>> Better would be

>>

>> rdata <- raster::stack(lapply(filelist, function(x) raster(x,band=12)))

>>

>> For the rest we will need to see details about the data, so please do

>>

>> print(rdata)

>>

>> and send the resulting print-out.

>>

>> A minor thing "data" is already a function, so it's advisable not to use

>> it as a name - if only to reduce possible confusion.

>>

>> Also, it's not obvious that the process by which raster(file) goes though

>> (it passes down to rgdal::readGDAL) will result in a correct interpretation

>> as the data source was intended, since GRIB is a domain-specific format for

>> time-series and volume-slice series data and the required translation to

>> the GDAL and raster data models is not always straight-forward.

>>

>> It totally can work though, I do it routine with many sources but they

>> all need some oversight to ensure that upfront translation is complete and

>> appropriate.

>>

>> It may just require setting raster::extent and/or raster::projection, but

>> there's no way to know without exploration.

>>

>> Cheers, Mike.

>>

>>

>>

>>> data_dams_size<-extract(data,pts_dams)

>>> DF_data_dams_size<- as.data.frame(data_dams_size)

>>>

>>> Unfortunately, I get mostly NAs for the data, it seems that there's an

>>> issue with the CRS projections for the city coordinates. Is there a

>>> specific projection for city level coordinates? Or am I doing something

>>> completely wrong? Thank you!

>>>

>>> Sincerely,

>>>

>>> Milu

>>>

>>> [[alternative HTML version deleted]]

>>>

>>> _______________________________________________

>>> R-sig-Geo mailing list

>>> [hidden email]

>>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo

>>>

>> --

>> Dr. Michael Sumner

>> Software and Database Engineer

>> Australian Antarctic Division

>> 203 Channel Highway

>> Kingston Tasmania 7050 Australia

>>

>>

> -- Dr. Michael Sumner

Software and Database Engineer

Australian Antarctic Division

203 Channel Highway

Kingston Tasmania 7050 Australia

[[alternative HTML version deleted]]

_______________________________________________

R-sig-Geo mailing list

[hidden email]

https://stat.ethz.ch/mailman/listinfo/r-sig-geo

say about NAs without knowing what is in "pt_dams". For that we need to

know what kind of thing it is:

class(pt_dams)

what projection it's in

projection(pt_dams)

and it's extent, and unfortunately for that you need to know what kind of

thing it is so either

extent(pt_dams) ## assuming it's a classed spatial-kind-of-thing

or this will vaguely give the idea

head(pt_dams) ## assuming it's a matrix or data frame

You've shown a lot of code but I think we only need to concentrated on

"rdata", as a representative of the raster data all your files will have,

and pt_dams and this line:

data_dams_size<-extract(rdata,pts_dams)

(It's also possible to make the entire task much faster by building an

index at that first step, possibly with cellnumbers = TRUE

But, you'll have to make sure that extract line is doing what you need it

to first. It's very hard to do this by email, and you really haven't honed

into the key details yet. If the coordinates in pt_dams don't fall within

the extent of rdata, then you're in trouble (or just in the wrong

projection, probably). raster makes life easy if you know the coordinates

are in the same space as the raster, but none of this metadata stuff is in

any way consistent or complete so as ever "it depends".

Cheers, Mike.

On Mon, 31 Jul 2017 at 20:13 Miluji Sb <[hidden email]> wrote:

> Dear Mike,

>

> Thank you very much for your suggestion, apologies for the delayed reply -

> I did not have access to internet over the weekend.

>

> Here is the print-out for rdata:

>

> class : RasterStack

> dimensions : 150, 360, 54000, 8 (nrow, ncol, ncell, nlayers)

> resolution : 1, 1 (x, y)

> extent : -180, 180, -60, 90 (xmin, xmax, ymin, ymax)

> coord. ref. : +proj=longlat +a=6367470 +b=6367470 +no_defs

> names : GLDAS_NOA//215507.020, GLDAS_NOA//215507.020,

> GLDAS_NOA//215507.020, GLDAS_NOA//215507.020, GLDAS_NOA//215507.020,

> GLDAS_NOA//215517.020, GLDAS_NOA//215517.020, GLDAS_NOA//215517.020

>

> I also made this change [rdata <- raster::stack(lapply(filelist,

> function(x) raster(x,band=12)))] but still getting NAs. The full (ugly)

> code is below

>

> I would be grateful for any help. Thank you again!

>

> Sincerely,

>

> Milu

>

> ####

> for (y in year) {

> setwd (y)

>

> day <- dir(path=getwd(),full.names=TRUE, recursive=FALSE,pattern =

> "^[0-9]" )

>

> n=0

>

> for(d in day) {

> setwd (d)

>

> n=n+1

>

>

> filelist<-list.files(pattern=".grb$")

>

> rdata <- raster::stack(lapply(filelist, function(x)

> raster(x,band=12)))

>

> data_dams_size<-extract(rdata,pts_dams)

> DF_data_dams_size<- as.data.frame(data_dams_size)

>

> #Join ISO3, lon, lat, dams data, climate data

> joined_dams <- cbind(foo_dams, DF_data_dams_size)

>

> #Keep only ISO3 LON LAT id

> joined_dams_reduced <- joined_dams

> dams_codes <- joined_dams[,c(1:3)]

>

> names(joined_dams_reduced) <- gsub("GLDAS_NOAH10_3H.A", "",

> names(joined_dams_reduced))

> names(joined_dams_reduced) <- gsub(".020", "",

> names(joined_dams_reduced))

>

> #From mm/s to mm_day

> joined_dams_reduced[joined_dams_reduced==9999] <- NA

>

> ##for prec, snow and runoff use:

> #joined_dams_reduced$mm_day<-rowSums(joined_dams_reduced[,4:11])*10800

>

> ##for temperature use:

> joined_dams_reduced$mm_day<-rowMeans(joined_dams_reduced[,4:11])-273.15

>

> joined_mm_day<-joined_dams_reduced[,12]

>

> #Names

> yr_name <- substr(filelist,18,21)

> csvname <- paste(paste(yr_name, sep="_"), ".csv", sep="")

> csvname <- csvname[1]

>

> #Stack days to get full year

> if (n==1) {joined_mm_day_year <- joined_mm_day} else

> {joined_mm_day_year <- cbind(joined_mm_day_year, joined_mm_day)}

>

> }

>

> #Join mm_day with dams codes

> joined_mm_day_year_names=cbind(dams_codes,joined_mm_day_year)

>

> newnames <-t(as.data.frame(c(paste ("day",

> seq(1,(ncol(joined_mm_day_year_names)-3),1), sep="_"))))

> names(joined_mm_day_year_names)[4:ncol(joined_mm_day_year_names)]=

> newnames

>

>

> write.csv(joined_mm_day_year_names,

> file=file.path("C:/Users/Desktop/temp/",csvname))

>

> }

>

>

> On Sat, Jul 29, 2017 at 12:36 AM, Michael Sumner <[hidden email]>

> wrote:

>

>>

>>

>> On Sat, 29 Jul 2017 at 02:21 Miluji Sb <[hidden email]> wrote:

>>

>>> Dear all,

>>>

>>> I have a set of coordinates for cities:

>>>

>>> structure(list(lon = c(3.7174243, 3.2246995, 2.928656, 33.3822764,

>>> 10.40237, 24.7535746, 7.2619532, -0.370679, -4.486076, -4.097899

>>> ), lat = c(51.0543422, 51.209348, 51.21543, 35.1855659, 55.403756,

>>> 59.4369608, 43.7101728, 49.182863, 48.390394, 47.997542)), .Names =

>>> c("lon",

>>> "lat"), na.action = structure(c(5L, 6L, 31L, 55L, 59L, 61L, 67L,

>>> 68L), .Names = c("5", "6", "31", "55", "59", "61", "67", "68"

>>> ), class = "omit"), row.names = c(1L, 2L, 3L, 4L, 7L, 8L, 9L,

>>> 10L, 11L, 12L), class = "data.frame")

>>>

>>> I am trying to extract climatic data from GLDAS climatic data (1 degree

>>> x 1

>>> degree) GRIB files with the following code:

>>>

>>> # Convert to spatial points

>>> pts_dams <- SpatialPoints(lonlat,proj4string=CRS("+proj=longlat"))

>>>

>>> # Extract

>>> filelist<-list.files(pattern=".grb$")

>>>

>>>

>> This part is a bit wrong:

>>

>>

>>> data<-stack(filelist[1])

>>> data@layers<-sapply(filelist, function(x) raster(x,band=12))

>>>

>>

>> Better would be

>>

>> rdata <- raster::stack(lapply(filelist, function(x) raster(x,band=12)))

>>

>> For the rest we will need to see details about the data, so please do

>>

>> print(rdata)

>>

>> and send the resulting print-out.

>>

>> A minor thing "data" is already a function, so it's advisable not to use

>> it as a name - if only to reduce possible confusion.

>>

>> Also, it's not obvious that the process by which raster(file) goes though

>> (it passes down to rgdal::readGDAL) will result in a correct interpretation

>> as the data source was intended, since GRIB is a domain-specific format for

>> time-series and volume-slice series data and the required translation to

>> the GDAL and raster data models is not always straight-forward.

>>

>> It totally can work though, I do it routine with many sources but they

>> all need some oversight to ensure that upfront translation is complete and

>> appropriate.

>>

>> It may just require setting raster::extent and/or raster::projection, but

>> there's no way to know without exploration.

>>

>> Cheers, Mike.

>>

>>

>>

>>> data_dams_size<-extract(data,pts_dams)

>>> DF_data_dams_size<- as.data.frame(data_dams_size)

>>>

>>> Unfortunately, I get mostly NAs for the data, it seems that there's an

>>> issue with the CRS projections for the city coordinates. Is there a

>>> specific projection for city level coordinates? Or am I doing something

>>> completely wrong? Thank you!

>>>

>>> Sincerely,

>>>

>>> Milu

>>>

>>> [[alternative HTML version deleted]]

>>>

>>> _______________________________________________

>>> R-sig-Geo mailing list

>>> [hidden email]

>>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo

>>>

>> --

>> Dr. Michael Sumner

>> Software and Database Engineer

>> Australian Antarctic Division

>> 203 Channel Highway

>> Kingston Tasmania 7050 Australia

>>

>>

> -- Dr. Michael Sumner

Software and Database Engineer

Australian Antarctic Division

203 Channel Highway

Kingston Tasmania 7050 Australia

[[alternative HTML version deleted]]

_______________________________________________

R-sig-Geo mailing list

[hidden email]

https://stat.ethz.ch/mailman/listinfo/r-sig-geo

### Re: Error in basename(x) : path too long

Hi John,

Using a raster::stack() should the perfect solution for you; I work with very large raster stacks all of the time. From the introductory vignette https://cran.r-project.org/web/packages/raster/vignettes/Raster.pdf

"A notable feature of the raster package is that it can work with raster

datasets that are stored on disk and are too large to be loaded into memory

(RAM). The package can work with large files because the objects it creates

from these files only contain information about the structure of the data, such

as the number of rows and columns, the spatial extent, and the filename, but it

does not attempt to read all the cell values in memory."

Have you tried something like...

library(raster)

S <- stack(vector_of_filenames)

Ben

> On Jul 31, 2017, at 5:19 AM, John Wasige <[hidden email]> wrote:

>

> Thank you Ben,

> I would like to generate annual sums & means for each year from a list rasters (not stack or brick because of memory issues with R). For 15 years I have 345 individual rasters and frequency of 23 per year.

>

> I will be very glad for an idea on how to do that.

>

> Thanks

>

> John

>

>

>

>

>

> On Sat, Jul 29, 2017 at 5:37 PM, Ben Tupper <[hidden email]> wrote:

> Hi again,

>

> A late thought - I'm still on the first cups of coffee.

>

> It looks to me like you are iterating over a stack to select certain layers to sum. You could achieve the same outcome with possibly much less work. The following example will create a sum of 24-layer blocks along a stack of rasters.

>

> # from https://gist.github.com/btupper/20879e0b46e5ed63d402d7cff424dbb7

> #' Split a vector into groups of MAX (or possibly fewer)

> #'

> #' @param v vector or list to split

> #' @param MAX numeric the maximum size per group

> #' @return a list of the vector split into groups

> split_vector <- function(v, MAX = 200){

> nv <- length(v)

> if (nv <= MAX) return(list('1' = v))

> split(v, findInterval(1:nv, seq(from = 1, to = nv, by = MAX)))

> }

>

>

> library(raster)

>

> N <- 345

> n <- 24

> nc <- 4

> nr <- 3

>

> R <- raster(matrix(runif(nc*nr), ncol = nc, nrow = nr))

>

> RR <- stack(lapply(seq_len(N), function(i) R))

>

> ix <- split_vector(seq_len(N), MAX = n)

>

> SS <- lapply(ix, function(index) sum(RR[[index]]))

>

> So, S[[1]], which looks like this...

>

> $`1`

> class : RasterLayer

> dimensions : 3, 4, 12 (nrow, ncol, ncell)

> resolution : 0.25, 0.3333333 (x, y)

> extent : 0, 1, 0, 1 (xmin, xmax, ymin, ymax)

> coord. ref. : NA

> data source : in memory

> names : layer

> values : 0.9451534, 20.0503 (min, max)

>

> ... is the sum of the first 24 layers of RR. SS[[2]] will be the sum of the next 24, and so on.

>

> Is that what you are trying to do?

>

> Cheers,

> Ben

>

>

>

>

> > On Jul 29, 2017, at 7:56 AM, John Wasige <[hidden email]> wrote:

> >

> > Dear all,

> >

> > I am running the script below & I get the following error:

> > Error in basename(x) : path too long

> >

> > What could be the problem?

> > Thanks for your help

> > John

> >

> > ### Script

> >

> > setwd("I:/Mauritius2001_2015") # directory of data

> > newlist= read.csv('I:/Mauritius2001_2015/Mauritius.csv',header=F)

> >

> > refr <- raster(paste("I:/Mauritius2001_2015/",newlist[i,1],sep = ""))

> > refr[!is.na(refr)] <- 0

> > for(i in seq(1,345,by=23)){

> > rsum <- refr

> > for(p in 0:22){

> > r <- raster(paste("I:/Mauritius2001_2015/",newlist[i+p],sep = ""))

> > rsum <- rsum + r

> > }

> > # rsum <- rsum

> > writeRaster(rsum,

> > filename=paste("D:/Mauritius2001_2015/Annual/",substr(newlist[i],1,6),".tif",sep=''),

> > format="GTiff", overwrite=TRUE)

> > }

> >

> > [[alternative HTML version deleted]]

> >

> > _______________________________________________

> > R-sig-Geo mailing list

> > [hidden email]

> > https://stat.ethz.ch/mailman/listinfo/r-sig-geo

>

> Ben Tupper

> Bigelow Laboratory for Ocean Sciences

> 60 Bigelow Drive, P.O. Box 380

> East Boothbay, Maine 04544

> http://www.bigelow.org

>

> Ecocast Reports: http://seascapemodeling.org/ecocast.html

> Tick Reports: https://report.bigelow.org/tick/

> Jellyfish Reports: https://jellyfish.bigelow.org/jellyfish/

>

>

>

>

>

>

> --

> John Wasige

> "There are no REGRATES in LIFE, just lessons (Jennifer Aniston)”

Ben Tupper

Bigelow Laboratory for Ocean Sciences

60 Bigelow Drive, P.O. Box 380

East Boothbay, Maine 04544

http://www.bigelow.org

Ecocast Reports: http://seascapemodeling.org/ecocast.html

Tick Reports: https://report.bigelow.org/tick/

Jellyfish Reports: https://jellyfish.bigelow.org/jellyfish/

_______________________________________________

R-sig-Geo mailing list

[hidden email]

https://stat.ethz.ch/mailman/listinfo/r-sig-geo

Using a raster::stack() should the perfect solution for you; I work with very large raster stacks all of the time. From the introductory vignette https://cran.r-project.org/web/packages/raster/vignettes/Raster.pdf

"A notable feature of the raster package is that it can work with raster

datasets that are stored on disk and are too large to be loaded into memory

(RAM). The package can work with large files because the objects it creates

from these files only contain information about the structure of the data, such

as the number of rows and columns, the spatial extent, and the filename, but it

does not attempt to read all the cell values in memory."

Have you tried something like...

library(raster)

S <- stack(vector_of_filenames)

Ben

> On Jul 31, 2017, at 5:19 AM, John Wasige <[hidden email]> wrote:

>

> Thank you Ben,

> I would like to generate annual sums & means for each year from a list rasters (not stack or brick because of memory issues with R). For 15 years I have 345 individual rasters and frequency of 23 per year.

>

> I will be very glad for an idea on how to do that.

>

> Thanks

>

> John

>

>

>

>

>

> On Sat, Jul 29, 2017 at 5:37 PM, Ben Tupper <[hidden email]> wrote:

> Hi again,

>

> A late thought - I'm still on the first cups of coffee.

>

> It looks to me like you are iterating over a stack to select certain layers to sum. You could achieve the same outcome with possibly much less work. The following example will create a sum of 24-layer blocks along a stack of rasters.

>

> # from https://gist.github.com/btupper/20879e0b46e5ed63d402d7cff424dbb7

> #' Split a vector into groups of MAX (or possibly fewer)

> #'

> #' @param v vector or list to split

> #' @param MAX numeric the maximum size per group

> #' @return a list of the vector split into groups

> split_vector <- function(v, MAX = 200){

> nv <- length(v)

> if (nv <= MAX) return(list('1' = v))

> split(v, findInterval(1:nv, seq(from = 1, to = nv, by = MAX)))

> }

>

>

> library(raster)

>

> N <- 345

> n <- 24

> nc <- 4

> nr <- 3

>

> R <- raster(matrix(runif(nc*nr), ncol = nc, nrow = nr))

>

> RR <- stack(lapply(seq_len(N), function(i) R))

>

> ix <- split_vector(seq_len(N), MAX = n)

>

> SS <- lapply(ix, function(index) sum(RR[[index]]))

>

> So, S[[1]], which looks like this...

>

> $`1`

> class : RasterLayer

> dimensions : 3, 4, 12 (nrow, ncol, ncell)

> resolution : 0.25, 0.3333333 (x, y)

> extent : 0, 1, 0, 1 (xmin, xmax, ymin, ymax)

> coord. ref. : NA

> data source : in memory

> names : layer

> values : 0.9451534, 20.0503 (min, max)

>

> ... is the sum of the first 24 layers of RR. SS[[2]] will be the sum of the next 24, and so on.

>

> Is that what you are trying to do?

>

> Cheers,

> Ben

>

>

>

>

> > On Jul 29, 2017, at 7:56 AM, John Wasige <[hidden email]> wrote:

> >

> > Dear all,

> >

> > I am running the script below & I get the following error:

> > Error in basename(x) : path too long

> >

> > What could be the problem?

> > Thanks for your help

> > John

> >

> > ### Script

> >

> > setwd("I:/Mauritius2001_2015") # directory of data

> > newlist= read.csv('I:/Mauritius2001_2015/Mauritius.csv',header=F)

> >

> > refr <- raster(paste("I:/Mauritius2001_2015/",newlist[i,1],sep = ""))

> > refr[!is.na(refr)] <- 0

> > for(i in seq(1,345,by=23)){

> > rsum <- refr

> > for(p in 0:22){

> > r <- raster(paste("I:/Mauritius2001_2015/",newlist[i+p],sep = ""))

> > rsum <- rsum + r

> > }

> > # rsum <- rsum

> > writeRaster(rsum,

> > filename=paste("D:/Mauritius2001_2015/Annual/",substr(newlist[i],1,6),".tif",sep=''),

> > format="GTiff", overwrite=TRUE)

> > }

> >

> > [[alternative HTML version deleted]]

> >

> > _______________________________________________

> > R-sig-Geo mailing list

> > [hidden email]

> > https://stat.ethz.ch/mailman/listinfo/r-sig-geo

>

> Ben Tupper

> Bigelow Laboratory for Ocean Sciences

> 60 Bigelow Drive, P.O. Box 380

> East Boothbay, Maine 04544

> http://www.bigelow.org

>

> Ecocast Reports: http://seascapemodeling.org/ecocast.html

> Tick Reports: https://report.bigelow.org/tick/

> Jellyfish Reports: https://jellyfish.bigelow.org/jellyfish/

>

>

>

>

>

>

> --

> John Wasige

> "There are no REGRATES in LIFE, just lessons (Jennifer Aniston)”

Ben Tupper

Bigelow Laboratory for Ocean Sciences

60 Bigelow Drive, P.O. Box 380

East Boothbay, Maine 04544

http://www.bigelow.org

Ecocast Reports: http://seascapemodeling.org/ecocast.html

Tick Reports: https://report.bigelow.org/tick/

Jellyfish Reports: https://jellyfish.bigelow.org/jellyfish/

_______________________________________________

R-sig-Geo mailing list

[hidden email]

https://stat.ethz.ch/mailman/listinfo/r-sig-geo

### Re: Problem in extracting data from GRIB files

Dear Mike,

Thank you very much for your suggestion, apologies for the delayed reply -

I did not have access to internet over the weekend.

Here is the print-out for rdata:

class : RasterStack

dimensions : 150, 360, 54000, 8 (nrow, ncol, ncell, nlayers)

resolution : 1, 1 (x, y)

extent : -180, 180, -60, 90 (xmin, xmax, ymin, ymax)

coord. ref. : +proj=longlat +a=6367470 +b=6367470 +no_defs

names : GLDAS_NOA//215507.020, GLDAS_NOA//215507.020,

GLDAS_NOA//215507.020, GLDAS_NOA//215507.020, GLDAS_NOA//215507.020,

GLDAS_NOA//215517.020, GLDAS_NOA//215517.020, GLDAS_NOA//215517.020

I also made this change [rdata <- raster::stack(lapply(filelist,

function(x) raster(x,band=12)))] but still getting NAs. The full (ugly)

code is below

I would be grateful for any help. Thank you again!

Sincerely,

Milu

####

for (y in year) {

setwd (y)

day <- dir(path=getwd(),full.names=TRUE, recursive=FALSE,pattern =

"^[0-9]" )

n=0

for(d in day) {

setwd (d)

n=n+1

filelist<-list.files(pattern=".grb$")

rdata <- raster::stack(lapply(filelist, function(x) raster(x,band=12)))

data_dams_size<-extract(rdata,pts_dams)

DF_data_dams_size<- as.data.frame(data_dams_size)

#Join ISO3, lon, lat, dams data, climate data

joined_dams <- cbind(foo_dams, DF_data_dams_size)

#Keep only ISO3 LON LAT id

joined_dams_reduced <- joined_dams

dams_codes <- joined_dams[,c(1:3)]

names(joined_dams_reduced) <- gsub("GLDAS_NOAH10_3H.A", "",

names(joined_dams_reduced))

names(joined_dams_reduced) <- gsub(".020", "",

names(joined_dams_reduced))

#From mm/s to mm_day

joined_dams_reduced[joined_dams_reduced==9999] <- NA

##for prec, snow and runoff use:

#joined_dams_reduced$mm_day<-rowSums(joined_dams_reduced[,4:11])*10800

##for temperature use:

joined_dams_reduced$mm_day<-rowMeans(joined_dams_reduced[,4:11])-273.15

joined_mm_day<-joined_dams_reduced[,12]

#Names

yr_name <- substr(filelist,18,21)

csvname <- paste(paste(yr_name, sep="_"), ".csv", sep="")

csvname <- csvname[1]

#Stack days to get full year

if (n==1) {joined_mm_day_year <- joined_mm_day} else

{joined_mm_day_year <- cbind(joined_mm_day_year, joined_mm_day)}

}

#Join mm_day with dams codes

joined_mm_day_year_names=cbind(dams_codes,joined_mm_day_year)

newnames <-t(as.data.frame(c(paste ("day",

seq(1,(ncol(joined_mm_day_year_names)-3),1), sep="_"))))

names(joined_mm_day_year_names)[4:ncol(joined_mm_day_year_names)]=

newnames

write.csv(joined_mm_day_year_names,

file=file.path("C:/Users/Desktop/temp/",csvname))

}

On Sat, Jul 29, 2017 at 12:36 AM, Michael Sumner <[hidden email]> wrote:

>

>

> On Sat, 29 Jul 2017 at 02:21 Miluji Sb <[hidden email]> wrote:

>

>> Dear all,

>>

>> I have a set of coordinates for cities:

>>

>> structure(list(lon = c(3.7174243, 3.2246995, 2.928656, 33.3822764,

>> 10.40237, 24.7535746, 7.2619532, -0.370679, -4.486076, -4.097899

>> ), lat = c(51.0543422, 51.209348, 51.21543, 35.1855659, 55.403756,

>> 59.4369608, 43.7101728, 49.182863, 48.390394, 47.997542)), .Names =

>> c("lon",

>> "lat"), na.action = structure(c(5L, 6L, 31L, 55L, 59L, 61L, 67L,

>> 68L), .Names = c("5", "6", "31", "55", "59", "61", "67", "68"

>> ), class = "omit"), row.names = c(1L, 2L, 3L, 4L, 7L, 8L, 9L,

>> 10L, 11L, 12L), class = "data.frame")

>>

>> I am trying to extract climatic data from GLDAS climatic data (1 degree x

>> 1

>> degree) GRIB files with the following code:

>>

>> # Convert to spatial points

>> pts_dams <- SpatialPoints(lonlat,proj4string=CRS("+proj=longlat"))

>>

>> # Extract

>> filelist<-list.files(pattern=".grb$")

>>

>>

> This part is a bit wrong:

>

>

>> data<-stack(filelist[1])

>> data@layers<-sapply(filelist, function(x) raster(x,band=12))

>>

>

> Better would be

>

> rdata <- raster::stack(lapply(filelist, function(x) raster(x,band=12)))

>

> For the rest we will need to see details about the data, so please do

>

> print(rdata)

>

> and send the resulting print-out.

>

> A minor thing "data" is already a function, so it's advisable not to use

> it as a name - if only to reduce possible confusion.

>

> Also, it's not obvious that the process by which raster(file) goes though

> (it passes down to rgdal::readGDAL) will result in a correct interpretation

> as the data source was intended, since GRIB is a domain-specific format for

> time-series and volume-slice series data and the required translation to

> the GDAL and raster data models is not always straight-forward.

>

> It totally can work though, I do it routine with many sources but they all

> need some oversight to ensure that upfront translation is complete and

> appropriate.

>

> It may just require setting raster::extent and/or raster::projection, but

> there's no way to know without exploration.

>

> Cheers, Mike.

>

>

>

>> data_dams_size<-extract(data,pts_dams)

>> DF_data_dams_size<- as.data.frame(data_dams_size)

>>

>> Unfortunately, I get mostly NAs for the data, it seems that there's an

>> issue with the CRS projections for the city coordinates. Is there a

>> specific projection for city level coordinates? Or am I doing something

>> completely wrong? Thank you!

>>

>> Sincerely,

>>

>> Milu

>>

>> [[alternative HTML version deleted]]

>>

>> _______________________________________________

>> R-sig-Geo mailing list

>> [hidden email]

>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo

>>

> --

> Dr. Michael Sumner

> Software and Database Engineer

> Australian Antarctic Division

> 203 Channel Highway

> Kingston Tasmania 7050 Australia

>

>

[[alternative HTML version deleted]]

_______________________________________________

R-sig-Geo mailing list

[hidden email]

https://stat.ethz.ch/mailman/listinfo/r-sig-geo

Thank you very much for your suggestion, apologies for the delayed reply -

I did not have access to internet over the weekend.

Here is the print-out for rdata:

class : RasterStack

dimensions : 150, 360, 54000, 8 (nrow, ncol, ncell, nlayers)

resolution : 1, 1 (x, y)

extent : -180, 180, -60, 90 (xmin, xmax, ymin, ymax)

coord. ref. : +proj=longlat +a=6367470 +b=6367470 +no_defs

names : GLDAS_NOA//215507.020, GLDAS_NOA//215507.020,

GLDAS_NOA//215507.020, GLDAS_NOA//215507.020, GLDAS_NOA//215507.020,

GLDAS_NOA//215517.020, GLDAS_NOA//215517.020, GLDAS_NOA//215517.020

I also made this change [rdata <- raster::stack(lapply(filelist,

function(x) raster(x,band=12)))] but still getting NAs. The full (ugly)

code is below

I would be grateful for any help. Thank you again!

Sincerely,

Milu

####

for (y in year) {

setwd (y)

day <- dir(path=getwd(),full.names=TRUE, recursive=FALSE,pattern =

"^[0-9]" )

n=0

for(d in day) {

setwd (d)

n=n+1

filelist<-list.files(pattern=".grb$")

rdata <- raster::stack(lapply(filelist, function(x) raster(x,band=12)))

data_dams_size<-extract(rdata,pts_dams)

DF_data_dams_size<- as.data.frame(data_dams_size)

#Join ISO3, lon, lat, dams data, climate data

joined_dams <- cbind(foo_dams, DF_data_dams_size)

#Keep only ISO3 LON LAT id

joined_dams_reduced <- joined_dams

dams_codes <- joined_dams[,c(1:3)]

names(joined_dams_reduced) <- gsub("GLDAS_NOAH10_3H.A", "",

names(joined_dams_reduced))

names(joined_dams_reduced) <- gsub(".020", "",

names(joined_dams_reduced))

#From mm/s to mm_day

joined_dams_reduced[joined_dams_reduced==9999] <- NA

##for prec, snow and runoff use:

#joined_dams_reduced$mm_day<-rowSums(joined_dams_reduced[,4:11])*10800

##for temperature use:

joined_dams_reduced$mm_day<-rowMeans(joined_dams_reduced[,4:11])-273.15

joined_mm_day<-joined_dams_reduced[,12]

#Names

yr_name <- substr(filelist,18,21)

csvname <- paste(paste(yr_name, sep="_"), ".csv", sep="")

csvname <- csvname[1]

#Stack days to get full year

if (n==1) {joined_mm_day_year <- joined_mm_day} else

{joined_mm_day_year <- cbind(joined_mm_day_year, joined_mm_day)}

}

#Join mm_day with dams codes

joined_mm_day_year_names=cbind(dams_codes,joined_mm_day_year)

newnames <-t(as.data.frame(c(paste ("day",

seq(1,(ncol(joined_mm_day_year_names)-3),1), sep="_"))))

names(joined_mm_day_year_names)[4:ncol(joined_mm_day_year_names)]=

newnames

write.csv(joined_mm_day_year_names,

file=file.path("C:/Users/Desktop/temp/",csvname))

}

On Sat, Jul 29, 2017 at 12:36 AM, Michael Sumner <[hidden email]> wrote:

>

>

> On Sat, 29 Jul 2017 at 02:21 Miluji Sb <[hidden email]> wrote:

>

>> Dear all,

>>

>> I have a set of coordinates for cities:

>>

>> structure(list(lon = c(3.7174243, 3.2246995, 2.928656, 33.3822764,

>> 10.40237, 24.7535746, 7.2619532, -0.370679, -4.486076, -4.097899

>> ), lat = c(51.0543422, 51.209348, 51.21543, 35.1855659, 55.403756,

>> 59.4369608, 43.7101728, 49.182863, 48.390394, 47.997542)), .Names =

>> c("lon",

>> "lat"), na.action = structure(c(5L, 6L, 31L, 55L, 59L, 61L, 67L,

>> 68L), .Names = c("5", "6", "31", "55", "59", "61", "67", "68"

>> ), class = "omit"), row.names = c(1L, 2L, 3L, 4L, 7L, 8L, 9L,

>> 10L, 11L, 12L), class = "data.frame")

>>

>> I am trying to extract climatic data from GLDAS climatic data (1 degree x

>> 1

>> degree) GRIB files with the following code:

>>

>> # Convert to spatial points

>> pts_dams <- SpatialPoints(lonlat,proj4string=CRS("+proj=longlat"))

>>

>> # Extract

>> filelist<-list.files(pattern=".grb$")

>>

>>

> This part is a bit wrong:

>

>

>> data<-stack(filelist[1])

>> data@layers<-sapply(filelist, function(x) raster(x,band=12))

>>

>

> Better would be

>

> rdata <- raster::stack(lapply(filelist, function(x) raster(x,band=12)))

>

> For the rest we will need to see details about the data, so please do

>

> print(rdata)

>

> and send the resulting print-out.

>

> A minor thing "data" is already a function, so it's advisable not to use

> it as a name - if only to reduce possible confusion.

>

> Also, it's not obvious that the process by which raster(file) goes though

> (it passes down to rgdal::readGDAL) will result in a correct interpretation

> as the data source was intended, since GRIB is a domain-specific format for

> time-series and volume-slice series data and the required translation to

> the GDAL and raster data models is not always straight-forward.

>

> It totally can work though, I do it routine with many sources but they all

> need some oversight to ensure that upfront translation is complete and

> appropriate.

>

> It may just require setting raster::extent and/or raster::projection, but

> there's no way to know without exploration.

>

> Cheers, Mike.

>

>

>

>> data_dams_size<-extract(data,pts_dams)

>> DF_data_dams_size<- as.data.frame(data_dams_size)

>>

>> Unfortunately, I get mostly NAs for the data, it seems that there's an

>> issue with the CRS projections for the city coordinates. Is there a

>> specific projection for city level coordinates? Or am I doing something

>> completely wrong? Thank you!

>>

>> Sincerely,

>>

>> Milu

>>

>> [[alternative HTML version deleted]]

>>

>> _______________________________________________

>> R-sig-Geo mailing list

>> [hidden email]

>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo

>>

> --

> Dr. Michael Sumner

> Software and Database Engineer

> Australian Antarctic Division

> 203 Channel Highway

> Kingston Tasmania 7050 Australia

>

>

[[alternative HTML version deleted]]

_______________________________________________

R-sig-Geo mailing list

[hidden email]

https://stat.ethz.ch/mailman/listinfo/r-sig-geo

### Re: RQGIS : looking for a geoalgorithm (Tristan Bourgeois)

RQGIS makes use of the processing toolbox and related geoalgorithms. The Atlas tool is not part of the processing toolbox that's why find_algorithms() does not return anything. However, automatic map making is possible in R using e.g., the packages sp, sf and/or raster (tmap has been already mentioned in a previous reply). If you want to stick with the QGIS map canvas, maybe rqgisapi might let you do that (http://r-sig-geo.2731867.n2.nabble.com/Introducing-an-R-package-for-calling-functions-within-a-running-instance-of-QGIS-ALPHA-tt7591302.html).

HTH,

Jannes

> Subject: RQGIS : looking for a geoalgorithm

>

> Dear all,

>

> Currently working on a new project I would like to create a dynamic report

> with R (using probably XML package for writting)

>

> In this report I'll plot some graphs (I'm ok with ggplots2) but also map.

>

>

> I just discovered the RQGIS package and I think I understood its functions.

> My question is the following one :

>

> Is there any geoalgorithms which allows to create an atlas ?

>

> I searched via the "find_algorithms()" but found nothing.

>

> Thanks in advance !

>

> Tristan.

_______________________________________________

R-sig-Geo mailing list

[hidden email]

https://stat.ethz.ch/mailman/listinfo/r-sig-geo

HTH,

Jannes

> Subject: RQGIS : looking for a geoalgorithm

>

> Dear all,

>

> Currently working on a new project I would like to create a dynamic report

> with R (using probably XML package for writting)

>

> In this report I'll plot some graphs (I'm ok with ggplots2) but also map.

>

>

> I just discovered the RQGIS package and I think I understood its functions.

> My question is the following one :

>

> Is there any geoalgorithms which allows to create an atlas ?

>

> I searched via the "find_algorithms()" but found nothing.

>

> Thanks in advance !

>

> Tristan.

_______________________________________________

R-sig-Geo mailing list

[hidden email]

https://stat.ethz.ch/mailman/listinfo/r-sig-geo

### Re: Error in basename(x) : path too long

Thank you Ben,

I would like to generate annual sums & means for each year from a list

rasters (not stack or brick because of memory issues with R). For 15 years

I have 345 individual rasters and frequency of 23 per year.

I will be very glad for an idea on how to do that.

Thanks

John

On Sat, Jul 29, 2017 at 5:37 PM, Ben Tupper <[hidden email]> wrote:

> Hi again,

>

> A late thought - I'm still on the first cups of coffee.

>

> It looks to me like you are iterating over a stack to select certain

> layers to sum. You could achieve the same outcome with possibly much less

> work. The following example will create a sum of 24-layer blocks along a

> stack of rasters.

>

> # from https://gist.github.com/btupper/20879e0b46e5ed63d402d7cff424dbb7

> #' Split a vector into groups of MAX (or possibly fewer)

> #'

> #' @param v vector or list to split

> #' @param MAX numeric the maximum size per group

> #' @return a list of the vector split into groups

> split_vector <- function(v, MAX = 200){

> nv <- length(v)

> if (nv <= MAX) return(list('1' = v))

> split(v, findInterval(1:nv, seq(from = 1, to = nv, by = MAX)))

> }

>

>

> library(raster)

>

> N <- 345

> n <- 24

> nc <- 4

> nr <- 3

>

> R <- raster(matrix(runif(nc*nr), ncol = nc, nrow = nr))

>

> RR <- stack(lapply(seq_len(N), function(i) R))

>

> ix <- split_vector(seq_len(N), MAX = n)

>

> SS <- lapply(ix, function(index) sum(RR[[index]]))

>

> So, S[[1]], which looks like this...

>

> $`1`

> class : RasterLayer

> dimensions : 3, 4, 12 (nrow, ncol, ncell)

> resolution : 0.25, 0.3333333 (x, y)

> extent : 0, 1, 0, 1 (xmin, xmax, ymin, ymax)

> coord. ref. : NA

> data source : in memory

> names : layer

> values : 0.9451534, 20.0503 (min, max)

>

> ... is the sum of the first 24 layers of RR. SS[[2]] will be the sum of

> the next 24, and so on.

>

> Is that what you are trying to do?

>

> Cheers,

> Ben

>

>

>

>

> > On Jul 29, 2017, at 7:56 AM, John Wasige <[hidden email]> wrote:

> >

> > Dear all,

> >

> > I am running the script below & I get the following error:

> > Error in basename(x) : path too long

> >

> > What could be the problem?

> > Thanks for your help

> > John

> >

> > ### Script

> >

> > setwd("I:/Mauritius2001_2015") # directory of data

> > newlist= read.csv('I:/Mauritius2001_2015/Mauritius.csv',header=F)

> >

> > refr <- raster(paste("I:/Mauritius2001_2015/",newlist[i,1],sep = ""))

> > refr[!is.na(refr)] <- 0

> > for(i in seq(1,345,by=23)){

> > rsum <- refr

> > for(p in 0:22){

> > r <- raster(paste("I:/Mauritius2001_2015/",newlist[i+p],sep = ""))

> > rsum <- rsum + r

> > }

> > # rsum <- rsum

> > writeRaster(rsum,

> > filename=paste("D:/Mauritius2001_2015/Annual/",

> substr(newlist[i],1,6),".tif",sep=''),

> > format="GTiff", overwrite=TRUE)

> > }

> >

> > [[alternative HTML version deleted]]

> >

> > _______________________________________________

> > R-sig-Geo mailing list

> > [hidden email]

> > https://stat.ethz.ch/mailman/listinfo/r-sig-geo

>

> Ben Tupper

> Bigelow Laboratory for Ocean Sciences

> 60 Bigelow Drive, P.O. Box 380

> East Boothbay, Maine 04544

> http://www.bigelow.org

>

> Ecocast Reports: http://seascapemodeling.org/ecocast.html

> Tick Reports: https://report.bigelow.org/tick/

> Jellyfish Reports: https://jellyfish.bigelow.org/jellyfish/

>

>

>

>

--

John Wasige

"There are no REGRATES in LIFE, just lessons (Jennifer Aniston)”

[[alternative HTML version deleted]]

_______________________________________________

R-sig-Geo mailing list

[hidden email]

https://stat.ethz.ch/mailman/listinfo/r-sig-geo

I would like to generate annual sums & means for each year from a list

rasters (not stack or brick because of memory issues with R). For 15 years

I have 345 individual rasters and frequency of 23 per year.

I will be very glad for an idea on how to do that.

Thanks

John

On Sat, Jul 29, 2017 at 5:37 PM, Ben Tupper <[hidden email]> wrote:

> Hi again,

>

> A late thought - I'm still on the first cups of coffee.

>

> It looks to me like you are iterating over a stack to select certain

> layers to sum. You could achieve the same outcome with possibly much less

> work. The following example will create a sum of 24-layer blocks along a

> stack of rasters.

>

> # from https://gist.github.com/btupper/20879e0b46e5ed63d402d7cff424dbb7

> #' Split a vector into groups of MAX (or possibly fewer)

> #'

> #' @param v vector or list to split

> #' @param MAX numeric the maximum size per group

> #' @return a list of the vector split into groups

> split_vector <- function(v, MAX = 200){

> nv <- length(v)

> if (nv <= MAX) return(list('1' = v))

> split(v, findInterval(1:nv, seq(from = 1, to = nv, by = MAX)))

> }

>

>

> library(raster)

>

> N <- 345

> n <- 24

> nc <- 4

> nr <- 3

>

> R <- raster(matrix(runif(nc*nr), ncol = nc, nrow = nr))

>

> RR <- stack(lapply(seq_len(N), function(i) R))

>

> ix <- split_vector(seq_len(N), MAX = n)

>

> SS <- lapply(ix, function(index) sum(RR[[index]]))

>

> So, S[[1]], which looks like this...

>

> $`1`

> class : RasterLayer

> dimensions : 3, 4, 12 (nrow, ncol, ncell)

> resolution : 0.25, 0.3333333 (x, y)

> extent : 0, 1, 0, 1 (xmin, xmax, ymin, ymax)

> coord. ref. : NA

> data source : in memory

> names : layer

> values : 0.9451534, 20.0503 (min, max)

>

> ... is the sum of the first 24 layers of RR. SS[[2]] will be the sum of

> the next 24, and so on.

>

> Is that what you are trying to do?

>

> Cheers,

> Ben

>

>

>

>

> > On Jul 29, 2017, at 7:56 AM, John Wasige <[hidden email]> wrote:

> >

> > Dear all,

> >

> > I am running the script below & I get the following error:

> > Error in basename(x) : path too long

> >

> > What could be the problem?

> > Thanks for your help

> > John

> >

> > ### Script

> >

> > setwd("I:/Mauritius2001_2015") # directory of data

> > newlist= read.csv('I:/Mauritius2001_2015/Mauritius.csv',header=F)

> >

> > refr <- raster(paste("I:/Mauritius2001_2015/",newlist[i,1],sep = ""))

> > refr[!is.na(refr)] <- 0

> > for(i in seq(1,345,by=23)){

> > rsum <- refr

> > for(p in 0:22){

> > r <- raster(paste("I:/Mauritius2001_2015/",newlist[i+p],sep = ""))

> > rsum <- rsum + r

> > }

> > # rsum <- rsum

> > writeRaster(rsum,

> > filename=paste("D:/Mauritius2001_2015/Annual/",

> substr(newlist[i],1,6),".tif",sep=''),

> > format="GTiff", overwrite=TRUE)

> > }

> >

> > [[alternative HTML version deleted]]

> >

> > _______________________________________________

> > R-sig-Geo mailing list

> > [hidden email]

> > https://stat.ethz.ch/mailman/listinfo/r-sig-geo

>

> Ben Tupper

> Bigelow Laboratory for Ocean Sciences

> 60 Bigelow Drive, P.O. Box 380

> East Boothbay, Maine 04544

> http://www.bigelow.org

>

> Ecocast Reports: http://seascapemodeling.org/ecocast.html

> Tick Reports: https://report.bigelow.org/tick/

> Jellyfish Reports: https://jellyfish.bigelow.org/jellyfish/

>

>

>

>

--

John Wasige

"There are no REGRATES in LIFE, just lessons (Jennifer Aniston)”

[[alternative HTML version deleted]]

_______________________________________________

R-sig-Geo mailing list

[hidden email]

https://stat.ethz.ch/mailman/listinfo/r-sig-geo

### Re: Differences between moran.test and lm.morantest

On Sat, 29 Jul 2017, Javier García wrote:

> Hello everybody:

>

>

>

> Currently I am working on a paper in which we need to analyze the presence

> of possible spatial correlation in the data. With this aim I am running some

> tests in R. I am a little bit confused about the differences between

> moran.test and lm.morantest R functions. The problem I face to is that when

> I run moran.test on my regression residuals the result is totally different

> from the one I obtain when I use lm.morantest with the lm regression object

> (please, see below the different outputs I get and after it a reproducible

> example). In particular, whereas the observed Moran I is the same, the

> expectation and variance differ dramatically, getting opposite conclusions.

> I would appreciate very much if someone could clarify for me which is the

> cause behind this. By the way, I also run LM tests (LMerr, RLMerr, LMlag and

> RLMlag) not rejecting the null hypothesis in any of them (all p-values are

> higher than 0.7), which is in clear contradiction with the lm.morantest? how

> is this possible?

> moran.test() is for "primary" variables only - read the reference in

?moran.test. The mean model applied to the this variable is the intercept,

that is the mean only:

> moran.test(COL.OLD$CRIME, nb2listw(COL.nb, style="W"),

+ randomisation=FALSE, alternative="two.sided")

Moran I test under normality

data: COL.OLD$CRIME

weights: nb2listw(COL.nb, style = "W")

Moran I statistic standard deviate = 5.6754, p-value = 1.384e-08

alternative hypothesis: two.sided

sample estimates:

Moran I statistic Expectation Variance

0.510951264 -0.020833333 0.008779831

under the Normal assumption.

lm.morantest() can reproduce the same results for the same model:

> lm.morantest(lm(CRIME ~ 1, data=COL.OLD), nb2listw(COL.nb, style="W"),

+ alternative="two.sided")

Global Moran I for regression residuals

data:

model: lm(formula = CRIME ~ 1, data = COL.OLD)

weights: nb2listw(COL.nb, style = "W")

Moran I statistic standard deviate = 5.6754, p-value = 1.384e-08

alternative hypothesis: two.sided

sample estimates:

Observed Moran I Expectation Variance

0.510951264 -0.020833333 0.008779831

Note that the Normal VI for moran.test() is:

VI <- (wc$nn * wc$S1 - wc$n * wc$S2 + 3 * S02)/(S02 *

(wc$nn - 1))

and for lm.morantest():

XtXinv <- chol2inv(model$qr$qr[p1, p1, drop = FALSE])

X <- model.matrix(terms(model), model.frame(model))

...

Z <- lag.listw(listw.U, X, zero.policy = zero.policy)

C1 <- t(X) %*% Z

trA <- (sum(diag(XtXinv %*% C1)))

EI <- -((N * trA)/((N - p) * S0))

C2 <- t(Z) %*% Z

C3 <- XtXinv %*% C1

trA2 <- sum(diag(C3 %*% C3))

trB <- sum(diag(4 * (XtXinv %*% C2)))

VI <- (((N * N)/((S0 * S0) * (N - p) * (N - p + 2))) * (S1 +

2 * trA2 - trB - ((2 * (trA^2))/(N - p))))

where you see easily that the issue of dependent residuals (only n-p are

independent even in the aspatial case, so you see n-p instead of n) is

handled, as are products of X, WX, and (X'X)^{-1}.

Reading the references is essential and should not be neglected. Reading

the code may help, but doesn't explain the reasoning behind the code and

the output. If you need to test model residuals, use the appropriate test.

Using moran.test() on extracted residuals if covariates are included in

the model is never justified.

Different outcomes from Moran's I and LM suggest severe mis-specification

in your model, see for example:

https://groups.google.com/forum/#!msg/openspace-list/k4F4jI9cU1I/s5bj8r4nwn4J

for a simplified flowchart for choosing models.

Hope this clarifies,

Roger

>

>

>

>

> MY PARTICULAR CASE

>

>

>

> reg.OLS <- lm(y~z1+z2+z3+z4+z5+z6+z7+z8+z9+z10, data=datos)

>

>

>

> moran.test(resid(reg.OLS),alternative="two.sided", W_n)

>

>

>

> Moran I test under randomisation

>

>

>

> data: resid(reg.OLS)

>

> weights: W_n

>

>

>

> Moran I statistic standard deviate = 0.4434, p-value = 0.6575

>

> alternative hypothesis: two.sided

>

> sample estimates:

>

> Moran I statistic Expectation Variance

>

> 1.596378e-05 -3.595829e-04 7.173448e-07

>

>

>

>

>

>

>

> moran.lm <-lm.morantest(reg.OLS, W_n, alternative="two.sided")

>

> print(moran.lm)

>

>

>

> Global Moran I for regression residuals

>

>

>

> data:

>

> model: lm(formula = y ~ z1 + z2 + z3 + z4 + z5 + z6 + z7 + z8 + z9 + z10

>

> , data = datos)

>

> weights: W_n

>

>

>

> Moran I statistic standard deviate = 11.649, p-value < 2.2e-16

>

> alternative hypothesis: two.sided

>

> sample estimates:

>

> Observed Moran I Expectation Variance

>

> 1.596378e-05 -1.913005e-03 2.741816e-08

>

>

>

>

>

> A REPRODUCIBLE EXAMPLE

>

>

>

>

>

> library(spdep)

>

> data(oldcol)

>

> oldcrime.lm <- lm(CRIME ~ HOVAL + INC + OPEN + PLUMB + DISCBD + PERIM, data

> = COL.OLD)

>

>

>

> moran.test(resid(oldcrime.lm), nb2listw(COL.nb, style="W"))

>

>

>

> Moran I test under randomisation

>

>

>

> data: resid(oldcrime.lm)

>

> weights: nb2listw(COL.nb, style = "W")

>

>

>

> Moran I statistic standard deviate = 1.2733, p-value = 0.1015

>

> alternative hypothesis: greater

>

> sample estimates:

>

> Moran I statistic Expectation Variance

>

> 0.096711162 -0.020833333 0.008521765

>

>

>

>

>

> lm.morantest(oldcrime.lm, nb2listw(COL.nb, style="W"))

>

>

>

> Global Moran I for regression residuals

>

>

>

> data:

>

> model: lm(formula = CRIME ~ HOVAL + INC + OPEN + PLUMB + DISCBD +

>

> PERIM, data = COL.OLD)

>

> weights: nb2listw(COL.nb, style = "W")

>

>

>

> Moran I statistic standard deviate = 1.6668, p-value = 0.04777

>

> alternative hypothesis: greater

>

> sample estimates:

>

> Observed Moran I Expectation Variance

>

> 0.096711162 -0.052848581 0.008050938

>

>

>

> Thanks a lot in advance and sorry for the inconvenience.

>

>

>

> Javi

>

>

>

>

>

>

>

>

> JAVIER GARCÍA

>

>

>

> Departamento de Economía Aplicada III (Econometría y Estadística)

>

> Facultad de Economía y Empresa (Sección Sarriko)

> Avda. Lehendakari Aguirre 83

>

> 48015 BILBAO

> T.: +34 601 7126 F.: +34 601 3754

> <http://www.ehu.es/> www.ehu.es

>

> http://www.unibertsitate-hedakuntza.ehu.es/p268-content/es/contenidos/inform

> acion/manual_id_corp/es_manual/images/firma_email_upv_euskampus_bilingue.gif

>

>

>

>

>

> --

Roger Bivand

Department of Economics, Norwegian School of Economics,

Helleveien 30, N-5045 Bergen, Norway.

voice: +47 55 95 93 55; e-mail: [hidden email]

Editor-in-Chief of The R Journal, https://journal.r-project.org/index.html

http://orcid.org/0000-0003-2392-6140

https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en

_______________________________________________

R-sig-Geo mailing list

[hidden email]

https://stat.ethz.ch/mailman/listinfo/r-sig-geo Roger Bivand

Department of Economics

Norwegian School of Economics

Helleveien 30

N-5045 Bergen, Norway

> Hello everybody:

>

>

>

> Currently I am working on a paper in which we need to analyze the presence

> of possible spatial correlation in the data. With this aim I am running some

> tests in R. I am a little bit confused about the differences between

> moran.test and lm.morantest R functions. The problem I face to is that when

> I run moran.test on my regression residuals the result is totally different

> from the one I obtain when I use lm.morantest with the lm regression object

> (please, see below the different outputs I get and after it a reproducible

> example). In particular, whereas the observed Moran I is the same, the

> expectation and variance differ dramatically, getting opposite conclusions.

> I would appreciate very much if someone could clarify for me which is the

> cause behind this. By the way, I also run LM tests (LMerr, RLMerr, LMlag and

> RLMlag) not rejecting the null hypothesis in any of them (all p-values are

> higher than 0.7), which is in clear contradiction with the lm.morantest? how

> is this possible?

> moran.test() is for "primary" variables only - read the reference in

?moran.test. The mean model applied to the this variable is the intercept,

that is the mean only:

> moran.test(COL.OLD$CRIME, nb2listw(COL.nb, style="W"),

+ randomisation=FALSE, alternative="two.sided")

Moran I test under normality

data: COL.OLD$CRIME

weights: nb2listw(COL.nb, style = "W")

Moran I statistic standard deviate = 5.6754, p-value = 1.384e-08

alternative hypothesis: two.sided

sample estimates:

Moran I statistic Expectation Variance

0.510951264 -0.020833333 0.008779831

under the Normal assumption.

lm.morantest() can reproduce the same results for the same model:

> lm.morantest(lm(CRIME ~ 1, data=COL.OLD), nb2listw(COL.nb, style="W"),

+ alternative="two.sided")

Global Moran I for regression residuals

data:

model: lm(formula = CRIME ~ 1, data = COL.OLD)

weights: nb2listw(COL.nb, style = "W")

Moran I statistic standard deviate = 5.6754, p-value = 1.384e-08

alternative hypothesis: two.sided

sample estimates:

Observed Moran I Expectation Variance

0.510951264 -0.020833333 0.008779831

Note that the Normal VI for moran.test() is:

VI <- (wc$nn * wc$S1 - wc$n * wc$S2 + 3 * S02)/(S02 *

(wc$nn - 1))

and for lm.morantest():

XtXinv <- chol2inv(model$qr$qr[p1, p1, drop = FALSE])

X <- model.matrix(terms(model), model.frame(model))

...

Z <- lag.listw(listw.U, X, zero.policy = zero.policy)

C1 <- t(X) %*% Z

trA <- (sum(diag(XtXinv %*% C1)))

EI <- -((N * trA)/((N - p) * S0))

C2 <- t(Z) %*% Z

C3 <- XtXinv %*% C1

trA2 <- sum(diag(C3 %*% C3))

trB <- sum(diag(4 * (XtXinv %*% C2)))

VI <- (((N * N)/((S0 * S0) * (N - p) * (N - p + 2))) * (S1 +

2 * trA2 - trB - ((2 * (trA^2))/(N - p))))

where you see easily that the issue of dependent residuals (only n-p are

independent even in the aspatial case, so you see n-p instead of n) is

handled, as are products of X, WX, and (X'X)^{-1}.

Reading the references is essential and should not be neglected. Reading

the code may help, but doesn't explain the reasoning behind the code and

the output. If you need to test model residuals, use the appropriate test.

Using moran.test() on extracted residuals if covariates are included in

the model is never justified.

Different outcomes from Moran's I and LM suggest severe mis-specification

in your model, see for example:

https://groups.google.com/forum/#!msg/openspace-list/k4F4jI9cU1I/s5bj8r4nwn4J

for a simplified flowchart for choosing models.

Hope this clarifies,

Roger

>

>

>

>

> MY PARTICULAR CASE

>

>

>

> reg.OLS <- lm(y~z1+z2+z3+z4+z5+z6+z7+z8+z9+z10, data=datos)

>

>

>

> moran.test(resid(reg.OLS),alternative="two.sided", W_n)

>

>

>

> Moran I test under randomisation

>

>

>

> data: resid(reg.OLS)

>

> weights: W_n

>

>

>

> Moran I statistic standard deviate = 0.4434, p-value = 0.6575

>

> alternative hypothesis: two.sided

>

> sample estimates:

>

> Moran I statistic Expectation Variance

>

> 1.596378e-05 -3.595829e-04 7.173448e-07

>

>

>

>

>

>

>

> moran.lm <-lm.morantest(reg.OLS, W_n, alternative="two.sided")

>

> print(moran.lm)

>

>

>

> Global Moran I for regression residuals

>

>

>

> data:

>

> model: lm(formula = y ~ z1 + z2 + z3 + z4 + z5 + z6 + z7 + z8 + z9 + z10

>

> , data = datos)

>

> weights: W_n

>

>

>

> Moran I statistic standard deviate = 11.649, p-value < 2.2e-16

>

> alternative hypothesis: two.sided

>

> sample estimates:

>

> Observed Moran I Expectation Variance

>

> 1.596378e-05 -1.913005e-03 2.741816e-08

>

>

>

>

>

> A REPRODUCIBLE EXAMPLE

>

>

>

>

>

> library(spdep)

>

> data(oldcol)

>

> oldcrime.lm <- lm(CRIME ~ HOVAL + INC + OPEN + PLUMB + DISCBD + PERIM, data

> = COL.OLD)

>

>

>

> moran.test(resid(oldcrime.lm), nb2listw(COL.nb, style="W"))

>

>

>

> Moran I test under randomisation

>

>

>

> data: resid(oldcrime.lm)

>

> weights: nb2listw(COL.nb, style = "W")

>

>

>

> Moran I statistic standard deviate = 1.2733, p-value = 0.1015

>

> alternative hypothesis: greater

>

> sample estimates:

>

> Moran I statistic Expectation Variance

>

> 0.096711162 -0.020833333 0.008521765

>

>

>

>

>

> lm.morantest(oldcrime.lm, nb2listw(COL.nb, style="W"))

>

>

>

> Global Moran I for regression residuals

>

>

>

> data:

>

> model: lm(formula = CRIME ~ HOVAL + INC + OPEN + PLUMB + DISCBD +

>

> PERIM, data = COL.OLD)

>

> weights: nb2listw(COL.nb, style = "W")

>

>

>

> Moran I statistic standard deviate = 1.6668, p-value = 0.04777

>

> alternative hypothesis: greater

>

> sample estimates:

>

> Observed Moran I Expectation Variance

>

> 0.096711162 -0.052848581 0.008050938

>

>

>

> Thanks a lot in advance and sorry for the inconvenience.

>

>

>

> Javi

>

>

>

>

>

>

>

>

> JAVIER GARCÍA

>

>

>

> Departamento de Economía Aplicada III (Econometría y Estadística)

>

> Facultad de Economía y Empresa (Sección Sarriko)

> Avda. Lehendakari Aguirre 83

>

> 48015 BILBAO

> T.: +34 601 7126 F.: +34 601 3754

> <http://www.ehu.es/> www.ehu.es

>

> http://www.unibertsitate-hedakuntza.ehu.es/p268-content/es/contenidos/inform

> acion/manual_id_corp/es_manual/images/firma_email_upv_euskampus_bilingue.gif

>

>

>

>

>

> --

Roger Bivand

Department of Economics, Norwegian School of Economics,

Helleveien 30, N-5045 Bergen, Norway.

voice: +47 55 95 93 55; e-mail: [hidden email]

Editor-in-Chief of The R Journal, https://journal.r-project.org/index.html

http://orcid.org/0000-0003-2392-6140

https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en

_______________________________________________

R-sig-Geo mailing list

[hidden email]

https://stat.ethz.ch/mailman/listinfo/r-sig-geo Roger Bivand

Department of Economics

Norwegian School of Economics

Helleveien 30

N-5045 Bergen, Norway

### Re: bivariate spatial correlation in R

Roger,

Population and income data are single point in time and come from the 2010

Census.

Accessibility variables in 2014 and 2017 show the proportion of jobs

accessible by public transport under 60 minutes. The variable diffaccess

shows the difference between these two. It's in percentage points

(access2017 - access2014)

best,

Rafael H M Pereira

urbandemographics.blogspot.com

On Sun, Jul 30, 2017 at 7:41 AM, Roger Bivand <[hidden email]> wrote:

> Thanks, I'll get back when able, offline now. What are the units of

> observation, and are aggregate household incomes observed only once?

>

> Roger

>

> Roger Bivand

> Norwegian School of Economics

> Bergen, Norway

>

>

>

> Fra: Rafael Pereira

> Sendt: søndag 30. juli, 00.39

> Emne: Re: [R-sig-Geo] bivariate spatial correlation in R

> Kopi: Rogério Barbosa, [hidden email]

>

>

> Hi all, here is a reproducible example to calculate in R bivariate Moran's

> I and LISA clusters. This example is based on a this answer provided in SO*

> and it uses a toy model of my data. The R script and the shape file with

> the data are available on this link. https://gist.github.com/

> rafapereirabr/5348193abf779625f5e8c5090776a228 What this example does is

> to estimate the spatial association between household income per capita and

> the gains in accessibility to jobs. The aim is to analyze who benefits the

> recent changes in the transport system in terms of access to jobs. So the

> idea is not to find causal relationships, but spatial association between

> areas of high/low income who had high/low gains in accessibility. The

> variables in the data show info on the proportion of jobs accessible in

> both years 2014 and 2017 (access2014, access2017) and the difference

> between the two years in percentage points (diffaccess). Roger, I know you

> have shown to be a bit sceptical about this application of bivariate

> Moran's I. Do you still think a spatial regression would be more

> appropriate? Also, I would be glad to hear if others have comments on the

> code. This function is not implemented in any package so it would be great

> to have some feedback. Rafael H M Pereira urbandemographics.blogspot.com

> * https://stackoverflow.com/questions/45177590/map-of-

> bivariate-spatial-correlation-in-r-bivariate-lisa On Wed, Jul 26, 2017 at

> 11:07 AM, Roger Bivand wrote: > On Wed, 26 Jul 2017, Rafael Pereira wrote:

> > > Roger, >> >> This example was provided only for the sake or making the

> code easily >> reproducible for others and I'm more interested in how the

> bi-variate >> Moran >> could be implemented in R, but your comments are

> very much welcomed and >> I've made changes to the question. >> >> My

> actual case study looks at bi-variate spatial correlation between (a) >>

> average household income per capita and (b) proportion of jobs in the city

> >> that are accessible under 60 minutes by transit. I don't think I could

> use >> rates in this case but I will normalize the variables using >>

> scale(data$variable). >> > > Please provide a reproducible example, either

> with a link to a data > subset, or using a builtin data set. My guess is

> that you do not need > bi-variate spatial correlation at all, but rather a

> spatial regression. > > The "causal" variable would then the the proportion

> of jobs accessible > within 60 minutes by transit, though this is extremely

> blunt, and lots of > other covariates (demography, etc.) impact average

> household income per > capita (per block/tract?). Since there are many

> missing variables in your > specification, any spatial correlation would be

> most closely associated > with them (demography, housing costs, education,

> etc.), and the choice of > units of measurement would dominate the outcome.

> > > This is also why bi-variate spatial correlation is seldom a good idea,

> I > believe. It can be done, but most likely shouldn't, unless it can be >

> motivated properly. > > By the way, the weighted and FDR-corrected SAD

> local Moran's I p-values of > the black/white ratio for Oregon (your toy

> example) did deliver the goods - > if you zoom in in mapview::mapview, you

> can see that it detects a rate > hotspot between the rivers. > > Roger > >

> > >> best, >> >> Rafael H M Pereira >> >> On Mon, Jul 24, 2017 at 7:56 PM,

> Roger Bivand >> wrote: >> >> On Mon, 24 Jul 2017, Rafael Pereira wrote: >>>

> >>> Hi all, >>> >>>> >>>> I would like to ask whether some you conducted

> bi-variate spatial >>>> correlation in R. >>>> >>>> I know the bi-variate

> Moran's I is not implemented in the spdep library. >>>> I left a question

> on SO but also wanted to hear if anyone if the >>>> mainlist >>>> have come

> across this. >>>> https://stackoverflow.com/questions/45177590/map-of-

> bivariat >>>> e-spatial-correlation-in-r-bivariate-lisa >>>> >>>> I also

> know Roger Bivand has implemented the L index proposed by Lee >>>> (2001)

> >>>> in spdep, but I'm not I'm not sure whether the L local correlation

> >>>> coefficients can be interpreted the same way as the local Moran's I

> >>>> coefficients. I couldn't find any reference commenting on this issue.

> I >>>> would very much appreciate your thoughts this. >>>> >>>> >>> In the

> SO question, and in the follow-up, your presumably throw-away >>> example

> makes fundamental mistakes. The code in spdep by Virgilio >>> Gómez-Rubio

> is for uni- and bivariate L, and produces point values of >>> local >>> L.

> This isn't the main problem, which is rather that you are not taking >>>

> account of the underlying population counts, nor shrinking any estimates

> >>> of >>> significance to accommodate population sizes. Population sizes

> vary from >>> 0 >>> to 11858, with the lower quartile at 3164 and upper

> 5698: >>> plot(ecdf(oregon.tract$pop2000)). Should you be comparing rates

> in >>> stead? >>> These are also compositional variables (sum to pop2000,

> or 1 if rates) >>> with >>> the other missing components. You would

> probably be better served by >>> tools >>> examining spatial segregation,

> such as for example the seg package. >>> >>> The 0 count populations cause

> problems for an unofficial alternative, the >>> black/white ratio: >>> >>>

> oregon.tract1 0,] >>> oregon.tract1$rat >> nb >> lw >> >>> which should

> still be adjusted by weighting: >>> >>> lm0 >> >>> I'm not advising this,

> but running localmoran.sad on this model output >>> yields SAD p-values <

> 0.05 after FDR correction only in contiguous tracts >>> on the Washington

> state line in Portland between the Columbia and >>> Williamette rivers. So

> do look at the variables you are using before >>> rushing into things. >>>

> >>> Hope this clarifies, >>> >>> Roger >>> >>> >>> best, >>>> >>>> Rafael

> HM Pereira >>>> http://urbandemographics.blogspot.com >>>> >>>>

> [[alternative HTML version deleted]] >>>> >>>>

> _______________________________________________ >>>> R-sig-Geo mailing

> list >>>> [hidden email] >>>> https://stat.ethz.ch/mailman/

> listinfo/r-sig-geo >>>> >>>> >>>> -- >>> Roger Bivand >>> Department of

> Economics, Norwegian School of Economics, >>> Helleveien 30, N-5045 Bergen,

> Norway. >>> voice: +47 55 95 93 55 <+47%2055%2095%2093%2055>; e-mail:

> [hidden email] >>> Editor-in-Chief of The R Journal,

> https://journal.r-project.org/ >>> index.html >>>

> http://orcid.org/0000-0003-2392-6140 >>> https://scholar.google.no/

> citations?user=AWeghB0AAAAJ&hl=en >>> >> >> [[alternative HTML version

> deleted]] >> >> _______________________________________________ >>

> R-sig-Geo mailing list >> [hidden email] >>

> https://stat.ethz.ch/mailman/listinfo/r-sig-geo >> > > -- > Roger Bivand

> > Department of Economics, Norwegian School of Economics, > Helleveien 30,

> N-5045 Bergen, Norway. > voice: +47 55 95 93 55 <+47%2055%2095%2093%2055>;

> e-mail: [hidden email] > Editor-in-Chief of The R Journal,

> https://journal.r-project.org/index.html > http://orcid.org/0000-0003-

> 2392-6140 > https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en >

> [[alternative HTML version deleted]] _______________________________________________

> R-sig-Geo mailing list [hidden email]

> https://stat.ethz.ch/mailman/listinfo/r-sig-geo

>

>

[[alternative HTML version deleted]]

_______________________________________________

R-sig-Geo mailing list

[hidden email]

https://stat.ethz.ch/mailman/listinfo/r-sig-geo

Population and income data are single point in time and come from the 2010

Census.

Accessibility variables in 2014 and 2017 show the proportion of jobs

accessible by public transport under 60 minutes. The variable diffaccess

shows the difference between these two. It's in percentage points

(access2017 - access2014)

best,

Rafael H M Pereira

urbandemographics.blogspot.com

On Sun, Jul 30, 2017 at 7:41 AM, Roger Bivand <[hidden email]> wrote:

> Thanks, I'll get back when able, offline now. What are the units of

> observation, and are aggregate household incomes observed only once?

>

> Roger

>

> Roger Bivand

> Norwegian School of Economics

> Bergen, Norway

>

>

>

> Fra: Rafael Pereira

> Sendt: søndag 30. juli, 00.39

> Emne: Re: [R-sig-Geo] bivariate spatial correlation in R

> Kopi: Rogério Barbosa, [hidden email]

>

>

> Hi all, here is a reproducible example to calculate in R bivariate Moran's

> I and LISA clusters. This example is based on a this answer provided in SO*

> and it uses a toy model of my data. The R script and the shape file with

> the data are available on this link. https://gist.github.com/

> rafapereirabr/5348193abf779625f5e8c5090776a228 What this example does is

> to estimate the spatial association between household income per capita and

> the gains in accessibility to jobs. The aim is to analyze who benefits the

> recent changes in the transport system in terms of access to jobs. So the

> idea is not to find causal relationships, but spatial association between

> areas of high/low income who had high/low gains in accessibility. The

> variables in the data show info on the proportion of jobs accessible in

> both years 2014 and 2017 (access2014, access2017) and the difference

> between the two years in percentage points (diffaccess). Roger, I know you

> have shown to be a bit sceptical about this application of bivariate

> Moran's I. Do you still think a spatial regression would be more

> appropriate? Also, I would be glad to hear if others have comments on the

> code. This function is not implemented in any package so it would be great

> to have some feedback. Rafael H M Pereira urbandemographics.blogspot.com

> * https://stackoverflow.com/questions/45177590/map-of-

> bivariate-spatial-correlation-in-r-bivariate-lisa On Wed, Jul 26, 2017 at

> 11:07 AM, Roger Bivand wrote: > On Wed, 26 Jul 2017, Rafael Pereira wrote:

> > > Roger, >> >> This example was provided only for the sake or making the

> code easily >> reproducible for others and I'm more interested in how the

> bi-variate >> Moran >> could be implemented in R, but your comments are

> very much welcomed and >> I've made changes to the question. >> >> My

> actual case study looks at bi-variate spatial correlation between (a) >>

> average household income per capita and (b) proportion of jobs in the city

> >> that are accessible under 60 minutes by transit. I don't think I could

> use >> rates in this case but I will normalize the variables using >>

> scale(data$variable). >> > > Please provide a reproducible example, either

> with a link to a data > subset, or using a builtin data set. My guess is

> that you do not need > bi-variate spatial correlation at all, but rather a

> spatial regression. > > The "causal" variable would then the the proportion

> of jobs accessible > within 60 minutes by transit, though this is extremely

> blunt, and lots of > other covariates (demography, etc.) impact average

> household income per > capita (per block/tract?). Since there are many

> missing variables in your > specification, any spatial correlation would be

> most closely associated > with them (demography, housing costs, education,

> etc.), and the choice of > units of measurement would dominate the outcome.

> > > This is also why bi-variate spatial correlation is seldom a good idea,

> I > believe. It can be done, but most likely shouldn't, unless it can be >

> motivated properly. > > By the way, the weighted and FDR-corrected SAD

> local Moran's I p-values of > the black/white ratio for Oregon (your toy

> example) did deliver the goods - > if you zoom in in mapview::mapview, you

> can see that it detects a rate > hotspot between the rivers. > > Roger > >

> > >> best, >> >> Rafael H M Pereira >> >> On Mon, Jul 24, 2017 at 7:56 PM,

> Roger Bivand >> wrote: >> >> On Mon, 24 Jul 2017, Rafael Pereira wrote: >>>

> >>> Hi all, >>> >>>> >>>> I would like to ask whether some you conducted

> bi-variate spatial >>>> correlation in R. >>>> >>>> I know the bi-variate

> Moran's I is not implemented in the spdep library. >>>> I left a question

> on SO but also wanted to hear if anyone if the >>>> mainlist >>>> have come

> across this. >>>> https://stackoverflow.com/questions/45177590/map-of-

> bivariat >>>> e-spatial-correlation-in-r-bivariate-lisa >>>> >>>> I also

> know Roger Bivand has implemented the L index proposed by Lee >>>> (2001)

> >>>> in spdep, but I'm not I'm not sure whether the L local correlation

> >>>> coefficients can be interpreted the same way as the local Moran's I

> >>>> coefficients. I couldn't find any reference commenting on this issue.

> I >>>> would very much appreciate your thoughts this. >>>> >>>> >>> In the

> SO question, and in the follow-up, your presumably throw-away >>> example

> makes fundamental mistakes. The code in spdep by Virgilio >>> Gómez-Rubio

> is for uni- and bivariate L, and produces point values of >>> local >>> L.

> This isn't the main problem, which is rather that you are not taking >>>

> account of the underlying population counts, nor shrinking any estimates

> >>> of >>> significance to accommodate population sizes. Population sizes

> vary from >>> 0 >>> to 11858, with the lower quartile at 3164 and upper

> 5698: >>> plot(ecdf(oregon.tract$pop2000)). Should you be comparing rates

> in >>> stead? >>> These are also compositional variables (sum to pop2000,

> or 1 if rates) >>> with >>> the other missing components. You would

> probably be better served by >>> tools >>> examining spatial segregation,

> such as for example the seg package. >>> >>> The 0 count populations cause

> problems for an unofficial alternative, the >>> black/white ratio: >>> >>>

> oregon.tract1 0,] >>> oregon.tract1$rat >> nb >> lw >> >>> which should

> still be adjusted by weighting: >>> >>> lm0 >> >>> I'm not advising this,

> but running localmoran.sad on this model output >>> yields SAD p-values <

> 0.05 after FDR correction only in contiguous tracts >>> on the Washington

> state line in Portland between the Columbia and >>> Williamette rivers. So

> do look at the variables you are using before >>> rushing into things. >>>

> >>> Hope this clarifies, >>> >>> Roger >>> >>> >>> best, >>>> >>>> Rafael

> HM Pereira >>>> http://urbandemographics.blogspot.com >>>> >>>>

> [[alternative HTML version deleted]] >>>> >>>>

> _______________________________________________ >>>> R-sig-Geo mailing

> list >>>> [hidden email] >>>> https://stat.ethz.ch/mailman/

> listinfo/r-sig-geo >>>> >>>> >>>> -- >>> Roger Bivand >>> Department of

> Economics, Norwegian School of Economics, >>> Helleveien 30, N-5045 Bergen,

> Norway. >>> voice: +47 55 95 93 55 <+47%2055%2095%2093%2055>; e-mail:

> [hidden email] >>> Editor-in-Chief of The R Journal,

> https://journal.r-project.org/ >>> index.html >>>

> http://orcid.org/0000-0003-2392-6140 >>> https://scholar.google.no/

> citations?user=AWeghB0AAAAJ&hl=en >>> >> >> [[alternative HTML version

> deleted]] >> >> _______________________________________________ >>

> R-sig-Geo mailing list >> [hidden email] >>

> https://stat.ethz.ch/mailman/listinfo/r-sig-geo >> > > -- > Roger Bivand

> > Department of Economics, Norwegian School of Economics, > Helleveien 30,

> N-5045 Bergen, Norway. > voice: +47 55 95 93 55 <+47%2055%2095%2093%2055>;

> e-mail: [hidden email] > Editor-in-Chief of The R Journal,

> https://journal.r-project.org/index.html > http://orcid.org/0000-0003-

> 2392-6140 > https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en >

> [[alternative HTML version deleted]] _______________________________________________

> R-sig-Geo mailing list [hidden email]

> https://stat.ethz.ch/mailman/listinfo/r-sig-geo

>

>

[[alternative HTML version deleted]]

_______________________________________________

R-sig-Geo mailing list

[hidden email]

https://stat.ethz.ch/mailman/listinfo/r-sig-geo

### Re: bivariate spatial correlation in R

Thanks, I'll get back when able, offline now. What are the units of observation, and are aggregate household incomes observed only once?

Roger

Roger Bivand

Norwegian School of Economics

Bergen, Norway

Fra: Rafael Pereira

Sendt: søndag 30. juli, 00.39

Emne: Re: [R-sig-Geo] bivariate spatial correlation in R

Kopi: Rogério Barbosa, [hidden email]

Hi all, here is a reproducible example to calculate in R bivariate Moran's I and LISA clusters. This example is based on a this answer provided in SO* and it uses a toy model of my data. The R script and the shape file with the data are available on this link. https://gist.github.com/rafapereirabr/5348193abf779625f5e8c5090776a228 What this example does is to estimate the spatial association between household income per capita and the gains in accessibility to jobs. The aim is to analyze who benefits the recent changes in the transport system in terms of access to jobs. So the idea is not to find causal relationships, but spatial association between areas of high/low income who had high/low gains in accessibility. The variables in the data show info on the proportion of jobs accessible in both years 2014 and 2017 (access2014, access2017) and the difference between the two years in percentage points (diffaccess). Roger, I know you have shown to be a bit sceptical about this application of bivariate Moran's I. Do you still think a spatial regression would be more appropriate? Also, I would be glad to hear if others have comments on the code. This function is not implemented in any package so it would be great to have some feedback. Rafael H M Pereira urbandemographics.blogspot.com * https://stackoverflow.com/questions/45177590/map-of-bivariate-spatial-correlation-in-r-bivariate-lisa On Wed, Jul 26, 2017 at 11:07 AM, Roger Bivand wrote: > On Wed, 26 Jul 2017, Rafael Pereira wrote: > > Roger, >> >> This example was provided only for the sake or making the code easily >> reproducible for others and I'm more interested in how the bi-variate >> Moran >> could be implemented in R, but your comments are very much welcomed and >> I've made changes to the question. >> >> My actual case study looks at bi-variate spatial correlation between (a) >> average household income per capita and (b) proportion of jobs in the city >> that are accessible under 60 minutes by transit. I don't think I could use >> rates in this case but I will normalize the variables using >> scale(data$variable). >> > > Please provide a reproducible example, either with a link to a data > subset, or using a builtin data set. My guess is that you do not need > bi-variate spatial correlation at all, but rather a spatial regression. > > The "causal" variable would then the the proportion of jobs accessible > within 60 minutes by transit, though this is extremely blunt, and lots of > other covariates (demography, etc.) impact average household income per > capita (per block/tract?). Since there are many missing variables in your > specification, any spatial correlation would be most closely associated > with them (demography, housing costs, education, etc.), and the choice of > units of measurement would dominate the outcome. > > This is also why bi-variate spatial correlation is seldom a good idea, I > believe. It can be done, but most likely shouldn't, unless it can be > motivated properly. > > By the way, the weighted and FDR-corrected SAD local Moran's I p-values of > the black/white ratio for Oregon (your toy example) did deliver the goods - > if you zoom in in mapview::mapview, you can see that it detects a rate > hotspot between the rivers. > > Roger > > > >> best, >> >> Rafael H M Pereira >> >> On Mon, Jul 24, 2017 at 7:56 PM, Roger Bivand >> wrote: >> >> On Mon, 24 Jul 2017, Rafael Pereira wrote: >>> >>> Hi all, >>> >>>> >>>> I would like to ask whether some you conducted bi-variate spatial >>>> correlation in R. >>>> >>>> I know the bi-variate Moran's I is not implemented in the spdep library. >>>> I left a question on SO but also wanted to hear if anyone if the >>>> mainlist >>>> have come across this. >>>> https://stackoverflow.com/questions/45177590/map-of-bivariat >>>> e-spatial-correlation-in-r-bivariate-lisa >>>> >>>> I also know Roger Bivand has implemented the L index proposed by Lee >>>> (2001) >>>> in spdep, but I'm not I'm not sure whether the L local correlation >>>> coefficients can be interpreted the same way as the local Moran's I >>>> coefficients. I couldn't find any reference commenting on this issue. I >>>> would very much appreciate your thoughts this. >>>> >>>> >>> In the SO question, and in the follow-up, your presumably throw-away >>> example makes fundamental mistakes. The code in spdep by Virgilio >>> Gómez-Rubio is for uni- and bivariate L, and produces point values of >>> local >>> L. This isn't the main problem, which is rather that you are not taking >>> account of the underlying population counts, nor shrinking any estimates >>> of >>> significance to accommodate population sizes. Population sizes vary from >>> 0 >>> to 11858, with the lower quartile at 3164 and upper 5698: >>> plot(ecdf(oregon.tract$pop2000)). Should you be comparing rates in >>> stead? >>> These are also compositional variables (sum to pop2000, or 1 if rates) >>> with >>> the other missing components. You would probably be better served by >>> tools >>> examining spatial segregation, such as for example the seg package. >>> >>> The 0 count populations cause problems for an unofficial alternative, the >>> black/white ratio: >>> >>> oregon.tract1 0,] >>> oregon.tract1$rat >> nb >> lw >> >>> which should still be adjusted by weighting: >>> >>> lm0 >> >>> I'm not advising this, but running localmoran.sad on this model output >>> yields SAD p-values < 0.05 after FDR correction only in contiguous tracts >>> on the Washington state line in Portland between the Columbia and >>> Williamette rivers. So do look at the variables you are using before >>> rushing into things. >>> >>> Hope this clarifies, >>> >>> Roger >>> >>> >>> best, >>>> >>>> Rafael HM Pereira >>>> http://urbandemographics.blogspot.com >>>> >>>> [[alternative HTML version deleted]] >>>> >>>> _______________________________________________ >>>> R-sig-Geo mailing list >>>> [hidden email] >>>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo >>>> >>>> >>>> -- >>> Roger Bivand >>> Department of Economics, Norwegian School of Economics, >>> Helleveien 30, N-5045 Bergen, Norway. >>> voice: +47 55 95 93 55; e-mail: [hidden email] >>> Editor-in-Chief of The R Journal, https://journal.r-project.org/ >>> index.html >>> http://orcid.org/0000-0003-2392-6140 >>> https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en >>> >> >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> R-sig-Geo mailing list >> [hidden email] >> https://stat.ethz.ch/mailman/listinfo/r-sig-geo >> > > -- > Roger Bivand > Department of Economics, Norwegian School of Economics, > Helleveien 30, N-5045 Bergen, Norway. > voice: +47 55 95 93 55; e-mail: [hidden email] > Editor-in-Chief of The R Journal, https://journal.r-project.org/index.html > http://orcid.org/0000-0003-2392-6140 > https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en > [[alternative HTML version deleted]] _______________________________________________ R-sig-Geo mailing list [hidden email] https://stat.ethz.ch/mailman/listinfo/r-sig-geo

[[alternative HTML version deleted]]

_______________________________________________

R-sig-Geo mailing list

[hidden email]

https://stat.ethz.ch/mailman/listinfo/r-sig-geo Roger Bivand

Department of Economics

Norwegian School of Economics

Helleveien 30

N-5045 Bergen, Norway

Roger

Roger Bivand

Norwegian School of Economics

Bergen, Norway

Fra: Rafael Pereira

Sendt: søndag 30. juli, 00.39

Emne: Re: [R-sig-Geo] bivariate spatial correlation in R

Kopi: Rogério Barbosa, [hidden email]

Hi all, here is a reproducible example to calculate in R bivariate Moran's I and LISA clusters. This example is based on a this answer provided in SO* and it uses a toy model of my data. The R script and the shape file with the data are available on this link. https://gist.github.com/rafapereirabr/5348193abf779625f5e8c5090776a228 What this example does is to estimate the spatial association between household income per capita and the gains in accessibility to jobs. The aim is to analyze who benefits the recent changes in the transport system in terms of access to jobs. So the idea is not to find causal relationships, but spatial association between areas of high/low income who had high/low gains in accessibility. The variables in the data show info on the proportion of jobs accessible in both years 2014 and 2017 (access2014, access2017) and the difference between the two years in percentage points (diffaccess). Roger, I know you have shown to be a bit sceptical about this application of bivariate Moran's I. Do you still think a spatial regression would be more appropriate? Also, I would be glad to hear if others have comments on the code. This function is not implemented in any package so it would be great to have some feedback. Rafael H M Pereira urbandemographics.blogspot.com * https://stackoverflow.com/questions/45177590/map-of-bivariate-spatial-correlation-in-r-bivariate-lisa On Wed, Jul 26, 2017 at 11:07 AM, Roger Bivand wrote: > On Wed, 26 Jul 2017, Rafael Pereira wrote: > > Roger, >> >> This example was provided only for the sake or making the code easily >> reproducible for others and I'm more interested in how the bi-variate >> Moran >> could be implemented in R, but your comments are very much welcomed and >> I've made changes to the question. >> >> My actual case study looks at bi-variate spatial correlation between (a) >> average household income per capita and (b) proportion of jobs in the city >> that are accessible under 60 minutes by transit. I don't think I could use >> rates in this case but I will normalize the variables using >> scale(data$variable). >> > > Please provide a reproducible example, either with a link to a data > subset, or using a builtin data set. My guess is that you do not need > bi-variate spatial correlation at all, but rather a spatial regression. > > The "causal" variable would then the the proportion of jobs accessible > within 60 minutes by transit, though this is extremely blunt, and lots of > other covariates (demography, etc.) impact average household income per > capita (per block/tract?). Since there are many missing variables in your > specification, any spatial correlation would be most closely associated > with them (demography, housing costs, education, etc.), and the choice of > units of measurement would dominate the outcome. > > This is also why bi-variate spatial correlation is seldom a good idea, I > believe. It can be done, but most likely shouldn't, unless it can be > motivated properly. > > By the way, the weighted and FDR-corrected SAD local Moran's I p-values of > the black/white ratio for Oregon (your toy example) did deliver the goods - > if you zoom in in mapview::mapview, you can see that it detects a rate > hotspot between the rivers. > > Roger > > > >> best, >> >> Rafael H M Pereira >> >> On Mon, Jul 24, 2017 at 7:56 PM, Roger Bivand >> wrote: >> >> On Mon, 24 Jul 2017, Rafael Pereira wrote: >>> >>> Hi all, >>> >>>> >>>> I would like to ask whether some you conducted bi-variate spatial >>>> correlation in R. >>>> >>>> I know the bi-variate Moran's I is not implemented in the spdep library. >>>> I left a question on SO but also wanted to hear if anyone if the >>>> mainlist >>>> have come across this. >>>> https://stackoverflow.com/questions/45177590/map-of-bivariat >>>> e-spatial-correlation-in-r-bivariate-lisa >>>> >>>> I also know Roger Bivand has implemented the L index proposed by Lee >>>> (2001) >>>> in spdep, but I'm not I'm not sure whether the L local correlation >>>> coefficients can be interpreted the same way as the local Moran's I >>>> coefficients. I couldn't find any reference commenting on this issue. I >>>> would very much appreciate your thoughts this. >>>> >>>> >>> In the SO question, and in the follow-up, your presumably throw-away >>> example makes fundamental mistakes. The code in spdep by Virgilio >>> Gómez-Rubio is for uni- and bivariate L, and produces point values of >>> local >>> L. This isn't the main problem, which is rather that you are not taking >>> account of the underlying population counts, nor shrinking any estimates >>> of >>> significance to accommodate population sizes. Population sizes vary from >>> 0 >>> to 11858, with the lower quartile at 3164 and upper 5698: >>> plot(ecdf(oregon.tract$pop2000)). Should you be comparing rates in >>> stead? >>> These are also compositional variables (sum to pop2000, or 1 if rates) >>> with >>> the other missing components. You would probably be better served by >>> tools >>> examining spatial segregation, such as for example the seg package. >>> >>> The 0 count populations cause problems for an unofficial alternative, the >>> black/white ratio: >>> >>> oregon.tract1 0,] >>> oregon.tract1$rat >> nb >> lw >> >>> which should still be adjusted by weighting: >>> >>> lm0 >> >>> I'm not advising this, but running localmoran.sad on this model output >>> yields SAD p-values < 0.05 after FDR correction only in contiguous tracts >>> on the Washington state line in Portland between the Columbia and >>> Williamette rivers. So do look at the variables you are using before >>> rushing into things. >>> >>> Hope this clarifies, >>> >>> Roger >>> >>> >>> best, >>>> >>>> Rafael HM Pereira >>>> http://urbandemographics.blogspot.com >>>> >>>> [[alternative HTML version deleted]] >>>> >>>> _______________________________________________ >>>> R-sig-Geo mailing list >>>> [hidden email] >>>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo >>>> >>>> >>>> -- >>> Roger Bivand >>> Department of Economics, Norwegian School of Economics, >>> Helleveien 30, N-5045 Bergen, Norway. >>> voice: +47 55 95 93 55; e-mail: [hidden email] >>> Editor-in-Chief of The R Journal, https://journal.r-project.org/ >>> index.html >>> http://orcid.org/0000-0003-2392-6140 >>> https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en >>> >> >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> R-sig-Geo mailing list >> [hidden email] >> https://stat.ethz.ch/mailman/listinfo/r-sig-geo >> > > -- > Roger Bivand > Department of Economics, Norwegian School of Economics, > Helleveien 30, N-5045 Bergen, Norway. > voice: +47 55 95 93 55; e-mail: [hidden email] > Editor-in-Chief of The R Journal, https://journal.r-project.org/index.html > http://orcid.org/0000-0003-2392-6140 > https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en > [[alternative HTML version deleted]] _______________________________________________ R-sig-Geo mailing list [hidden email] https://stat.ethz.ch/mailman/listinfo/r-sig-geo

[[alternative HTML version deleted]]

_______________________________________________

R-sig-Geo mailing list

[hidden email]

https://stat.ethz.ch/mailman/listinfo/r-sig-geo Roger Bivand

Department of Economics

Norwegian School of Economics

Helleveien 30

N-5045 Bergen, Norway

### Re: bivariate spatial correlation in R

Hi all,

here is a reproducible example to calculate in R bivariate Moran's I and

LISA clusters. This example is based on a this answer provided in SO* and

it uses a toy model of my data. The R script and the shape file with the

data are available on this link.

https://gist.github.com/rafapereirabr/5348193abf779625f5e8c5090776a228

What this example does is to estimate the spatial association between

household income per capita and the gains in accessibility to jobs. The aim

is to analyze who benefits the recent changes in the transport system in

terms of access to jobs. So the idea is not to find causal relationships,

but spatial association between areas of high/low income who had high/low

gains in accessibility.

The variables in the data show info on the proportion of jobs accessible in

both years 2014 and 2017 (access2014, access2017) and the difference

between the two years in percentage points (diffaccess).

Roger, I know you have shown to be a bit sceptical about this application

of bivariate Moran's I. Do you still think a spatial regression would be

more appropriate?

Also, I would be glad to hear if others have comments on the code. This

function is not implemented in any package so it would be great to have

some feedback.

Rafael H M Pereira

urbandemographics.blogspot.com

*

https://stackoverflow.com/questions/45177590/map-of-bivariate-spatial-correlation-in-r-bivariate-lisa

On Wed, Jul 26, 2017 at 11:07 AM, Roger Bivand <[hidden email]> wrote:

> On Wed, 26 Jul 2017, Rafael Pereira wrote:

>

> Roger,

>>

>> This example was provided only for the sake or making the code easily

>> reproducible for others and I'm more interested in how the bi-variate

>> Moran

>> could be implemented in R, but your comments are very much welcomed and

>> I've made changes to the question.

>>

>> My actual case study looks at bi-variate spatial correlation between (a)

>> average household income per capita and (b) proportion of jobs in the city

>> that are accessible under 60 minutes by transit. I don't think I could use

>> rates in this case but I will normalize the variables using

>> scale(data$variable).

>>

>

> Please provide a reproducible example, either with a link to a data

> subset, or using a builtin data set. My guess is that you do not need

> bi-variate spatial correlation at all, but rather a spatial regression.

>

> The "causal" variable would then the the proportion of jobs accessible

> within 60 minutes by transit, though this is extremely blunt, and lots of

> other covariates (demography, etc.) impact average household income per

> capita (per block/tract?). Since there are many missing variables in your

> specification, any spatial correlation would be most closely associated

> with them (demography, housing costs, education, etc.), and the choice of

> units of measurement would dominate the outcome.

>

> This is also why bi-variate spatial correlation is seldom a good idea, I

> believe. It can be done, but most likely shouldn't, unless it can be

> motivated properly.

>

> By the way, the weighted and FDR-corrected SAD local Moran's I p-values of

> the black/white ratio for Oregon (your toy example) did deliver the goods -

> if you zoom in in mapview::mapview, you can see that it detects a rate

> hotspot between the rivers.

>

> Roger

>

>

>

>> best,

>>

>> Rafael H M Pereira

>>

>> On Mon, Jul 24, 2017 at 7:56 PM, Roger Bivand <[hidden email]>

>> wrote:

>>

>> On Mon, 24 Jul 2017, Rafael Pereira wrote:

>>>

>>> Hi all,

>>>

>>>>

>>>> I would like to ask whether some you conducted bi-variate spatial

>>>> correlation in R.

>>>>

>>>> I know the bi-variate Moran's I is not implemented in the spdep library.

>>>> I left a question on SO but also wanted to hear if anyone if the

>>>> mainlist

>>>> have come across this.

>>>> https://stackoverflow.com/questions/45177590/map-of-bivariat

>>>> e-spatial-correlation-in-r-bivariate-lisa

>>>>

>>>> I also know Roger Bivand has implemented the L index proposed by Lee

>>>> (2001)

>>>> in spdep, but I'm not I'm not sure whether the L local correlation

>>>> coefficients can be interpreted the same way as the local Moran's I

>>>> coefficients. I couldn't find any reference commenting on this issue. I

>>>> would very much appreciate your thoughts this.

>>>>

>>>>

>>> In the SO question, and in the follow-up, your presumably throw-away

>>> example makes fundamental mistakes. The code in spdep by Virgilio

>>> Gómez-Rubio is for uni- and bivariate L, and produces point values of

>>> local

>>> L. This isn't the main problem, which is rather that you are not taking

>>> account of the underlying population counts, nor shrinking any estimates

>>> of

>>> significance to accommodate population sizes. Population sizes vary from

>>> 0

>>> to 11858, with the lower quartile at 3164 and upper 5698:

>>> plot(ecdf(oregon.tract$pop2000)). Should you be comparing rates in

>>> stead?

>>> These are also compositional variables (sum to pop2000, or 1 if rates)

>>> with

>>> the other missing components. You would probably be better served by

>>> tools

>>> examining spatial segregation, such as for example the seg package.

>>>

>>> The 0 count populations cause problems for an unofficial alternative, the

>>> black/white ratio:

>>>

>>> oregon.tract1 <- oregon.tract[oregon.tract$white > 0,]

>>> oregon.tract1$rat <- oregon.tract1$black/oregon.tract1$white

>>> nb <- poly2nb(oregon.tract1)

>>> lw <- nb2listw(nb)

>>>

>>> which should still be adjusted by weighting:

>>>

>>> lm0 <- lm(rat ~ 1, weights=pop2000, data=oregon.tract1)

>>>

>>> I'm not advising this, but running localmoran.sad on this model output

>>> yields SAD p-values < 0.05 after FDR correction only in contiguous tracts

>>> on the Washington state line in Portland between the Columbia and

>>> Williamette rivers. So do look at the variables you are using before

>>> rushing into things.

>>>

>>> Hope this clarifies,

>>>

>>> Roger

>>>

>>>

>>> best,

>>>>

>>>> Rafael HM Pereira

>>>> http://urbandemographics.blogspot.com

>>>>

>>>> [[alternative HTML version deleted]]

>>>>

>>>> _______________________________________________

>>>> R-sig-Geo mailing list

>>>> [hidden email]

>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo

>>>>

>>>>

>>>> --

>>> Roger Bivand

>>> Department of Economics, Norwegian School of Economics,

>>> Helleveien 30, N-5045 Bergen, Norway.

>>> voice: +47 55 95 93 55; e-mail: [hidden email]

>>> Editor-in-Chief of The R Journal, https://journal.r-project.org/

>>> index.html

>>> http://orcid.org/0000-0003-2392-6140

>>> https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en

>>>

>>

>> [[alternative HTML version deleted]]

>>

>> _______________________________________________

>> R-sig-Geo mailing list

>> [hidden email]

>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo

>>

>

> --

> Roger Bivand

> Department of Economics, Norwegian School of Economics,

> Helleveien 30, N-5045 Bergen, Norway.

> voice: +47 55 95 93 55; e-mail: [hidden email]

> Editor-in-Chief of The R Journal, https://journal.r-project.org/index.html

> http://orcid.org/0000-0003-2392-6140

> https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en

>

[[alternative HTML version deleted]]

_______________________________________________

R-sig-Geo mailing list

[hidden email]

https://stat.ethz.ch/mailman/listinfo/r-sig-geo

here is a reproducible example to calculate in R bivariate Moran's I and

LISA clusters. This example is based on a this answer provided in SO* and

it uses a toy model of my data. The R script and the shape file with the

data are available on this link.

https://gist.github.com/rafapereirabr/5348193abf779625f5e8c5090776a228

What this example does is to estimate the spatial association between

household income per capita and the gains in accessibility to jobs. The aim

is to analyze who benefits the recent changes in the transport system in

terms of access to jobs. So the idea is not to find causal relationships,

but spatial association between areas of high/low income who had high/low

gains in accessibility.

The variables in the data show info on the proportion of jobs accessible in

both years 2014 and 2017 (access2014, access2017) and the difference

between the two years in percentage points (diffaccess).

Roger, I know you have shown to be a bit sceptical about this application

of bivariate Moran's I. Do you still think a spatial regression would be

more appropriate?

Also, I would be glad to hear if others have comments on the code. This

function is not implemented in any package so it would be great to have

some feedback.

Rafael H M Pereira

urbandemographics.blogspot.com

*

https://stackoverflow.com/questions/45177590/map-of-bivariate-spatial-correlation-in-r-bivariate-lisa

On Wed, Jul 26, 2017 at 11:07 AM, Roger Bivand <[hidden email]> wrote:

> On Wed, 26 Jul 2017, Rafael Pereira wrote:

>

> Roger,

>>

>> This example was provided only for the sake or making the code easily

>> reproducible for others and I'm more interested in how the bi-variate

>> Moran

>> could be implemented in R, but your comments are very much welcomed and

>> I've made changes to the question.

>>

>> My actual case study looks at bi-variate spatial correlation between (a)

>> average household income per capita and (b) proportion of jobs in the city

>> that are accessible under 60 minutes by transit. I don't think I could use

>> rates in this case but I will normalize the variables using

>> scale(data$variable).

>>

>

> Please provide a reproducible example, either with a link to a data

> subset, or using a builtin data set. My guess is that you do not need

> bi-variate spatial correlation at all, but rather a spatial regression.

>

> The "causal" variable would then the the proportion of jobs accessible

> within 60 minutes by transit, though this is extremely blunt, and lots of

> other covariates (demography, etc.) impact average household income per

> capita (per block/tract?). Since there are many missing variables in your

> specification, any spatial correlation would be most closely associated

> with them (demography, housing costs, education, etc.), and the choice of

> units of measurement would dominate the outcome.

>

> This is also why bi-variate spatial correlation is seldom a good idea, I

> believe. It can be done, but most likely shouldn't, unless it can be

> motivated properly.

>

> By the way, the weighted and FDR-corrected SAD local Moran's I p-values of

> the black/white ratio for Oregon (your toy example) did deliver the goods -

> if you zoom in in mapview::mapview, you can see that it detects a rate

> hotspot between the rivers.

>

> Roger

>

>

>

>> best,

>>

>> Rafael H M Pereira

>>

>> On Mon, Jul 24, 2017 at 7:56 PM, Roger Bivand <[hidden email]>

>> wrote:

>>

>> On Mon, 24 Jul 2017, Rafael Pereira wrote:

>>>

>>> Hi all,

>>>

>>>>

>>>> I would like to ask whether some you conducted bi-variate spatial

>>>> correlation in R.

>>>>

>>>> I know the bi-variate Moran's I is not implemented in the spdep library.

>>>> I left a question on SO but also wanted to hear if anyone if the

>>>> mainlist

>>>> have come across this.

>>>> https://stackoverflow.com/questions/45177590/map-of-bivariat

>>>> e-spatial-correlation-in-r-bivariate-lisa

>>>>

>>>> I also know Roger Bivand has implemented the L index proposed by Lee

>>>> (2001)

>>>> in spdep, but I'm not I'm not sure whether the L local correlation

>>>> coefficients can be interpreted the same way as the local Moran's I

>>>> coefficients. I couldn't find any reference commenting on this issue. I

>>>> would very much appreciate your thoughts this.

>>>>

>>>>

>>> In the SO question, and in the follow-up, your presumably throw-away

>>> example makes fundamental mistakes. The code in spdep by Virgilio

>>> Gómez-Rubio is for uni- and bivariate L, and produces point values of

>>> local

>>> L. This isn't the main problem, which is rather that you are not taking

>>> account of the underlying population counts, nor shrinking any estimates

>>> of

>>> significance to accommodate population sizes. Population sizes vary from

>>> 0

>>> to 11858, with the lower quartile at 3164 and upper 5698:

>>> plot(ecdf(oregon.tract$pop2000)). Should you be comparing rates in

>>> stead?

>>> These are also compositional variables (sum to pop2000, or 1 if rates)

>>> with

>>> the other missing components. You would probably be better served by

>>> tools

>>> examining spatial segregation, such as for example the seg package.

>>>

>>> The 0 count populations cause problems for an unofficial alternative, the

>>> black/white ratio:

>>>

>>> oregon.tract1 <- oregon.tract[oregon.tract$white > 0,]

>>> oregon.tract1$rat <- oregon.tract1$black/oregon.tract1$white

>>> nb <- poly2nb(oregon.tract1)

>>> lw <- nb2listw(nb)

>>>

>>> which should still be adjusted by weighting:

>>>

>>> lm0 <- lm(rat ~ 1, weights=pop2000, data=oregon.tract1)

>>>

>>> I'm not advising this, but running localmoran.sad on this model output

>>> yields SAD p-values < 0.05 after FDR correction only in contiguous tracts

>>> on the Washington state line in Portland between the Columbia and

>>> Williamette rivers. So do look at the variables you are using before

>>> rushing into things.

>>>

>>> Hope this clarifies,

>>>

>>> Roger

>>>

>>>

>>> best,

>>>>

>>>> Rafael HM Pereira

>>>> http://urbandemographics.blogspot.com

>>>>

>>>> [[alternative HTML version deleted]]

>>>>

>>>> _______________________________________________

>>>> R-sig-Geo mailing list

>>>> [hidden email]

>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo

>>>>

>>>>

>>>> --

>>> Roger Bivand

>>> Department of Economics, Norwegian School of Economics,

>>> Helleveien 30, N-5045 Bergen, Norway.

>>> voice: +47 55 95 93 55; e-mail: [hidden email]

>>> Editor-in-Chief of The R Journal, https://journal.r-project.org/

>>> index.html

>>> http://orcid.org/0000-0003-2392-6140

>>> https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en

>>>

>>

>> [[alternative HTML version deleted]]

>>

>> _______________________________________________

>> R-sig-Geo mailing list

>> [hidden email]

>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo

>>

>

> --

> Roger Bivand

> Department of Economics, Norwegian School of Economics,

> Helleveien 30, N-5045 Bergen, Norway.

> voice: +47 55 95 93 55; e-mail: [hidden email]

> Editor-in-Chief of The R Journal, https://journal.r-project.org/index.html

> http://orcid.org/0000-0003-2392-6140

> https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en

>

[[alternative HTML version deleted]]

_______________________________________________

R-sig-Geo mailing list

[hidden email]

https://stat.ethz.ch/mailman/listinfo/r-sig-geo

### Re: Error in basename(x) : path too long

Hi again,

A late thought - I'm still on the first cups of coffee.

It looks to me like you are iterating over a stack to select certain layers to sum. You could achieve the same outcome with possibly much less work. The following example will create a sum of 24-layer blocks along a stack of rasters.

# from https://gist.github.com/btupper/20879e0b46e5ed63d402d7cff424dbb7

#' Split a vector into groups of MAX (or possibly fewer)

#'

#' @param v vector or list to split

#' @param MAX numeric the maximum size per group

#' @return a list of the vector split into groups

split_vector <- function(v, MAX = 200){

nv <- length(v)

if (nv <= MAX) return(list('1' = v))

split(v, findInterval(1:nv, seq(from = 1, to = nv, by = MAX)))

}

library(raster)

N <- 345

n <- 24

nc <- 4

nr <- 3

R <- raster(matrix(runif(nc*nr), ncol = nc, nrow = nr))

RR <- stack(lapply(seq_len(N), function(i) R))

ix <- split_vector(seq_len(N), MAX = n)

SS <- lapply(ix, function(index) sum(RR[[index]]))

So, S[[1]], which looks like this...

$`1`

class : RasterLayer

dimensions : 3, 4, 12 (nrow, ncol, ncell)

resolution : 0.25, 0.3333333 (x, y)

extent : 0, 1, 0, 1 (xmin, xmax, ymin, ymax)

coord. ref. : NA

data source : in memory

names : layer

values : 0.9451534, 20.0503 (min, max)

... is the sum of the first 24 layers of RR. SS[[2]] will be the sum of the next 24, and so on.

Is that what you are trying to do?

Cheers,

Ben

> On Jul 29, 2017, at 7:56 AM, John Wasige <[hidden email]> wrote:

>

> Dear all,

>

> I am running the script below & I get the following error:

> Error in basename(x) : path too long

>

> What could be the problem?

> Thanks for your help

> John

>

> ### Script

>

> setwd("I:/Mauritius2001_2015") # directory of data

> newlist= read.csv('I:/Mauritius2001_2015/Mauritius.csv',header=F)

>

> refr <- raster(paste("I:/Mauritius2001_2015/",newlist[i,1],sep = ""))

> refr[!is.na(refr)] <- 0

> for(i in seq(1,345,by=23)){

> rsum <- refr

> for(p in 0:22){

> r <- raster(paste("I:/Mauritius2001_2015/",newlist[i+p],sep = ""))

> rsum <- rsum + r

> }

> # rsum <- rsum

> writeRaster(rsum,

> filename=paste("D:/Mauritius2001_2015/Annual/",substr(newlist[i],1,6),".tif",sep=''),

> format="GTiff", overwrite=TRUE)

> }

>

> [[alternative HTML version deleted]]

>

> _______________________________________________

> R-sig-Geo mailing list

> [hidden email]

> https://stat.ethz.ch/mailman/listinfo/r-sig-geo

Ben Tupper

Bigelow Laboratory for Ocean Sciences

60 Bigelow Drive, P.O. Box 380

East Boothbay, Maine 04544

http://www.bigelow.org

Ecocast Reports: http://seascapemodeling.org/ecocast.html

Tick Reports: https://report.bigelow.org/tick/

Jellyfish Reports: https://jellyfish.bigelow.org/jellyfish/

_______________________________________________

R-sig-Geo mailing list

[hidden email]

https://stat.ethz.ch/mailman/listinfo/r-sig-geo

A late thought - I'm still on the first cups of coffee.

It looks to me like you are iterating over a stack to select certain layers to sum. You could achieve the same outcome with possibly much less work. The following example will create a sum of 24-layer blocks along a stack of rasters.

# from https://gist.github.com/btupper/20879e0b46e5ed63d402d7cff424dbb7

#' Split a vector into groups of MAX (or possibly fewer)

#'

#' @param v vector or list to split

#' @param MAX numeric the maximum size per group

#' @return a list of the vector split into groups

split_vector <- function(v, MAX = 200){

nv <- length(v)

if (nv <= MAX) return(list('1' = v))

split(v, findInterval(1:nv, seq(from = 1, to = nv, by = MAX)))

}

library(raster)

N <- 345

n <- 24

nc <- 4

nr <- 3

R <- raster(matrix(runif(nc*nr), ncol = nc, nrow = nr))

RR <- stack(lapply(seq_len(N), function(i) R))

ix <- split_vector(seq_len(N), MAX = n)

SS <- lapply(ix, function(index) sum(RR[[index]]))

So, S[[1]], which looks like this...

$`1`

class : RasterLayer

dimensions : 3, 4, 12 (nrow, ncol, ncell)

resolution : 0.25, 0.3333333 (x, y)

extent : 0, 1, 0, 1 (xmin, xmax, ymin, ymax)

coord. ref. : NA

data source : in memory

names : layer

values : 0.9451534, 20.0503 (min, max)

... is the sum of the first 24 layers of RR. SS[[2]] will be the sum of the next 24, and so on.

Is that what you are trying to do?

Cheers,

Ben

> On Jul 29, 2017, at 7:56 AM, John Wasige <[hidden email]> wrote:

>

> Dear all,

>

> I am running the script below & I get the following error:

> Error in basename(x) : path too long

>

> What could be the problem?

> Thanks for your help

> John

>

> ### Script

>

> setwd("I:/Mauritius2001_2015") # directory of data

> newlist= read.csv('I:/Mauritius2001_2015/Mauritius.csv',header=F)

>

> refr <- raster(paste("I:/Mauritius2001_2015/",newlist[i,1],sep = ""))

> refr[!is.na(refr)] <- 0

> for(i in seq(1,345,by=23)){

> rsum <- refr

> for(p in 0:22){

> r <- raster(paste("I:/Mauritius2001_2015/",newlist[i+p],sep = ""))

> rsum <- rsum + r

> }

> # rsum <- rsum

> writeRaster(rsum,

> filename=paste("D:/Mauritius2001_2015/Annual/",substr(newlist[i],1,6),".tif",sep=''),

> format="GTiff", overwrite=TRUE)

> }

>

> [[alternative HTML version deleted]]

>

> _______________________________________________

> R-sig-Geo mailing list

> [hidden email]

> https://stat.ethz.ch/mailman/listinfo/r-sig-geo

Ben Tupper

Bigelow Laboratory for Ocean Sciences

60 Bigelow Drive, P.O. Box 380

East Boothbay, Maine 04544

http://www.bigelow.org

Ecocast Reports: http://seascapemodeling.org/ecocast.html

Tick Reports: https://report.bigelow.org/tick/

Jellyfish Reports: https://jellyfish.bigelow.org/jellyfish/

_______________________________________________

R-sig-Geo mailing list

[hidden email]

https://stat.ethz.ch/mailman/listinfo/r-sig-geo

### Re: Error in basename(x) : path too long

Hi,

It's not possible to know but I have a suspicion and a suggestion.

suspicion: You are accessing the elements of the data frame "newlist" different ways ...

> refr <- raster(paste("I:/Mauritius2001_2015/",newlist[i,1],sep = ""))

and

> r <- raster(paste("I:/Mauritius2001_2015/",newlist[i+p],sep = ""))

and

> filename=paste("D:/Mauritius2001_2015/Annual/",substr(newlist[i],1,6),".tif",sep='')

The first likely works. The latter two return the (i+p)'th and i'th columns each as a new data frame so the filenames may indeed be quite long. So, it looks like you need to work on your data frame indexing.

suggestion: Instead of using paste() to make file paths I suggest that you give file.path() a try. It works across all platforms and is easier than using paste.

CHeers,

Ben

> On Jul 29, 2017, at 7:56 AM, John Wasige <[hidden email]> wrote:

>

> Dear all,

>

> I am running the script below & I get the following error:

> Error in basename(x) : path too long

>

> What could be the problem?

> Thanks for your help

> John

>

> ### Script

>

> setwd("I:/Mauritius2001_2015") # directory of data

> newlist= read.csv('I:/Mauritius2001_2015/Mauritius.csv',header=F)

>

> refr <- raster(paste("I:/Mauritius2001_2015/",newlist[i,1],sep = ""))

> refr[!is.na(refr)] <- 0

> for(i in seq(1,345,by=23)){

> rsum <- refr

> for(p in 0:22){

> r <- raster(paste("I:/Mauritius2001_2015/",newlist[i+p],sep = ""))

> rsum <- rsum + r

> }

> # rsum <- rsum

> writeRaster(rsum,

> filename=paste("D:/Mauritius2001_2015/Annual/",substr(newlist[i],1,6),".tif",sep=''),

> format="GTiff", overwrite=TRUE)

> }

>

> [[alternative HTML version deleted]]

>

> _______________________________________________

> R-sig-Geo mailing list

> [hidden email]

> https://stat.ethz.ch/mailman/listinfo/r-sig-geo

Ben Tupper

Bigelow Laboratory for Ocean Sciences

60 Bigelow Drive, P.O. Box 380

East Boothbay, Maine 04544

http://www.bigelow.org

Ecocast Reports: http://seascapemodeling.org/ecocast.html

Tick Reports: https://report.bigelow.org/tick/

Jellyfish Reports: https://jellyfish.bigelow.org/jellyfish/

_______________________________________________

R-sig-Geo mailing list

[hidden email]

https://stat.ethz.ch/mailman/listinfo/r-sig-geo

It's not possible to know but I have a suspicion and a suggestion.

suspicion: You are accessing the elements of the data frame "newlist" different ways ...

> refr <- raster(paste("I:/Mauritius2001_2015/",newlist[i,1],sep = ""))

and

> r <- raster(paste("I:/Mauritius2001_2015/",newlist[i+p],sep = ""))

and

> filename=paste("D:/Mauritius2001_2015/Annual/",substr(newlist[i],1,6),".tif",sep='')

The first likely works. The latter two return the (i+p)'th and i'th columns each as a new data frame so the filenames may indeed be quite long. So, it looks like you need to work on your data frame indexing.

suggestion: Instead of using paste() to make file paths I suggest that you give file.path() a try. It works across all platforms and is easier than using paste.

CHeers,

Ben

> On Jul 29, 2017, at 7:56 AM, John Wasige <[hidden email]> wrote:

>

> Dear all,

>

> I am running the script below & I get the following error:

> Error in basename(x) : path too long

>

> What could be the problem?

> Thanks for your help

> John

>

> ### Script

>

> setwd("I:/Mauritius2001_2015") # directory of data

> newlist= read.csv('I:/Mauritius2001_2015/Mauritius.csv',header=F)

>

> refr <- raster(paste("I:/Mauritius2001_2015/",newlist[i,1],sep = ""))

> refr[!is.na(refr)] <- 0

> for(i in seq(1,345,by=23)){

> rsum <- refr

> for(p in 0:22){

> r <- raster(paste("I:/Mauritius2001_2015/",newlist[i+p],sep = ""))

> rsum <- rsum + r

> }

> # rsum <- rsum

> writeRaster(rsum,

> filename=paste("D:/Mauritius2001_2015/Annual/",substr(newlist[i],1,6),".tif",sep=''),

> format="GTiff", overwrite=TRUE)

> }

>

> [[alternative HTML version deleted]]

>

> _______________________________________________

> R-sig-Geo mailing list

> [hidden email]

> https://stat.ethz.ch/mailman/listinfo/r-sig-geo

Ben Tupper

Bigelow Laboratory for Ocean Sciences

60 Bigelow Drive, P.O. Box 380

East Boothbay, Maine 04544

http://www.bigelow.org

Ecocast Reports: http://seascapemodeling.org/ecocast.html

Tick Reports: https://report.bigelow.org/tick/

Jellyfish Reports: https://jellyfish.bigelow.org/jellyfish/

_______________________________________________

R-sig-Geo mailing list

[hidden email]

https://stat.ethz.ch/mailman/listinfo/r-sig-geo