Subscribe to R-sig-geo feed
This is an archive for R-sig-geo, not a forum. It is not possible to post through Nabble - you may not start a new thread nor follow up an existing thread. If you wish to post, but are not subscribed to the list through the list homepage, subscribe first (through the list homepage, not via Nabble) and post from your subscribed email address. Until 2015-06-20, subscribers could post through Nabble, but policy has been changed as too many non-subscribers misunderstood the interface.
Updated: 3 min 42 sec ago

Re: Spatial Autocorrelation Estimation Method

Sat, 11/16/2019 - 17:31
Dear Roger,

Thank you for your message and sorry for my late answer.

Regarding the number of listings (lettings) for my data set (2.216.642 observations), each listing contains an individual id:

unique ids: 180.004
time periods: 54 (2015-01 to 2019-09)
number of ids that appear only once: 28.486 (of 180.004 ids) (15,8%)
number of ids that appear/repeat 2-10 times: 82.641 (of 180.004 ids) (45,9%)
number of ids that appear/repeat 11-30 times: 46.465 (of 180.004 ids) (25,8%)
number of ids that appear/repeat 31-54 times: 22.412 (of 180.004 ids) (12,5%)

Important to notice is that hosts can change the room_category (between entire/home apt, private room and shared room) keeping the same listing id number. In my data, the number of unique ids that in some point changed the room_type is of 7.204 ids.

--

For the OLS model, I was using only a fixed effect model, where each time period (date_compiled) (54 in total) is a time dummy.

plm::plm(formula = model, data = listings, model = "pooling", index = c("id", "date_compiled"))


--
Osland et al. (2016) (https://doi.org/10.1111/jors.12281) use a spatial fixed effects (SFE) hedonic model, where each defined neighborhood zone in the study area is represented by dummy variables.

Dong et al. (2015) (https://doi.org/10.1111/gean.12049) outline four model specifications to accommodate geographically hierarchical data structures: (1) groupwise W and fixed regional effects; (2) groupwise W and random regional effects; (3) proximity-based W and fixed regional effects; and (4) proximity-based W and random regional effects.
--

I created a new column/variable containing the borough where the zipcode is found (Manhattan, Brooklyn, Queens, Bronx, Staten Island).

If I understood it right, the (two-level) Hierarchical Spatial Simultaneous Autoregressive Model (HSAR) considers the occurrence of spatial relations at the (lower) individual (geographical coordinates - in my case, the listing location) and (higher) group level (territorial units - in my case, zipcodes).

According to Bivand et al. (2017): "(...) W is a spatial weights matrix. The HSAR model may also be estimated without this component.". So, in this case I only estimate the Hierarchical Spatial Simultaneous Autoregressive Model (HSAR) in a "one-level" basis, i.e., at the higher-level.

HSAR::hsar(model, data = listings, W = NULL, M = M, Delta = Delta, burnin = 5000, Nsim = 10000, thinning = 1, parameters.start = pars)

(Where the "model" formula contains the 54 time dummy variables)

Do you think I can proceed with this model? I was able to calculate it.

If I remove all observations/rows with NAs in one of the chosen variables/observations, 884.183 observations remain. If I would create a W matrix for HSAR::hsar, I would have a gigantic 884.183 by 884.183 matrix. This is the reason why I put W = NULL.


Thank you and best regards

________________________________________
From: Roger Bivand <[hidden email]>
Sent: Monday, November 11, 2019 11:31
To: Robert R
Cc: [hidden email]
Subject: Re: [R-sig-Geo] Spatial Autocorrelation Estimation Method

On Sun, 10 Nov 2019, Robert R wrote:

> Dear Roger,
>
> Again, thank you for your answer. I read the material provided and
> decided that Hierarchical Spatial Autoregressive (HSAR) could be the
> right model for me.
>
> I indeed have the precise latitude and longitude information for all my
> listings for NYC.
>
> I created a stratified sample (group = zipcode) with 22172 (1%) of my
> observations called listings_sample and tried to replicate the hsar
> model, please see below.
>
> For now W = NULL, because otherwise I would have a 22172 x 22172 matrix.
Unless you know definitely that you want to relate the response to its
lagged value, you do not need this. Do note that the matrix is very
sparse, so could be fitted without difficulty with ML in a cross-sectional
model.

>
> You recommended then to introduce a Markov random field (MRF) random
> effect (RE) at the zipcode level, but I did not understand it so well.
> Could you develop a litte more?
>

Did you read the development in
https://doi.org/10.1016/j.spasta.2017.01.002? It is explained there, and
includes code for fitting the Beijing housing parcels data se from HSAR
with many other packages (MCMC, INLA, hglm, etc.). I guess that you should
try to create a model that works on a single borough, sing the zipcodes
in that borough as a proxy for unobserved neighbourhood effects. Try for
example using lme4::lmer() with only a zipcode IID random effect, see if
the hedonic estimates are similar to lm(), and leave adding an MRF RE
(with for example mgcv::gam() or hglm::hglm()) until you have a working
testbed. Then advance step-by-step from there.

You still have not said how many repeat lettings you see - it will affect
the way you specify your model.

Roger

> ##############
> library(spdep)
> library(HSAR)
> library(dplyr)
> library(splitstackshape)
>
>
> # Stratified sample per zipcode (size = 1%) listings_sample <-
> splitstackshape::stratified(indt = listings, group = "zipcode", size =
> 0.01)
>
> # Removing zipcodes from polygon_nyc which are not observable in
> listings_sample polygon_nyc_listings <- polygon_nyc %>% filter(zipcode
> %in% c(unique(as.character(listings_sample$zipcode))))
>
>
> ## Random effect matrix (N by J)
>
> # N: 22172
> # J: 154
>
> # Arrange listings_sample by zipcode (ascending)
> listings_sample <- listings_sample %>% arrange(zipcode)
>
> # Count number of listings per zipcode
> MM <- listings_sample %>% st_drop_geometry() %>% group_by(zipcode) %>% summarise(count = n()) %>% as.data.frame()
> # sum(MM$count)
>
> # N by J nulled matrix creation
> Delta <- matrix(data = 0, nrow = nrow(listings_sample), ncol = dim(MM)[1])
>
> # The total number of neighbourhood
> Uid <- rep(c(1:dim(MM)[1]), MM[,2])
>
> for(i in 1:dim(MM)[1]) {
>  Delta[Uid==i,i] <- 1
> }
> rm(i)
>
> Delta <- as(Delta,"dgCMatrix")
>
>
> ## Higher-level spatial weights matrix or neighbourhood matrix (J by J)
>
> # Neighboring polygons: list of neighbors for each polygon (queen contiguity neighbors)
> polygon_nyc_nb <- poly2nb(polygon_nyc_listings, row.names = polygon_nyc$zipcode, queen = TRUE)
>
> # Include neighbour itself as a neighbour
> polygon_nyc_nb <- include.self(polygon_nyc_nb)
>
> # Spatial weights matrix for nb
> polygon_nyc_nb_matrix <- nb2mat(neighbours = polygon_nyc_nb, style = "W", zero.policy = NULL)
> M <- as(polygon_nyc_nb_matrix,"dgCMatrix")
>
>
> ## Fit HSAR SAR upper level random effect
> model <- as.formula(log_price ~ guests_included + minimum_nights)
>
> betas = coef(lm(formula = model, data = listings_sample))
> pars = list(rho = 0.5, lambda = 0.5, sigma2e = 2.0, sigma2u = 2.0, betas = betas)
>
> m_hsar <- hsar(model, data = listings_sample, W = NULL, M = M, Delta = Delta, burnin = 5000, Nsim = 10000, thinning = 1, parameters.start = pars)
>
> ##############
>
> Thank you and best regards
> Robert
>
> ________________________________________
> From: Roger Bivand <[hidden email]>
> Sent: Friday, November 8, 2019 13:29
> To: Robert R
> Cc: [hidden email]
> Subject: Re: [R-sig-Geo] Spatial Autocorrelation Estimation Method
>
> On Fri, 8 Nov 2019, Robert R wrote:
>
>> Dear Roger,
>>
>> Thank you for your answer.
>>
>> I successfully used the function nb2blocknb() for a smaller dataset.
>>
>> But for a dataset of over 2 million observations, I get the following
>> error: "Error: cannot allocate vector of size 840 Kb".
>
> I don't think the observations are helpful. If you have repeat lets in the
> same property in a given month, you need to handle that anyway. I'd go for
> making the modelling exercise work (we agree that this is not panel data,
> right?) on a small subset first. I would further argue that you need a
> multi-level approach rather than spdep::nb2blocknb(), with a zipcode IID
> RE. You could very well take (stratified) samples per zipcode to represent
> your data. Once that works, introduce an MRF RE at the zipcode level,
> where you do know relative position. Using SARAR is going to be a waste of
> time unless you can geocode the letting addresses. A multi-level approach
> will work. Having big data in your case with no useful location
> information per observation is just adding noise and over-smoothing, I'm
> afraid. The approach used in https://doi.org/10.1016/j.spasta.2017.01.002
> will work, also when you sample the within zipcode lets, given a split
> into training and test sets, and making CV possible.
>
> Roger
>
>>
>> I am expecting that at least 500.000 observations will be dropped due
>> the lack of values for the chosen variables for the regression model, so
>> probably I will filter and remove the observations/rows that will not be
>> used anyway - do you know if there is any package that does this
>> automatically, given the variables/columns chosed by me?
>>
>> Or would you recommend me another approach to avoid the above mentioned
>> error?
>>
>> Thank you and best regards,
>> Robert
>>
>> ________________________________________
>> From: Roger Bivand <[hidden email]>
>> Sent: Thursday, November 7, 2019 10:13
>> To: Robert R
>> Cc: [hidden email]
>> Subject: Re: [R-sig-Geo] Spatial Autocorrelation Estimation Method
>>
>> On Thu, 7 Nov 2019, Robert R wrote:
>>
>>> Dear Roger,
>>>
>>> Many thanks for your help.
>>>
>>> I have an additional question:
>>>
>>> Is it possible to create a "separate" lw (nb2listw) (with different
>>> rownumbers) from my data set? For now, I am taking my data set and
>>> merging with the sf object polygon_nyc with the function
>>> "merge(polygon_nyc, listings, by=c("zipcode" = "zipcode"))", so I create
>>> a huge n x n matrix (depending of the size of my data set).
>>>
>>> Taking the polygon_nyc alone and turning it to a lw (weights list)
>>> object has only n = 177.
>>>
>>> Of course running
>>>
>>> spatialreg::lagsarlm(formula=model, data = listings_sample,
>>> spatialreg::polygon_nyc_lw, tol.solve=1.0e-10)
>>>
>>> does not work ("Input data and weights have different dimensions").
>>>
>>> The only option is to take my data set, merge it to my polygon_nyc (by
>>> zipcode) and then create the weights list lw? Or there another option?
>>
>> I think we are getting more clarity. You do not know the location of the
>> lettings beyond their zipcode. You do know the boundaries of the zipcode
>> areas, and can create a neighbour object from these boundaries. You then
>> want to treat all the lettings in a zipcode area i as neighbours, and
>> additionally lettings in zipcode areas neighbouring i as neighbours of
>> lettings in i. This is the data structure that motivated the
>> spdep::nb2blocknb() function:
>>
>> https://r-spatial.github.io/spdep/reference/nb2blocknb.html
>>
>> Try running the examples to get a feel for what is going on.
>>
>> I feel that most of the variability will vanish in the very large numbers
>> of neighbours, over-smoothing the outcomes. If you do not have locations
>> for the lettings themselves, I don't think you can make much progress.
>>
>> You could try a linear mixed model (or gam with a spatially structured
>> random effect) with a temporal and a spatial random effect. See the HSAR
>> package, articles by Dong et al., and maybe
>> https://doi.org/10.1016/j.spasta.2017.01.002 for another survey. Neither
>> this nor Dong et al. handle spatio-temporal settings. MRF spatial random
>> effects at the zipcode level might be a way forward, together with an IID
>> random effect at the same level (equivalent to sef-neighbours).
>>
>> Hope this helps,
>>
>> Roger
>>
>>>
>>> Best regards,
>>> Robert
>>>
>>> ________________________________________
>>> From: Roger Bivand <[hidden email]>
>>> Sent: Wednesday, November 6, 2019 15:07
>>> To: Robert R
>>> Cc: [hidden email]
>>> Subject: Re: [R-sig-Geo] Spatial Autocorrelation Estimation Method
>>>
>>> On Tue, 5 Nov 2019, Robert R wrote:
>>>
>>>> Dear Roger,
>>>>
>>>> Thank you for your reply. I disabled HTML; my e-mails should be now in
>>>> plain text.
>>>>
>>>> I will give a better context for my desired outcome.
>>>>
>>>> I am taking Airbnb's listings information for New York City available
>>>> on: http://insideairbnb.com/get-the-data.html
>>>>
>>>> I save every listings.csv.gz file available for NYC (2015-01 to 2019-09)
>>>> - in total, 54 files/time periods - as a YYYY-MM-DD.csv file into a
>>>> Listings/ folder. When importing all these 54 files into one single data
>>>> set, I create a new "date_compiled" variable/column.
>>>>
>>>> In total, after the data cleansing process, I have a little more 2
>>>> million observations.
>>>
>>> You have repeat lettings for some, but not all properties. So this is at
>>> best a very unbalanced panel. For those properties with repeats, you may
>>> see temporal movement (trend/seasonal).
>>>
>>> I suggest (strongly) taking a single borough or even zipcode with some
>>> hindreds of properties, and working from there. Do not include the
>>> observation as its own neighbour, perhaps identify repeats and handle them
>>> specially (create or use a property ID). Unbalanced panels may also create
>>> a selection bias issue (why are some properties only listed sometimes?).
>>>
>>> So this although promising isn't simple, and getting to a hedonic model
>>> may be hard, but not (just) because of spatial autocorrelation. I wouldn't
>>> necessarily trust OLS output either, partly because of the repeat property
>>> issue.
>>>
>>> Roger
>>>
>>>>
>>>> I created 54 timedummy variables for each time period available.
>>>>
>>>> I want to estimate using a hedonic spatial timedummy model the impact of
>>>> a variety of characteristics which potentially determine the daily rate
>>>> on Airbnb listings through time in New York City (e.g. characteristics
>>>> of the listing as number of bedrooms, if the host if professional,
>>>> proximity to downtown (New York City Hall) and nearest subway station
>>>> from the listing, income per capita, etc.).
>>>>
>>>> My dependent variable is price (log price, common in the related
>>>> literature for hedonic prices).
>>>>
>>>> The OLS model is done.
>>>>
>>>> For the spatial model, I am assuming that hosts, when deciding the
>>>> pricing of their listings, take not only into account its structural and
>>>> location characteristics, but also the prices charged by near listings
>>>> with similar characteristics - spatial autocorrelation is then present,
>>>> at least spatial dependence is present in the dependent variable.
>>>>
>>>> As I wrote in my previous post, I was willing to consider the neighbor
>>>> itself as a neighbor.
>>>>
>>>> Parts of my code can be found below:
>>>>
>>>> ########
>>>>
>>>> ## packages
>>>>
>>>> packages_install <- function(packages){
>>>> new.packages <- packages[!(packages %in% installed.packages()[, "Package"])]
>>>> if (length(new.packages))
>>>> install.packages(new.packages, dependencies = TRUE)
>>>> sapply(packages, require, character.only = TRUE)
>>>> }
>>>>
>>>> packages_required <- c("bookdown", "cowplot", "data.table", "dplyr", "e1071", "fastDummies", "ggplot2", "ggrepel", "janitor", "kableExtra", "knitr", "lubridate", "nngeo", "plm", "RColorBrewer", "readxl", "scales", "sf", "spdep", "stargazer", "tidyverse")
>>>> packages_install(packages_required)
>>>>
>>>> # Working directory
>>>> setwd("C:/Users/User/R")
>>>>
>>>>
>>>>
>>>> ## shapefile_us
>>>>
>>>> # Shapefile zips import and Coordinate Reference System (CRS) transformation
>>>> # Shapefile download: https://www2.census.gov/geo/tiger/GENZ2018/shp/cb_2018_us_zcta510_500k.zip
>>>> shapefile_us <- sf::st_read(dsn = "Shapefile", layer = "cb_2018_us_zcta510_500k")
>>>>
>>>> # Columns removal
>>>> shapefile_us <- shapefile_us %>% select(-c(AFFGEOID10, GEOID10, ALAND10, AWATER10))
>>>>
>>>> # Column rename: ZCTA5CE10
>>>> setnames(shapefile_us, old=c("ZCTA5CE10"), new=c("zipcode"))
>>>>
>>>> # Column class change: zipcode
>>>> shapefile_us$zipcode <- as.character(shapefile_us$zipcode)
>>>>
>>>>
>>>>
>>>> ## polygon_nyc
>>>>
>>>> # Zip code not available in shapefile: 11695
>>>> polygon_nyc <- shapefile_us %>% filter(zipcode %in% zips_nyc)
>>>>
>>>>
>>>>
>>>> ## weight_matrix
>>>>
>>>> # Neighboring polygons: list of neighbors for each polygon (queen contiguity neighbors)
>>>> polygon_nyc_nb <- poly2nb((polygon_nyc %>% select(-borough)), queen=TRUE)
>>>>
>>>> # Include neighbour itself as a neighbour
>>>> # for(i in 1:length(polygon_nyc_nb)){polygon_nyc_nb[[i]]=as.integer(c(i,polygon_nyc_nb[[i]]))}
>>>> polygon_nyc_nb <- include.self(polygon_nyc_nb)
>>>>
>>>> # Weights to each neighboring polygon
>>>> lw <- nb2listw(neighbours = polygon_nyc_nb, style="W", zero.policy=TRUE)
>>>>
>>>>
>>>>
>>>> ## listings
>>>>
>>>> # Data import
>>>> files <- list.files(path="Listings/", pattern=".csv", full.names=TRUE)
>>>> listings <- setNames(lapply(files, function(x) read.csv(x, stringsAsFactors = FALSE, encoding="UTF-8")), files)
>>>> listings <- mapply(cbind, listings, date_compiled = names(listings))
>>>> listings <- listings %>% bind_rows
>>>>
>>>> # Characters removal
>>>> listings$date_compiled <- gsub("Listings/", "", listings$date_compiled)
>>>> listings$date_compiled <- gsub(".csv", "", listings$date_compiled)
>>>> listings$price <- gsub("\\$", "", listings$price)
>>>> listings$price <- gsub(",", "", listings$price)
>>>>
>>>>
>>>>
>>>> ## timedummy
>>>>
>>>> timedummy <- sapply("date_compiled_", paste, unique(listings$date_compiled), sep="")
>>>> timedummy <- paste(timedummy, sep = "", collapse = " + ")
>>>> timedummy <- gsub("-", "_", timedummy)
>>>>
>>>>
>>>>
>>>> ## OLS regression
>>>>
>>>> # Pooled cross-section data - Randomly sampled cross sections of Airbnb listings price at different points in time
>>>> regression <- plm(formula=as.formula(paste("log_price ~ #some variables", timedummy, sep = "", collapse = " + ")), data=listings, model="pooling", index="id")
>>>>
>>>> ########
>>>>
>>>> Some of my id's repeat in multiple time periods.
>>>>
>>>> I use NYC's zip codes to left join my data with the neighborhood zip code specific characteristics, such as income per capita to that specific zip code, etc.
>>>>
>>>> Now I want to apply the hedonic model with the timedummy variables.
>>>>
>>>> Do you know how to proceed? 1) Which package to use (spdep/splm)?; 2) Do I have to join the polygon_nyc (by zip code) to my listings data set, and then calculate the weight matrix "lw"?
>>>>
>>>> Again, thank you very much for the help provided until now.
>>>>
>>>> Best regards,
>>>> Robert
>>>>
>>>> ________________________________________
>>>> From: Roger Bivand <[hidden email]>
>>>> Sent: Tuesday, November 5, 2019 15:30
>>>> To: Robert R
>>>> Cc: [hidden email]
>>>> Subject: Re: [R-sig-Geo] Spatial Autocorrelation Estimation Method
>>>>
>>>> On Tue, 5 Nov 2019, Robert R wrote:
>>>>
>>>>> I have a large pooled cross-section data set. ?I would like to
>>>>> estimate/regress using spatial autocorrelation methods. I am assuming
>>>>> for now that spatial dependence is present in both the dependent
>>>>> variable and the error term.? ?My data set is over a period of 4 years,
>>>>> monthly data (54 periods). For this means, I've created a time dummy
>>>>> variable for each time period.? ?I also created a weight matrix using the
>>>>> functions "poly2nb" and "nb2listw".? ?Now I am trying to figure out a way
>>>>> to estimate my model which contains a really big data set.? ?Basically, my
>>>>> model is as follows: y = ?D + ?W1y + X? + ?W2u + ?? ?My questions are:? ?1)
>>>>> My spatial weight matrix for the whole data set will be probably a
>>>>> enormous matrix with submatrices for each time period itself. I don't
>>>>> think it would be possible to calculate this.? What I would like to know
>>>>> is a way to estimate each time dummy/period separately (to compare
>>>>> different periods alone). How to do it?? ?2) Which package to use: spdep
>>>>> or splm?? ?Thank you and best regards,? Robert?
>>>>
>>>> Please do not post HTML, only plain text. Almost certainly your model
>>>> specification is wrong (SARAR/SAC is always a bad idea if alternatives are
>>>> untried). What is your cross-sectional size? Using sparse kronecker
>>>> products, the "enormous" matrix may not be very big. Does it make any
>>>> sense using time dummies (54 x N x T will be mostly zero anyway)? Are most
>>>> of the covariates time-varying? Please provide motivation and use area
>>>> (preferably with affiliation (your email and user name are not
>>>> informative) - this feels like a real estate problem, probably wrongly
>>>> specified. You should use splm if time make sense in your case, but if it
>>>> really doesn't, simplify your approach, as much of the data will be
>>>> subject to very large temporal autocorrelation.
>>>>
>>>> If this is a continuation of your previous question about using
>>>> self-neighbours, be aware that you should not use self-neighbours in
>>>> modelling, they are only useful for the Getis-Ord local G_i^* measure.
>>>>
>>>> Roger
>>>>
>>>>>
>>>>>       [[alternative HTML version deleted]]
>>>>>
>>>>> _______________________________________________
>>>>> R-sig-Geo mailing list
>>>>> [hidden email]
>>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>>>>
>>>> --
>>>> Roger Bivand
>>>> Department of Economics, Norwegian School of Economics,
>>>> Helleveien 30, N-5045 Bergen, Norway.
>>>> voice: +47 55 95 93 55; e-mail: [hidden email]
>>>> https://orcid.org/0000-0003-2392-6140
>>>> https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en
>>>>
>>>
>>> --
>>> Roger Bivand
>>> Department of Economics, Norwegian School of Economics,
>>> Helleveien 30, N-5045 Bergen, Norway.
>>> voice: +47 55 95 93 55; e-mail: [hidden email]
>>> https://orcid.org/0000-0003-2392-6140
>>> https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en
>>>
>>
>> --
>> Roger Bivand
>> Department of Economics, Norwegian School of Economics,
>> Helleveien 30, N-5045 Bergen, Norway.
>> voice: +47 55 95 93 55; e-mail: [hidden email]
>> https://orcid.org/0000-0003-2392-6140
>> https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en
>>
>
> --
> Roger Bivand
> Department of Economics, Norwegian School of Economics,
> Helleveien 30, N-5045 Bergen, Norway.
> voice: +47 55 95 93 55; e-mail: [hidden email]
> https://orcid.org/0000-0003-2392-6140
> https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en
>
--
Roger Bivand
Department of Economics, Norwegian School of Economics,
Helleveien 30, N-5045 Bergen, Norway.
voice: +47 55 95 93 55; e-mail: [hidden email]
https://orcid.org/0000-0003-2392-6140
https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en

_______________________________________________
R-sig-Geo mailing list
[hidden email]
https://stat.ethz.ch/mailman/listinfo/r-sig-geo

Re: sp/rgdal workflows with PROJ >= 6 and GDAL >= 3

Fri, 11/15/2019 - 14:59
The development version of rgdal on R-Forge is now at rev 894, and is now
ready for trying out with PROJ6/GDAL3 workflows, and workflows that may
migrate within 6 months to modern CRS representations. The motivating RFC
is also updated to cover coordinate operations, the use of prepared
(pre-searched) coordinate operations, and should be read carefully by
anyone using rgdal::spTransform(). Note further that rgdal::project() will
not be adapted for PROJ6, and is effectively deprecated.

I'll be running reverse dependency checks, and may be bugging package
maintainers. I would really prefer that mainainers of packages using
spTransform() checked themselves and joined this thread or the associated
twitter thread: https://twitter.com/RogerBivand/status/1194586193108914177

Be ready for modern PROJ and GDAL, they are already being deployed across
open source geospatial software, like GRASS, QGIS, pyproj, spatialite etc.

Waiting, hopefully not in vain, for contributions.

Roger

On Wed, 13 Nov 2019, Roger Bivand wrote:

> And this link explains the CDN proposal for grid distribution:
>
> https://www.spatialys.com/en/crowdfunding/
>
> Roger
>
> On Wed, 13 Nov 2019, Roger Bivand wrote:
>
>>  Because PROJ >= 6 and GDAL >= 3 change the way that PROJ strings
>>  (representations of coordinate reference systems) are handled, steps are
>>  being taken to find ways to adapt sp/rgdal workflows. A current proposal
>>  is to store the WKT2_2018 string as a comment to CRS objects as defined in
>>  the sp package.
>>
>>  A draft development-in-progress version of rgdal is available at
>>  https://r-forge.r-project.org/R/?group_id=884, and for sp at
>>  https://github.com/rsbivand/sp (this version of sp requires rgdal >=
>>  1.5-1). This adds the WKT comments to CRS objects on reading vector and
>>  raster data sources, and uses WKT comments if found when writing vector
>>  and raster objects (or at least does as far as I've checked, possibly
>>  fragile).
>>
>>  An RFC with tersely worked cases for using CRS object comments to carry
>>  WKT strings but maintaining full backward compatibility is online at
>>  http://rgdal.r-forge.r-project.org/articles/PROJ6_GDAL3.html.
>>
>>  If you have other ideas or concerns about trying to use this mechanism for
>>  sp CRS objects, please contribute at your earliest convenience.
>>
>>  http://rgdal.r-forge.r-project.org/reference/list_coordOps.html shows the
>>  beginning of the next step, to query transformation operations to find
>>  viable coordinate operation pipelines.
>>
>>  I'm assuming that the previous behaviour (transform without considering
>>  accuracy with whatever is to hand) is not viable going forward, and that
>>  we will need two steps: list coordinate operations between source and
>>  target CRS (using the WKT comments as better specifications than the PROJ
>>  strings), possibly intervene manually to install missing grids, then
>>  undertake the coordinate operation.
>>
>>  The fallback may be simply to choose the least inaccurate available
>>  coordinate operation, but this should be a fallback. This means that all
>>  uses of spTransform() will require intervention.
>>
>>  Is this OK (it is tiresome but modernises workflows once), or is it not OK
>>  (no user intervention is crucial)?
>>
>>  These behaviours may be set in an option, so that package maintainers and
>>  users may delay modernisation, but all are undoubtedly served by rapid
>>  adaptation (GRASS 7.8.1 released yesterday, libspatialite, pyproj, QGIS
>>  development versions all state that they list candidate coordinate
>>  operations).
>>
>>  We cannot ship all the grids, they are very bulky, and probably nobody
>>  needs sub-metre accuracy world-wide. Work in PROJ is starting to create a
>>  content delivery network for trusted download and mechanisms for
>>  registering downloaded grids on user platforms. We would for example not
>>  want Windows users of rgdal and sf to have to download the same grid
>>  twice.
>>
>>  Comments welcome here and at
>>  https://github.com/r-spatial/discuss/issues/28 or
>>  https://github.com/r-spatial/sf/issues/1187
>>
>>  Roger
>>
>>
>
>
--
Roger Bivand
Department of Economics, Norwegian School of Economics,
Helleveien 30, N-5045 Bergen, Norway.
voice: +47 55 95 93 55; e-mail: [hidden email]
https://orcid.org/0000-0003-2392-6140
https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en

_______________________________________________
R-sig-Geo mailing list
[hidden email]
https://stat.ethz.ch/mailman/listinfo/r-sig-geo
Roger Bivand
Department of Economics
Norwegian School of Economics
Helleveien 30
N-5045 Bergen, Norway

Calculating weighted values of precipitation annually

Fri, 11/15/2019 - 10:13
Hi there,I am currently trying to calculate "weighted" spatial annual global values for precipitation using the object "RCP1pctCO2Mean", which is a raster brick. I need to do this for each of the 138 years (i.e. 138 layers) of global precipitation data that I have. The idea would be to somehow apply weights to each grid cell value for each year by using the cosine of its latitude (which means grid cells at the equator would have a weight of 1 (i.e. the cosine of 0 degrees is 1), and the poles would have a value of 1 (as the cosine of 90 is 1)). The idea would also then be to make a time series of these values, from Year 1 to Year 138, after all newly derived grid cell values are averaged for each year, creating 138 (weighted) averages)). The object "RCP1pctCO2Mean" looks like this:
class       : RasterBrick
dimensions  : 64, 128, 8192, 140  (nrow, ncol, ncell, nlayers)
resolution  : 2.8125, 2.789327  (x, y)
extent      : -181.4062, 178.5938, -89.25846, 89.25846  (xmin, xmax, ymin, ymax)
coord. ref. : +proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0
data source : C:/Users/Travis/Documents/Other documents/All netCDF files/netcdffiles/MaxPrecIPSLIPSL-CM5B-LR1pctCO2.nc
names       : X1, X2, X3, X4, X5, X6, X7, X8, X9, X10, X11, X12, X13, X14, X15, ...
z-value     : 1, 140 (min, max)
varname     : onedaymax

Here is what I have done so far:
newraster <- rasterToPoints(RCP1pctCO2Mean) #Not sure if this is necessary?
I then proceeded to do assign weights like this: 
weight <- cos(newraster*(pi/180))
However, this yields strange but identical precipitation values (i.e. all values are 0.97 to 0.99, which is odd) across each grid cell for each layer. I am not sure what I am doing incorrectly (if there is anything incorrect) - could it be that the "pi/180" is not necessary? Also, once this is done, how to revert to a raster stack with the new values?
I also saw another function, called "getWeights", but I am not sure how relevant this is. I am not sure about its associated package, but I was thinking about using it like this
weight <- getWeights(newraster, f = (newraster) cos(newraster)) 
Would this approach apply the weights appropriately? 
I would greatly appreciate any help with this!!!!Thanks,
        [[alternative HTML version deleted]]

_______________________________________________
R-sig-Geo mailing list
[hidden email]
https://stat.ethz.ch/mailman/listinfo/r-sig-geo

Averaging values across grid cells for each layer of a raster brick object

Thu, 11/14/2019 - 20:31
Hi there,
I am trying to average precipitation values across grid cells of a raster (which is masked to only account for land areas). This raster brick object, called "overlap.sub" has 138 layers (years). It was created as follows:

World_land <- readOGR("ne_110m_land.shp")
newprojection1 <- spTransform(World_land, CRS("+proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0"))overlap <- crop(RCP1pctCO2Median, extent(newprojection1))
overlap.sub <- mask(overlap, newprojection1) #This isolates grid cells on all land areas
The object "overlap.sub" has the following attributes:
class       : RasterBrick
dimensions  : 62, 127, 7874, 138  (nrow, ncol, ncell, nlayers)
resolution  : 2.8125, 2.789327  (x, y)
extent      : -178.5938, 178.5938, -89.25846, 83.67981  (xmin, xmax, ymin, ymax)
coord. ref. : +proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0
data source : C:/Users/Travis/AppData/Local/Temp/Rtmp0GALJn/raster/r_tmp_2019-11-14_201408_9732_60407.grd
names       :    layer.1,    layer.2,    layer.3,    layer.4,    layer.5,    layer.6,    layer.7,    layer.8,    layer.9,   layer.10,   layer.11,   layer.12,   layer.13,   layer.14,   layer.15, ...
min values  : 0.42964514, 0.43375653, 0.51749371, 0.50838983, 0.45366730, 0.53099146, 0.49757186, 0.45697752, 0.41382199, 0.46082401, 0.45516687, 0.51857087, 0.41005131, 0.45956529, 0.47497867, ...
max values  :   87.85833,   86.73710,   88.92577,   76.44161,   82.43909,   85.03188,   77.36673,   75.86527,   94.32226,  101.58247,   86.67574,   90.96802,   99.59785,   76.78394,   88.31423, ...
Now, to obtain the annual averages across the masked grid cells, so that I may plot a time series of the average precipitation across grid cells for Year 1, then for Year 2, then Year 3....all the way to Year 138 (essentially 138 averages):
Averageprec <- colMeans(overlap.sub)
However, this yields this error:
Error in colMeans(overlap.sub) :
  'x' must be an array of at least two dimensions
I do believe that colMeans is the appropriate function for this, but why am I receiving this error? Unless there is another way to do this? Also, this would only average those grid cells that I masked (i.e. only the land areas), correct? Thanks, and any help/feedback would be greatly appreciated!!
        [[alternative HTML version deleted]]

_______________________________________________
R-sig-Geo mailing list
[hidden email]
https://stat.ethz.ch/mailman/listinfo/r-sig-geo

Re: Isolating only land areas on a global map for computing averages

Thu, 11/14/2019 - 12:41
Hi Tom and others,

I was able to use rasterToPolygons(RCP1pctCO2Mean) to convert my raster into polygons, which is now an object called "trans". However, does this command retain the original 138 layers/years of data, but just now in polygon form? 
Also, I am not too clear as to how to generate id values for polygons as row and column indices from this new object. I was looking online for a procedure, but nothing too specific shows how to do this directly. Would it be possible to provide an example in this context? 
Finally, I am unsure of what was meant by "get land polygon SpatialPolygons object "landPoly" with polygons of land only, not ocean." Do you mean the SpatialPolygonsDataframe that was created from rasterToPolygons?
Thank you, and I appreciate your help!
-----Original Message-----
From: Tom Philippi <[hidden email]>
To: rain1290 <[hidden email]>; tephilippi <[hidden email]>
Cc: r-sig-geo <[hidden email]>
Sent: Fri, Nov 8, 2019 11:04 pm
Subject: Re: [R-sig-Geo] Isolating only land areas on a global map for computing averages

Rain--Yes, it is possible to do it with your extant raster stack.  In fact, pretty much all reasonable approaches will do that.  Anything you do will create a raster layer with values at every one of your 8192 raster cells: some values will be 0 or NA.
The trivial answer if you only had a small range of latitude, and small raster cell sizes relative to your polygon, would be the raster::rasterize() function to generate a raster mask.
But, with a global raster from +90 to -90, any interpretable answer averaging over area needs to take into account the different area of a grid cell at 70 deg latitude vs 0 deg.  See: https://gis.stackexchange.com/questions/29734/how-to-calculate-area-of-1-x-1-degree-cells-in-a-rasterAt that point, you should go ahead and account for fractions of grid cells over land so your averaging would be over land area.
I'm reticent to give you a complete code solution because without a name, you may well be a student with this as an assignment.  But, my approach would be:
create a SpatialPolygonsDataFrame object "gridPoly" from any of your raster layers via raster::rasterToPolygons()generate an id value for each of those polygons as row & column indices from the raster, or as the cell number.get land polygon SpatialPolygons object "landPoly" with polygons of land only, not ocean.create a new SpatialPolygons object from gIntersect::gridPoly, landPoly, byid = c(TRUE, FALSE))calculate an area of each polygon in that object via geosphere::areaPolygon()  {rgeos::gArea()  only works for projected CRS}create your mask/weights raster layer with either all NA or all 0.either:      parse the id values to row & column values.     use the triplets of row, column, and area to replace the corresponding NA or 0 values in that mask
or:     if you used cell numbers, just use the cell numbers and area values in replacement in that mask
create a second weight raster via raster::area() on one of your raster layers.
raster multiply your polygon-area and your raster::area values to give the actual weighs to use.
This still is an approximation, but likely +/- 1-2%.
If this is still complete gibberish to you, either I need more coffee or you need to consult a good reference on working with spatial data in general.
On Thu, Nov 7, 2019 at 8:25 AM <[hidden email]> wrote:

Hi Tom and others,

Thank you for your response and suggestions! 
Yes, I loaded and used the maptools package previously to create continents on my world map, among other things. I do think that the easiest approach would be to create a raster layer for land, and then water, with the values that I have. However, my precipitation values are globally distributed - every grid cell has a precipitation value for each year (i.e. each grid cell has 138 values/layers/years). So, if I were to create a raster of only land areas, how would I have my grid cells coincide with the land areas only on that raster?
Also, would it be possible to accomplish this with the raster stack that I already have? If so, is there a way to separate all land/water areas this way using the maptools package?
Thanks, again, and I really appreciate your help!
-----Original Message-----
From: Tom Philippi <[hidden email]>
To: rain1290 <[hidden email]>
Cc: r-sig-geo <[hidden email]>
Sent: Thu, Nov 7, 2019 12:44 am
Subject: Re: [R-sig-Geo] Isolating only land areas on a global map for computing averages

The easiest approach would be to create a separate aligned raster layer for land vs water.  There are plenty of coastline polygons available out there (e.g., maptools, rworldmap, rnaturalearth packages): choose one in your raster CRS (or choose one and spTransform() it).  Then, use a grid version of your raster to extract values from that land/water SpatialPolygons object.
1: Your idea of extracting the land/water value at each grid cell centroid gives one estimate.  Instead of TRUE/FALSE, think of the numeric equivalents 1,0,  then using those as weights for averaging across your grid cells.2: A "better" estimate would be to compute the fraction of each grid cell that is land, then use those fractional [0, 1] values as weights to compute a weighted average of precipitation over land.  At 2.8deg grid cells, a lot of heavy rainfall coastal areas would have the grid cell centroid offshore and be omitted by approach #1.3: I recommend that you think hard about averaging across cells in Lat Lon to estimate average precipitation over land.  The actual area of a ~2.8 by 2.8 deg grid cell at the equator is much larger than the same at 70 deg N.  I would spend the extra hour computing the actual area (in km^2) of land in each of your 8192 grid cells, then using those in a raster as weights for whatever calculations you do on the raster stack.  [Or you can cheat, as the area of a grid cell in degrees is a function of only the latitudes, and your required weights are multiplicative.]
Your mileage may vary...
Tom
On Wed, Nov 6, 2019 at 6:18 PM rain1290--- via R-sig-Geo <[hidden email]> wrote:

Hi there,
I am interested in calculating precipitation medians globally. However, I only want to isolate land areas to compute the median. I already first created a raster stack, called "RCP1pctCO2median", which contains the median values. There are 138 layers, with each layer representing one year.  This raster stack has the following attributes:
class       : RasterStack
dimensions  : 64, 128, 8192, 138  (nrow, ncol, ncell, nlayers)
resolution  : 2.8125, 2.789327  (x, y)
extent      : -181.4062, 178.5938, -89.25846, 89.25846  (xmin, xmax, ymin, ymax)
coord. ref. : +proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0
names       :    layer.1,    layer.2,    layer.3,    layer.4,    layer.5,    layer.6,    layer.7,    layer.8,    layer.9,   layer.10,   layer.11,   layer.12,   layer.13,   layer.14,   layer.15, ...
min values  : 0.42964514, 0.43375653, 0.51749371, 0.50838983, 0.45366730, 0.53099146, 0.49757186, 0.45697752, 0.41382199, 0.46082401, 0.45516687, 0.51857087, 0.41005131, 0.45956529, 0.47497867, ...
max values  :   96.30350,  104.08584,   88.92751,   97.49373,   89.57201,   90.58570,   86.67651,   88.33519,   96.94720,  101.58247,   96.07792,   93.21948,   99.59785,   94.26218,   90.62138, ...  

Previously, I was isolating a specific region by specifying a range of longitudes and latitudes to obtain the medians for that region, like this:
expansion1<-expand.grid(103:120, 3:15)lonlataaa<-extract(RCP1pctCO2Median, expansion1)Columnaaa<-colMeans(lonlataaa)

However, with this approach, too much water can mix with land areas, and if I narrow the latitude/longitude range on land, I might miss too much land to compute the median meaningfully.
Therefore, with this data, would it be possible to use an IF/ELSE statement to tell R that if the "center point" of each grid cell happens to fall on land, then it would be considered as land (i.e. that would be TRUE - if not, then FALSE)? Even if a grid cell happens to have water mixed with land, but the center point of the grid is on land, that would be considered land. But can R even tell what is land or water in this case?
Thank you, and I would greatly appreciate any assistance!

        [[alternative HTML version deleted]]

_______________________________________________
R-sig-Geo mailing list
[hidden email]
https://stat.ethz.ch/mailman/listinfo/r-sig-geo



        [[alternative HTML version deleted]]

_______________________________________________
R-sig-Geo mailing list
[hidden email]
https://stat.ethz.ch/mailman/listinfo/r-sig-geo

Computing the means of gridded data over land only using Raster Stack

Wed, 11/13/2019 - 14:33
Greetings,
I am interested in calculating precipitation averages globally. However, I only want to isolate land and/or oceanic areas to compute the mean of those separately. What I would like to do is somehow isolate those grid cells whose centers overlap with either land or ocean and then compute the annual mean. I already first created a raster stack, called "RCP1pctCO2Mean", which contains the mean values of interest. There are 138 layers, with each layer representing one year.  This raster stack has the following attributes:
    class       : RasterStack
    dimensions  : 64, 128, 8192, 138  (nrow, ncol, ncell, nlayers)
    resolution  : 2.8125, 2.789327  (x, y)
    extent      : -181.4062, 178.5938, -89.25846, 89.25846  (xmin, xmax, ymin, 
    ymax)
    coord. ref. : +proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0
    names       :    layer.1,    layer.2,    layer.3,    layer.4,    layer.5,      
    layer.6,    layer.7,    layer.8,    layer.9,   layer.10,   layer.11,  
    layer.12,   layer.13,   layer.14,   layer.15, ...
    min values  : 0.42964514, 0.43375653, 0.51749371, 0.50838983, 0.45366730,   
    0.53099146, 0.49757186, 0.45697752, 0.41382199, 0.46082401, 0.45516687,
    0.51857087, 0.41005131, 0.45956529, 0.47497867, ...
    max values  :   96.30350,  104.08584,   88.92751,   97.49373,   89.57201,  
    90.58570,   86.67651,   88.33519,   96.94720,  101.58247,   96.07792,  
    93.21948,   99.59785,   94.26218,   90.62138, ...  
Previously, I tried isolating a specific region by specifying a range of longitudes and latitudes to obtain the means and medians for that region, just like this:
    >expansion1<-expand.grid(103:120, 3:15) #This is a range of longitudes and then latitudes
    >lonlataaa<-extract(RCP1pctCO2Mean, expansion1)
    >Columnaaa<-colMeans(lonlataaa)
    #Packages loaded        library(raster)
    library(maps)
    library(maptools)
    library(rasterVis)

However, with this approach, too much water can mix with land areas, and if I narrow the latitude/longitude range on land, I might miss too much land to compute the mean meaningfully.
Therefore, with this RasterStack, would it be possible to create a condition that tells R that if the "center point" or centroid of each grid cell (with each grid cell center representing a specific latitude/longitude coordinate) happens to fall on land, then it would be considered as land (i.e. that would be TRUE - if not, then FALSE, or maybe somehow use 0s or 1s)? Even if a grid cell happens to have water mixed with land, but the center point/centroid of the grid is on land, that would be considered as land. I would like to do this for specific countries, too.
I want the individual 138 layers/years to be retained, so that all the Year 1s can be averaged across all relevant grid cells, then all Year 2s, then all Year 3s, then all Year 4s, etc. (to create a time series later). I'm not sure if this is the correct way to do this, but what I did first was take the "RCP1pctCO2Mean" RasterStack and created a SpatialPolygonsDataframe using:
    >trans <- raster::rasterToPolygon(RCP1pctCO2Mean)    >trans

class       : SpatialPolygonsDataFrame
features    : 8192
extent      : -181.4062, 178.5938, -89.25846, 89.25846  (xmin, xmax, ymin, ymax)
coord. ref. : +proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0
variables   : 138
names       :           layer.1,           layer.2,           layer.3,           layer.4,           layer.5,           layer.6,          layer.7,           layer.8,           layer.9,          layer.10,         layer.11,          layer.12,          layer.13,          layer.14,          layer.15, ...
min values  : 0.429645141288708, 0.433756527757047, 0.517493712584313, 0.508389826053265, 0.453667300300907, 0.530991463885754,  0.4975718597839, 0.456977516231847, 0.413821991744321, 0.460824014230889, 0.45516687008843, 0.518570869929649, 0.410051312472821, 0.459565291388527, 0.474978673081429, ...
max values  :  96.3034965348338,  104.085840813594,  88.9275127089197,  97.4937326695693,  89.5720053000712,  90.5857030396531, 86.6765123781949,  88.3351859796546,   96.947199473011,  101.582468961459, 96.0779212204739,  93.2194802269814,  99.5978503841538,  94.2621847475029,  90.6213755054263, ...

Could generating an id value for each of those land (or water) polygons, such that the center of those grid cells (i.e. latitude/longitude coordinates) on land are only counted, be a logical next step to do something like this? Is it possible to somehow just isolate an all-land polygon, and then somehow specify which cells are considered land, and then compute averages for each year across the grid cells? If so, there are a lot of cells to assign a weight individually, so I am not sure if there is a way to do this quickly?
Thank you, and I would greatly appreciate any assistance! I look forward to your response!
        [[alternative HTML version deleted]]

_______________________________________________
R-sig-Geo mailing list
[hidden email]
https://stat.ethz.ch/mailman/listinfo/r-sig-geo

Re: sp/rgdal workflows with PROJ >= 6 and GDAL >= 3

Wed, 11/13/2019 - 13:22
And this link explains the CDN proposal for grid distribution:

https://www.spatialys.com/en/crowdfunding/

Roger

On Wed, 13 Nov 2019, Roger Bivand wrote:

> Because PROJ >= 6 and GDAL >= 3 change the way that PROJ strings
> (representations of coordinate reference systems) are handled, steps are
> being taken to find ways to adapt sp/rgdal workflows. A current proposal is
> to store the WKT2_2018 string as a comment to CRS objects as defined in the
> sp package.
>
> A draft development-in-progress version of rgdal is available at
> https://r-forge.r-project.org/R/?group_id=884, and for sp at
> https://github.com/rsbivand/sp (this version of sp requires rgdal >= 1.5-1).
> This adds the WKT comments to CRS objects on reading vector and raster data
> sources, and uses WKT comments if found when writing vector and raster
> objects (or at least does as far as I've checked, possibly fragile).
>
> An RFC with tersely worked cases for using CRS object comments to carry WKT
> strings but maintaining full backward compatibility is online at
> http://rgdal.r-forge.r-project.org/articles/PROJ6_GDAL3.html.
>
> If you have other ideas or concerns about trying to use this mechanism for sp
> CRS objects, please contribute at your earliest convenience.
>
> http://rgdal.r-forge.r-project.org/reference/list_coordOps.html shows the
> beginning of the next step, to query transformation operations to find viable
> coordinate operation pipelines.
>
> I'm assuming that the previous behaviour (transform without considering
> accuracy with whatever is to hand) is not viable going forward, and that we
> will need two steps: list coordinate operations between source and target CRS
> (using the WKT comments as better specifications than the PROJ strings),
> possibly intervene manually to install missing grids, then undertake the
> coordinate operation.
>
> The fallback may be simply to choose the least inaccurate available
> coordinate operation, but this should be a fallback. This means that all uses
> of spTransform() will require intervention.
>
> Is this OK (it is tiresome but modernises workflows once), or is it not OK
> (no user intervention is crucial)?
>
> These behaviours may be set in an option, so that package maintainers and
> users may delay modernisation, but all are undoubtedly served by rapid
> adaptation (GRASS 7.8.1 released yesterday, libspatialite, pyproj, QGIS
> development versions all state that they list candidate coordinate
> operations).
>
> We cannot ship all the grids, they are very bulky, and probably nobody needs
> sub-metre accuracy world-wide. Work in PROJ is starting to create a content
> delivery network for trusted download and mechanisms for registering
> downloaded grids on user platforms. We would for example not want Windows
> users of rgdal and sf to have to download the same grid twice.
>
> Comments welcome here and at https://github.com/r-spatial/discuss/issues/28 
> or https://github.com/r-spatial/sf/issues/1187
>
> Roger
>
>
--
Roger Bivand
Department of Economics, Norwegian School of Economics,
Helleveien 30, N-5045 Bergen, Norway.
voice: +47 55 95 93 55; e-mail: [hidden email]
https://orcid.org/0000-0003-2392-6140
https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en

_______________________________________________
R-sig-Geo mailing list
[hidden email]
https://stat.ethz.ch/mailman/listinfo/r-sig-geo
Roger Bivand
Department of Economics
Norwegian School of Economics
Helleveien 30
N-5045 Bergen, Norway

sp/rgdal workflows with PROJ >= 6 and GDAL >= 3

Wed, 11/13/2019 - 05:45
Because PROJ >= 6 and GDAL >= 3 change the way that PROJ strings
(representations of coordinate reference systems) are handled, steps are
being taken to find ways to adapt sp/rgdal workflows. A current proposal
is to store the WKT2_2018 string as a comment to CRS objects as defined in
the sp package.

A draft development-in-progress version of rgdal is available at
https://r-forge.r-project.org/R/?group_id=884, and for sp at
https://github.com/rsbivand/sp (this version of sp requires rgdal >=
1.5-1). This adds the WKT comments to CRS objects on reading vector and
raster data sources, and uses WKT comments if found when writing vector
and raster objects (or at least does as far as I've checked, possibly
fragile).

An RFC with tersely worked cases for using CRS object comments to carry
WKT strings but maintaining full backward compatibility is online at
http://rgdal.r-forge.r-project.org/articles/PROJ6_GDAL3.html.

If you have other ideas or concerns about trying to use this mechanism for
sp CRS objects, please contribute at your earliest convenience.

http://rgdal.r-forge.r-project.org/reference/list_coordOps.html shows the
beginning of the next step, to query transformation operations to find
viable coordinate operation pipelines.

I'm assuming that the previous behaviour (transform without considering
accuracy with whatever is to hand) is not viable going forward, and that
we will need two steps: list coordinate operations between source and
target CRS (using the WKT comments as better specifications than the PROJ
strings), possibly intervene manually to install missing grids, then
undertake the coordinate operation.

The fallback may be simply to choose the least inaccurate available
coordinate operation, but this should be a fallback. This means that all
uses of spTransform() will require intervention.

Is this OK (it is tiresome but modernises workflows once), or is it not OK
(no user intervention is crucial)?

These behaviours may be set in an option, so that package maintainers and
users may delay modernisation, but all are undoubtedly served by rapid
adaptation (GRASS 7.8.1 released yesterday, libspatialite, pyproj, QGIS
development versions all state that they list candidate coordinate
operations).

We cannot ship all the grids, they are very bulky, and probably nobody
needs sub-metre accuracy world-wide. Work in PROJ is starting to create a
content delivery network for trusted download and mechanisms for
registering downloaded grids on user platforms. We would for example not
want Windows users of rgdal and sf to have to download the same grid
twice.

Comments welcome here and at
https://github.com/r-spatial/discuss/issues/28 or
https://github.com/r-spatial/sf/issues/1187

Roger

--
Roger Bivand
Department of Economics, Norwegian School of Economics,
Helleveien 30, N-5045 Bergen, Norway.
voice: +47 55 95 93 55; e-mail: [hidden email]
https://orcid.org/0000-0003-2392-6140
https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en

_______________________________________________
R-sig-Geo mailing list
[hidden email]
https://stat.ethz.ch/mailman/listinfo/r-sig-geo
Roger Bivand
Department of Economics
Norwegian School of Economics
Helleveien 30
N-5045 Bergen, Norway

Re: Error in impacts() after lagsarlm (spdep package)

Wed, 11/13/2019 - 03:54
On Wed, 13 Nov 2019, Denys Dukhovnov via R-sig-Geo wrote:

> I posted this question on StackOverflow a while back but received no answer. 

I maintain the spatialreg package to which I think you are referring. I
never visit SO, both to save time and because the signal/noise ratio there
is not encouraging (with exceptions where reproducible examples are used).

While your post was pain text (thanks), what you copied from SO got rather
mangled. The spdep package does warn you that the functions you are using
are deprecated and are actually to be called from spatialreg.

>
> I have 9,150 polygons in my data set. I was trying to run a spatial
> autoregressive model (SAR) in spdep to test spatial dependence of my
> outcome variable. After running the model, I wanted to examine the
> direct/indirect impacts, but encountered an error that seems to have
> something to do with the length of neighbors in the weights matrix not
> being equal to n. I tried running the very same equation as SLX model
> (Spatial Lag X), and impacts() worked fine, even though there were some
> polygons in my set that had no neighbors.
>
>> # Defining queen contiguity neighbors for polyset and storing the
>> matrix as list
>> q.nbrs <- poly2nb(polyset)
>> listweights <- nb2listw(q.nbrs, zero.policy = TRUE)
>
>> # Defining the model
>> model.equation <- TIME ~ A + B + C
>
>> # Run SAR model reg <- lagsarlm(model.equation, data = polyset, listw =
>> listweights, zero.policy = TRUE) You are completely unnecessarily using the "eigen" method. If your weights
are symmetric, use Cholesky decomposition ("Matrix"), much faster, same
output.

>
>> # Run impacts() to show direct/indirect impacts
>> impacts(reg, listw = listweights, zero.policy = TRUE)
>
>> Error in intImpacts(rho = rho, beta = beta, P = P, n = n, mu = mu, Sigma = Sigma,  :
>     length(listweights$neighbours) == n is not TRUE
>

Never, ever, do this. Did you read LeSage & Pace? Use traces, not the
weights themselves. With the listw object, you need to invert an nxn
matrix once in this case, 1+R times if you run Monte Carlo simulations.

If you provide a reproducible example using built-in data, I can try to
provide a more informative error message.

> What am I doing wrong? I am running Windows 10 machine with R 3.5.3 with
> the most up-to-date spdep package, if it helps.
>

R is at 3.6.1.

Hope this clarifies,

Roger

> Thank you very much.
>
> Regards,
> Denys Dukhovnov
>
> _______________________________________________
> R-sig-Geo mailing list
> [hidden email]
> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
> --
Roger Bivand
Department of Economics, Norwegian School of Economics,
Helleveien 30, N-5045 Bergen, Norway.
voice: +47 55 95 93 55; e-mail: [hidden email]
https://orcid.org/0000-0003-2392-6140
https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en
_______________________________________________
R-sig-Geo mailing list
[hidden email]
https://stat.ethz.ch/mailman/listinfo/r-sig-geo
Roger Bivand
Department of Economics
Norwegian School of Economics
Helleveien 30
N-5045 Bergen, Norway

Error in impacts() after lagsarlm (spdep package)

Wed, 11/13/2019 - 03:42
I posted this question on StackOverflow a while back but received no answer. 

I have 9,150 polygons in my data set. I was trying to run a spatial autoregressive model (SAR) in spdep to test spatial dependence of my outcome variable. After running the model, I wanted to examine the direct/indirect impacts, but encountered an error that seems to have something to do with the length of neighbors in the weights matrix not being equal to n. I tried running the very same equation as SLX model (Spatial Lag X), and impacts() worked fine, even though there were some polygons in my set that had no neighbors. 

> # Defining queen contiguity neighbors for polyset and storing the matrix as list
> q.nbrs <- poly2nb(polyset)
> listweights <- nb2listw(q.nbrs, zero.policy = TRUE)

> # Defining the model
> model.equation <- TIME ~ A + B + C

> # Run SAR model
> reg <- lagsarlm(model.equation, data = polyset, listw = listweights, zero.policy = TRUE)

> # Run impacts() to show direct/indirect impacts
> impacts(reg, listw = listweights, zero.policy = TRUE)

> Error in intImpacts(rho = rho, beta = beta, P = P, n = n, mu = mu, Sigma = Sigma,  :
    length(listweights$neighbours) == n is not TRUE

What am I doing wrong? I am running Windows 10 machine with R 3.5.3 with the most up-to-date spdep package, if it helps.

Thank you very much.

Regards,
Denys Dukhovnov

_______________________________________________
R-sig-Geo mailing list
[hidden email]
https://stat.ethz.ch/mailman/listinfo/r-sig-geo

Re: A surface of GWR predicted values

Wed, 11/13/2019 - 03:15
Dear Binbin,

Yes, it helped a lot. Thanks for the clarification.

Best,
Fred.

Sent from Mail for Windows 10

From: [hidden email]
Sent: Tuesday, November 12, 2019 4:19 PM
To: [hidden email]; fred ramos
Cc: [hidden email]
Subject: Re: Re: [R-sig-Geo] A surface of GWR predicted values

Dear Fred,
It seems that you were using the gwr.predict from the GWmodel package.

Note that the condition of outputing predictions at specific locations is that observations of the corresponding exploratory variables are available.
The predictions are output when you use the following routine:
gwr_test <- gwr.predict(job_density_log~distances+pop_density_log,data=zonas_OD_P_sp,kernel="gaussian",bw=21720)
because you were predicting the observations.
When you specified a point grid, NA were returned. I assumed that you didn't have any observed exploratory variables at them, and that's why. Note that GWR is not an interpolation technique. 

Hope it helps.
Cheers,
Binbin



> -----原始邮件-----
> 发件人: "Roger Bivand" <[hidden email]>
> 发送时间: 2019-11-12 20:59:17 (星期二)
> 收件人: "Fred Ramos" <[hidden email]>
> 抄送: "[hidden email]" <[hidden email]>
> 主题: Re: [R-sig-Geo] A surface of GWR predicted values

> Please do not post HTML, only plain text.

> You do need to say which package gwr.predict() comes from. Better, provide 
> a reproducible example with a built-in data set showing your problem.

> Roger

> On Tue, 12 Nov 2019, Fred Ramos wrote:

> > Dear all,
> >
> > I`m trying to build a surface with predicted values using gwr.predict.
> >
> > When I run the gwr.predict without giving the fitting point it runs without problems. But when I enter with SpatialPointsDataFrame (a point grid) as fitting points the prediction values returns NA. Is there anything that I could do to get these results in the fitting points?
> >
> >
> >> gwr_test <- gwr.predict(job_density_log~distances+pop_density_log,data=zonas_OD_P_sp,kernel="gaussian",bw=21720)
> >
> >> gwr_test$SDF
> > class       : SpatialPointsDataFrame
> > features    : 423
> > extent      : 292782, 414055.3, 7349266, 7424404  (xmin, xmax, ymin, ymax)
> > crs         : +proj=utm +zone=23 +south +ellps=intl +units=m +no_defs
> > variables   : 5
> > names       :     Intercept_coef,        distances_coef, pop_density_log_coef,        prediction,    prediction_var
> > min values  : -0.958964210431336, -0.000128871984509238,    0.287006700717343, -3.26862056578432,  0.48618073025782
> > max values  :   4.17712723602899,  -1.8499577687439e-06,    0.979959974621219,  5.64311734285255, 0.693583031013022
> >
> >> gwr_out_grid_test <- gwr.predict(job_density_log~distances+pop_density_log,data=zonas_OD_P_sp,predictdata=grade_g_DF,kernel="gaussian",bw=21720)
> >
> >
> >> gwr_out_grid_test$SDF
> > class       : SpatialPointsDataFrame
> > features    : 9075
> > extent      : 293282, 413282, 7349904, 7423904  (xmin, xmax, ymin, ymax)
> > crs         : +proj=utm +zone=23 +south +ellps=intl +units=m +no_defs
> > variables   : 5
> > names       :    Intercept_coef,        distances_coef, pop_density_log_coef, prediction, prediction_var
> > min values  : -1.18701861474003, -0.000127242449368378,    0.310710138723114,         NA,             NA
> > max values  :  4.04479455484814,  2.02812418949068e-06,    0.995539141570355,         NA,             NA
> >
> > Many thanks,
> > Fred.
> >
> >
> >
> >
> > Sent from Mail for Windows 10
> >
> >
> >  [[alternative HTML version deleted]]
> >
> > _______________________________________________
> > R-sig-Geo mailing list
> > [hidden email]
> > https://stat.ethz.ch/mailman/listinfo/r-sig-geo
> >

> -- 
> Roger Bivand
> Department of Economics, Norwegian School of Economics,
> Helleveien 30, N-5045 Bergen, Norway.
> voice: +47 55 95 93 55; e-mail: [hidden email]
https://orcid.org/0000-0003-2392-6140
https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en

> _______________________________________________
> R-sig-Geo mailing list
[hidden email]
https://stat.ethz.ch/mailman/listinfo/r-sig-geo


        [[alternative HTML version deleted]]

_______________________________________________
R-sig-Geo mailing list
[hidden email]
https://stat.ethz.ch/mailman/listinfo/r-sig-geo

Re: A surface of GWR predicted values

Tue, 11/12/2019 - 09:18
Dear Fred,

It seems that you were using the gwr.predict from the GWmodel package.




Note that the condition of outputing predictions at specific locations is that observations of the corresponding exploratory variables are available.

The predictions are output when you use the following routine:

gwr_test <- gwr.predict(job_density_log~distances+pop_density_log,data=zonas_OD_P_sp,kernel="gaussian",bw=21720)

because you were predicting the observations.

When you specified a point grid, NA were returned. I assumed that you didn't have any observed exploratory variables at them, and that's why. Note that GWR is not an interpolation technique.




Hope it helps.

Cheers,

Binbin








> -----原始邮件-----
> 发件人: "Roger Bivand" <[hidden email]>
> 发送时间: 2019-11-12 20:59:17 (星期二)
> 收件人: "Fred Ramos" <[hidden email]>
> 抄送: "[hidden email]" <[hidden email]>
> 主题: Re: [R-sig-Geo] A surface of GWR predicted values
>
> Please do not post HTML, only plain text.
>
> You do need to say which package gwr.predict() comes from. Better, provide
> a reproducible example with a built-in data set showing your problem.
>
> Roger
>
> On Tue, 12 Nov 2019, Fred Ramos wrote:
>
> > Dear all,
> >
> > I`m trying to build a surface with predicted values using gwr.predict.
> >
> > When I run the gwr.predict without giving the fitting point it runs without problems. But when I enter with SpatialPointsDataFrame (a point grid) as fitting points the prediction values returns NA. Is there anything that I could do to get these results in the fitting points?
> >
> >
> >> gwr_test <- gwr.predict(job_density_log~distances+pop_density_log,data=zonas_OD_P_sp,kernel="gaussian",bw=21720)
> >
> >> gwr_test$SDF
> > class       : SpatialPointsDataFrame
> > features    : 423
> > extent      : 292782, 414055.3, 7349266, 7424404  (xmin, xmax, ymin, ymax)
> > crs         : +proj=utm +zone=23 +south +ellps=intl +units=m +no_defs
> > variables   : 5
> > names       :     Intercept_coef,        distances_coef, pop_density_log_coef,        prediction,    prediction_var
> > min values  : -0.958964210431336, -0.000128871984509238,    0.287006700717343, -3.26862056578432,  0.48618073025782
> > max values  :   4.17712723602899,  -1.8499577687439e-06,    0.979959974621219,  5.64311734285255, 0.693583031013022
> >
> >> gwr_out_grid_test <- gwr.predict(job_density_log~distances+pop_density_log,data=zonas_OD_P_sp,predictdata=grade_g_DF,kernel="gaussian",bw=21720)
> >
> >
> >> gwr_out_grid_test$SDF
> > class       : SpatialPointsDataFrame
> > features    : 9075
> > extent      : 293282, 413282, 7349904, 7423904  (xmin, xmax, ymin, ymax)
> > crs         : +proj=utm +zone=23 +south +ellps=intl +units=m +no_defs
> > variables   : 5
> > names       :    Intercept_coef,        distances_coef, pop_density_log_coef, prediction, prediction_var
> > min values  : -1.18701861474003, -0.000127242449368378,    0.310710138723114,         NA,             NA
> > max values  :  4.04479455484814,  2.02812418949068e-06,    0.995539141570355,         NA,             NA
> >
> > Many thanks,
> > Fred.
> >
> >
> >
> >
> > Sent from Mail for Windows 10
> >
> >
> >  [[alternative HTML version deleted]]
> >
> > _______________________________________________
> > R-sig-Geo mailing list
> > [hidden email]
> > https://stat.ethz.ch/mailman/listinfo/r-sig-geo
> >
>
> --
> Roger Bivand
> Department of Economics, Norwegian School of Economics,
> Helleveien 30, N-5045 Bergen, Norway.
> voice: +47 55 95 93 55; e-mail: [hidden email]
> https://orcid.org/0000-0003-2392-6140
> https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en
>
> _______________________________________________
> R-sig-Geo mailing list
> [hidden email]
> https://stat.ethz.ch/mailman/listinfo/r-sig-geo


        [[alternative HTML version deleted]]

_______________________________________________
R-sig-Geo mailing list
[hidden email]
https://stat.ethz.ch/mailman/listinfo/r-sig-geo

Re: A surface of GWR predicted values

Tue, 11/12/2019 - 06:59
Please do not post HTML, only plain text.

You do need to say which package gwr.predict() comes from. Better, provide
a reproducible example with a built-in data set showing your problem.

Roger

On Tue, 12 Nov 2019, Fred Ramos wrote:

> Dear all,
>
> I`m trying to build a surface with predicted values using gwr.predict.
>
> When I run the gwr.predict without giving the fitting point it runs without problems. But when I enter with SpatialPointsDataFrame (a point grid) as fitting points the prediction values returns NA. Is there anything that I could do to get these results in the fitting points?
>
>
>> gwr_test <- gwr.predict(job_density_log~distances+pop_density_log,data=zonas_OD_P_sp,kernel="gaussian",bw=21720)
>
>> gwr_test$SDF
> class       : SpatialPointsDataFrame
> features    : 423
> extent      : 292782, 414055.3, 7349266, 7424404  (xmin, xmax, ymin, ymax)
> crs         : +proj=utm +zone=23 +south +ellps=intl +units=m +no_defs
> variables   : 5
> names       :     Intercept_coef,        distances_coef, pop_density_log_coef,        prediction,    prediction_var
> min values  : -0.958964210431336, -0.000128871984509238,    0.287006700717343, -3.26862056578432,  0.48618073025782
> max values  :   4.17712723602899,  -1.8499577687439e-06,    0.979959974621219,  5.64311734285255, 0.693583031013022
>
>> gwr_out_grid_test <- gwr.predict(job_density_log~distances+pop_density_log,data=zonas_OD_P_sp,predictdata=grade_g_DF,kernel="gaussian",bw=21720)
>
>
>> gwr_out_grid_test$SDF
> class       : SpatialPointsDataFrame
> features    : 9075
> extent      : 293282, 413282, 7349904, 7423904  (xmin, xmax, ymin, ymax)
> crs         : +proj=utm +zone=23 +south +ellps=intl +units=m +no_defs
> variables   : 5
> names       :    Intercept_coef,        distances_coef, pop_density_log_coef, prediction, prediction_var
> min values  : -1.18701861474003, -0.000127242449368378,    0.310710138723114,         NA,             NA
> max values  :  4.04479455484814,  2.02812418949068e-06,    0.995539141570355,         NA,             NA
>
> Many thanks,
> Fred.
>
>
>
>
> Sent from Mail for Windows 10
>
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> R-sig-Geo mailing list
> [hidden email]
> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>
--
Roger Bivand
Department of Economics, Norwegian School of Economics,
Helleveien 30, N-5045 Bergen, Norway.
voice: +47 55 95 93 55; e-mail: [hidden email]
https://orcid.org/0000-0003-2392-6140
https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en

_______________________________________________
R-sig-Geo mailing list
[hidden email]
https://stat.ethz.ch/mailman/listinfo/r-sig-geo
Roger Bivand
Department of Economics
Norwegian School of Economics
Helleveien 30
N-5045 Bergen, Norway

Error in mixed model spatial prediction

Tue, 11/12/2019 - 06:51
Hi all,

I am trying to do a mixed logistic model prediction on a rasterstack (3 layers) as below and it gives me this error.:

# model response is a integer (0,1)
model1 <- glmer(loss ~ Location + Presence_Absence_SPP + density_am +
                                       (1|grid_id), family = binomial("logit"), data=DATA)

# rs is raster layer composed of 2 layers of factors and 1 layer of numeric variable.
mod_pred <- predict(rs, m1, type="response", index=1:3)

# error
Error in `contrasts<-`(`*tmp*`, value = contrasts.arg[[nn]]) :
  contrasts apply only to factors

# i tried to do with re.form=~0, based on some suggestions on the net. But think it is not correct to do this, otherwise the model is a glm predictions and not a mixed model predictions.

With or without re.form=~0, i get the same error. Could anyone suggest one can make such predictions using rasterstack in r? I checked my raster rs structure and the variables have class like the ones in model1. So i am unable to understand what i am doing wrong. Some help is highly appreciated.

Best

_______________________________________________
R-sig-Geo mailing list
[hidden email]
https://stat.ethz.ch/mailman/listinfo/r-sig-geo

A surface of GWR predicted values

Tue, 11/12/2019 - 06:33
Dear all,

I`m trying to build a surface with predicted values using gwr.predict.

When I run the gwr.predict without giving the fitting point it runs without problems. But when I enter with SpatialPointsDataFrame (a point grid) as fitting points the prediction values returns NA. Is there anything that I could do to get these results in the fitting points?


> gwr_test <- gwr.predict(job_density_log~distances+pop_density_log,data=zonas_OD_P_sp,kernel="gaussian",bw=21720)

> gwr_test$SDF
class       : SpatialPointsDataFrame
features    : 423
extent      : 292782, 414055.3, 7349266, 7424404  (xmin, xmax, ymin, ymax)
crs         : +proj=utm +zone=23 +south +ellps=intl +units=m +no_defs
variables   : 5
names       :     Intercept_coef,        distances_coef, pop_density_log_coef,        prediction,    prediction_var
min values  : -0.958964210431336, -0.000128871984509238,    0.287006700717343, -3.26862056578432,  0.48618073025782
max values  :   4.17712723602899,  -1.8499577687439e-06,    0.979959974621219,  5.64311734285255, 0.693583031013022

> gwr_out_grid_test <- gwr.predict(job_density_log~distances+pop_density_log,data=zonas_OD_P_sp,predictdata=grade_g_DF,kernel="gaussian",bw=21720)


> gwr_out_grid_test$SDF
class       : SpatialPointsDataFrame
features    : 9075
extent      : 293282, 413282, 7349904, 7423904  (xmin, xmax, ymin, ymax)
crs         : +proj=utm +zone=23 +south +ellps=intl +units=m +no_defs
variables   : 5
names       :    Intercept_coef,        distances_coef, pop_density_log_coef, prediction, prediction_var
min values  : -1.18701861474003, -0.000127242449368378,    0.310710138723114,         NA,             NA
max values  :  4.04479455484814,  2.02812418949068e-06,    0.995539141570355,         NA,             NA

Many thanks,
Fred.




Sent from Mail for Windows 10


        [[alternative HTML version deleted]]

_______________________________________________
R-sig-Geo mailing list
[hidden email]
https://stat.ethz.ch/mailman/listinfo/r-sig-geo

Calculating weighted values across grid cells annually

Mon, 11/11/2019 - 09:37
Hi there,I am currently trying to calculate "weighted" spatial annual "global" values for precipitation using the object "Prec20". I need to do this for each of the 140 years (i.e. 140 layers) of global precipitation data that I have. The idea would be to somehow apply weights to each grid cell value for each year by using the cosine of its latitude (which means grid cells at the equator would have a weight of 1 (i.e. the cosine of 0 degrees is 1), and the poles would have a value of 1 (as the cosine of 90 is 1)). The idea would also then be to make a time series of these values, from Year 1 to Year 140, after all newly derived grid cell values are averaged for each year, creating 140 (weighted) averages)). The object "Prec20" looks like this:class       : RasterBrick
dimensions  : 64, 128, 8192, 140  (nrow, ncol, ncell, nlayers)
resolution  : 2.8125, 2.789327  (x, y)
extent      : -181.4062, 178.5938, -89.25846, 89.25846  (xmin, xmax, ymin, ymax)
coord. ref. : +proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0
data source : C:/Users/Travis/Documents/Other documents/All netCDF files/netcdffiles/MaxPrecIPSLIPSL-CM5B-LR1pctCO2.nc
names       : X1, X2, X3, X4, X5, X6, X7, X8, X9, X10, X11, X12, X13, X14, X15, ...
z-value     : 1, 140 (min, max)
varname     : onedaymax
Here is what I have done so far:ncfname20 <- "MaxPrecIPSLIPSL-CM5B-LR1pctCO2.nc"Prec20 <- brick(ncfname20,var="onedaymax")w20 <- cos(Prec20*(pi/180))
Would this approach apply the weights appropriately? 
I would greatly appreciate any help with this!!!!Thanks,
        [[alternative HTML version deleted]]

_______________________________________________
R-sig-Geo mailing list
[hidden email]
https://stat.ethz.ch/mailman/listinfo/r-sig-geo

Re: Spatial Autocorrelation Estimation Method

Mon, 11/11/2019 - 04:31
On Sun, 10 Nov 2019, Robert R wrote:

> Dear Roger,
>
> Again, thank you for your answer. I read the material provided and
> decided that Hierarchical Spatial Autoregressive (HSAR) could be the
> right model for me.
>
> I indeed have the precise latitude and longitude information for all my
> listings for NYC.
>
> I created a stratified sample (group = zipcode) with 22172 (1%) of my
> observations called listings_sample and tried to replicate the hsar
> model, please see below.
>
> For now W = NULL, because otherwise I would have a 22172 x 22172 matrix. Unless you know definitely that you want to relate the response to its
lagged value, you do not need this. Do note that the matrix is very
sparse, so could be fitted without difficulty with ML in a cross-sectional
model.

>
> You recommended then to introduce a Markov random field (MRF) random
> effect (RE) at the zipcode level, but I did not understand it so well.
> Could you develop a litte more?
>

Did you read the development in
https://doi.org/10.1016/j.spasta.2017.01.002? It is explained there, and
includes code for fitting the Beijing housing parcels data se from HSAR
with many other packages (MCMC, INLA, hglm, etc.). I guess that you should
try to create a model that works on a single borough, sing the zipcodes
in that borough as a proxy for unobserved neighbourhood effects. Try for
example using lme4::lmer() with only a zipcode IID random effect, see if
the hedonic estimates are similar to lm(), and leave adding an MRF RE
(with for example mgcv::gam() or hglm::hglm()) until you have a working
testbed. Then advance step-by-step from there.

You still have not said how many repeat lettings you see - it will affect
the way you specify your model.

Roger

> ##############
> library(spdep)
> library(HSAR)
> library(dplyr)
> library(splitstackshape)
>
>
> # Stratified sample per zipcode (size = 1%) listings_sample <-
> splitstackshape::stratified(indt = listings, group = "zipcode", size =
> 0.01)
>
> # Removing zipcodes from polygon_nyc which are not observable in
> listings_sample polygon_nyc_listings <- polygon_nyc %>% filter(zipcode
> %in% c(unique(as.character(listings_sample$zipcode))))
>
>
> ## Random effect matrix (N by J)
>
> # N: 22172
> # J: 154
>
> # Arrange listings_sample by zipcode (ascending)
> listings_sample <- listings_sample %>% arrange(zipcode)
>
> # Count number of listings per zipcode
> MM <- listings_sample %>% st_drop_geometry() %>% group_by(zipcode) %>% summarise(count = n()) %>% as.data.frame()
> # sum(MM$count)
>
> # N by J nulled matrix creation
> Delta <- matrix(data = 0, nrow = nrow(listings_sample), ncol = dim(MM)[1])
>
> # The total number of neighbourhood
> Uid <- rep(c(1:dim(MM)[1]), MM[,2])
>
> for(i in 1:dim(MM)[1]) {
>  Delta[Uid==i,i] <- 1
> }
> rm(i)
>
> Delta <- as(Delta,"dgCMatrix")
>
>
> ## Higher-level spatial weights matrix or neighbourhood matrix (J by J)
>
> # Neighboring polygons: list of neighbors for each polygon (queen contiguity neighbors)
> polygon_nyc_nb <- poly2nb(polygon_nyc_listings, row.names = polygon_nyc$zipcode, queen = TRUE)
>
> # Include neighbour itself as a neighbour
> polygon_nyc_nb <- include.self(polygon_nyc_nb)
>
> # Spatial weights matrix for nb
> polygon_nyc_nb_matrix <- nb2mat(neighbours = polygon_nyc_nb, style = "W", zero.policy = NULL)
> M <- as(polygon_nyc_nb_matrix,"dgCMatrix")
>
>
> ## Fit HSAR SAR upper level random effect
> model <- as.formula(log_price ~ guests_included + minimum_nights)
>
> betas = coef(lm(formula = model, data = listings_sample))
> pars = list(rho = 0.5, lambda = 0.5, sigma2e = 2.0, sigma2u = 2.0, betas = betas)
>
> m_hsar <- hsar(model, data = listings_sample, W = NULL, M = M, Delta = Delta, burnin = 5000, Nsim = 10000, thinning = 1, parameters.start = pars)
>
> ##############
>
> Thank you and best regards
> Robert
>
> ________________________________________
> From: Roger Bivand <[hidden email]>
> Sent: Friday, November 8, 2019 13:29
> To: Robert R
> Cc: [hidden email]
> Subject: Re: [R-sig-Geo] Spatial Autocorrelation Estimation Method
>
> On Fri, 8 Nov 2019, Robert R wrote:
>
>> Dear Roger,
>>
>> Thank you for your answer.
>>
>> I successfully used the function nb2blocknb() for a smaller dataset.
>>
>> But for a dataset of over 2 million observations, I get the following
>> error: "Error: cannot allocate vector of size 840 Kb".
>
> I don't think the observations are helpful. If you have repeat lets in the
> same property in a given month, you need to handle that anyway. I'd go for
> making the modelling exercise work (we agree that this is not panel data,
> right?) on a small subset first. I would further argue that you need a
> multi-level approach rather than spdep::nb2blocknb(), with a zipcode IID
> RE. You could very well take (stratified) samples per zipcode to represent
> your data. Once that works, introduce an MRF RE at the zipcode level,
> where you do know relative position. Using SARAR is going to be a waste of
> time unless you can geocode the letting addresses. A multi-level approach
> will work. Having big data in your case with no useful location
> information per observation is just adding noise and over-smoothing, I'm
> afraid. The approach used in https://doi.org/10.1016/j.spasta.2017.01.002
> will work, also when you sample the within zipcode lets, given a split
> into training and test sets, and making CV possible.
>
> Roger
>
>>
>> I am expecting that at least 500.000 observations will be dropped due
>> the lack of values for the chosen variables for the regression model, so
>> probably I will filter and remove the observations/rows that will not be
>> used anyway - do you know if there is any package that does this
>> automatically, given the variables/columns chosed by me?
>>
>> Or would you recommend me another approach to avoid the above mentioned
>> error?
>>
>> Thank you and best regards,
>> Robert
>>
>> ________________________________________
>> From: Roger Bivand <[hidden email]>
>> Sent: Thursday, November 7, 2019 10:13
>> To: Robert R
>> Cc: [hidden email]
>> Subject: Re: [R-sig-Geo] Spatial Autocorrelation Estimation Method
>>
>> On Thu, 7 Nov 2019, Robert R wrote:
>>
>>> Dear Roger,
>>>
>>> Many thanks for your help.
>>>
>>> I have an additional question:
>>>
>>> Is it possible to create a "separate" lw (nb2listw) (with different
>>> rownumbers) from my data set? For now, I am taking my data set and
>>> merging with the sf object polygon_nyc with the function
>>> "merge(polygon_nyc, listings, by=c("zipcode" = "zipcode"))", so I create
>>> a huge n x n matrix (depending of the size of my data set).
>>>
>>> Taking the polygon_nyc alone and turning it to a lw (weights list)
>>> object has only n = 177.
>>>
>>> Of course running
>>>
>>> spatialreg::lagsarlm(formula=model, data = listings_sample,
>>> spatialreg::polygon_nyc_lw, tol.solve=1.0e-10)
>>>
>>> does not work ("Input data and weights have different dimensions").
>>>
>>> The only option is to take my data set, merge it to my polygon_nyc (by
>>> zipcode) and then create the weights list lw? Or there another option?
>>
>> I think we are getting more clarity. You do not know the location of the
>> lettings beyond their zipcode. You do know the boundaries of the zipcode
>> areas, and can create a neighbour object from these boundaries. You then
>> want to treat all the lettings in a zipcode area i as neighbours, and
>> additionally lettings in zipcode areas neighbouring i as neighbours of
>> lettings in i. This is the data structure that motivated the
>> spdep::nb2blocknb() function:
>>
>> https://r-spatial.github.io/spdep/reference/nb2blocknb.html
>>
>> Try running the examples to get a feel for what is going on.
>>
>> I feel that most of the variability will vanish in the very large numbers
>> of neighbours, over-smoothing the outcomes. If you do not have locations
>> for the lettings themselves, I don't think you can make much progress.
>>
>> You could try a linear mixed model (or gam with a spatially structured
>> random effect) with a temporal and a spatial random effect. See the HSAR
>> package, articles by Dong et al., and maybe
>> https://doi.org/10.1016/j.spasta.2017.01.002 for another survey. Neither
>> this nor Dong et al. handle spatio-temporal settings. MRF spatial random
>> effects at the zipcode level might be a way forward, together with an IID
>> random effect at the same level (equivalent to sef-neighbours).
>>
>> Hope this helps,
>>
>> Roger
>>
>>>
>>> Best regards,
>>> Robert
>>>
>>> ________________________________________
>>> From: Roger Bivand <[hidden email]>
>>> Sent: Wednesday, November 6, 2019 15:07
>>> To: Robert R
>>> Cc: [hidden email]
>>> Subject: Re: [R-sig-Geo] Spatial Autocorrelation Estimation Method
>>>
>>> On Tue, 5 Nov 2019, Robert R wrote:
>>>
>>>> Dear Roger,
>>>>
>>>> Thank you for your reply. I disabled HTML; my e-mails should be now in
>>>> plain text.
>>>>
>>>> I will give a better context for my desired outcome.
>>>>
>>>> I am taking Airbnb's listings information for New York City available
>>>> on: http://insideairbnb.com/get-the-data.html
>>>>
>>>> I save every listings.csv.gz file available for NYC (2015-01 to 2019-09)
>>>> - in total, 54 files/time periods - as a YYYY-MM-DD.csv file into a
>>>> Listings/ folder. When importing all these 54 files into one single data
>>>> set, I create a new "date_compiled" variable/column.
>>>>
>>>> In total, after the data cleansing process, I have a little more 2
>>>> million observations.
>>>
>>> You have repeat lettings for some, but not all properties. So this is at
>>> best a very unbalanced panel. For those properties with repeats, you may
>>> see temporal movement (trend/seasonal).
>>>
>>> I suggest (strongly) taking a single borough or even zipcode with some
>>> hindreds of properties, and working from there. Do not include the
>>> observation as its own neighbour, perhaps identify repeats and handle them
>>> specially (create or use a property ID). Unbalanced panels may also create
>>> a selection bias issue (why are some properties only listed sometimes?).
>>>
>>> So this although promising isn't simple, and getting to a hedonic model
>>> may be hard, but not (just) because of spatial autocorrelation. I wouldn't
>>> necessarily trust OLS output either, partly because of the repeat property
>>> issue.
>>>
>>> Roger
>>>
>>>>
>>>> I created 54 timedummy variables for each time period available.
>>>>
>>>> I want to estimate using a hedonic spatial timedummy model the impact of
>>>> a variety of characteristics which potentially determine the daily rate
>>>> on Airbnb listings through time in New York City (e.g. characteristics
>>>> of the listing as number of bedrooms, if the host if professional,
>>>> proximity to downtown (New York City Hall) and nearest subway station
>>>> from the listing, income per capita, etc.).
>>>>
>>>> My dependent variable is price (log price, common in the related
>>>> literature for hedonic prices).
>>>>
>>>> The OLS model is done.
>>>>
>>>> For the spatial model, I am assuming that hosts, when deciding the
>>>> pricing of their listings, take not only into account its structural and
>>>> location characteristics, but also the prices charged by near listings
>>>> with similar characteristics - spatial autocorrelation is then present,
>>>> at least spatial dependence is present in the dependent variable.
>>>>
>>>> As I wrote in my previous post, I was willing to consider the neighbor
>>>> itself as a neighbor.
>>>>
>>>> Parts of my code can be found below:
>>>>
>>>> ########
>>>>
>>>> ## packages
>>>>
>>>> packages_install <- function(packages){
>>>> new.packages <- packages[!(packages %in% installed.packages()[, "Package"])]
>>>> if (length(new.packages))
>>>> install.packages(new.packages, dependencies = TRUE)
>>>> sapply(packages, require, character.only = TRUE)
>>>> }
>>>>
>>>> packages_required <- c("bookdown", "cowplot", "data.table", "dplyr", "e1071", "fastDummies", "ggplot2", "ggrepel", "janitor", "kableExtra", "knitr", "lubridate", "nngeo", "plm", "RColorBrewer", "readxl", "scales", "sf", "spdep", "stargazer", "tidyverse")
>>>> packages_install(packages_required)
>>>>
>>>> # Working directory
>>>> setwd("C:/Users/User/R")
>>>>
>>>>
>>>>
>>>> ## shapefile_us
>>>>
>>>> # Shapefile zips import and Coordinate Reference System (CRS) transformation
>>>> # Shapefile download: https://www2.census.gov/geo/tiger/GENZ2018/shp/cb_2018_us_zcta510_500k.zip
>>>> shapefile_us <- sf::st_read(dsn = "Shapefile", layer = "cb_2018_us_zcta510_500k")
>>>>
>>>> # Columns removal
>>>> shapefile_us <- shapefile_us %>% select(-c(AFFGEOID10, GEOID10, ALAND10, AWATER10))
>>>>
>>>> # Column rename: ZCTA5CE10
>>>> setnames(shapefile_us, old=c("ZCTA5CE10"), new=c("zipcode"))
>>>>
>>>> # Column class change: zipcode
>>>> shapefile_us$zipcode <- as.character(shapefile_us$zipcode)
>>>>
>>>>
>>>>
>>>> ## polygon_nyc
>>>>
>>>> # Zip code not available in shapefile: 11695
>>>> polygon_nyc <- shapefile_us %>% filter(zipcode %in% zips_nyc)
>>>>
>>>>
>>>>
>>>> ## weight_matrix
>>>>
>>>> # Neighboring polygons: list of neighbors for each polygon (queen contiguity neighbors)
>>>> polygon_nyc_nb <- poly2nb((polygon_nyc %>% select(-borough)), queen=TRUE)
>>>>
>>>> # Include neighbour itself as a neighbour
>>>> # for(i in 1:length(polygon_nyc_nb)){polygon_nyc_nb[[i]]=as.integer(c(i,polygon_nyc_nb[[i]]))}
>>>> polygon_nyc_nb <- include.self(polygon_nyc_nb)
>>>>
>>>> # Weights to each neighboring polygon
>>>> lw <- nb2listw(neighbours = polygon_nyc_nb, style="W", zero.policy=TRUE)
>>>>
>>>>
>>>>
>>>> ## listings
>>>>
>>>> # Data import
>>>> files <- list.files(path="Listings/", pattern=".csv", full.names=TRUE)
>>>> listings <- setNames(lapply(files, function(x) read.csv(x, stringsAsFactors = FALSE, encoding="UTF-8")), files)
>>>> listings <- mapply(cbind, listings, date_compiled = names(listings))
>>>> listings <- listings %>% bind_rows
>>>>
>>>> # Characters removal
>>>> listings$date_compiled <- gsub("Listings/", "", listings$date_compiled)
>>>> listings$date_compiled <- gsub(".csv", "", listings$date_compiled)
>>>> listings$price <- gsub("\\$", "", listings$price)
>>>> listings$price <- gsub(",", "", listings$price)
>>>>
>>>>
>>>>
>>>> ## timedummy
>>>>
>>>> timedummy <- sapply("date_compiled_", paste, unique(listings$date_compiled), sep="")
>>>> timedummy <- paste(timedummy, sep = "", collapse = " + ")
>>>> timedummy <- gsub("-", "_", timedummy)
>>>>
>>>>
>>>>
>>>> ## OLS regression
>>>>
>>>> # Pooled cross-section data - Randomly sampled cross sections of Airbnb listings price at different points in time
>>>> regression <- plm(formula=as.formula(paste("log_price ~ #some variables", timedummy, sep = "", collapse = " + ")), data=listings, model="pooling", index="id")
>>>>
>>>> ########
>>>>
>>>> Some of my id's repeat in multiple time periods.
>>>>
>>>> I use NYC's zip codes to left join my data with the neighborhood zip code specific characteristics, such as income per capita to that specific zip code, etc.
>>>>
>>>> Now I want to apply the hedonic model with the timedummy variables.
>>>>
>>>> Do you know how to proceed? 1) Which package to use (spdep/splm)?; 2) Do I have to join the polygon_nyc (by zip code) to my listings data set, and then calculate the weight matrix "lw"?
>>>>
>>>> Again, thank you very much for the help provided until now.
>>>>
>>>> Best regards,
>>>> Robert
>>>>
>>>> ________________________________________
>>>> From: Roger Bivand <[hidden email]>
>>>> Sent: Tuesday, November 5, 2019 15:30
>>>> To: Robert R
>>>> Cc: [hidden email]
>>>> Subject: Re: [R-sig-Geo] Spatial Autocorrelation Estimation Method
>>>>
>>>> On Tue, 5 Nov 2019, Robert R wrote:
>>>>
>>>>> I have a large pooled cross-section data set. ​I would like to
>>>>> estimate/regress using spatial autocorrelation methods. I am assuming
>>>>> for now that spatial dependence is present in both the dependent
>>>>> variable and the error term.​ ​My data set is over a period of 4 years,
>>>>> monthly data (54 periods). For this means, I've created a time dummy
>>>>> variable for each time period.​ ​I also created a weight matrix using the
>>>>> functions "poly2nb" and "nb2listw".​ ​Now I am trying to figure out a way
>>>>> to estimate my model which contains a really big data set.​ ​Basically, my
>>>>> model is as follows: y = γD + ρW1y + Xβ + λW2u + ε​ ​My questions are:​ ​1)
>>>>> My spatial weight matrix for the whole data set will be probably a
>>>>> enormous matrix with submatrices for each time period itself. I don't
>>>>> think it would be possible to calculate this.​ What I would like to know
>>>>> is a way to estimate each time dummy/period separately (to compare
>>>>> different periods alone). How to do it?​ ​2) Which package to use: spdep
>>>>> or splm?​ ​Thank you and best regards,​ Robert​
>>>>
>>>> Please do not post HTML, only plain text. Almost certainly your model
>>>> specification is wrong (SARAR/SAC is always a bad idea if alternatives are
>>>> untried). What is your cross-sectional size? Using sparse kronecker
>>>> products, the "enormous" matrix may not be very big. Does it make any
>>>> sense using time dummies (54 x N x T will be mostly zero anyway)? Are most
>>>> of the covariates time-varying? Please provide motivation and use area
>>>> (preferably with affiliation (your email and user name are not
>>>> informative) - this feels like a real estate problem, probably wrongly
>>>> specified. You should use splm if time make sense in your case, but if it
>>>> really doesn't, simplify your approach, as much of the data will be
>>>> subject to very large temporal autocorrelation.
>>>>
>>>> If this is a continuation of your previous question about using
>>>> self-neighbours, be aware that you should not use self-neighbours in
>>>> modelling, they are only useful for the Getis-Ord local G_i^* measure.
>>>>
>>>> Roger
>>>>
>>>>>
>>>>>       [[alternative HTML version deleted]]
>>>>>
>>>>> _______________________________________________
>>>>> R-sig-Geo mailing list
>>>>> [hidden email]
>>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>>>>
>>>> --
>>>> Roger Bivand
>>>> Department of Economics, Norwegian School of Economics,
>>>> Helleveien 30, N-5045 Bergen, Norway.
>>>> voice: +47 55 95 93 55; e-mail: [hidden email]
>>>> https://orcid.org/0000-0003-2392-6140
>>>> https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en
>>>>
>>>
>>> --
>>> Roger Bivand
>>> Department of Economics, Norwegian School of Economics,
>>> Helleveien 30, N-5045 Bergen, Norway.
>>> voice: +47 55 95 93 55; e-mail: [hidden email]
>>> https://orcid.org/0000-0003-2392-6140
>>> https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en
>>>
>>
>> --
>> Roger Bivand
>> Department of Economics, Norwegian School of Economics,
>> Helleveien 30, N-5045 Bergen, Norway.
>> voice: +47 55 95 93 55; e-mail: [hidden email]
>> https://orcid.org/0000-0003-2392-6140
>> https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en
>>
>
> --
> Roger Bivand
> Department of Economics, Norwegian School of Economics,
> Helleveien 30, N-5045 Bergen, Norway.
> voice: +47 55 95 93 55; e-mail: [hidden email]
> https://orcid.org/0000-0003-2392-6140
> https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en
> --
Roger Bivand
Department of Economics, Norwegian School of Economics,
Helleveien 30, N-5045 Bergen, Norway.
voice: +47 55 95 93 55; e-mail: [hidden email]
https://orcid.org/0000-0003-2392-6140
https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en
_______________________________________________
R-sig-Geo mailing list
[hidden email]
https://stat.ethz.ch/mailman/listinfo/r-sig-geo
Roger Bivand
Department of Economics
Norwegian School of Economics
Helleveien 30
N-5045 Bergen, Norway

Re: It's possible delimit a shapefile very extent using R? (Roger Bivand)

Mon, 11/11/2019 - 04:21
Hi Caridad,

You may be right, that Geomar actually meant the bounding box or
coordinates of each polygon - thanks for trying to clarify. I think we'll
have to Geomar to explain what is needed.

Roger

On Sun, 10 Nov 2019, Caridad Serrano Ortega wrote:

> Hi Rober,
>
> My script with R is intermedi level and the english is basic one. Is this
> your answer?
>
> pas_file<- readOGR("D:/DATOS", "CoveredArea")
>
> pas_file@bbox # Extend all
> pas_file@polygons[[1]]@Polygons[[1]]@coords # Coordinates from the first
> polygon
> pas_file@polygons[[1]]@Polygons[[1]]@coords[,1] # All x from the first
> polygon
> pas_file@polygons[[1]]@Polygons[[1]]@coords[,2] # All y from the first
> polygon
> max(pas_file@polygons[[1]]@Polygons[[1]]@coords[,1])
> min(pas_file@polygons[[1]]@Polygons[[1]]@coords[,1])
> max(pas_file@polygons[[1]]@Polygons[[1]]@coords[,2])
> min(pas_file@polygons[[1]]@Polygons[[1]]@coords[,2])
>
> The shape file is a spatialPolygonsDataFrame so look at this link:
> http://strimas.com/r/tidy-sf/
>
--
Roger Bivand
Department of Economics, Norwegian School of Economics,
Helleveien 30, N-5045 Bergen, Norway.
voice: +47 55 95 93 55; e-mail: [hidden email]
https://orcid.org/0000-0003-2392-6140
https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en

_______________________________________________
R-sig-Geo mailing list
[hidden email]
https://stat.ethz.ch/mailman/listinfo/r-sig-geo
Roger Bivand
Department of Economics
Norwegian School of Economics
Helleveien 30
N-5045 Bergen, Norway

Re: It's possible delimit a shapefile very extent using R? (Roger Bivand)

Sun, 11/10/2019 - 12:55
Hi Rober,

My script with R is intermedi level and the english is basic one. Is this
your answer?

pas_file<- readOGR("D:/DATOS", "CoveredArea")

pas_file@bbox # Extend all
pas_file@polygons[[1]]@Polygons[[1]]@coords # Coordinates from the first
polygon
pas_file@polygons[[1]]@Polygons[[1]]@coords[,1] # All x from the first
polygon
pas_file@polygons[[1]]@Polygons[[1]]@coords[,2] # All y from the first
polygon
max(pas_file@polygons[[1]]@Polygons[[1]]@coords[,1])
min(pas_file@polygons[[1]]@Polygons[[1]]@coords[,1])
max(pas_file@polygons[[1]]@Polygons[[1]]@coords[,2])
min(pas_file@polygons[[1]]@Polygons[[1]]@coords[,2])

The shape file is a spatialPolygonsDataFrame so look at this link:
http://strimas.com/r/tidy-sf/

        [[alternative HTML version deleted]]

_______________________________________________
R-sig-Geo mailing list
[hidden email]
https://stat.ethz.ch/mailman/listinfo/r-sig-geo

Re: issue with readOGR

Sun, 11/10/2019 - 10:28
Thank Roger, that solve the issue.

Best regards,
as

On Sun, Nov 10, 2019 at 1:34 PM Roger Bivand <[hidden email]> wrote:

> On Sat, 9 Nov 2019, Abdoulaye Sarr wrote:
>
> > Platform is osx.
> >
> > Thanks
> >
> > On Sat, Nov 9, 2019 at 6:11 PM Abdoulaye Sarr <[hidden email]>
> > wrote:
> >
> >> output from list.files:
> >>
> >>      "dhaka_div.shx"        "dhaka_gazipur.cpg"
> >>  "dhaka_gazipur.dbf"    "dhaka_gazipur.prj"    "dhaka_gazipur.qpj"
> >>  "dhaka_gazipur.shp"
> >>  "dhaka_gazipur.shx"    "dhaka.dbf"            "dhaka.prj"
> >>  "dhaka.shp"
> >> "dhaka.shx"            "indicator.csv"        "r_val.csv"
> >>
>
> From this we see that "indicator.csv" is not (immediately) a file that has
> an appropriate driver (it needs a *.vrt to say which columns are which,
> see https://gdal.org/drivers/vector/csv.html). Valid layers should be
> "dhaka" and "dhaka_gazipur", but not "dhaka_div", for which  only the
> *.shx is shown.
>
> Try ogrListLayers(path.expand("/Volumes/DS2S/R_QGIS_codes//Data")) to try
> to detect valid layers. If you really meant "indicators", use
> paste0("CSV:", file.path(path.expand("/Volumes/DS2S/R_QGIS_codes//Data"),
> "indicators.csv")) as per the GDAL vector driver help page.
>
> Roger
>
> >> On Sat, Nov 9, 2019 at 5:07 PM Roger Bivand <[hidden email]>
> wrote:
> >>
> >>> On Fri, 8 Nov 2019, Abdoulaye Sarr wrote:
> >>>
> >>>> I am trying to read a shapefile using readOGR but keep getting error
> >>>> messages:
> >>>> map =
> >>>>
> readOGR(dsn=path.expand("/Volumes/DS2S/R_QGIS_codes//Data"),"indicator")
> >>>> Error in ogrInfo(dsn = dsn, layer = layer, encoding = encoding,
> >>> use_iconv =
> >>>> use_iconv,  :
> >>>>  Cannot open layer
> >>>
> >>> What is "/Volumes/DS2S/R_QGIS_codes//Data"? If a network drive, some
> OGR
> >>> drivers are known not to work with them (or that used to be the case).
> >>> Which OS/Platform is this? Is the repeated "//" correct? Can you see
> the
> >>> file running list.files("/Volumes/DS2S/R_QGIS_codes//Data") ?
> >>>
> >>> Roger
> >>>
> >>>>
> >>>> What could be causing the problem? sp 1.3-2
> >>>>
> >>>> Thanks
> >>>> as
> >>>>
> >>>>       [[alternative HTML version deleted]]
> >>>>
> >>>> _______________________________________________
> >>>> R-sig-Geo mailing list
> >>>> [hidden email]
> >>>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
> >>>>
> >>>
> >>> --
> >>> Roger Bivand
> >>> Department of Economics, Norwegian School of Economics,
> >>> Helleveien 30, N-5045 Bergen, Norway.
> >>> voice: +47 55 95 93 55; e-mail: [hidden email]
> >>> https://orcid.org/0000-0003-2392-6140
> >>> https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en
> >>>
> >>
> >
> >       [[alternative HTML version deleted]]
> >
> > _______________________________________________
> > R-sig-Geo mailing list
> > [hidden email]
> > https://stat.ethz.ch/mailman/listinfo/r-sig-geo
> >
>
> --
> Roger Bivand
> Department of Economics, Norwegian School of Economics,
> Helleveien 30, N-5045 Bergen, Norway.
> voice: +47 55 95 93 55; e-mail: [hidden email]
> https://orcid.org/0000-0003-2392-6140
> https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en
>
        [[alternative HTML version deleted]]

_______________________________________________
R-sig-Geo mailing list
[hidden email]
https://stat.ethz.ch/mailman/listinfo/r-sig-geo

Pages