Subscribe to R bloggers feed R bloggers
R news and tutorials contributed by hundreds of R bloggers
Updated: 2 days 2 hours ago

Visualizing classifier thresholds

Mon, 11/13/2017 - 17:38

(This article was first published on Rstats – bayesianbiologist, and kindly contributed to R-bloggers)

Lately I’ve been thinking a lot about the connection between prediction models and the decisions that they influence. There is a lot of theory around this, but communicating how the various pieces all fit together with the folks who will use and be impacted by these decisions can be challenging.

One of the important conceptual pieces is the link between the decision threshold (how high does the score need to be to predict positive) and the resulting distribution of outcomes (true positives, false positives, true negatives and false negatives). As a starting point, I’ve built this interactive tool for exploring this.

The idea is to take a validation sample of predictions from a model and experiment with the consequences of varying the decision threshold. The hope is that the user will be able to develop an intuition around the tradeoffs involved by seeing the link to the individual data points involved.

Code for this experiment is available here. I hope to continue to build on this with other interactive, visual tools aimed at demystifying the concepts at the interface between predictions and decisions.

var vglnk = { key: '949efb41171ac6ec1bf7f206d57e90b8' }; (function(d, t) { var s = d.createElement(t); s.type = 'text/javascript'; s.async = true; s.src = '//cdn.viglink.com/api/vglnk.js'; var r = d.getElementsByTagName(t)[0]; r.parentNode.insertBefore(s, r); }(document, 'script'));

To leave a comment for the author, please follow the link and comment on their blog: Rstats – bayesianbiologist. R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

New Project: Data Science Instruction at the US Census Bureau!

Mon, 11/13/2017 - 17:00

(This article was first published on R – AriLamstein.com, and kindly contributed to R-bloggers)

Today I am delighted to announce an exciting new collaboration. I will be working with the US Census Bureau as a Data Science Instructor!

Over the next six months I will be helping Census develop courses on using R to work with Census Data. These courses will be free and open to the public. People familiar with my open source work will realize that this project is right up my alley!

As a start to this project I am trying to gather two pieces of information:

  1. Which packages do R programmers typically use when working with Census data?
  2. What types of analyses do R programmers typically do with Census data?

If you use R to work with Census data, please leave an answer below!

The post New Project: Data Science Instruction at the US Census Bureau! appeared first on AriLamstein.com.

var vglnk = { key: '949efb41171ac6ec1bf7f206d57e90b8' }; (function(d, t) { var s = d.createElement(t); s.type = 'text/javascript'; s.async = true; s.src = '//cdn.viglink.com/api/vglnk.js'; var r = d.getElementsByTagName(t)[0]; r.parentNode.insertBefore(s, r); }(document, 'script'));

To leave a comment for the author, please follow the link and comment on their blog: R – AriLamstein.com. R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

Make memorable plots with memery. v0.3.0 now on CRAN.

Mon, 11/13/2017 - 15:33

Make memorable plots with memery. memery is an R package that generates internet memes including superimposed inset graphs and other atypical features, combining the visual impact of an attention-grabbing meme with graphic results of data analysis. Version 0.3.0 of memery is now on CRAN. The latest development version and a package vignette are available on GitHub.

[original post]

Below is an example interleaving a semi-transparent ggplot2 graph between a meme image backdrop and overlying meme text labels. The meme function will produce basic memes without needing to specify a number of additional arguments, but this is not the main purpose of the package. Adding a plot is then as simple as passing the plot to inset.

memery offers sensible defaults as well as a variety of basic templates for controlling how the meme and graph are spliced together. The example here shows how additional arguments can be specified to further control the content and layout. See the package vignette for a more complete set of examples and description of available features and graph templates.

Please do share your data analyst meme creations. Enjoy!

library(memery) # Make a graph of some data library(ggplot2) x <- seq(0, 2*pi , length.out = 50) panels <- rep(c("Plot A", "Plot B"), each = 50) d <- data.frame(x = x, y = sin(x), grp = panels) txt <- c("Philosoraptor's plots", "I like to make plots", "Figure 1. (A) shows a plot and (B) shows another plot.") p <- ggplot(d, aes(x, y)) + geom_line(colour = "cornflowerblue", size = 2) + geom_point(colour = "orange", size = 4) + facet_wrap(~grp) + labs(title = txt[1], subtitle = txt[2], caption = txt[3]) # Meme settings img <- system.file("philosoraptor.jpg", package = "memery") # image lab <- c("What to call my R package?", "Hmm... What? raptr is taken!?", "Noooooo!!!!") # labels size <- c(1.8, 1.5, 2.2) # label sizes, positions, font families and colors pos <- list(w = rep(0.9, 3), h = rep(0.3, 3), x = c(0.45, 0.6, 0.5), y = c(0.95, 0.85, 0.3)) fam <- c("Impact", "serif", "Impact") col <- list(c("black", "orange", "white"), c("white", "black", "black")) gbg <- list(fill = "#FF00FF50", col = "#FFFFFF75") # graph background # Save meme meme(img, lab, "meme.jpg", size = size, family = fam, col = col[[1]], shadow = col[[2]], label_pos = pos, inset = p, inset_bg = gbg, mult = 2)

var vglnk = { key: '949efb41171ac6ec1bf7f206d57e90b8' }; (function(d, t) { var s = d.createElement(t); s.type = 'text/javascript'; s.async = true; s.src = '//cdn.viglink.com/api/vglnk.js'; var r = d.getElementsByTagName(t)[0]; r.parentNode.insertBefore(s, r); }(document, 'script'));

Update on coordinatized or fluid data

Mon, 11/13/2017 - 01:56

(This article was first published on R – Win-Vector Blog, and kindly contributed to R-bloggers)

We have just released a major update of the cdata R package to CRAN.

If you work with R and data, now is the time to check out the cdata package.

Among the changes in the 0.5.* version of cdata package:

  • All coordinatized data or fluid data operations are now in the cdata package (no longer split between the cdata and replyr packages).
  • The transforms are now centered on the more general table driven moveValuesToRowsN() and moveValuesToColumnsN() operators (though pivot and un-pivot are now made available as convenient special cases).
  • All the transforms are now implemented in SQL through DBI (no longer using tidyr or dplyr, though we do include examples of using cdata with dplyr).
  • This is (unfortunately) a user visible API change, however adapting to the changed API is deliberately straightforward.

cdata now supplies very general data transforms on both in-memory data.frames and remote or large data systems (PostgreSQL, Spark/Hive, and so on). These transforms include operators such as pivot/un-pivot that were previously not conveniently available for these data sources (for example tidyr does not operate on such data, despite dplyr doing so).

To help transition we have updated the existing documentation:

The fluid data document is a bit long, as it covers a lot of concepts quickly. We hope to develop more targeted training material going forward.

In summary: cdata theory and package now allow very concise and powerful transformations of big data using R.

var vglnk = { key: '949efb41171ac6ec1bf7f206d57e90b8' }; (function(d, t) { var s = d.createElement(t); s.type = 'text/javascript'; s.async = true; s.src = '//cdn.viglink.com/api/vglnk.js'; var r = d.getElementsByTagName(t)[0]; r.parentNode.insertBefore(s, r); }(document, 'script'));

To leave a comment for the author, please follow the link and comment on their blog: R – Win-Vector Blog. R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

ShinyProxy 1.0.2

Sun, 11/12/2017 - 20:26

(This article was first published on Open Analytics, and kindly contributed to R-bloggers)

ShinyProxy is a novel, open source platform to deploy Shiny apps for the enterprise
or larger organizations. Since our last blog post ten new
releases of ShinyProxy have seen the light, but with the 1.0.2 release it is time
to provide an overview of the lines of development and advances made.

Scalability

ShinyProxy now allows to run 1000s of Shiny apps concurrently on a Docker Swarm cluster.
Moreover, ShinyProxy will automatically detect whether the Docker API URL is a
Docker Engine API or a Swarm cluster API. In other words changing the back-end from
a single Docker host to a Docker Swarm is plug and play.

Single-Sign On

Complex deployments asked for advanced functionality for identity and access management (IAM).
To tackle this we introduced a new authentication mechanism authentication: keycloak
which integrates ShinyProxy with Keycloak, RedHat’s open source IAM solution. Features like single-sign on, identity brokering, user federation etc. are now available for ShinyProxy
deployments.

Larger Applications and Networks

Often times Shiny applications will be offered as part of larger applications that are
written in other languages than R. To enable this type of integrations, we have introduced
functionality to entirely hide the ShinyProxy user interface elements for seamless embedding
as views in bigger user interfaces.

Next to integration within other user interfaces, the underlying Shiny code may need to interact
with applications that live in specific networks. To make sure the Shiny app containers
have network interfaces configured for the right networks, a new docker-network configuration
parameter has been added to the app-specific configurations. Together with Docker volume mounting
for persistence, and the possibility to pass environment variables to Docker containers,
this gives Shiny developers lots of freedom to develop serious applications. An example configuration is given below. A Shiny app communicates over a dedicated Docker network db-net with a database back-end and configuration information is made available to the Shiny app via environment variables that are
read from a configuration file db.env:

- name: db-enabled-app display-name: Shiny App with a Database Persistence Layer description: Shiny App connecting with a Database for Persistence docker-image: registry.openanalytics.eu/public/db-enabled-app:latest docker-network-connections: [ "db-net" ] docker-env-file: db.env groups: [db-app-users] Usage Statistics

Gathering usage statistics was already part of ShinyProxy since version 0.6.0, but was limited
to an InfluxDB back-end so far. Customers asked us to integrate Shiny applications
with MonetDB (and did not want a separate database to store usage statistics) so we developed a MonetDB adapter for version 0.8.4. Configuration has been streamlined with a usage-stats-url and support for DB credentials is now offered through a usage-stats-username and usage-stats-password.

Security

Proper security for ShinyProxy setups of all sizes is highly important and a number
of improvements have been implemented. The ShinyProxy security page
has been extended and has extra content has been added on dealing
with sensitive configuration.
On the authentication side LDAPS support has been around for a long time, but since release 1.0.0
we also offer LDAP+StartTLS support out of the box.

Deployment

Following production deployments for customers, we now also offer RPM files for deployment
on CentOS 7 and RHEL 7, besides the .deb packages for Ubuntu and the platform-independent
JAR files.

Further Information

For all these new features, detailed documentation is provided on http://shinyproxy.io and as always community support on this new release is available at

https://support.openanalytics.eu

Don’t hesitate to send in questions or suggestions and have fun with ShinyProxy!

var vglnk = { key: '949efb41171ac6ec1bf7f206d57e90b8' }; (function(d, t) { var s = d.createElement(t); s.type = 'text/javascript'; s.async = true; s.src = '//cdn.viglink.com/api/vglnk.js'; var r = d.getElementsByTagName(t)[0]; r.parentNode.insertBefore(s, r); }(document, 'script'));

To leave a comment for the author, please follow the link and comment on their blog: Open Analytics. R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

Creating integer64 and nanotime vectors in C++

Sat, 11/11/2017 - 01:00

(This article was first published on Rcpp Gallery, and kindly contributed to R-bloggers)

Motivation: More Precise Timestamps

R has excellent facilities for dealing with both dates and datetime objects.
For datetime objects, the POSIXt time type can be mapped to POSIXct and
its representation of fractional seconds since the January 1, 1970 “epoch” as
well as to the broken-out list representation in POSIXlt. Many add-on
packages use these facilities.

POSIXct uses a double to provide 53 bits of resolution. That is generally
good enough for timestamps down to just above a microsecond, and has served
the R community rather well.

But increasingly, time increments are measure in nanoseconds. Other languages uses a (signed)
64-bit integer to represent (integer) nanoseconds since the epoch. A bit over a year I realized
that we have this in R too—by combining the integer64 type in the
bit64 package by Jens Oehlschlaegel with the
CCTZ-based parser and formatter in my
RcppCCTZ package. And thus the
nanotime package was created.

Leonardo Silvestri then significantly enhanced
nanotime by redoing it as an S4 class.

A simple example:

library(nanotime) n <- nanotime(42) n [1] "1970-01-01T00:00:00.000000042+00:00"

Here we used a single element with value 42, and created a nanotime vector from it—which is
taken to me 42 nanoseconds since the epoch, or basically almost at January 1, 1970.

Step 1: Large Integer Types

So more recently I had a need to efficiently generate such integer vector from int64_t data.
Both Leonardo and Dan helped with
initial discussion and tests. One can either use a reinterpret_cast<> or a straight memcpy as
the key trick in bit64 is to use the underlying 64-bit
double. So we have the space, we just need to ensure we copy the bits rather than their values.
This leads to the following function to create an integer64 vector for use in R at the C++ level:

#include Rcpp::NumericVector makeInt64(std::vector<int64_t> v) { size_t len = v.size(); Rcpp::NumericVector n(len); // storage vehicle we return them in // transfers values 'keeping bits' but changing type // using reinterpret_cast would get us a warning std::memcpy(&(n[0]), &(v[0]), len * sizeof(double)); n.attr("class") = "integer64"; return n; }

This uses the standard trick of setting a class attribute to set an S3 class. Now the values in
v will return to R (exactly how is treated below), and R will treat the vector as integer64
object (provided the bit64 package has been loaded).

Step 2: Nanotime

A nanotime vector is creating using an internal integer64 vector. So the previous functions
almost gets us there. But we need to set the S4 type correctly. So that needed some extra work.
The following function does it:

#include Rcpp::S4 makeNanotime(std::vector<int64_t> v) { size_t len = v.size(); Rcpp::NumericVector n(len); // storage vehicle we return them in // transfers values 'keeping bits' but changing type // using reinterpret_cast would get us a warning std::memcpy(&(n[0]), &(v[0]), len * sizeof(double)); // do what needs to be done for the S4-ness: class, and .S3Class // this was based on careful reading of .Internal(inspect(nanotime(c(0,1)))) Rcpp::CharacterVector cl = Rcpp::CharacterVector::create("nanotime"); cl.attr("package") = "nanotime"; n.attr(".S3Class") = "integer64"; n.attr("class") = cl; SET_S4_OBJECT(n); return Rcpp::S4(n); }

This creates a nanotime vector as a proper S4 object.

Step 3: Returning them R via data.table

The astute reader will have noticed that neither function had an Rcpp::export tag. This is
because of the function argument: int64_t is not representable natively by R, which is why we
need a workaround. Matt Dowle has been very helpful in providing
excellent support for nanotime in data.table
(even after we, ahem, borked it by switching from S3 to S4). This support was of course relatively
straightforward because data.table already had
support for the underlying integer64, and we had the additional formatters etc.

#include // Enable C++11 via this plugin (Rcpp 0.10.3 or later) // [[Rcpp::plugins("cpp11")]] Rcpp::NumericVector makeInt64(std::vector<int64_t> v) { size_t len = v.size(); Rcpp::NumericVector n(len); // storage vehicle we return them in // transfers values 'keeping bits' but changing type // using reinterpret_cast would get us a warning std::memcpy(&(n[0]), &(v[0]), len * sizeof(double)); n.attr("class") = "integer64"; return n; } Rcpp::S4 makeNanotime(std::vector<int64_t> v) { size_t len = v.size(); Rcpp::NumericVector n(len); // storage vehicle we return them in // transfers values 'keeping bits' but changing type // using reinterpret_cast would get us a warning std::memcpy(&(n[0]), &(v[0]), len * sizeof(double)); // do what needs to be done for the S4-ness: class, and .S3Class // this was based on careful reading of .Internal(inspect(nanotime(c(0,1)))) Rcpp::CharacterVector cl = Rcpp::CharacterVector::create("nanotime"); cl.attr("package") = "nanotime"; n.attr(".S3Class") = "integer64"; n.attr("class") = cl; SET_S4_OBJECT(n); return Rcpp::S4(n); } // [[Rcpp::export]] Rcpp::DataFrame getDT() { std::vector<int64_t> d = { 1L, 1000L, 1000000L, 1000000000L }; std::vector<int64_t> ns = { 1510442294123456789L, 1510442295123456789L, 1510442296123456789L, 1510442297123456789L }; Rcpp::DataFrame df = Rcpp::DataFrame::create(Rcpp::Named("int64s") = makeInt64(d), Rcpp::Named("nanos") = makeNanotime(ns)); df.attr("class") = Rcpp::CharacterVector::create("data.table", "data.frame"); return(df); } Example

The following example shows the output from the preceding function:

suppressMessages(library("data.table")) dt <- getDT() print(dt) int64s nanos 1: 1 2017-11-11T23:18:14.123456789+00:00 2: 1000 2017-11-11T23:18:15.123456789+00:00 3: 1000000 2017-11-11T23:18:16.123456789+00:00 4: 1000000000 2017-11-11T23:18:17.123456789+00:00 dt[[1]] integer64 [1] 1 1000 1000000 1000000000 dt[[2]] [1] "2017-11-11T23:18:14.123456789+00:00" [2] "2017-11-11T23:18:15.123456789+00:00" [3] "2017-11-11T23:18:16.123456789+00:00" [4] "2017-11-11T23:18:17.123456789+00:00" diff(dt[[2]]) # here 1e9 nanoseconds between them integer64 [1] 1000000000 1000000000 1000000000

var vglnk = { key: '949efb41171ac6ec1bf7f206d57e90b8' }; (function(d, t) { var s = d.createElement(t); s.type = 'text/javascript'; s.async = true; s.src = '//cdn.viglink.com/api/vglnk.js'; var r = d.getElementsByTagName(t)[0]; r.parentNode.insertBefore(s, r); }(document, 'script'));

To leave a comment for the author, please follow the link and comment on their blog: Rcpp Gallery. R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

Stan Roundup, 10 November 2017

Fri, 11/10/2017 - 21:00

(This article was first published on R – Statistical Modeling, Causal Inference, and Social Science, and kindly contributed to R-bloggers)

We’re in the heart of the academic season and there’s a lot going on.

  • James Ramsey reported a critical performance regression bug in Stan 2.17 (this affects the latest CmdStan and PyStan, not the latest RStan). Sean Talts and Daniel Lee diagnosed the underlying problem as being with the change from char* to std::string arguments—you can’t pass char* and rely on the implicit std::string constructor without the penalty of memory allocation and copying. The reversion goes back to how things were before with const char* arguments. Ben Goodrich is working with Sean Talts to cherry-pick the performance regression fix to Stan that led to a very slow 2.17 release for the other interfaces. RStan 2.17 should be out soon, and it will be the last pre-C++11 release. We’ve already opened the C++11 floodgates on our development branches (yoo-hoo!).

  • Quentin F. Gronau, Henrik Singmann, E. J. Wagenmakers released the bridgesampling package in R. Check out the arXiv paper. It runs with output from Stan and JAGS.

  • Andrew Gelman and Bob Carpenter‘s proposal was approved by Coursera for a four-course introductory concentration on Bayesian statistics with Stan: 1. Bayesian Data Analysis (Andrew), 2. Markov Chain Monte Carlo (Bob), 3. Stan (Bob), 4. Multilevel Regression (Andrew). The plan is to finish the first two by late spring and the second two by the end of the summer in time for Fall 2018. Advait Rajagopal, an economics Ph.D. student at the New School is going to be leading the exercise writing, managing the Coursera platform, and will also TA the first few iterations. We’ve left open the option for us or others to add a prequel and sequel, 0. Probability Theory, and 5. Advanced Modeling in Stan.

  • Dan Simpson is in town and dropped a casual hint that order statistics would clean up the discretization and binning issues that Sean Talts and crew were having with the simulation-based algorithm testing framework (aka the Cook-Gelman-Rubin diagnostics). Lo-and-behold, it works. Michael Betancourt worked through all the math on our (chalk!) board and I think they are now ready to proceed with the paper and recommendations for coding in Stan. As I’ve commented before, one of my favorite parts of working on Stan is watching the progress on this kind of thing from the next desk.

  • Michael Betancourt tweeted about using Andrei Kascha‘s javascript-based vector field visualization tool for visualizing Hamiltonian trajectories and with multiple trajectories, the Hamiltonian flow. Richard McElreath provides a link to visualizations of the fields for light, normal, and heavy-tailed distributions. The Cauchy’s particularly hypnotic, especially with many fewer particles and velocity highlighting.

  • Krzysztof Sakrejda finished the fixes for standalone function generation in C++. This lets you generate a double- and int-only version of a Stan function for inclusion in R (or elsewhere). This will go into RStan 2.18.

  • Sebastian Weber reports that the Annals of Applied Statistics paper, Bayesian aggregation of average data: An application in drug development, was finally formally accepted after two years in process. I think Michael Betancourt, Aki Vehtari, Daniel Lee, and Andrew Gelman are co-authors.

  • Aki Vehtari posted a case study for review on extreme-value analysis and user-defined functions in Stan [forum link — please comment there].

  • Aki Vehtari, Andrew Gelman and Jonah Gabry have made a major revision of Pareto smoothed importance sampling paper with improved algorithm, new Monte Carlo error and convergence rate results, new experiments with varying sample size and different functions. The next loo package release will use the new version.

  • Bob Carpenter (it’s weird writing about myself in the third person) posted a case study for review on Lotka-Volterra predator-prey population dynamics [forum link — please comment there].

  • Sebastian and Sean Talts led us through the MPI design decisions about whether to go with our own MPI map-reduce abstraction or just build the parallel map function we’re going to implement in the Stan language. Pending further review from someone with more MPI experience, the plan’s to implememt the function directly, then worry about generalizing when we have more than one function to implement.

  • Matt Hoffman (inventor of the original NUTS algorithm and co-founder of Stan) dropped in on the Stan meeting this week and let us know he’s got an upcoming paper generalizing Hamiltonian Monte Carlo sampling and that his team at Google’s working on probabilistic modeling for Tensorflow.

  • Mitzi Morris, Ben Goodrich, Sean Talts and I sat down and hammered out the services spec for running the generated quantities block of a Stan program over the draws from a previous sample. This will decouple the model fitting process and the posterior predictive inference process (because the generated quantities block generates a ỹ according to p(ỹ | θ) where ỹ is a vector of predictive quantities and θ is the vector of model parameters. Mitzi then finished the coding and testing and it should be merged soon. She and Ben Bales are working on getting it into CmdStan and Ben Goodrich doesn’t think it’ll be hard to add to RStan.

  • Mitzi Morris extended the spatial case study with leave-one-out cross-validation and WAIC comparisons of the simple Poisson model, a heterogeneous random effects model, a spatial random effects model, and a combined heterogeneous and spatial model with two different prior configurations. I’m not sure if she posted the updated version yet (no, because Aki is also in town and suggested checking Pareto khats, which said no).

  • Sean Talts split out some of the longer tests for less frequent application to get distribution testing time down to 1.5 hours to improve flow of pull requests.

  • Sean Talts is taking another one for the team by leading the charge to auto-format the C++ code base and then proceed with pre-commit autoformat hooks. I think we’re almost there after a spirited discussion of readability and our ability to assess it.

  • Sean Talts also added precompiled headers to our unit and integration tests. This is a worthwhile speedup when running lots of tests and part of the order of magnitude speedup Sean’s eked out.

ps. some edits made by Aki

The post Stan Roundup, 10 November 2017 appeared first on Statistical Modeling, Causal Inference, and Social Science.

var vglnk = { key: '949efb41171ac6ec1bf7f206d57e90b8' }; (function(d, t) { var s = d.createElement(t); s.type = 'text/javascript'; s.async = true; s.src = '//cdn.viglink.com/api/vglnk.js'; var r = d.getElementsByTagName(t)[0]; r.parentNode.insertBefore(s, r); }(document, 'script'));

To leave a comment for the author, please follow the link and comment on their blog: R – Statistical Modeling, Causal Inference, and Social Science. R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

.rprofile: Mara Averick

Fri, 11/10/2017 - 01:00

(This article was first published on rOpenSci - open tools for open science, and kindly contributed to R-bloggers)


Mara Averick is a non-profit data nerd, NBA stats junkie, and most recently, tidyverse developer advocate at RStudio. She is the voice behind two very popular Twitter accounts, @dataandme and @batpigandme. Mara and I discussed sports analytics, how attending a cool conference can change the approach to your career, and how she uses Twitter as a mechanism for self-imposed forced learning.

KO: What is your name, job title, and how long have you been using R? [Note: This interview took place in May 2017. Mara joined RStudio as their tidyverse developer advocate in November 2017.]

MA: My name is Mara Averick, I do consulting, data science, I just say “data nerd at large” because I’ve seen those Venn diagrams and I’m definitely not a data scientist. I used R in high school for fantasy basketball. I graduated from high school in 2003, and then in college used SPSS, and I didn’t use R for a long time. And then I was working with a company that does grant proposals for non-profits, doing all of the demand- and outcome-analysis and it all was in Excel and I thought, we could do better – R might also be helpful for this. It turns out there’s a package for American Community Survey data in R (acs), so that was how I got back into R.

KO: How did you find out about R when you first started using it in high school?

MA: I honestly don’t remember. I didn’t even use RStudio until two years ago. I think it was probably from other fantasy nerds?

KO: Is there an underground R fantasy basketball culture?

MA: Well R for fantasy football is legit. Fantasy Football Analytics is all R modeling.

KO: That’s awesome – so now, do you work with sports analytics? Or is that your personal project/passion?

MA: A little bit of both, I worked for this startup called Stattleship (@stattleship). Because I’ll get involved with anything if there’s a good pun involved… and so we were doing sports analytics work that kind of ended up shifting more in a marketing direction. I still do consulting with the head data scientist [Tanya Cashorali] for that [at TCB Analytics]. Some of the analysis/consulting will be with companies who are doing either consumer products for sports or data journalism stuff around sports analytics.

KO: How often do you use R now?

MA: Oh, I use R like every day. I use it… I don’t use Word any more. [Laughter] Yeah so one of the things about basketball is that there are times of the year where there are games every day. So that’s been my morning workflow for a while – scraping basketball data.

KO: So you get up every morning and scrape what’s new in Basketball?

MA: Yeah! So I end up in RStudio bright and early (often late, as well).

KO: So is that literally what the first half hour of your day looks like?

MA: No, so incidentally that’s kind of how this Twitter thing got started. My dog has long preceded me on Twitter and the internet at large, he’s kind of an internet famous dog @batpigandme. There’s an application called Buffer which allows you to schedule tweets and facebook page posts, which was most of Batpig’s traffic – facebook page visits from Japan. And so I had this morning routine (started in the winter when I had one of those light things you sit in front of for a certain number of minutes) where I would wake up and schedule batpig posts while I’m sitting there and read emails. And that ended up being a nice morning workflow thing.

I went to a Do Good Data conference, which is a Data Analysts for Social Good (@DA4SG) event, just over two years ago, and everyone there was giving out their twitter handles, and I was like, oh – maybe people who aren’t dogs also use Twitter? [Laughter] So that was how I ended up creating my own account @dataandme independent from Batpig.

KO: What happened after you went to this conference? Was it awesome, did it inspire you?

MA: Yeah so, I was the stats person at the company I was working at. And I didn’t realize there was all this really awesome work being done with really rigorous evaluation that wasn’t necessarily federal grant proposal stuff. So I was really inspired by that and started learning more about what other people were doing, some of it in R, some of it not. I kept in touch with some of the people from that conference. And then NBA Twitter is also a thing it turns out, and NBA, R/Statistics is also a really big thing so that was kind of what pulled me in. And it was really fun. A lot of interesting projects and people that I work with were all through that [Twitter] which still surprises me – that I can read a book and tell the author something and they care? It’s weird.

I like to make arbitrary rules for myself, one of the things is I don’t tweet stuff that I haven’t read.

KO: Everyone loves your twitter account. How do you find and curate the things you end up posting about?

MA: I like to make arbitrary rules for myself, one of the things is I don’t tweet stuff that I haven’t read. I like to learn new things and/or I have to learn new things every day so I basically started scheduling [tweets] as a way to make myself read the things that I want to read and get back to.

KO: Wait, so you schedule a tweet and then you’re like, okay well this is my deadline to read this thing – or I’ll be a liar.

MA: Totally.

KO: Whoa that’s awesome.

MA: I’ve also never not finished a book in my life. It’s one of my rules, I’m really strict about it.

KO: That’s a lot of pressure!

MA: So that was kind of how it started out – especially because I didn’t even know all the stuff I didn’t know. Then, as I’ve used R more and more, there’s stuff that I’ve just happened to read because I don’t know what I’m doing.

KO: The more you learn the more you can learn.

MA: Yeah so now a lot of the stuff [tweets] is stuff I end up reading over the course of the day and then add it [to the queue]. Or it’s just stuff I’ve already read when I feel like being lazy.

KO: Do you have side projects other than the basketball/sports stuff?

MA: I actually majored in science and technology studies, which means I was randomly trained in ethical/legal/social implications of science. So I’m working on some data ethics projects which unfortunately I can’t talk about. And then my big side project for total amusement was this D3.js in Action analysis of Archer which is a cartoon that I watch. But that’s also how I learned really how to use tidytext. So then I ended up doing a technical review for David [Robinson] and Julia’s [Silge] book Text Mining with R: A Tidy Approach. It was super fun. So yeah, I always have a bunch of random side projects going on.

KO: How is your work-life balance?

MA: It’s funny because I like what I do. So I don’t always know where that starts and ends. And I’m really bad at capitalism. It never occurs to me that I should be paid for doing some things. Especially if it involves open data and open source – surely you can’t charge for that? But I read a lot of stuff that’s not R too. I think I’m getting sort of a balance, but I’m not sure.

KO: Switching back to your job-job now. Are you on a team, are you remote, are you in an office, what are the logistics like?

MA: Kind of all of the above. In my old job I was on a team but I was the only person doing anything data related. And I developed some really lazy habits from that – really ugly code and committing terrible stuff to git. But with this NBA project I end up working with a lot of different people (who are also basketball-stat nerds).

KO: Do you work with people who are employed by the actual NBA teams, or just people who are really interested in the subject?

MA: No, so there is an unfortunate attrition of people whom I work with when they get hired by teams – which is not unfortunate, it’s awesome, but then they can no longer do anything with us. So that’s collaborative work but I don’t work on a team anymore.

KO: So you don’t have daily stand-ups or anything.

MA: No, no. I could probably benefit from that, but my goal is never to be 100% remote. After I went to that first data conference, I felt like being around all these people who are so much smarter than I am, and know so much more than I do is intimidating, but I also learned so much. And I learned so many things I was doing, not wrong, but inefficiently. I still learn about 80 things I’m doing inefficiently every day.

My goal right now – stop holding on to all of my projects that are not as done as I want them to be, and will never be done.

KO: Do you have set beginnings and endings to projects? How many projects are you juggling at a given time?

MA: After doing federal grant proposals, it doesn’t feel like anything is a deadline compared to that. They don’t care if your house burned down if it’s not in at the right time. So nothing feels as hard and fast as that. There are certain things like the NBA that —

KO: There are timely things.

MA: Yeah, and then sometimes we’ll just set arbitrary deadlines, just to kind of get out of a cycle of trying to perfect it, which I fall deeply into. Yeah so that’s kind of a little bit of my goal right now – stop holding on to all of my projects that are not as done as I want them to be, and will never be done. With the first iteration of this Archer thing I literally spent three days trying to get this faceted bar chart thing to sort in multiple ways and was super frustrated and then I tweeted something about it and immediately David Robinson responded with precisely what I needed and would have never figured out. So I’m working on doing that more. And also because it’s so helpful to me when other people do that.

KO: How did you get hooked up with Julia and David, just through Twitter?

MA: Yeah! So Julia I’d met at Open Vis Conf, David I’d read his blog about a million lines of bad code – it was open on my iPad for like years because I loved it so much, and still do. And yeah so again as this super random twitter-human that I feel like I am, I do end up meeting and doing things with cool people who are super smart and do really cool things.

KO: It’s impressive how much you post and not just that, but it’s really evident that you care. People can tell that this isn’t just someone who reposts a million things day.

MA: I mean it’s totally selfish, don’t get me wrong. But I’m super glad that it’s helpful to other people too. It gives me so much anxiety to think that people might think I know how to do all the things that I post, which I don’t, that’s why I had to read them – but even when I read them, sometimes I don’t know. The R community is pretty awesome, at least the parts of it that I know; which is not universally true of any community of any group of scientists. R Twitter is super-super helpful. And that was evident really quickly, at least to me.

My plea to everyone who has a blog is to put their Twitter handle somewhere on it.

KO: What are some of your favorite things on the internet? Blogs, Twitter Accounts, Podcasts…

MA: I have never skipped one of Julia Silge’s blog posts. Her posts are always something that I know I should learn how to do. Both she and D-Rob [David Robinson] know their stuff and they write really well. So those are two blogs and follows that I love. Bob Rudis – almost daily, I can’t believe how quickly he churns stuff out. R-Bloggers is a great way to discover new stuff. Dr. Simon J [Simon Jackson] – I literally think of people by their twitter handles [@drsimonj], and there are so many others.

Every day I’m amazed by all the stuff I didn’t know existed. And also there’s stuff that people wrote three or four years ago. A lot of the data vis stuff I end up finding from weird angles. So those are some of my favorites – I’m sure there are more. Oh! Thomas Lin Pedersen, Data Imaginist is his blog. There are so many good blogs. My plea to everyone who has a blog is to put their twitter handle somewhere on it. I actually try really hard to find attribution stuff. Every now and then I get it really wrong and it’ll be someone who has nothing to do with it but who has the same name. There’s a bikini model who has the same name as someone who I said wrote a thing – which I vetted it too! I was like, well she’s multi-faceted, good for her! And then somebody was like, I don’t think that’s the right one. Oops! I have to say that that’s the one thing that Medium nailed – when you click share it gives you their twitter handle. If you have a blog, put your twitter handle there so I don’t end up attributing it to a bikini model.

var vglnk = { key: '949efb41171ac6ec1bf7f206d57e90b8' }; (function(d, t) { var s = d.createElement(t); s.type = 'text/javascript'; s.async = true; s.src = '//cdn.viglink.com/api/vglnk.js'; var r = d.getElementsByTagName(t)[0]; r.parentNode.insertBefore(s, r); }(document, 'script'));

To leave a comment for the author, please follow the link and comment on their blog: rOpenSci - open tools for open science. R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

Gold-Mining – Week 10 (2017)

Fri, 11/10/2017 - 00:01

(This article was first published on R – Fantasy Football Analytics, and kindly contributed to R-bloggers)

Week 10 Gold Mining and Fantasy Football Projection Roundup now available. Go get that free agent gold!

The post Gold-Mining – Week 10 (2017) appeared first on Fantasy Football Analytics.

var vglnk = { key: '949efb41171ac6ec1bf7f206d57e90b8' }; (function(d, t) { var s = d.createElement(t); s.type = 'text/javascript'; s.async = true; s.src = '//cdn.viglink.com/api/vglnk.js'; var r = d.getElementsByTagName(t)[0]; r.parentNode.insertBefore(s, r); }(document, 'script'));

To leave a comment for the author, please follow the link and comment on their blog: R – Fantasy Football Analytics. R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

Recap: EARL Boston 2017

Thu, 11/09/2017 - 23:30

(This article was first published on Revolutions, and kindly contributed to R-bloggers)

By Emmanuel Awa, Francesca Lazzeri and Jaya Mathew, data scientists at Microsoft

A few of us got to attend EARL conference in Boston last week which brought together a group of talented users of R from academia and industry. The conference highlighted various Enterprise Applications of R. Despite being a small conference, the quality of the talks were great and showcased various innovative ways in using some of the newer packages available for use in the R language. Some of the attendees were veteran R users while some were new comers to the R language, so there was a mix in the level of proficiency in using the R language.  

R currently has a vibrant community of users and there are over 11,000 open source packages. The conference also encouraged women to join their local chapter for R Ladies with the aim of increasing the participation of women at R conferences and increasing the number of women who contribute R packages to the open source community.

The team from Microsoft got to showcase some of our tools namely the Microsoft ML Server and our commitment to support the open language R. Some of the Microsoft earned sessions were:

  1. Deep Learning with R – Francesca Lazzeri
  2. Enriching your Customer profile at Scale using R Server – Jaya Mathew, Emmanuel Awa & Robert Alexander
  3. Developing Deep Learning Applications with CNTK – Ali Zaidi

Microsoft was a sponsor at the event and had a booth at the conference where there was a live demo using the Cognitive Services APIs — namely the Face API — to detect age, gender, facial expression.

In addition, some of the other interesting talks were:

  1. When and Why to Use Shiny for Commercial Applications – Tanya Cashorali
  2. HR Analytics: Using Machine Learning to Predict Employee Turnover – Matt Dancho
  3. Using R to Automate the Classification of E-commerce Products – Aidan Boland
  4. Leveraging More Data using Data Fusion in R – Michael Conklin

All the slides from the conference will be available at the conference website shortly. For photos from the conference, visit EARL’s twitter page.

var vglnk = { key: '949efb41171ac6ec1bf7f206d57e90b8' }; (function(d, t) { var s = d.createElement(t); s.type = 'text/javascript'; s.async = true; s.src = '//cdn.viglink.com/api/vglnk.js'; var r = d.getElementsByTagName(t)[0]; r.parentNode.insertBefore(s, r); }(document, 'script'));

To leave a comment for the author, please follow the link and comment on their blog: Revolutions. R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

Announcing “Introduction to the Tidyverse”, my new DataCamp course

Thu, 11/09/2017 - 19:00

(This article was first published on Variance Explained, and kindly contributed to R-bloggers)

For the last few years I’ve been encouraging a particular approach to R education, particularly teaching the dplyr and ggplot2 packages first and introducing real datasets early on. This week I’m excited to announce the next step: the release of Introduction to the Tidyverse, my new interactive course on the DataCamp platform.

The course is an introduction to the dplyr and ggplot2 packages through an analysis of the Gapminder dataset, enabling students to explore and visualize country statistics over time. It’s designed so that people can take it even if they have no previous experience in R, or if they’ve learned some (like in DataCamp’s free introduction) but aren’t familiar with dplyr, ggplot2, or how they fit together.

I’ve published two DataCamp courses before, Exploratory Data Analysis: Case Study (which makes a great followup to this new one) and Foundations of Probability. But I’m particularly excited about this one because the topic is so important to me. Here I’ll share a bit of my thinking behind the course and we made the decisions we did.

How “Intro to the Tidyverse” started

In early July I was at the useR 2017 conference in Brussels (where I gave a talk on R’s growth as seen in Stack Overflow data). A lot of the attendees were experienced teachers, and a common theme in my conversations was about whether it made sense to teach tidyverse packages like dplyr and ggplot2 before teaching base R syntax.

.@minebocek agrees: teach tidyverse to beginners first #UseR2017 pic.twitter.com/vxjCjNrDz0

— David Robinson (@drob) July 5, 2017

These conversations encouraged me to publish Teach the tidyverse to beginners that week. But the most notable conversations I had were with Chester Ismay, who had recently joined DataCamp as a Curriculum Lead, and with the rest of their content team (like Nick Carchedi and Richie Cotton). Chester and I have a lot of alignment in our teaching philosophies, and we realized the DataCamp platform offers a great opportunity to try a tidyverse-first course at a large scale.

The months since have been an exciting process of planning, writing, and executing the course. I enjoyed building my first two DataCamp courses, but this was a particularly thrilling experience, because I grew to realize I’d been planning this course for a while, almost subconsciously. In early October I filmed the video in NYC, it was released almost four months to the day after Chester and I first had the idea.

The curriculum

I realized while I was writing the “teach tidyverse first” post that while I had taught R to beginners with dplyr/ggplot2 about a dozen times in my career (a mix of graduate courses, seminars, and workshops), I hadn’t shared my curriculum in any standardized way.1 This means the conversation has always been a bit abstract. What exactly do I mean by teaching dplyr first, and when do other programming concepts get introduced along the way?

We put a lot of thought into the ordering of topics. DataCamp courses are divided into four chapters, each containing several videos and about 10-15 exercises.

  1. Data Wrangling. Learn to do three things with a table: filter for particular observations, arrange the observations in a desired order, and mutate to add or change a column. You’ll see how each of these steps lets you answer questions about your data.

  2. Data Visualization. Learn the essential skill of data visualization, using the ggplot2 package. Visualization and maniuplation are often intertwined, so you’ll see how the dplyr and ggplot2 packages work closely together to create informative graphs.

  3. Grouping and summarizing. We may be interested in aggregations of the data, such as the average life expectancy of all countries within each year. Here you’ll learn to use the group by and summarize verbs, which collapse large datasets into manageable summaries.

  4. Types of visualizations. Learn to create line plots, bar plots, histograms, and boxplots. You’ll see how each plot needs different kinds of data manipulation to prepare for it, and understand the different roles of each of these plot types in data analysis.

This ordering is certainly not the only way to teach R. But I like how it achieves a particular set of goals.

  • It not only introduces dplyr and ggplot2, but show how they work together. This is the reason we alternated chapters in a dplyr-ggplot2-dplyr-ggplot2 order, to appreciate how filtering, grouping, and summarizing data can feed directly into visualizations. This is one distinction between this course and the existing (excellent) dplyr and ggplot2 courses on DataCamp.
  • Get students doing powerful things quickly. This is a major theme of my tidyverse-first post and a sort of obsession of mine. The first exercise in the course introduces the gapminder dataset, discussing the data before writing a single line of code. And the last chapter in particular teaches students to create four different types of graphs, and shows how once you understand the grammar of graphics you can make a variety of visualizations.
  • Teach an approach that scales to real projects. There are hundreds of important topics students don’t learn in the course, ranging from matrices to lists to loops. But the particular skills they do learn aren’t toy examples or bad habits that need to be unlearned. I do use the functions and graphs taught in the course every day, and Julia Silge and I wrote a book using very similar principles.
  • Beginners don’t need any previous experience in R, or even in programming. We don’t assume someone’s familiar even with the basics in advance, even fundamentals such as variable assignment (assignment is introduced at the start of chapter 2; until then exploration is done interactively). It doesn’t hurt to have a course like Introduction to R under one’s belt first, but it’s not mandatory.

Incidentally, the course derives a lot of inspiration from the excellent book R for Data Science (R4DS), by Hadley Wickham and Garrett Grolemund. Most notably R4DS also uses the gapminder dataset to teach dplyr (thanks to Jenny Bryan’s R package it’s a bit of a modern classic).2 I think the two resources complement each other: some people who prefer learning from videos and interactive exercises than from books, and vice versa. Books have an advantage of having space to go deeper (for instance, we don’t teach select, grouped mutates, or statistical transformations), while courses are useful for having a built-in self-evaluation mechanism. Be sure to check out this page for more resources on learning tidyverse tools.

What’s next

I’m excited about developing my fourth DataCamp course with Chester (continuing my probability curriculum). And I’m particularly interested in seeing how the course is received, and whether people who complete this course continue to succeed in their data science journey.

I have a lot of opinions about R education, but not a lot of data about it, and I’m considering this an experiment to see how the tidyverse-first approach works in a large-scale interactive course. I’m looking forward both to the explicit data that DataCamp can collect, and to hear feedback from students and other instructors. So I hope to hear what you think!

  1. The last online course I’ve recorded for beginners, which I recorded in 2014, takes a very different philosophy than I use now, especially in the first chapter. 

  2. One of the differences is that we introduce the first dplyr operations before introducing ggplot2 (because it’s difficult to visualize gapminder data without filtering it first, while R4DS uses a different dataset to teach ggplot2). 

var vglnk = { key: '949efb41171ac6ec1bf7f206d57e90b8' }; (function(d, t) { var s = d.createElement(t); s.type = 'text/javascript'; s.async = true; s.src = '//cdn.viglink.com/api/vglnk.js'; var r = d.getElementsByTagName(t)[0]; r.parentNode.insertBefore(s, r); }(document, 'script'));

To leave a comment for the author, please follow the link and comment on their blog: Variance Explained. R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

R live class | R with Database and Big Data | Nov 21-22 Milan

Thu, 11/09/2017 - 16:43

(This article was first published on R blog | Quantide - R training & consulting, and kindly contributed to R-bloggers)

 

R with Database and Big Data is our fifth course of the autumn term. It takes place in November 21-22 in a location close to Milano Lima.
During this course you will see how to connect databases through R, and how to use dplyr with databases. Then you will become familiar with the basic IT infrastructures behind big data, the R toolbox to access and manipulate big data structures, the sparkML libraries for out of memory data modeling and ad hoc techniques for big data visualization. It presents the latest techniques to work with big data within the R environment, which means manipulating, analyzing, visualizing big data structures that exceed the single computer capacity in a true R style.
No previous knowledge of big data technology is required, while a basic knowledge of R is necessary.

R with Database and Big Data: Outlines

– Introduction to databases
– Connecting databases through R: ODBC and RSQLite
– Data manipulation with dplyr
– Using dplyr with databases
– Introduction to distributed infrastructure
– Spark and Hadoop
– Sparklyr
– Distributed data manipulation with dplyr
– SparkML

R with Database and Big Data is organized by the R training and consulting company Quantide and is taught in Italian, while all the course materials are in English.

This course is for max 6 attendees.

Location

The course location is 550 mt. (7 minutes on walk) from Milano central station and just 77 mt. (1 minute on walk) from Lima subway station.

Registration

If you want to reserve a seat go to: FAQ, detailed program and tickets.

Other R courses | Autumn term

You can find an overview of all our courses here. Next dates will be:

  • November 29-30Professional R Programming. Organise, document and test your code: write efficient functions, improve the code reproducibility and build R packages. Reserve now!

In case you are a group of people interested in more than one class, write us at training[at]quantide[dot]com! We can arrange together a tailor-made course, picking all the topics that are interesting for your organization and dropping the rest.

The post R live class | R with Database and Big Data | Nov 21-22 Milan appeared first on Quantide – R training & consulting.

var vglnk = { key: '949efb41171ac6ec1bf7f206d57e90b8' }; (function(d, t) { var s = d.createElement(t); s.type = 'text/javascript'; s.async = true; s.src = '//cdn.viglink.com/api/vglnk.js'; var r = d.getElementsByTagName(t)[0]; r.parentNode.insertBefore(s, r); }(document, 'script'));

To leave a comment for the author, please follow the link and comment on their blog: R blog | Quantide - R training & consulting. R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

How Happy is Your Country? — Happy Planet Index Visualized

Thu, 11/09/2017 - 15:00

The Happy Planet Index (HPI) is an index of human well-being and environmental impact that was introduced by NEF, a UK-based economic think tank promoting social, economic and environmental justice. It ranks 140 countries according to “what matters most — sustainable wellbeing for all”.

This is how HPI is calculated:

It’s tells us “how well nations are doing at achieving long, happy, sustainable lives”. The index is weighted to give progressively higher scores to nations with lower ecological footprints.

I downloaded the 2016 dataset from the HPI website. Inspired by “Web Scraping and Applied Clustering Global Happiness and Social Progress Index” written by Dr. Mesfin Gebeyaw, I am interested to find correlations among happiness, wealth, life expectancy, footprint and so on, and then put these 140 countries into different clusters, according to the above measures. I wonder whether the findings will surprise me.

Note: for those who want to see the results right now, I have created a Tableau story, that can be accessed from here.

Load the packages

library(dplyr) library(plotly) library(stringr) library(cluster) library(FactoMineR) library(factoextra) library(ggplot2) library(reshape2) library(ggthemes) library(NbClust)

Data Preprocessing

library(xlsx) hpi <- read.xlsx('hpi-data-2016.xlsx',sheetIndex = 5, header = TRUE) # Remove the unnecessary columns hpi <- hpi[c(3:14)] # remove footer hpi <- hpi[-c(141:158), ] # rename columns hpi <- hpi[,c(grep('Country', colnames(hpi)), grep('Region', colnames(hpi)), grep('Happy.Planet.Index', colnames(hpi)), grep('Average.Life..Expectancy', colnames(hpi)), grep('Happy.Life.Years', colnames(hpi)), grep('Footprint..gha.capita.', colnames(hpi)), grep('GDP.capita...PPP.', colnames(hpi)), grep('Inequality.of.Outcomes', colnames(hpi)), grep('Average.Wellbeing..0.10.', colnames(hpi)), grep('Inequality.adjusted.Life.Expectancy', colnames(hpi)), grep('Inequality.adjusted.Wellbeing', colnames(hpi)), grep('Population', colnames(hpi)))] names(hpi) <- c('country', 'region','hpi_index', 'life_expectancy', 'happy_years', 'footprint', 'gdp', 'inequality_outcomes', 'wellbeing', 'adj_life_expectancy', 'adj_wellbeing', 'population') # change data type hpi$country <- as.character(hpi$country) hpi$region <- as.character(hpi$region)

The structure of the data

str(hpi) 'data.frame': 140 obs. of 12 variables: $ country : chr "Afghanistan" "Albania" "Algeria" "Argentina" ##... $ region : chr "Middle East and North Africa" "Post-communist" ##"Middle East and North Africa" "Americas" ... $ hpi_index : num 20.2 36.8 33.3 35.2 25.7 ... $ life_expectancy : num 59.7 77.3 74.3 75.9 74.4 ... $ happy_years : num 12.4 34.4 30.5 40.2 24 ... $ footprint : num 0.79 2.21 2.12 3.14 2.23 9.31 6.06 0.72 5.09 7.44 ... $ gdp : num 691 4247 5584 14357 3566 ... $ inequality_outcomes: num 0.427 0.165 0.245 0.164 0.217 ... $ wellbeing : num 3.8 5.5 5.6 6.5 4.3 7.2 7.4 4.7 5.7 6.9 ... $ adj_life_expectancy: num 38.3 69.7 60.5 68.3 66.9 ... $ adj_wellbeing : num 3.39 5.1 5.2 6.03 3.75 ... $ population : num 29726803 2900489 37439427 42095224 2978339 ...

The summary

summary(hpi[, 3:12]) hpi_index life_expectancy happy_years footprint Min. :12.78 Min. :48.91 Min. : 8.97 Min. : 0.610 1st Qu.:21.21 1st Qu.:65.04 1st Qu.:18.69 1st Qu.: 1.425 Median :26.29 Median :73.50 Median :29.40 Median : 2.680 Mean :26.41 Mean :70.93 Mean :30.25 Mean : 3.258 3rd Qu.:31.54 3rd Qu.:77.02 3rd Qu.:39.71 3rd Qu.: 4.482 Max. :44.71 Max. :83.57 Max. :59.32 Max. :15.820 gdp inequality_outcomes wellbeing adj_life_expectancy Min. : 244.2 Min. :0.04322 Min. :2.867 Min. :27.32 1st Qu.: 1628.1 1st Qu.:0.13353 1st Qu.:4.575 1st Qu.:48.21 Median : 5691.1 Median :0.21174 Median :5.250 Median :63.41 Mean : 13911.1 Mean :0.23291 Mean :5.408 Mean :60.34 3rd Qu.: 15159.1 3rd Qu.:0.32932 3rd Qu.:6.225 3rd Qu.:72.57 Max. :105447.1 Max. :0.50734 Max. :7.800 Max. :81.26 adj_wellbeing population Min. :2.421 Min. :2.475e+05 1st Qu.:4.047 1st Qu.:4.248e+06 Median :4.816 Median :1.065e+07 Mean :4.973 Mean :4.801e+07 3rd Qu.:5.704 3rd Qu.:3.343e+07 Max. :7.625 Max. :1.351e+09 ggplot(hpi, aes(x=gdp, y=life_expectancy)) + geom_point(aes(size=population, color=region)) + coord_trans(x = 'log10') + geom_smooth(method = 'loess') + ggtitle('Life Expectancy and GDP per Capita in USD log10') + theme_classic()

Gives this plot:

After log transformation, the relationship between GDP per capita and life expectancy is more clear and looks relatively strong. These two variables are correlated. The Pearson correlation between this two variable is reasonably high, at approximate 0.62.

cor.test(hpi$gdp, hpi$life_expectancy) Pearson's product-moment correlation data: hpi$gdp and hpi$life_expectancy t = 9.3042, df = 138, p-value = 2.766e-16 alternative hypothesis: true correlation is not equal to 0 95 percent confidence interval: 0.5072215 0.7133067 sample estimates: cor 0.6208781 ggplot(hpi, aes(x=life_expectancy, y=hpi_index)) + geom_point(aes(size=population, color=region)) + geom_smooth(method = 'loess') + ggtitle('Life Expectancy and Happy Planet Index Score') + theme_classic()

Gives this plot:

Many countries in Europe and Americas end up with middle-to-low HPI index probably because of their big carbon footprints, despite the long life expectancy.

ggplot(hpi, aes(x=gdp, y=hpi_index)) + geom_point(aes(size=population, color=region)) + geom_smooth(method = 'loess') + ggtitle('GDP per Capita(log10) and Happy Planet Index Score') + coord_trans(x = 'log10')

Gives this plot:

Money can’t buy happiness. The correlation between GDP and Happy Planet Index score is indeed very low, at about 0.11.

cor.test(hpi$gdp, hpi$hpi_index) Pearson's product-moment correlation data: hpi$gdp and hpi$hpi_index t = 1.3507, df = 138, p-value = 0.179 alternative hypothesis: true correlation is not equal to 0 95 percent confidence interval: -0.05267424 0.27492060 sample estimates: cor 0.1142272 Scale the data

An important step of meaningful clustering consists of transforming the variables such that they have mean zero and standard deviation one.

hpi[, 3:12] <- scale(hpi[, 3:12]) summary(hpi[, 3:12]) hpi_index life_expectancy happy_years footprint Min. :-1.86308 Min. :-2.5153 Min. :-1.60493 Min. :-1.1493 1st Qu.:-0.71120 1st Qu.:-0.6729 1st Qu.:-0.87191 1st Qu.:-0.7955 Median :-0.01653 Median : 0.2939 Median :-0.06378 Median :-0.2507 Mean : 0.00000 Mean : 0.0000 Mean : 0.00000 Mean : 0.0000 3rd Qu.: 0.70106 3rd Qu.: 0.6968 3rd Qu.: 0.71388 3rd Qu.: 0.5317 Max. : 2.50110 Max. : 1.4449 Max. : 2.19247 Max. : 5.4532 gdp inequality_outcomes wellbeing adj_life_expectancy Min. :-0.6921 Min. :-1.5692 Min. :-2.2128 Min. :-2.2192 1st Qu.:-0.6220 1st Qu.:-0.8222 1st Qu.:-0.7252 1st Qu.:-0.8152 Median :-0.4163 Median :-0.1751 Median :-0.1374 Median : 0.2060 Mean : 0.0000 Mean : 0.0000 Mean : 0.0000 Mean : 0.0000 3rd Qu.: 0.0632 3rd Qu.: 0.7976 3rd Qu.: 0.7116 3rd Qu.: 0.8221 Max. : 4.6356 Max. : 2.2702 Max. : 2.0831 Max. : 1.4059 adj_wellbeing population Min. :-2.1491 Min. :-0.2990 1st Qu.:-0.7795 1st Qu.:-0.2740 Median :-0.1317 Median :-0.2339 Mean : 0.0000 Mean : 0.0000 3rd Qu.: 0.6162 3rd Qu.:-0.0913 Max. : 2.2339 Max. : 8.1562

A simple correlation heatmap

qplot(x=Var1, y=Var2, data=melt(cor(hpi[, 3:12], use="p")), fill=value, geom="tile") + scale_fill_gradient2(limits=c(-1, 1)) + theme(axis.text.x = element_text(angle = 90, hjust = 1)) + labs(title="Heatmap of Correlation Matrix", x=NULL, y=NULL)

Gives this plot:

Principal Component Analysis (PCA)

PCA is a procedure for identifying a smaller number of uncorrelated variables, called “principal components”, from a large set of data. The goal of principal components analysis is to explain the maximum amount of variance with the minimum number of principal components.

hpi.pca <- PCA(hpi[, 3:12], graph=FALSE) print(hpi.pca) **Results for the Principal Component Analysis (PCA)** The analysis was performed on 140 individuals, described by 10 variables *The results are available in the following objects: name description 1 "$eig" "eigenvalues" 2 "$var" "results for the variables" 3 "$var$coord" "coord. for the variables" 4 "$var$cor" "correlations variables - dimensions" 5 "$var$cos2" "cos2 for the variables" 6 "$var$contrib" "contributions of the variables" 7 "$ind" "results for the individuals" 8 "$ind$coord" "coord. for the individuals" 9 "$ind$cos2" "cos2 for the individuals" 10 "$ind$contrib" "contributions of the individuals" 11 "$call" "summary statistics" 12 "$call$centre" "mean of the variables" 13 "$call$ecart.type" "standard error of the variables" 14 "$call$row.w" "weights for the individuals" 15 "$call$col.w" "weights for the variables" eigenvalues <- hpi.pca$eig head(eigenvalues) eigenvalue percentage of variance cumulative percentage of variance comp 1 6.66741533 66.6741533 66.67415 comp 2 1.31161290 13.1161290 79.79028 comp 3 0.97036077 9.7036077 89.49389 comp 4 0.70128270 7.0128270 96.50672 comp 5 0.24150648 2.4150648 98.92178 comp 6 0.05229306 0.5229306 99.44471

Interpretation:
* The proportion of variation retained by the principal components was extracted above.
* Eigenvalues is the amount of variation retained by each PC. The first PC corresponds to the maximum amount of variation in the data set. In this case, the first two principal components are worthy of consideration because A commonly used criterion for the number of factors to rotate is the eigenvalues-greater-than-one rule proposed by Kaiser (1960).

fviz_screeplot(hpi.pca, addlabels = TRUE, ylim = c(0, 65))

Gives this plot:

The scree plot shows us which components explain most of the variability in the data. In this case, almost 80% of the variances contained in the data are retained by the first two principal components.

head(hpi.pca$var$contrib) Dim.1 Dim.2 Dim.3 Dim.4 Dim.5 hpi_index 3.571216 50.96354921 5.368971166 2.1864830 5.28431372 life_expectancy 12.275001 2.29815687 0.002516184 18.4965447 0.31797242 happy_years 14.793710 0.01288175 0.027105103 0.7180341 0.03254368 footprint 9.021277 24.71161977 2.982449522 0.4891428 7.62967135 gdp 9.688265 11.57381062 1.003632002 2.3980025 72.49799232 inequality_outcomes 13.363651 0.30494623 0.010038818 9.7957329 2.97699333

* Variables that are correlated with PC1 and PC2 are the most important in explaining the variability in the data set.
* The contribution of variables was extracted above: The larger the value of the contribution, the more the variable contributes to the component.

fviz_pca_var(hpi.pca, col.var="contrib", gradient.cols = c("#00AFBB", "#E7B800", "#FC4E07"), repel = TRUE )

Gives this plot:

This highlights the most important variables in explaining the variations retained by the principal components.

Group countries by wealth, development, carbon emissions, and happiness

When using clustering algorithms, k must be specified. I use the following method to help to find the best k.

number <- NbClust(hpi[, 3:12], distance="euclidean", min.nc=2, max.nc=15, method='ward.D', index='all', alphaBeale = 0.1) *** : The Hubert index is a graphical method of determining the number of clusters. In the plot of Hubert index, we seek a significant knee that corresponds to a significant increase of the value of the measure i.e the significant peak in Hubert index second differences plot. *** : The D index is a graphical method of determining the number of clusters. In the plot of D index, we seek a significant knee (the significant peak in Dindex second differences plot) that corresponds to a significant increase of the value of the measure. ******************************************************************* * Among all indices: * 4 proposed 2 as the best number of clusters * 7 proposed 3 as the best number of clusters * 1 proposed 5 as the best number of clusters * 5 proposed 6 as the best number of clusters * 3 proposed 10 as the best number of clusters * 3 proposed 15 as the best number of clusters ***** Conclusion ***** * According to the majority rule, the best number of clusters is 3

I will apply K=3 in the following steps:

set.seed(2017) pam <- pam(hpi[, 3:12], diss=FALSE, 3, keep.data=TRUE) fviz_silhouette(pam) cluster size ave.sil.width 1 1 43 0.46 2 2 66 0.32 3 3 31 0.37

Number of countries assigned in each cluster

hpi$country[pam$id.med] [1] "Liberia" "Romania" "Ireland"

This prints out one typical country represents each cluster.

fviz_cluster(pam, stand = FALSE, geom = "point", ellipse.type = "norm")

Gives this plot:

It is always a good idea to look at the cluster results, see how these three clusters were assigned.

A World map of three clusters

hpi['cluster'] <- as.factor(pam$clustering) map <- map_data("world") map <- left_join(map, hpi, by = c('region' = 'country')) ggplot() + geom_polygon(data = map, aes(x = long, y = lat, group = group, fill=cluster, color=cluster)) + labs(title = "Clustering Happy Planet Index", subtitle = "Based on data from:http://happyplanetindex.org/", x=NULL, y=NULL) + theme_minimal()

Gives this plot:

Summary

The Happy Planet index has been criticized for weighting the ecological footprint too heavily; and the ecological footprint is a controversial concept. In addition, the Happy Planet Index has been misunderstood as a measure of personal “happiness”, when in fact, it is a measure of the “happiness” of the planet.

Nevertheless, the Happy Planet Index has been a consideration in the political area. For us, it is useful because it combines well being and environmental aspects, and it is simple and understandable. Also, it is available online, so we can create a story out of it.

Source code that created this post can be found here. I am happy to hear any feedback or questions.

    Related Post

    1. Exploring, Clustering, and Mapping Toronto’s Crimes
    2. Spring Budget 2017: Circle Visualisation
    3. Qualitative Research in R
    4. Multi-Dimensional Reduction and Visualisation with t-SNE
    5. Comparing Trump and Clinton’s Facebook pages during the US presidential election, 2016
    var vglnk = { key: '949efb41171ac6ec1bf7f206d57e90b8' }; (function(d, t) { var s = d.createElement(t); s.type = 'text/javascript'; s.async = true; s.src = '//cdn.viglink.com/api/vglnk.js'; var r = d.getElementsByTagName(t)[0]; r.parentNode.insertBefore(s, r); }(document, 'script'));

    Formal ways to compare forecasting models: Rolling windows

    Thu, 11/09/2017 - 01:49

    (This article was first published on R – insightR, and kindly contributed to R-bloggers)

    By Gabriel Vasconcelos Overview

    When working with time-series forecasting we often have to choose between a few potential models and the best way is to test each model in pseudo-out-of-sample estimations. In other words, we simulate a forecasting situation where we drop some data from the estimation sample to see how each model perform.

    Naturally, if you do only one (or just a few) forecasting test you results will have no robustness and in the next forecast the results may change drastically. Another possibility is to estimate the model in, let’s say, half of the sample, and use the estimated model to forecast the other half. This is better than a single forecast but it does not account for possible changes in the structure of the data over the time because you have only one estimation of the model. The most accurate way to compare models is using rolling windows. Suppose you have, for example, 200 observations of a time-series. First you estimate the model with the first 100 observations to forecast the observation 101. Then you include the observation 101 in the estimation sample and estimate the model again to forecast the observation 102. The process is repeated until you have a forecast for all 100 out-of-sample observations. This procedure is also called expanding window. If you drop the first observation in each iteration to keep the window size always the same then you have a fixed rolling window estimation. In the end you will have 100 forecasts for each model and you can calculate RMSE, MAE and formal tests such as Diebold & Mariano.

    In general, the fixed rolling window is better than the expanding window because of the following example. Suppose we have two models:


    Let’s assume that the true value of is zero. If we use expanding windows the asymptotic theory tells us that will go to zero and both models will be the same. If that is the case, we may be unable to distinguish which model is more accurate to forecast . However, the first model is better than the second model in small samples and it is just as good in large samples. We should be able to identify this feature. Fixed rolling windows keep the sample size fixed and they are free from this problem conditional on the sample size. In this case, the Diebold & Mariano test becomes the Giacomini & White test.

    Application

    In this example we are going to use some inflation data from the AER package. First let’s have a look at the function embed. This function is very useful in this rolling window framework because we often include lags of variables in the models and the function embed creates all lags for us in a single line of code. Here is an example:

    library(AER) library(xts) library(foreach) library(reshape2) library(ggplot2) library(forecast) ## = embed = ## x1 = c(1, 2, 3, 4, 5) x2 = c(11, 12, 13, 14, 15) x = cbind(x1, x2) (x_embed = embed(x, 2)) ## [,1] [,2] [,3] [,4] ## [1,] 2 12 1 11 ## [2,] 3 13 2 12 ## [3,] 4 14 3 13 ## [4,] 5 15 4 14

    As you can see. The first two columns show the variables x1 and x2 at lag 0 and the second column shows the same variables with one lag. We lost one observation because of the lag operation.

    To the real example!!! We are going to estimate a model to forecast the US inflation using four autorregressive variables (four lags of the inflation), four lags of the industrial production and dummy variables for months. The second model will be a simple random walk. I took the first log-difference on both variables (CPI and industrial production index). The code below loads and prepare the data with the embed function.

    ## = Load Data = ## data("USMacroSWM") data = as.xts(USMacroSWM)[ , c("cpi", "production"), ] data = cbind(diff(log(data[ ,"cpi"])), diff(log(data[ ,"production"])))[-1, ] ## = Prep data with embed = ## lag = 4 X = embed(data, lag + 1) X = as.data.frame(X) colnames(X) = paste(rep(c("inf", "prod"), lag + 1), sort(rep(paste("l", 0:lag, sep = ""),2)), sep = "" ) X$month = months(tail(index(data), nrow(X))) head(X) ## infl0 prodl0 infl1 prodl1 infl2 ## 1 0.005905082 0.000000000 -0.002275314 0.004085211 0.000000000 ## 2 0.006770507 -0.005841138 0.005905082 0.000000000 -0.002275314 ## 3 0.007618231 0.005841138 0.006770507 -0.005841138 0.005905082 ## 4 0.019452426 0.007542827 0.007618231 0.005841138 0.006770507 ## 5 0.003060112 0.009778623 0.019452426 0.007542827 0.007618231 ## 6 0.006526018 0.013644328 0.003060112 0.009778623 0.019452426 ## prodl2 infl3 prodl3 infl4 prodl4 ## 1 -0.008153802 0.017423641 0.005817352 0.006496543 0.005851392 ## 2 0.004085211 0.000000000 -0.008153802 0.017423641 0.005817352 ## 3 0.000000000 -0.002275314 0.004085211 0.000000000 -0.008153802 ## 4 -0.005841138 0.005905082 0.000000000 -0.002275314 0.004085211 ## 5 0.005841138 0.006770507 -0.005841138 0.005905082 0.000000000 ## 6 0.007542827 0.007618231 0.005841138 0.006770507 -0.005841138 ## month ## 1 June ## 2 July ## 3 August ## 4 September ## 5 October ## 6 November

    The following code estimates 391 fixed rolling windows with a sample size of 300 in each window:

    # = Number of windows and window size w_size = 300 n_windows = nrow(X) - 300 # = Rolling Window Loop = # forecasts = foreach(i=1:n_windows, .combine = rbind) %do%{ # = Select data for the window (in and out-of-sample) = # X_in = X[i:(w_size + i - 1), ] # = change to X[1:(w_size + i - 1), ] for expanding window X_out = X[w_size + i, ] # = Regression Model = # m1 = lm(infl0 ~ . - prodl0, data = X_in) f1 = predict(m1, X_out) # = Random Walk = # f2 = tail(X_in$infl0, 1) return(c(f1, f2)) }

    Finally, the remaining code calculates the forecasting errors, forecasting RMSE across the rolling windows and the Giacomini & White test. As you can see, the test rejected the null hypothesis of both models being equally accurate and the RMSE was smaller for the model with the lags, production and dummies.

    # = Calculate and plot errors = # e1 = tail(X[ ,"infl0"], nrow(forecasts)) - forecasts[ ,1] e2 = tail(X[ ,"infl0"], nrow(forecasts)) - forecasts[ ,2] df = data.frame("date"=tail(as.Date(index(data)), n_windows), "Regression" = e1, "RandomWalk" = e2) mdf = melt(df,id.vars = "date") ggplot(data = mdf) + geom_line(aes(x = date, y = value, linetype = variable, color = variable))

    # = RMSE = # (rmse1 = 1000 * sqrt(mean(e1 ^ 2))) ## [1] 2.400037 (rmse2 = 1000 * sqrt(mean(e2 ^ 2))) ## [1] 2.62445 # = DM test = # (dm = dm.test(e1, e2)) ## ## Diebold-Mariano Test ## ## data: e1e2 ## DM = -1.977, Forecast horizon = 1, Loss function power = 2, ## p-value = 0.04874 ## alternative hypothesis: two.sided References

    Diebold, Francis X., and Robert S. Mariano. “Comparing predictive accuracy.” Journal of Business & economic statistics 20.1 (2002): 134-144.

    Giacomini, Raffaella, and Halbert White. “Tests of conditional predictive ability.” Econometrica 74.6 (2006): 1545-1578.

    var vglnk = { key: '949efb41171ac6ec1bf7f206d57e90b8' }; (function(d, t) { var s = d.createElement(t); s.type = 'text/javascript'; s.async = true; s.src = '//cdn.viglink.com/api/vglnk.js'; var r = d.getElementsByTagName(t)[0]; r.parentNode.insertBefore(s, r); }(document, 'script'));

    To leave a comment for the author, please follow the link and comment on their blog: R – insightR. R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

    Introduction to Visualizing Asset Returns

    Thu, 11/09/2017 - 01:00

    (This article was first published on R Views, and kindly contributed to R-bloggers)




























    In a previous post, we reviewed how to import daily prices, build a portfolio, and calculate portfolio returns. Today, we will visualize the returns of our individual assets that ultimately get mashed into a portfolio. The motivation here is to make sure we have scrutinized our assets before they get into our portfolio, because once the portfolio has been constructed, it is tempting to keep the analysis at the portfolio level.

    By way of a quick reminder, our ultimate portfolio consists of the following.

    + SPY (S&P 500 fund) weighted 25% + EFA (a non-US equities fund) weighted 25% + IJS (a small-cap value fund) weighted 20% + EEM (an emerging-mkts fund) weighted 20% + AGG (a bond fund) weighted 10%

    Let’s load up our packages.

    library(tidyverse) library(tidyquant) library(timetk) library(tibbletime) library(highcharter)

    To get our objects into the global environment, we use the next code chunk, which should look familiar from the previous post: we will create one xts object and one tibble, in long/tidy format, of monthly log returns.

    # The symbols vector holds our tickers. symbols <- c("SPY","EFA", "IJS", "EEM","AGG") prices <- getSymbols(symbols, src = 'yahoo', from = "2005-01-01", auto.assign = TRUE, warnings = FALSE) %>% map(~Ad(get(.))) %>% reduce(merge) %>% `colnames<-`(symbols) # XTS method prices_monthly <- to.monthly(prices, indexAt = "last", OHLC = FALSE) asset_returns_xts <- na.omit(Return.calculate(prices_monthly, method = "log")) # Tidyverse method, to long, tidy format asset_returns_long <- prices %>% to.monthly(indexAt = "last", OHLC = FALSE) %>% tk_tbl(preserve_index = TRUE, rename_index = "date") %>% gather(asset, returns, -date) %>% group_by(asset) %>% mutate(returns = (log(returns) - log(lag(returns))))

    We now have two objects holding monthly log returns, asset_returns_xts and asset_returns_long. First, let’s use highcharter to visualize the xts formatted returns.

    Highcharter is fantastic for visualizing a time series or many time series. First, we set highchart(type = "stock") to get a nice time series line. Then we add each of our series to the highcharter code flow. In this case, we’ll add our columns from the xts object.

    highchart(type = "stock") %>% hc_title(text = "Monthly Log Returns") %>% hc_add_series(asset_returns_xts$SPY, name = names(asset_returns_xts$SPY)) %>% hc_add_series(asset_returns_xts$EFA, name = names(asset_returns_xts$EFA)) %>% hc_add_series(asset_returns_xts$IJS, name = names(asset_returns_xts$IJS)) %>% hc_add_theme(hc_theme_flat()) %>% hc_navigator(enabled = FALSE) %>% hc_scrollbar(enabled = FALSE)

    <<<<<<< HEAD

    {"x":{"hc_opts":{"title":{"text":"Monthly Log Returns"},"yAxis":{"title":{"text":null}},"credits":{"enabled":false},"exporting":{"enabled":false},"plotOptions":{"series":{"turboThreshold":0},"treemap":{"layoutAlgorithm":"squarified"},"bubble":{"minSize":5,"maxSize":25}},"annotationsOptions":{"enabledButtons":false},"tooltip":{"delayForDisplay":10},"series":[{"data":[[1109548800000,0.0206879564787652],[1112227200000,-0.0184620427115227],[1114732800000,-0.0189129118436862],[1117497600000,0.0317163083379048],[1120089600000,0.00151393629460106],[1122595200000,0.0375475240332914],[1125446400000,-0.00941855857008189],[1128038400000,0.007993202120959],[1130716800000,-0.0239349137310887],[1133308800000,0.0430138749208249],[1135900800000,-0.0019156226742254],[1138665600000,0.0237308029515964],[1141084800000,0.00570899260805735],[1143763200000,0.0163691250694358],[1146182400000,0.0125527610205225],[1149033600000,-0.0305839375386094],[1151625600000,0.00260488765092415],[1154304000000,0.0044681481904556],[1156982400000,0.0215881063221808],[1159488000000,0.026643087519183],[1162252800000,0.031030247969607],[1164844800000,0.0196902180071232],[1167350400000,0.0132824269392957],[1170201600000,0.0149287468809467],[1172620800000,-0.0198126309091737],[1175212800000,0.0115224470101358],[1177891200000,0.0433425164975478],[1180569600000,0.0333578210097789],[1183075200000,-0.0147288107682826],[1185840000000,-0.0318111103015148],[1188518400000,0.0127511519771888],[1190937600000,0.037983878406008],[1193788800000,0.0134756480496216],[1196380800000,-0.0395031646979138],[1199059200000,-0.0113241455710815],[1201737600000,-0.062366166012918],[1204243200000,-0.0261820868034999],[1206921600000,-0.00898255723853847],[1209513600000,0.0465610927028539],[1212105600000,0.0150035057676243],[1214784000000,-0.0872762465769927],[1217462400000,-0.00902604574898103],[1219968000000,0.0153352377351812],[1222732800000,-0.0989073697028093],[1225411200000,-0.180547395863481],[1227830400000,-0.072147882008025],[1230681600000,0.00974913256530652],[1233273600000,-0.0856823870527457],[1235692800000,-0.113671935138881],[1238457600000,0.0800220063060602],[1241049600000,0.094715527774393],[1243555200000,0.0568088071858863],[1246320000000,-0.000655167253289513],[1248998400000,0.071953996648217],[1251676800000,0.0362735609011979],[1254268800000,0.0348429492089988],[1256860800000,-0.0194122101038108],[1259539200000,0.0597837301676805],[1262217600000,0.0189205364696052],[1264723200000,-0.0370191100308706],[1267142400000,0.0307179901644217],[1269993600000,0.0590982633204424],[1272585600000,0.0153517791422715],[1275004800000,-0.082789029026074],[1277856000000,-0.0531274653636906],[1280448000000,0.0660691628873211],[1283212800000,-0.0460236322571204],[1285804800000,0.0857691561301728],[1288310400000,0.0374905301138417],[1291075200000,0],[1293753600000,0.0647126358053045],[1296432000000,0.0230325314056046],[1298851200000,0.0341479340332063],[1301529600000,0.00011975806919029],[1304035200000,0.0285502967694589],[1306800000000,-0.0112781487823161],[1309392000000,-0.0170134064348968],[1311897600000,-0.0202073935478904],[1314748800000,-0.0565445290215694],[1317340800000,-0.0719479355806953],[1320019200000,0.103591277397301],[1322611200000,-0.00407225015013868],[1325203200000,0.0103942584024228],[1327968000000,0.0453313493227681],[1330473600000,0.042489853182194],[1333065600000,0.0316573309578079],[1335744000000,-0.00669814909292832],[1338422400000,-0.0619346298765713],[1340928000000,0.0397793800113906],[1343692800000,0.0117601810113923],[1346371200000,0.0247440302682689],[1348790400000,0.0250347803664841],[1351641600000,-0.0183660958508947],[1354233600000,0.00564402145604426],[1356912000000,0.0088947119917],[1359590400000,0.0499229739693074],[1362009600000,0.0126782888845867],[1364428800000,0.0372681391254028],[1367280000000,0.0190301985626817],[1369958400000,0.0233350764063225],[1372377600000,-0.0134341176746107],[1375228800000,0.0503857346056087],[1377820800000,-0.0304514727123406],[1380499200000,0.0311561844386716],[1383177600000,0.0452666390820351],[1385683200000,0.029206921916912],[1388448000000,0.0255960775136792],[1391126400000,-0.0358844994464338],[1393545600000,0.0445103393197277],[1396224000000,0.00826126379928027],[1398816000000,0.00692752148230458],[1401408000000,0.0229411937674122],[1404086400000,0.020434571472804],[1406764800000,-0.0135286262072674],[1409270400000,0.0387046969843707],[1412035200000,-0.0138923891344058],[1414713600000,0.0232777542525708],[1417132800000,0.0271015754478947],[1419984000000,-0.00254002002878728],[1422576000000,-0.0300767694361843],[1424995200000,0.0546818057272151],[1427760000000,-0.0158303468184666],[1430352000000,0.00978598900739236],[1432857600000,0.0127741884503729],[1435622400000,-0.0203254364909862],[1438300800000,0.0219060133877527],[1440979200000,-0.0633070337442643],[1443571200000,-0.0253971364858741],[1446163200000,0.0818438281022278],[1448841600000,0.00364830021490725],[1451520000000,-0.017433493021799],[1454025600000,-0.051068745607993],[1456704000000,-0.000826272340695766],[1459382400000,0.0651002372862814],[1461888000000,0.00393355280006968],[1464652800000,0.0168684168520459],[1467244800000,0.00346986666674631],[1469750400000,0.0358218673761721],[1472601600000,0.00119688944841201],[1475193600000,5.80083859968994e-05],[1477872000000,-0.0174890984625753],[1480464000000,0.0361759803978394],[1483056000000,0.0200691262966979],[1485820800000,0.0177365234669162],[1488240000000,0.0385392352341904],[1490918400000,0.00124919981810123],[1493337600000,0.00987715638181186],[1496188800000,0.0140142937017638],[1498780800000,0.0063547386737941],[1501459200000,0.0203457615482456],[1504137600000,0.00291345463160919],[1506643200000,0.0199491222858326],[1509408000000,0.0232907105950764],[1510099200000,0.00759307518159069]],"name":"SPY"},{"data":[[1109548800000,0.037151134272853],[1112227200000,-0.0265838389041115],[1114732800000,-0.0163089755672123],[1117497600000,-0.00867461761694077],[1120089600000,0.0142254640102872],[1122595200000,0.0295274996701176],[1125446400000,0.0376439503473396],[1128038400000,0.0362784852024642],[1130716800000,-0.0323598198953077],[1133308800000,0.0228482619911152],[1135900800000,0.0505276527755769],[1138665600000,0.0561109769064179],[1141084800000,-0.00702447237375559],[1143763200000,0.0392699591675636],[1146182400000,0.0467930479482104],[1149033600000,-0.0389679406958585],[1151625600000,-0.000611290408581766],[1154304000000,0.00807242712294753],[1156982400000,0.0251661207214218],[1159488000000,0.00221651592188188],[1162252800000,0.0368049528313334],[1164844800000,0.0302672456166655],[1167350400000,0.0314716099337979],[1170201600000,0.0138342000225076],[1172620800000,-0.00121274201084187],[1175212800000,0.0285827539990051],[1177891200000,0.0367981237963262],[1180569600000,0.023348580160409],[1183075200000,-0.00321413838200346],[1185840000000,-0.0231710746818719],[1188518400000,-0.0063555878304502],[1190937600000,0.0518098232028095],[1193788800000,0.0416208362386312],[1196380800000,-0.0369099719460833],[1199059200000,-0.0302661933720048],[1201737600000,-0.0817214568330091],[1204243200000,-0.0102820674777391],[1206921600000,0.00418097653946115],[1209513600000,0.0529540909918413],[1212105600000,0.0118020141194437],[1214784000000,-0.0919707626305599],[1217462400000,-0.0337659926261513],[1219968000000,-0.0434046779238249],[1222732800000,-0.121447220729729],[1225411200000,-0.233633611527602],[1227830400000,-0.065840530045524],[1230681600000,0.0849696967387832],[1233273600000,-0.147707186898761],[1235692800000,-0.109676223419933],[1238457600000,0.0805748613952213],[1241049600000,0.109024822178633],[1243555200000,0.123913602147066],[1246320000000,-0.0141486182002488],[1248998400000,0.0956871031096225],[1251676800000,0.0440462841789913],[1254268800000,0.0372620446282181],[1256860800000,-0.0255616499409705],[1259539200000,0.0384627676118114],[1262217600000,0.00707108987272109],[1264723200000,-0.051978866149399],[1267142400000,0.00266406370831529],[1269993600000,0.0618981291618597],[1272585600000,-0.0284465113532373],[1275004800000,-0.118702313684693],[1277856000000,-0.0208350857380379],[1280448000000,0.109844258803769],[1283212800000,-0.0386893284782368],[1285804800000,0.095055500829881],[1288310400000,0.0373490095117459],[1291075200000,-0.0494395750737691],[1293753600000,0.0798096864770277],[1296432000000,0.0207384312165213],[1298851200000,0.0348823493355179],[1301529600000,-0.0241726773332922],[1304035200000,0.0547327668077355],[1306800000000,-0.0223081675510199],[1309392000000,-0.0121811146717974],[1311897600000,-0.0240652603492073],[1314748800000,-0.0916209895261475],[1317340800000,-0.114382222680635],[1320019200000,0.0919178839423349],[1322611200000,-0.022004287484473],[1325203200000,-0.0219894735413453],[1327968000000,0.051353836985244],[1330473600000,0.0471998203969957],[1333065600000,0.00419879485552022],[1335744000000,-0.0209872726859328],[1338422400000,-0.118155203572803],[1340928000000,0.068063370602534],[1343692800000,0.000800288769793323],[1346371200000,0.0314987289415014],[1348790400000,0.0267703167039559],[1351641600000,0.0108838376151383],[1354233600000,0.0274292435140708],[1356912000000,0.042879075039882],[1359590400000,0.0366062177433522],[1362009600000,-0.0129693475113624],[1364428800000,0.0129693475113624],[1367280000000,0.0489678279763219],[1369958400000,-0.030655705787022],[1372377600000,-0.0271443734947425],[1375228800000,0.0518601885478733],[1377820800000,-0.0197463072357991],[1380499200000,0.0753386169581263],[1383177600000,0.0320816230620462],[1385683200000,0.00544967509154315],[1388448000000,0.0215281231213043],[1391126400000,-0.0534132932648861],[1393545600000,0.0595049957265781],[1396224000000,-0.0046026010587763],[1398816000000,0.0165294641755356],[1401408000000,0.0158284424353807],[1404086400000,0.0091654717345353],[1406764800000,-0.0263798737202805],[1409270400000,0.00180045712250632],[1412035200000,-0.039598541324648],[1414713600000,-0.00265478388784413],[1417132800000,0.000625227534890804],[1419984000000,-0.0407465927925506],[1422576000000,0.00622638001941045],[1424995200000,0.0614505439740869],[1427760000000,-0.0143887128054843],[1430352000000,0.0358165555667957],[1432857600000,0.00195266162417695],[1435622400000,-0.031676232061093],[1438300800000,0.0201144387140753],[1440979200000,-0.0771524802712635],[1443571200000,-0.0451948483375104],[1446163200000,0.0640259924153521],[1448841600000,-0.00755588943730778],[1451520000000,-0.0235950182011635],[1454025600000,-0.056757828831866],[1456704000000,-0.0339139677633575],[1459382400000,0.0637456787775954],[1461888000000,0.0219751639880688],[1464652800000,-0.000856075241223664],[1467244800000,-0.0244915611178333],[1469750400000,0.0390003013163938],[1472601600000,0.005326818637859],[1475193600000,0.0132791495670137],[1477872000000,-0.0224037208451646],[1480464000000,-0.0179744493521579],[1483056000000,0.0267029207072493],[1485820800000,0.0323818692375557],[1488240000000,0.0118364376763189],[1490918400000,0.0318056747302649],[1493337600000,0.0239522219375043],[1496188800000,0.0348101755260704],[1498780800000,0.00295595977539609],[1501459200000,0.0261878741922148],[1504137600000,-0.000448300083002451],[1506643200000,0.0233427786123483],[1509408000000,0.016653690227681],[1510099200000,0.00344094963166341]],"name":"EFA"},{"data":[[1109548800000,0.0286090056376822],[1112227200000,-0.0238816413470162],[1114732800000,-0.052557028119252],[1117497600000,0.0597361469218773],[1120089600000,0.0384077367010072],[1122595200000,0.0567711816790979],[1125446400000,-0.0188802370639181],[1128038400000,0.000488809073939223],[1130716800000,-0.0347115077857048],[1133308800000,0.0479259551003905],[1135900800000,-0.0105975414862556],[1138665600000,0.0838887941563367],[1141084800000,0.00273101701240375],[1143763200000,0.0474208866435006],[1146182400000,0.00178350386683412],[1149033600000,-0.0465784153055235],[1151625600000,0.000421964055665747],[1154304000000,-0.0314464774227146],[1156982400000,0.0266837275924203],[1159488000000,0.0117932306008948],[1162252800000,0.0537273110996388],[1164844800000,0.0234988455767606],[1167350400000,0.00314687820050708],[1170201600000,0.0211451208839621],[1172620800000,-0.00926989522508315],[1175212800000,0.00901964836660518],[1177891200000,0.0182141380840877],[1180569600000,0.0452975847517818],[1183075200000,-0.0227115073812163],[1185840000000,-0.0646393138723926],[1188518400000,0.0132135352466678],[1190937600000,0.00315389764764262],[1193788800000,0.0105475244758262],[1196380800000,-0.0741584623834521],[1199059200000,-0.00720895443262393],[1201737600000,-0.0408220573495495],[1204243200000,-0.03420937272326],[1206921600000,0.00822014981195895],[1209513600000,0.0260927305455478],[1212105600000,0.0384077297164023],[1214784000000,-0.087925673348515],[1217462400000,0.0283280107599966],[1219968000000,0.0465200693220789],[1222732800000,-0.0493365385162789],[1225411200000,-0.221415713421957],[1227830400000,-0.125533997035027],[1230681600000,0.0637301705420099],[1233273600000,-0.150182251633202],[1235692800000,-0.139558506609924],[1238457600000,0.0732620809428939],[1241049600000,0.173026624489215],[1243555200000,0.00667868562089469],[1246320000000,0.0150718542910742],[1248998400000,0.104382975461063],[1251676800000,0.0342330282040972],[1254268800000,0.041153182560882],[1256860800000,-0.062663997967856],[1259539200000,0.0286026656444029],[1262217600000,0.077878192778106],[1264723200000,-0.0256780949528288],[1267142400000,0.0465278114149363],[1269993600000,0.0750794660652976],[1272585600000,0.0641679025009161],[1275004800000,-0.0892741998339348],[1277856000000,-0.0863778482937168],[1280448000000,0.0649646269398509],[1283212800000,-0.082420611411318],[1285804800000,0.103833493326968],[1288310400000,0.0395797001940199],[1291075200000,0.0244359202825697],[1293753600000,0.085859230191506],[1296432000000,-0.00236746310031322],[1298851200000,0.0412456108572048],[1301529600000,0.0218402674875495],[1304035200000,0.0105711717645729],[1306800000000,-0.0178132160838675],[1309392000000,-0.0160795458360079],[1311897600000,-0.0281285077592326],[1314748800000,-0.0831290069956392],[1317340800000,-0.111609680117464],[1320019200000,0.142717737052455],[1322611200000,0.00627880721610197],[1325203200000,0.0199991369370647],[1327968000000,0.0760203352926903],[1330473600000,0.0169933479136102],[1333065600000,0.0285565399698857],[1335744000000,-0.0192756080816627],[1338422400000,-0.0744576383071394],[1340928000000,0.0444857052220033],[1343692800000,-0.00688508881229044],[1346371200000,0.0399652273790148],[1348790400000,0.0308345004659198],[1351641600000,-0.0198151689749491],[1354233600000,0.016391945950768],[1356912000000,0.0348939458556297],[1359590400000,0.0521334068088111],[1362009600000,0.0161753160899298],[1364428800000,0.0402581118721042],[1367280000000,0.00122261949434677],[1369958400000,0.041976209966303],[1372377600000,-0.00140295871433338],[1375228800000,0.0635415530701504],[1377820800000,-0.0347434139628504],[1380499200000,0.063873524552303],[1383177600000,0.0342339597986134],[1385683200000,0.0416612432633112],[1388448000000,0.0128921429591502],[1391126400000,-0.0357753812832087],[1393545600000,0.045257393506442],[1396224000000,0.0133152917570012],[1398816000000,-0.0231841735810603],[1401408000000,0.00620527316117681],[1404086400000,0.0377187390221776],[1406764800000,-0.0520094136899454],[1409270400000,0.0436578133993812],[1412035200000,-0.0612605643393618],[1414713600000,0.0688749937183744],[1417132800000,0.00477364402186176],[1419984000000,0.0252957663427056],[1422576000000,-0.0546278352682528],[1424995200000,0.0569145405749198],[1427760000000,0.0101563013835051],[1430352000000,-0.0184176319430005],[1432857600000,0.00750988632388072],[1435622400000,0.00417146447130001],[1438300800000,-0.0273754457742177],[1440979200000,-0.047268458395858],[1443571200000,-0.0384647360685886],[1446163200000,0.0635897190900101],[1448841600000,0.0244152403272127],[1451520000000,-0.0521569132328743],[1454025600000,-0.0603069715640157],[1456704000000,0.0206053245172564],[1459382400000,0.0899102913009653],[1461888000000,0.0210442397678579],[1464652800000,0.00439714604375929],[1467244800000,0.00829228533296789],[1469750400000,0.0493483204384928],[1472601600000,0.0112610906578512],[1475193600000,0.00861467666116589],[1477872000000,-0.0381348678009106],[1480464000000,0.125246458484529],[1483056000000,0.0314916841334938],[1485820800000,-0.0121438398744331],[1488240000000,0.013428773146078],[1490918400000,-0.00653302685795687],[1493337600000,0.00510773653279983],[1496188800000,-0.0228626735155766],[1498780800000,0.0291518775628319],[1501459200000,0.00748150981266793],[1504137600000,-0.0275646800558462],[1506643200000,0.0823217315839475],[1509408000000,0.00591601053187674],[1510099200000,-0.0157409159678101]],"name":"IJS"}],"navigator":{"enabled":false},"scrollbar":{"enabled":false}},"theme":{"colors":["#f1c40f","#2ecc71","#9b59b6","#e74c3c","#34495e","#3498db","#1abc9c","#f39c12","#d35400"],"chart":{"backgroundColor":"#ECF0F1"},"xAxis":{"gridLineDashStyle":"Dash","gridLineWidth":1,"gridLineColor":"#BDC3C7","lineColor":"#BDC3C7","minorGridLineColor":"#BDC3C7","tickColor":"#BDC3C7","tickWidth":1},"yAxis":{"gridLineDashStyle":"Dash","gridLineColor":"#BDC3C7","lineColor":"#BDC3C7","minorGridLineColor":"#BDC3C7","tickColor":"#BDC3C7","tickWidth":1},"legendBackgroundColor":"rgba(0, 0, 0, 0.5)","background2":"#505053","dataLabelsColor":"#B0B0B3","textColor":"#34495e","contrastTextColor":"#F0F0F3","maskColor":"rgba(255,255,255,0.3)"},"conf_opts":{"global":{"Date":null,"VMLRadialGradientURL":"http =//code.highcharts.com/list(version)/gfx/vml-radial-gradient.png","canvasToolsURL":"http =//code.highcharts.com/list(version)/modules/canvas-tools.js","getTimezoneOffset":null,"timezoneOffset":0,"useUTC":true},"lang":{"contextButtonTitle":"Chart context menu","decimalPoint":".","downloadJPEG":"Download JPEG image","downloadPDF":"Download PDF document","downloadPNG":"Download PNG image","downloadSVG":"Download SVG vector image","drillUpText":"Back to {series.name}","invalidDate":null,"loading":"Loading...","months":["January","February","March","April","May","June","July","August","September","October","November","December"],"noData":"No data to display","numericSymbols":["k","M","G","T","P","E"],"printChart":"Print chart","resetZoom":"Reset zoom","resetZoomTitle":"Reset zoom level 1:1","shortMonths":["Jan","Feb","Mar","Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec"],"thousandsSep":" ","weekdays":["Sunday","Monday","Tuesday","Wednesday","Thursday","Friday","Saturday"]}},"type":"stock","fonts":[],"debug":false},"evals":[],"jsHooks":[]}

    Take a look at the chart. It has a line for the monthly log returns of 3 of our ETFs (and in my opinion it’s already starting to get crowded). We might be able to pull some useful intuition from this chart. Perhaps one of our ETFs remained stable during the 2008 financial crisis, or had an era of consistently negative/positive returns. Highcharter is great for plotting time series line charts.

    Highcharter does have the capacity for histogram making. One method is to first call the base function hist on the data along with the arguments for breaks and plot = FALSE. Then we can call hchart on that object.

    =======

    {"x":{"hc_opts":{"title":{"text":"Monthly Log Returns"},"yAxis":{"title":{"text":null}},"credits":{"enabled":false},"exporting":{"enabled":false},"plotOptions":{"series":{"turboThreshold":0},"treemap":{"layoutAlgorithm":"squarified"},"bubble":{"minSize":5,"maxSize":25}},"annotationsOptions":{"enabledButtons":false},"tooltip":{"delayForDisplay":10},"series":[{"data":[[1109548800000,0.0206881263448668],[1112227200000,-0.018461969530513],[1114732800000,-0.0189130916629976],[1117497600000,0.0317163509312559],[1120089600000,0.00151410285222653],[1122595200000,0.037547541574888],[1125446400000,-0.00941884503296375],[1128038400000,0.00799332807135134],[1130716800000,-0.0239349792446122],[1133308800000,0.0430140199254936],[1135900800000,-0.00191553334079586],[1138665600000,0.023730238946575],[1141084800000,0.00570942903340121],[1143763200000,0.0163691491192619],[1146182400000,0.0125527791771942],[1149033600000,-0.0305840427637456],[1151625600000,0.00260480822378106],[1154304000000,0.00446826943836154],[1156982400000,0.0215881773681232],[1159488000000,0.0266430581441846],[1162252800000,0.0310301182093982],[1164844800000,0.0196903490891849],[1167350400000,0.0132825159196317],[1170201600000,0.0149285567125448],[1172620800000,-0.0198125538943206],[1175212800000,0.0115224020349931],[1177891200000,0.043342670657931],[1180569600000,0.033357753539998],[1183075200000,-0.0147290730213427],[1185840000000,-0.0318107355371051],[1188518400000,0.0127509156509156],[1190937600000,0.0379841633617399],[1193788800000,0.0134754732155491],[1196380800000,-0.0395031297726991],[1199059200000,-0.0113241672416082],[1201737600000,-0.0623662499410251],[1204243200000,-0.0261821945542557],[1206921600000,-0.00898244920304414],[1209513600000,0.0465611896250309],[1212105600000,0.0150033073872766],[1214784000000,-0.0872760439747893],[1217462400000,-0.00902608620600098],[1219968000000,0.0153351022933936],[1222732800000,-0.0989072027032512],[1225411200000,-0.180547382132571],[1227830400000,-0.0721477711379288],[1230681600000,0.00974878758495823],[1233273600000,-0.0856823980637982],[1235692800000,-0.113671549173575],[1238457600000,0.0800219727508518],[1241049600000,0.0947152799658264],[1243555200000,0.056808684574583],[1246320000000,-0.000654779082119639],[1248998400000,0.0719537109976889],[1251676800000,0.0362738629781489],[1254268800000,0.0348429860538877],[1256860800000,-0.0194127419599788],[1259539200000,0.0597840114964852],[1262217600000,0.0189207045993536],[1264723200000,-0.0370194572774638],[1267142400000,0.0307178770840828],[1269993600000,0.0590985135494204],[1272585600000,0.0153517985869751],[1275004800000,-0.0827891084898482],[1277856000000,-0.0531275156757474],[1280448000000,0.0660691746622897],[1283212800000,-0.0460234543794877],[1285804800000,0.0857689827414481],[1288310400000,0.0374905370511103],[1291075200000,0],[1293753600000,0.0647126722655313],[1296432000000,0.0230326584025153],[1298851200000,0.0341477927534681],[1301529600000,0.000119955860938603],[1304035200000,0.0285501923331761],[1306800000000,-0.0112781129793715],[1309392000000,-0.0170135370352185],[1311897600000,-0.0202073070054096],[1314748800000,-0.056544590858735],[1317340800000,-0.0719480395552079],[1320019200000,0.103591325069326],[1322611200000,-0.00407201749741493],[1325203200000,0.0103942796449079],[1327968000000,0.0453313675703297],[1330473600000,0.0424897503683699],[1333065600000,0.0316573828601339],[1335744000000,-0.00669805819960878],[1338422400000,-0.0619347602384499],[1340928000000,0.0397792910718513],[1343692800000,0.0117602004556092],[1346371200000,0.0247439983165263],[1348790400000,0.0250346772989518],[1351641600000,-0.0183659008794557],[1354233600000,0.00564399143596894],[1356912000000,0.0088945614376108],[1359590400000,0.0499231357446419],[1362009600000,0.0126782321108232],[1364428800000,0.0372681092354448],[1367280000000,0.0190299135256806],[1369958400000,0.0233355101543333],[1372377600000,-0.0134345864857588],[1375228800000,0.0503859552747734],[1377820800000,-0.0304512570950308],[1380499200000,0.0311555726370498],[1383177600000,0.0452667745302389],[1385683200000,0.029206815648287],[1388448000000,0.0255975973946434],[1391126400000,-0.0358842297562694],[1393545600000,0.0445102011793042],[1396224000000,0.00825920786856926],[1398816000000,0.00692750665745923],[1401408000000,0.0229411266746888],[1404086400000,0.0204330993937125],[1406764800000,-0.0135286953950207],[1409270400000,0.0387046457361997],[1412035200000,-0.0138914643259724],[1414713600000,0.0232780247164452],[1417132800000,0.0271013548213164],[1419984000000,-0.00254025089793064],[1422576000000,-0.0300770304757565],[1424995200000,0.0546818615209057],[1427760000000,-0.0158303175042906],[1430352000000,0.00978589508163452],[1432857600000,0.0127741659516065],[1435622400000,-0.020325324232874],[1438300800000,0.0219060477057784],[1440979200000,-0.0633070688928576],[1443571200000,-0.0253970886044232],[1446163200000,0.0818435978611651],[1448841600000,0.00364860372917342],[1451520000000,-0.0174358618687105],[1454025600000,-0.0510687711922486],[1456704000000,-0.000826272681099027],[1459382400000,0.065100343505363],[1461888000000,0.00393347906859809],[1464652800000,0.0168684232025615],[1467244800000,0.0034698679598284],[1469750400000,0.0358218804665604],[1472601600000,0.00119688516071292],[1475193600000,5.83715975830401e-05],[1477872000000,-0.0174890984625753],[1480464000000,0.0361759803978394],[1483056000000,0.0200691262966979],[1485820800000,0.0177365234669162],[1488240000000,0.0385392352341904],[1490918400000,0.00124919981810123],[1493337600000,0.00987715638181186],[1496188800000,0.0140142937017638],[1498780800000,0.0063547386737941],[1501459200000,0.0203457615482456],[1504137600000,0.00291345463160919],[1506643200000,0.0199491222858326],[1507852800000,0.0146985974409883],[1510099200000,0.0161851883356787]],"name":"SPY"},{"data":[[1109548800000,0.037150948352815],[1112227200000,-0.0265835142440878],[1114732800000,-0.0163090725589354],[1117497600000,-0.00867466428157559],[1120089600000,0.0142253616719064],[1122595200000,0.0295273973167864],[1125446400000,0.0376442743262433],[1128038400000,0.0362784187372776],[1130716800000,-0.0323597822330686],[1133308800000,0.022848195178915],[1135900800000,0.0505275766899529],[1138665600000,0.056110917773712],[1141084800000,-0.007024175903668],[1143763200000,0.0392698451844868],[1146182400000,0.0467931036436049],[1149033600000,-0.0389679893663066],[1151625600000,-0.000611371385501602],[1154304000000,0.00807245366318332],[1156982400000,0.025165940075349],[1159488000000,0.00221674766282787],[1162252800000,0.0368049268121502],[1164844800000,0.0302672981040883],[1167350400000,0.0314715318070187],[1170201600000,0.0138343028700345],[1172620800000,-0.00121291756801645],[1175212800000,0.0285828022653436],[1177891200000,0.036798190448712],[1180569600000,0.0233484283316967],[1183075200000,-0.00321392901812256],[1185840000000,-0.0231711099840224],[1188518400000,-0.00635554107826408],[1190937600000,0.0518096328208513],[1193788800000,0.0416209245105454],[1196380800000,-0.0369097642675253],[1199059200000,-0.030266218267621],[1201737600000,-0.081721614519354],[1204243200000,-0.0102820665305772],[1206921600000,0.00418096462123341],[1209513600000,0.0529539788700069],[1212105600000,0.0118021211769301],[1214784000000,-0.0919708423850207],[1217462400000,-0.0337659028908996],[1219968000000,-0.0434049720963574],[1222732800000,-0.121446874633704],[1225411200000,-0.233633792531256],[1227830400000,-0.0658406383578156],[1230681600000,0.0849697845200064],[1233273600000,-0.147706884437237],[1235692800000,-0.109676440956979],[1238457600000,0.0805750581629332],[1241049600000,0.109024825847011],[1243555200000,0.123913444338497],[1246320000000,-0.014148735610481],[1248998400000,0.0956873289264037],[1251676800000,0.0440460621709837],[1254268800000,0.0372621994222415],[1256860800000,-0.0255615317827003],[1259539200000,0.0384626470958627],[1262217600000,0.00707103834293576],[1264723200000,-0.0519789328795244],[1267142400000,0.00266404456608482],[1269993600000,0.0618983035022298],[1272585600000,-0.0284464606542518],[1275004800000,-0.118702497261121],[1277856000000,-0.0208349337886693],[1280448000000,0.109844133473587],[1283212800000,-0.0386892402239898],[1285804800000,0.0950552624964733],[1288310400000,0.0373493333469392],[1291075200000,-0.0494396081247053],[1293753600000,0.0798094976404307],[1296432000000,0.0207385095555375],[1298851200000,0.0348823380990058],[1301529600000,-0.024172885889413],[1304035200000,0.0547329107405674],[1306800000000,-0.0223082413364466],[1309392000000,-0.012181107045369],[1311897600000,-0.0240651682613073],[1314748800000,-0.0916208778704641],[1317340800000,-0.114382111063962],[1320019200000,0.0919176873776739],[1322611200000,-0.0220041261663795],[1325203200000,-0.0219894043548212],[1327968000000,0.0513535533097351],[1330473600000,0.0472000516897708],[1333065600000,0.00419876372819417],[1335744000000,-0.0209872699631335],[1338422400000,-0.118155316494923],[1340928000000,0.0680632384381901],[1343692800000,0.000800283045633066],[1346371200000,0.0314987538731004],[1348790400000,0.0267703018125518],[1351641600000,0.0108840192198798],[1354233600000,0.0274291289141937],[1356912000000,0.042879053480259],[1359590400000,0.036606249800073],[1362009600000,-0.0129693906306034],[1364428800000,0.0129693906306034],[1367280000000,0.0489678331571293],[1369958400000,-0.0306556708342192],[1372377600000,-0.0271532952722922],[1375228800000,0.0518603459929912],[1377820800000,-0.0197463368878967],[1380499200000,0.0753385863662306],[1383177600000,0.0320817400117521],[1385683200000,0.00544956433121691],[1388448000000,0.0215236481684693],[1391126400000,-0.0534134135464139],[1393545600000,0.0595051031275506],[1396224000000,-0.00460251303433612],[1398816000000,0.0165293174141574],[1401408000000,0.0158284098633086],[1404086400000,0.00916832156906366],[1406764800000,-0.0263797166471029],[1409270400000,0.00180046272259027],[1412035200000,-0.0395984129409275],[1414713600000,-0.00265491061272183],[1417132800000,0.000625365168779979],[1419984000000,-0.0407436245216282],[1422576000000,0.00622638001941045],[1424995200000,0.0614505439740869],[1427760000000,-0.0143887128054843],[1430352000000,0.0358165555667957],[1432857600000,0.00195266162417695],[1435622400000,-0.031676232061093],[1438300800000,0.0201144387140753],[1440979200000,-0.0771524802712635],[1443571200000,-0.0451948483375104],[1446163200000,0.0640259924153521],[1448841600000,-0.00755588943730778],[1451520000000,-0.0235950182011635],[1454025600000,-0.056757828831866],[1456704000000,-0.0339139677633575],[1459382400000,0.0637456787775954],[1461888000000,0.0219751639880688],[1464652800000,-0.000856075241223664],[1467244800000,-0.0244915611178333],[1469750400000,0.0390003013163938],[1472601600000,0.005326818637859],[1475193600000,0.0132791495670137],[1477872000000,-0.0224037208451646],[1480464000000,-0.0179744493521579],[1483056000000,0.0267029207072493],[1485820800000,0.0323818692375557],[1488240000000,0.0118364376763189],[1490918400000,0.0318056747302649],[1493337600000,0.0239522219375043],[1496188800000,0.0348101755260704],[1498780800000,0.00295595977539609],[1501459200000,0.0261878741922148],[1504137600000,-0.000448300083002451],[1506643200000,0.0233427786123483],[1507852800000,0.0157916642936531],[1510099200000,0.00430297556569137]],"name":"EFA"},{"data":[[1109548800000,0.0286091835752296],[1112227200000,-0.023881960774137],[1114732800000,-0.0525567567716227],[1117497600000,0.0597360281829751],[1120089600000,0.0384078988363052],[1122595200000,0.0567711131380602],[1125446400000,-0.0188802735072606],[1128038400000,0.00048880282072794],[1130716800000,-0.0347115213331861],[1133308800000,0.0479258007518513],[1135900800000,-0.0105972762787201],[1138665600000,0.0838887532742061],[1141084800000,0.00273098197748478],[1143763200000,0.0474209793328431],[1146182400000,0.00178348087353886],[1149033600000,-0.0465785356809585],[1151625600000,0.000421907447683445],[1154304000000,-0.0314464768563711],[1156982400000,0.0266839077700176],[1159488000000,0.0117932151860591],[1162252800000,0.0537270717712728],[1164844800000,0.0234987876801567],[1167350400000,0.00314726113885566],[1170201600000,0.0211452634290206],[1172620800000,-0.00927025254575131],[1175212800000,0.00901965655165515],[1177891200000,0.0182142389958804],[1180569600000,0.0452977921943942],[1183075200000,-0.022711884103523],[1185840000000,-0.0646391890198581],[1188518400000,0.0132133671646901],[1190937600000,0.00315398135389788],[1193788800000,0.0105470396818728],[1196380800000,-0.0741578989019551],[1199059200000,-0.00720879555141885],[1201737600000,-0.0408223632061162],[1204243200000,-0.0342094861171001],[1206921600000,0.00822045461122922],[1209513600000,0.0260925896069768],[1212105600000,0.0384077659567383],[1214784000000,-0.0879256088027134],[1217462400000,0.0283279880979155],[1219968000000,0.0465201951240743],[1222732800000,-0.0493366989838835],[1225411200000,-0.221415841311478],[1227830400000,-0.12553383493852],[1230681600000,0.0637302731761364],[1233273600000,-0.150182127359193],[1235692800000,-0.139559035429452],[1238457600000,0.0732624814968919],[1241049600000,0.173026367331466],[1243555200000,0.00667886939942797],[1246320000000,0.0150717586599165],[1248998400000,0.104382709443319],[1251676800000,0.0342334862466127],[1254268800000,0.0411531832974159],[1256860800000,-0.0626640381292511],[1259539200000,0.028602676082838],[1262217600000,0.0778780337050256],[1264723200000,-0.0256779610092956],[1267142400000,0.0465277907457828],[1269993600000,0.0750793222749171],[1272585600000,0.0641679965232713],[1275004800000,-0.0892740964513363],[1277856000000,-0.0863780106417571],[1280448000000,0.0649647943035698],[1283212800000,-0.0824206340946496],[1285804800000,0.103833439539222],[1288310400000,0.0395797869497736],[1291075200000,0.0244358100636264],[1293753600000,0.0858594700151807],[1296432000000,-0.00236789467257559],[1298851200000,0.0412461078354003],[1301529600000,0.0218399846777197],[1304035200000,0.0105711516430063],[1306800000000,-0.0178134381544695],[1309392000000,-0.0160791051253089],[1311897600000,-0.0281285023849778],[1314748800000,-0.0831291871138644],[1317340800000,-0.111609868106249],[1320019200000,0.142718125866275],[1322611200000,0.00627855019597234],[1325203200000,0.0199989465075365],[1327968000000,0.0760209064294912],[1330473600000,0.0169931262311458],[1333065600000,0.0285567019044617],[1335744000000,-0.0192759883030487],[1338422400000,-0.0744574763828227],[1340928000000,0.0444854793666387],[1343692800000,-0.00688489889294264],[1346371200000,0.0399652729751869],[1348790400000,0.030834518414717],[1351641600000,-0.0198151194259841],[1354233600000,0.0163917216261975],[1356912000000,0.0348941306160588],[1359590400000,0.052133127161075],[1362009600000,0.0161755825286827],[1364428800000,0.0402581483054432],[1367280000000,0.00122250892083642],[1369958400000,0.0419764663520441],[1372377600000,-0.00140311096043089],[1375228800000,0.0635411624860351],[1377820800000,-0.0347433020936734],[1380499200000,0.0638705228684522],[1383177600000,0.034234005648365],[1385683200000,0.0416611454340288],[1388448000000,0.0128910143700107],[1391126400000,-0.0357753086929975],[1393545600000,0.0452574477086429],[1396224000000,0.0133144732671528],[1398816000000,-0.0231842093818004],[1401408000000,0.00620536673268912],[1404086400000,0.0377146432070781],[1406764800000,-0.052009280888619],[1409270400000,0.0436577760943528],[1412035200000,-0.0612641877276303],[1414713600000,0.068874988549612],[1417132800000,0.00477357970257231],[1419984000000,0.0252944749655128],[1422576000000,-0.054627890333558],[1424995200000,0.0569145366582307],[1427760000000,0.0101562438764011],[1430352000000,-0.0184175223251408],[1432857600000,0.00750976285273453],[1435622400000,0.00417146978708427],[1438300800000,-0.0273754179729648],[1440979200000,-0.0472684430048069],[1443571200000,-0.0384635075969859],[1446163200000,0.0635897190900101],[1448841600000,0.0244152403272127],[1451520000000,-0.0521569132328743],[1454025600000,-0.0603069715640157],[1456704000000,0.0206053245172564],[1459382400000,0.0899102913009653],[1461888000000,0.0210442397678579],[1464652800000,0.00439714604375929],[1467244800000,0.00829228533296789],[1469750400000,0.0493483204384928],[1472601600000,0.0112610906578512],[1475193600000,0.00861467666116589],[1477872000000,-0.0381348678009106],[1480464000000,0.125246458484529],[1483056000000,0.0314916841334938],[1485820800000,-0.0121438398744331],[1488240000000,0.013428773146078],[1490918400000,-0.00653302685795687],[1493337600000,0.00510773653279983],[1496188800000,-0.0228626735155766],[1498780800000,0.0291518775628319],[1501459200000,0.00748150981266793],[1504137600000,-0.0275646800558462],[1506643200000,0.0823217315839475],[1507852800000,0.00477588769901161],[1510099200000,-0.014600793134945]],"name":"IJS"}],"navigator":{"enabled":false},"scrollbar":{"enabled":false}},"theme":{"colors":["#f1c40f","#2ecc71","#9b59b6","#e74c3c","#34495e","#3498db","#1abc9c","#f39c12","#d35400"],"chart":{"backgroundColor":"#ECF0F1"},"xAxis":{"gridLineDashStyle":"Dash","gridLineWidth":1,"gridLineColor":"#BDC3C7","lineColor":"#BDC3C7","minorGridLineColor":"#BDC3C7","tickColor":"#BDC3C7","tickWidth":1},"yAxis":{"gridLineDashStyle":"Dash","gridLineColor":"#BDC3C7","lineColor":"#BDC3C7","minorGridLineColor":"#BDC3C7","tickColor":"#BDC3C7","tickWidth":1},"legendBackgroundColor":"rgba(0, 0, 0, 0.5)","background2":"#505053","dataLabelsColor":"#B0B0B3","textColor":"#34495e","contrastTextColor":"#F0F0F3","maskColor":"rgba(255,255,255,0.3)"},"conf_opts":{"global":{"Date":null,"VMLRadialGradientURL":"http =//code.highcharts.com/list(version)/gfx/vml-radial-gradient.png","canvasToolsURL":"http =//code.highcharts.com/list(version)/modules/canvas-tools.js","getTimezoneOffset":null,"timezoneOffset":0,"useUTC":true},"lang":{"contextButtonTitle":"Chart context menu","decimalPoint":".","downloadJPEG":"Download JPEG image","downloadPDF":"Download PDF document","downloadPNG":"Download PNG image","downloadSVG":"Download SVG vector image","drillUpText":"Back to {series.name}","invalidDate":null,"loading":"Loading...","months":["January","February","March","April","May","June","July","August","September","October","November","December"],"noData":"No data to display","numericSymbols":["k","M","G","T","P","E"],"printChart":"Print chart","resetZoom":"Reset zoom","resetZoomTitle":"Reset zoom level 1:1","shortMonths":["Jan","Feb","Mar","Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec"],"thousandsSep":" ","weekdays":["Sunday","Monday","Tuesday","Wednesday","Thursday","Friday","Saturday"]}},"type":"stock","fonts":[],"debug":false},"evals":[],"jsHooks":[]}

    Take a look at the chart. It has a line for the monthly log returns of three of our ETFs (and in my opinion, it’s already starting to get crowded). We might be able to pull some useful intuition from this chart. Perhaps one of our ETFs remained stable during the 2008 financial crisis, or had an era of consistently negative/positive returns. Highcharter is great for plotting time series line charts.

    Highcharter does have the capacity for histogram making. One method is to first call the base function hist on the data with the arguments for breaks and plot = FALSE. Then we can call hchart on that object.

    >>>>>>> ea33b5686b615c7a2697ce0b8da92b85932d655f

    hc_spy <- hist(asset_returns_xts$SPY, breaks = 50, plot = FALSE) hchart(hc_spy) %>% hc_title(text = "SPY Log Returns Distribution")

    {"x":{"hc_opts":{"title":{"text":"SPY Log Returns Distribution"},"yAxis":{"title":{"text":null}},"credits":{"enabled":false},"exporting":{"enabled":false},"plotOptions":{"series":{"turboThreshold":0},"treemap":{"layoutAlgorithm":"squarified"},"bubble":{"minSize":5,"maxSize":25}},"annotationsOptions":{"enabledButtons":false},"tooltip":{"delayForDisplay":10,"formatter":"function() { return this.point.name + '
    ' + this.y; }"},"chart":{"zoomType":"x"},"series":[{"data":[{"x":-0.1825,"y":1,"name":"(-0.185, -0.18]"},{"x":-0.1775,"y":0,"name":"(-0.18, -0.175]"},{"x":-0.1725,"y":0,"name":"(-0.175, -0.17]"},{"x":-0.1675,"y":0,"name":"(-0.17, -0.165]"},{"x":-0.1625,"y":0,"name":"(-0.165, -0.16]"},{"x":-0.1575,"y":0,"name":"(-0.16, -0.155]"},{"x":-0.1525,"y":0,"name":"(-0.155, -0.15]"},{"x":-0.1475,"y":0,"name":"(-0.15, -0.145]"},{"x":-0.1425,"y":0,"name":"(-0.145, -0.14]"},{"x":-0.1375,"y":0,"name":"(-0.14, -0.135]"},{"x":-0.1325,"y":0,"name":"(-0.135, -0.13]"},{"x":-0.1275,"y":0,"name":"(-0.13, -0.125]"},{"x":-0.1225,"y":0,"name":"(-0.125, -0.12]"},{"x":-0.1175,"y":0,"name":"(-0.12, -0.115]"},{"x":-0.1125,"y":1,"name":"(-0.115, -0.11]"},{"x":-0.1075,"y":0,"name":"(-0.11, -0.105]"},{"x":-0.1025,"y":0,"name":"(-0.105, -0.1]"},{"x":-0.0975,"y":1,"name":"(-0.1, -0.095]"},{"x":-0.0925,"y":0,"name":"(-0.095, -0.09]"},{"x":-0.0875,"y":2,"name":"(-0.09, -0.085]"},{"x":-0.0825,"y":1,"name":"(-0.085, -0.08]"},{"x":-0.0775,"y":0,"name":"(-0.08, -0.075]"},{"x":-0.0725,"y":2,"name":"(-0.075, -0.07]"},{"x":-0.0675,"y":0,"name":"(-0.07, -0.065]"},{"x":-0.0625,"y":3,"name":"(-0.065, -0.06]"},{"x":-0.0575,"y":1,"name":"(-0.06, -0.055]"},{"x":-0.0525,"y":2,"name":"(-0.055, -0.05]"},{"x":-0.0475,"y":1,"name":"(-0.05, -0.045]"},{"x":-0.0425,"y":0,"name":"(-0.045, -0.04]"},{"x":-0.0375,"y":3,"name":"(-0.04, -0.035]"},{"x":-0.0325,"y":4,"name":"(-0.035, -0.03]"},{"x":-0.0275,"y":2,"name":"(-0.03, -0.025]"},{"x":-0.0225,"y":3,"name":"(-0.025, -0.02]"},{"x":-0.0175,"y":9,"name":"(-0.02, -0.015]"},{"x":-0.0125,"y":6,"name":"(-0.015, -0.01]"},{"x":-0.00750000000000003,"y":4,"name":"(-0.01, -0.00500000000000003]"},{"x":-0.00250000000000002,"y":6,"name":"(-0.00500000000000002, -1.38777878078145e-17]"},{"x":0.00249999999999999,"y":11,"name":"(-1.38777878078145e-17, 0.00499999999999999]"},{"x":0.00749999999999998,"y":10,"name":"(0.00499999999999998, 0.00999999999999998]"},{"x":0.0125,"y":12,"name":"(0.00999999999999997, 0.015]"},{"x":0.0175,"y":11,"name":"(0.015, 0.02]"},{"x":0.0225,"y":12,"name":"(0.02, 0.025]"},{"x":0.0275,"y":6,"name":"(0.025, 0.03]"},{"x":0.0325,"y":8,"name":"(0.03, 0.035]"},{"x":0.0375,"y":10,"name":"(0.035, 0.04]"},{"x":0.0425,"y":4,"name":"(0.04, 0.045]"},{"x":0.0475,"y":4,"name":"(0.045, 0.05]"},{"x":0.0525,"y":2,"name":"(0.05, 0.055]"},{"x":0.0575,"y":3,"name":"(0.055, 0.06]"},{"x":0.0625,"y":1,"name":"(0.06, 0.065]"},{"x":0.0675,"y":2,"name":"(0.065, 0.07]"},{"x":0.0725,"y":1,"name":"(0.07, 0.075]"},{"x":0.0775,"y":0,"name":"(0.075, 0.08]"},{"x":0.0825,"y":2,"name":"(0.08, 0.085]"},{"x":0.0875,"y":1,"name":"(0.085, 0.09]"},{"x":0.0925,"y":1,"name":"(0.09, 0.095]"},{"x":0.0975,"y":0,"name":"(0.095, 0.1]"},{"x":0.1025,"y":1,"name":"(0.1, 0.105]"}],"type":"column","pointRange":0.005,"groupPadding":0,"pointPadding":0,"borderWidth":0}]},"theme":{"chart":{"backgroundColor":"transparent"}},"conf_opts":{"global":{"Date":null,"VMLRadialGradientURL":"http =//code.highcharts.com/list(version)/gfx/vml-radial-gradient.png","canvasToolsURL":"http =//code.highcharts.com/list(version)/modules/canvas-tools.js","getTimezoneOffset":null,"timezoneOffset":0,"useUTC":true},"lang":{"contextButtonTitle":"Chart context menu","decimalPoint":".","downloadJPEG":"Download JPEG image","downloadPDF":"Download PDF document","downloadPNG":"Download PNG image","downloadSVG":"Download SVG vector image","drillUpText":"Back to {series.name}","invalidDate":null,"loading":"Loading...","months":["January","February","March","April","May","June","July","August","September","October","November","December"],"noData":"No data to display","numericSymbols":["k","M","G","T","P","E"],"printChart":"Print chart","resetZoom":"Reset zoom","resetZoomTitle":"Reset zoom level 1:1","shortMonths":["Jan","Feb","Mar","Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec"],"thousandsSep":" ","weekdays":["Sunday","Monday","Tuesday","Wednesday","Thursday","Friday","Saturday"]}},"type":"chart","fonts":[],"debug":false},"evals":["hc_opts.tooltip.formatter"],"jsHooks":[]}

    Nothing wrong with that chart, and it shows us the distribution of SPY returns. However, highcharter is missing an easy way to chart multiple histograms, and to add density lines to those multiple histograms. The functionality is fine for one set of returns, but here we want to see the distribution of all of our returns series together.

    For that, we will head to the tidyverse and use ggplot2 on our tidy tibble called assets_returns_long. Because it is in long, tidy format, and it is grouped by the ‘asset’ column, we can chart the asset histograms collectively on one chart.

    # Make so all titles centered in the upcoming ggplots theme_update(plot.title = element_text(hjust = 0.5)) asset_returns_long %>% ggplot(aes(x = returns, fill = asset)) + geom_histogram(alpha = 0.25, binwidth = .01)

    Let’s use facet_wrap(~asset) to break these out by asset. We can add a title with ggtitle.

    asset_returns_long %>% ggplot(aes(x = returns, fill = asset)) + geom_histogram(alpha = 0.25, binwidth = .01) + facet_wrap(~asset) + ggtitle("Monthly Returns Since 2005")

    Maybe we don’t want to use a histogram, but instead want to use a density line to visualize the various distributions. We can use the stat_density(geom = "line", alpha = 1) function to do this. The alpha argument is selecting a line thickness. Let’s also add a label to the x and y axes with the xlab and ylab functions.

    asset_returns_long %>% ggplot(aes(x = returns, colour = asset, fill = asset)) + stat_density(geom = "line", alpha = 1) + ggtitle("Monthly Returns Since 2005") + xlab("monthly returns") + ylab("distribution")

    That chart is quite digestible, but we can also facet_wrap(~asset) to break the densities out into individual charts.

    asset_returns_long %>% ggplot(aes(x = returns, colour = asset, fill = asset)) + stat_density(geom = "line", alpha = 1) + facet_wrap(~asset) + ggtitle("Monthly Returns Since 2005") + xlab("monthly returns") + ylab("distribution")

    Now we can combine all of our ggplots into one nice, faceted plot.

    At the same time, to add to the aesthetic toolkit a bit, we will do some editing to the label colors. First off, let’s choose a different color besides black to be the theme. I will go with cornflower blue, because it’s a nice shade and I don’t see it used very frequently elsewhere. Once we have a color, we can choose the different elements of the chart to change in the the theme function. I make a lot of changes here by way of example but feel free to comment out a few of those lines and see the different options.

    asset_returns_long %>% ggplot(aes(x = returns, colour = asset, fill = asset)) + stat_density(geom = "line", alpha = 1) + geom_histogram(alpha = 0.25, binwidth = .01) + facet_wrap(~asset) + ggtitle("Monthly Returns Since 2005") + xlab("monthly returns") + ylab("distribution") + # Lots of elements can be customized in the theme() function theme(plot.title = element_text(colour = "cornflowerblue"), strip.text.x = element_text(size = 8, colour = "white"), strip.background = element_rect(colour = "white", fill = "cornflowerblue"), axis.text.x = element_text(colour = "cornflowerblue"), axis.text = element_text(colour = "cornflowerblue"), axis.ticks.x = element_line(colour = "cornflowerblue"), axis.text.y = element_text(colour = "cornflowerblue"), axis.ticks.y = element_line(colour = "cornflowerblue"), axis.title = element_text(colour = "cornflowerblue"), legend.title = element_text(colour = "cornflowerblue"), legend.text = element_text(colour = "cornflowerblue") )

    We now have one chart, with histograms and line densities broken out for each of our assets. This would scale nicely if we had more assets and wanted to peek at more distributions of returns.

    We have not done any substantive work today, but the chart of monthly returns is a tool to quickly glance at the data and see if anything unusual jumps out, or some sort of hypothesis comes to mind. We are going to be combining these assets into a portfolio and, once that occurs, we will rarely view the assets in isolation again. Before that leap to portfolio building, it’s a good idea to glance at the portfolio component distributions.

    That’s all for today. Thanks for reading!

    _____='https://rviews.rstudio.com/2017/11/09/introduction-to-visualizing-asset-returns/';

    var vglnk = { key: '949efb41171ac6ec1bf7f206d57e90b8' }; (function(d, t) { var s = d.createElement(t); s.type = 'text/javascript'; s.async = true; s.src = '//cdn.viglink.com/api/vglnk.js'; var r = d.getElementsByTagName(t)[0]; r.parentNode.insertBefore(s, r); }(document, 'script'));

    To leave a comment for the author, please follow the link and comment on their blog: R Views. R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

    bridgesampling [R package]

    Thu, 11/09/2017 - 00:17

    (This article was first published on R – Xi'an's Og, and kindly contributed to R-bloggers)

    Quentin F. Gronau, Henrik Singmann and Eric-Jan Wagenmakers have arXived a detailed documentation about their bridgesampling R package. (No wonder that researchers from Amsterdam favour bridge sampling!) [The package relates to a [52 pages] tutorial on bridge sampling by Gronau et al. that I will hopefully comment soon.] The bridge sampling methodology for marginal likelihood approximation requires two Monte Carlo samples for a ratio of two integrals. A nice twist in this approach is to use a dummy integral that is already available, with respect to a probability density that is an approximation to the exact posterior. This means avoiding the difficulties with bridge sampling of bridging two different parameter spaces, in possibly different dimensions, with potentially very little overlap between the posterior distributions. The substitute probability density is chosen as Normal or warped Normal, rather than a t which would provide more stability in my opinion. The bridgesampling package also provides an error evaluation for the approximation, although based on spectral estimates derived from the coda package. The remainder of the document exhibits how the package can be used in conjunction with either JAGS or Stan. And concludes with the following words of caution:

    “It should also be kept in mind that there may be cases in which the bridge sampling procedure may not be the ideal choice for conducting Bayesian model comparisons. For instance, when the models are nested it might be faster and easier to use the Savage-Dickey density ratio (Dickey and Lientz 1970; Wagenmakers et al. 2010). Another example is when the comparison of interest concerns a very large model space, and a separate bridge sampling based computation of marginal likelihoods may take too much time. In this scenario, Reversible Jump MCMC (Green 1995) may be more appropriate.”

    Filed under: pictures, R, Statistics, University life Tagged: Amsterdam, bridge, bridge sampling, bridgesampling, JAGS, R, R package, STAN, University of Amsterdam, warped bridge sampling

    var vglnk = { key: '949efb41171ac6ec1bf7f206d57e90b8' }; (function(d, t) { var s = d.createElement(t); s.type = 'text/javascript'; s.async = true; s.src = '//cdn.viglink.com/api/vglnk.js'; var r = d.getElementsByTagName(t)[0]; r.parentNode.insertBefore(s, r); }(document, 'script'));

    To leave a comment for the author, please follow the link and comment on their blog: R – Xi'an's Og. R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

    Calculating the house edge of a slot machine, with R

    Wed, 11/08/2017 - 23:32

    (This article was first published on Revolutions, and kindly contributed to R-bloggers)

    Modern slot machines (fruit machine, pokies, or whatever those electronic gambling devices are called in your part of the world) are designed to be addictive. They're also usually quite complicated, with a bunch of features that affect the payout of a spin: multiple symbols with different pay scales, wildcards, scatter symbols, free spins, jackpots … the list goes on. Many machines also let you play multiple combinations at the same time (20 lines, or 80, or even more with just one spin). All of this complexity is designed to make it hard for you, the player, to judge the real odds of success. But rest assured: in the long run, you always lose. 

    All slot machines are designed to have a "house edge" — the percentage of player bets retained by the machine in the long run — greater than zero. Some may take 1% of each bet (over a long-run average); some may take as much as 15%. But every slot machine takes something.

    That being said, with all those complex rules and features, trying to calculate the house edge, even when you know all of the underlying probabilities and frequencies, is no easy task. Giora Simchoni demonstrates the problem with an R script to calculate the house edge of an "open source" slot machine Atkins Diet. Click the image below to try it out. 

    This virtual machine is at a typical level of complexity of modern slot machines. Even though we know the pay table (which is always public) and the relative frequency of the symbols on the reels (which usually isn't), calculating the house edge for this machine requires several pages of code. You could calculate the expected return analytically, of course, but it turns out to be a somewhat error-prone combinatorial problem. The simplest approach is to simulate playing the machine 100,000 times or so. Then we can have a look at the distribution of the payouts over all of these spins:

    The x axis here is log(Total Wins + 1), in log-dollars, from a single spin. It's interesting to see the impact of the bet size (which increases variance but doesn't change the distribution), and the number of lines played. Playing one 20-line game isn't the same as playing 20 1-line games, because the re-use of the symbols means multi-line wins are not independent: a high-value symbol (like a wild) may contribute to wins on multiple lines. Conversely, losing combinations have a tendency to cluster together, too. It all balances in the end, but the possibility of more frequent wins (coupled with higher-value losses) is apparently appealing to players, since many machines encourage multi-line play.

    Nonetheless, whichever method you play, the house edge is always positive. For Atkins Diet, it's about 4% for single-line play, and about 3.2% for multi-line play. You can see the details of the calculation, and the complete R code behind it, at the link below.

    Giora Simchoni: Don't Drink and Gamble (via the author)

    var vglnk = { key: '949efb41171ac6ec1bf7f206d57e90b8' }; (function(d, t) { var s = d.createElement(t); s.type = 'text/javascript'; s.async = true; s.src = '//cdn.viglink.com/api/vglnk.js'; var r = d.getElementsByTagName(t)[0]; r.parentNode.insertBefore(s, r); }(document, 'script'));

    To leave a comment for the author, please follow the link and comment on their blog: Revolutions. R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

    Creating Reporting Template with Glue in R

    Wed, 11/08/2017 - 15:00

    Report Generation is a very important part in any Organization’s Business Intelligence and Analytics Division. The ability to create automated reports out of the given data is one of the most desirable things, that any innovative team would thrive for. And that is one area where SAS is considered to be more matured than R – not because R does not have those features – but primarily because R practitioners are not familiar with those. That’s the same feeling I came across today when I stumbled upon this package glue in R, which is a very good and competitive alternative for Reporting Template packages like whisker and brew.

    The package can be installed directly from CRAN.

    install.packages('glue')

    Let us try to put together a very minimal reporting template to output basic information about the given Dataset.

    library(glue) df <- mtcars msg <- 'Dataframe Info: \n\n This dataset has {nrow(df)} rows and {ncol(df)} columns. \n There {ifelse(sum(is.na(df))>0,"is","are")} {sum(is.na(df))} Missing Value glue(msg) Dataframe Info: This dataset has 32 rows and 11 columns. There are 0 Missing Values.

    As in the above code, glue() is the primary function that takes a string with r expressions enclosed in curly braces {} whose resulting value would get concatenated with the given string. Creation of the templatised string is what we have done with msg. This whole exercise wouldn’t make much of a sense if it’s required for only one dataset, rather it serves its purpose when the same code is used for different datasets with no code change. Hence let us try running this on a different dataset – R’s inbuilt iris dataset. Also since we are outputting the count of missing values, let’s manually assign NA for two instances and run the code.

    df <- iris df[2,3] <- NA df[4,2] <- NA msg <- 'Dataframe Info: \n\n This dataset has {nrow(df)} rows and {ncol(df)} columns. \n There {ifelse(sum(is.na(df))>0,"is","are")} {sum(is.na(df))} Missing Value glue(msg) Dataframe Info: This dataset has 150 rows and 5 columns. There is 2 Missing Values.

    That looks fine. But what if we want to report the contents of the dataframe. That’s where coupling glue's glue_data() function with magrittr's %>% operator helps.

    library(magrittr) head(mtcars) %>% glue_data("* {rownames(.)} has {cyl} cylinders and {hp} hp") * Mazda RX4 has 6 cylinders and 110 hp * Mazda RX4 Wag has 6 cylinders and 110 hp * Datsun 710 has 4 cylinders and 93 hp * Hornet 4 Drive has 6 cylinders and 110 hp * Hornet Sportabout has 8 cylinders and 175 hp * Valiant has 6 cylinders and 105 hp

    This is just to introduce glue and its possibilities. This could potentially help in automating a lot of Reports and also to start with Exception-based Reporting. The code used in the article can be found on my github.

      Related Post

      1. Predict Employee Turnover With Python
      2. Making a Shiny dashboard using ‘highcharter’ – Analyzing Inflation Rates
      3. Time Series Analysis in R Part 2: Time Series Transformations
      4. Time Series Analysis in R Part 1: The Time Series Object
      5. Parsing Text for Emotion Terms: Analysis & Visualization Using R
      var vglnk = { key: '949efb41171ac6ec1bf7f206d57e90b8' }; (function(d, t) { var s = d.createElement(t); s.type = 'text/javascript'; s.async = true; s.src = '//cdn.viglink.com/api/vglnk.js'; var r = d.getElementsByTagName(t)[0]; r.parentNode.insertBefore(s, r); }(document, 'script'));

      R / Finance 2018 Call for Papers

      Wed, 11/08/2017 - 13:50

      (This article was first published on Thinking inside the box , and kindly contributed to R-bloggers)

      The tenth (!!) annual annual R/Finance conference will take in Chicago on the UIC campus on June 1 and 2, 2018. Please see the call for papers below (or at the website) and consider submitting a paper.

      We are once again very excited about our conference, thrilled about who we hope may agree to be our anniversary keynotes, and hope that many R / Finance users will not only join us in Chicago in June — and also submit an exciting proposal.

      So read on below, and see you in Chicago in June!

      Call for Papers

      R/Finance 2018: Applied Finance with R
      June 1 and 2, 2018
      University of Illinois at Chicago, IL, USA

      The tenth annual R/Finance conference for applied finance using R will be held June 1 and 2, 2018 in Chicago, IL, USA at the University of Illinois at Chicago. The conference will cover topics including portfolio management, time series analysis, advanced risk tools, high-performance computing, market microstructure, and econometrics. All will be discussed within the context of using R as a primary tool for financial risk management, portfolio construction, and trading.

      Over the past nine years, R/Finance has includedattendeesfrom around the world. It has featured presentations from prominent academics and practitioners, and we anticipate another exciting line-up for 2018.

      We invite you to submit complete papers in pdf format for consideration. We will also consider one-page abstracts (in txt or pdf format) although more complete papers are preferred. We welcome submissions for both full talks and abbreviated "lightning talks." Both academic and practitioner proposals related to R are encouraged.

      All slides will be made publicly available at conference time. Presenters are strongly encouraged to provide working R code to accompany the slides. Data sets should also be made public for the purposes of reproducibility (though we realize this may be limited due to contracts with data vendors). Preference may be given to presenters who have released R packages.

      Please submit proposals online at http://go.uic.edu/rfinsubmit. Submissions will be reviewed and accepted on a rolling basis with a final submission deadline of February 2, 2018. Submitters will be notified via email by March 2, 2018 of acceptance, presentation length, and financial assistance (if requested).

      Financial assistance for travel and accommodation may be available to presenters. Requests for financial assistance do not affect acceptance decisions. Requests should be made at the time of submission. Requests made after submission are much less likely to be fulfilled. Assistance will be granted at the discretion of the conference committee.

      Additional details will be announced via the conference website at http://www.RinFinance.com/ as they become available. Information on previous years’presenters and their presentations are also at the conference website. We will make a separate announcement when registration opens.

      For the program committee:

      Gib Bassett, Peter Carl, Dirk Eddelbuettel, Brian Peterson,
      Dale Rosenthal, Jeffrey Ryan, Joshua Ulrich

      var vglnk = { key: '949efb41171ac6ec1bf7f206d57e90b8' }; (function(d, t) { var s = d.createElement(t); s.type = 'text/javascript'; s.async = true; s.src = '//cdn.viglink.com/api/vglnk.js'; var r = d.getElementsByTagName(t)[0]; r.parentNode.insertBefore(s, r); }(document, 'script'));

      To leave a comment for the author, please follow the link and comment on their blog: Thinking inside the box . R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

      RQuantLib 0.4.4: Several smaller updates

      Wed, 11/08/2017 - 12:45

      (This article was first published on Thinking inside the box , and kindly contributed to R-bloggers)

      A shiny new (mostly-but-not-completely maintenance) release of RQuantLib, now at version 0.4.4, arrived on CRAN overnight, and will get to Debian shortly. This is the first release in over a year, and it it contains (mostly) a small number of fixes throughout. It also includes the update to the new DateVector and DatetimeVector classes which become the default with the upcoming Rcpp 0.12.14 release (just like this week’s RcppQuantuccia release). One piece of new code is due to François Cocquemas who added support for discrete dividends to both European and American options. See below for the complete set of changes reported in the NEWS file.

      As with release 0.4.3 a little over a year ago, we will not have new Windows binaries from CRAN as I apparently have insufficient powers of persuasion to get CRAN to update their QuantLib libraries. So we need a volunteer. If someone could please build a binary package for Windows from the 0.4.4 sources, I would be happy to once again host it on the GHRR drat repo. Please contact me directly if you can help.

      Changes are listed below:

      Changes in RQuantLib version 0.4.4 (2017-11-07)
      • Changes in RQuantLib code:

        • Equity options can now be analyzed via discrete dividends through two vectors of dividend dates and values (Francois Cocquemas in #73 fixing #72)

        • Some package and dependency information was updated in files DESCRIPTION and NAMESPACE.

        • The new Date(time)Vector classes introduced with Rcpp 0.12.8 are now used when available.

        • Minor corrections were applied to BKTree, to vanilla options for the case of intraday time stamps, to the SabrSwaption documentation, and to bond utilities for the most recent QuantLib release.

      Courtesy of CRANberries, there is also a diffstat report for the this release. As always, more detailed information is on the RQuantLib page. Questions, comments etc should go to the rquantlib-devel mailing list off the R-Forge page. Issue tickets can be filed at the GitHub repo.

      This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

      var vglnk = { key: '949efb41171ac6ec1bf7f206d57e90b8' }; (function(d, t) { var s = d.createElement(t); s.type = 'text/javascript'; s.async = true; s.src = '//cdn.viglink.com/api/vglnk.js'; var r = d.getElementsByTagName(t)[0]; r.parentNode.insertBefore(s, r); }(document, 'script'));

      To leave a comment for the author, please follow the link and comment on their blog: Thinking inside the box . R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

      Pages