R news and tutorials contributed by hundreds of R bloggers
Updated: 14 hours 19 min ago

### Detect When the Random Number Generator Was Used

Mon, 09/21/2020 - 21:45

[social4i size="small" align="align-left"] --> [This article was first published on JottR on R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

If you ever need to figure out if a function call in R generated a random number or not, here is a simple trick that you can use in an interactive R session. Add the following to your ~/.Rprofile(*):

if (interactive()) { invisible(addTaskCallback(local({ last <- .GlobalEnv$.Random.seed function(...) { curr <- .GlobalEnv$.Random.seed if (!identical(curr, last)) { msg <- "NOTE: .Random.seed changed" if (requireNamespace("crayon", quietly=TRUE)) msg <- crayon::blurred(msg) message(msg) last <<- curr } TRUE } }), name = "RNG tracker")) }

It works by checking whether or not the state of the random number generator (RNG), that is, .Random.seed in the global environment, was changed. If it has, a note is produced. For example,

> sum(1:100) [1] 5050 > runif(1) [1] 0.280737 NOTE: .Random.seed changed >

It is not always obvious that a function generates random numbers internally. For instance, the rank() function may or may not updated the RNG state depending on argument ties as illustrated in the following example:

> x <- c(1, 4, 3, 2) > rank(x) [1] 1.0 2.5 2.5 4.0 > rank(x, ties.method = "random") [1] 1 3 2 4 NOTE: .Random.seed changed >

For some functions, it may even depend on the input data whether or not random numbers are generated, e.g.

> y <- matrixStats::rowRanks(matrix(c(1,2,2), nrow=2, ncol=3), ties.method = "random") NOTE: .Random.seed changed > y <- matrixStats::rowRanks(matrix(c(1,2,3), nrow=2, ncol=3), ties.method = "random") >

I have this RNG tracker enabled all the time to learn about functions that unexpectedly draw random numbers internally, which can be important to know when you run statistical analysis in parallel.

As a bonus, if you have the crayon package installed, the RNG tracker will output the note with a style that is less intrusive.

(*) If you use the startup package, you can add it to a new file ~/.Rprofile.d/interactive=TRUE/rng_tracker.R. To learn more about the startup package, have a look at the blog posts on startup.

var vglnk = {key: '949efb41171ac6ec1bf7f206d57e90b8'}; (function(d, t) { var s = d.createElement(t); s.type = 'text/javascript'; s.async = true; // s.defer = true; // s.src = '//cdn.viglink.com/api/vglnk.js'; s.src = 'https://www.r-bloggers.com/wp-content/uploads/2020/08/vglnk.js'; var r = d.getElementsByTagName(t)[0]; r.parentNode.insertBefore(s, r); }(document, 'script'));

The post Detect When the Random Number Generator Was Used first appeared on R-bloggers.

### August 2020: “Top 40” New CRAN Packages

Mon, 09/21/2020 - 20:00

[social4i size="small" align="align-left"] --> [This article was first published on R Views, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

One hundred forty-six new packages stuck to CRAN in August. Below, are my “Top 40” picks in eleven categories: Computational Methods, Data, Genomics, Insurance, Machine Learning, Mathematics, Medicine, Statistics, Time Series, Utilities and Visualization.

Computational Methods

dpseg v0.1.1: Implements an algorithm for piecewise linear segmentation of ordered data by a dynamic programming algorithm. See the vignette.

qpmadr v0.1.0: Implements the method outlined in Goldfarb & Idnani (1983) for solving quadratic problems with linear inequality, equality, and box constraints.

WoodburyMatrix v0.0.1: Implements a hierarchy of classes and methods for manipulating matrices formed implicitly from the sums of the inverses of other matrices, a situation commonly encountered in spatial statistics and related fields. See the vignette for details.

Data

neonstore v0.2.2: Provides to access to numerous National Ecological Observatory Network (NEON) data sets through its API.

pdxTrees v0.4.0: A collection of datasets from Portland Parks and Recreation which inventoried every tree in over one hundred seventy parks and along the streets in ninety-six neighborhoods. See the vignette.

Genomics

hiphop v0.0.1: Implements a method to compare the genotypes of offspring with any combination of potential parents, and scores the number of mismatches of these individuals at bi-allelic genetic markers that can be used for paternity and maternity assignment. See Huisman (2017) for background, and the vignette for an introduction.

RapidoPGS v1.0.2: Provides functions to quickly compute polygenic scores from GWAS summary statistics of either case-control or quantitative traits without LD matrix computation or parameter tuning. See Reales et al. (2020) for details and the vignette for examples.

Insurance

SynthETIC v0.1.0: Implements an individual claims simulator which generates synthetic data that emulates various features of non-life insurance claims. Refer to Avanzi et al. (2020) for background and see the vignette for examples.

Machine Learning

sparklyr.flint v0.1.1: Extends sparklyr to include Flint time series functionality. Vignettes include Importing Data and Time Series RDD.

torch v0.0.3: Provides functionality to define and train neural networks similar to PyTorch by Paszke et al (2019) but written entirely in R. There are vignettes on Extending Autograd, Indexing tensors, Loading data, Creating tensors, and Using autograd.

Mathematics

gasper v1.0.1: Provides the standard operations for signal processing on graphs including graph Fourier transform, spectral graph wavelet transform, visualization tools. See De Loynes et al. (2019) for background and the package vignette.

GeodRegr v0.1.0: Provides a gradient descent algorithm to find a geodesic relationship between real-valued independent variables and a manifold-valued dependent variable (i.e. geodesic regression). Available manifolds are Euclidean space, the sphere, and Kendall’s 2-dimensional shape space. See Shin & Oh (2020), Fletcher (2013), Kim et al. (2104)] for background.

geos v0.0.1: Provides an R API to the Open Source Geometry Engine GEOS library and a vector format with which to efficiently store GEOS geometries. See README for an example.

pcSteiner v1.0.0: Provides functions for obtaining an approximate solution to the prize winning Steiner Tree problem that seeks a subgraph connecting a given set of vertices with the most expensive nodes and least expensive edges. This implementation uses a loopy belief propagation algorithm. There is a Tutorial.

TCIU v1.1.0: Provides the core functionality to transform longitudinal data to complex-time (kime) data using analytic and numerical techniques, visualize the original time-series and reconstructed kime-surfaces, perform model based (e.g., tensor-linear regression) and model-free classification and clustering methods. See Dinov & Velev (2021) for background. There are vignettes on Laplace Transform and Kime Surface Transforms and Workflows of TCIU Analytics.

Medicine

epigraphdb v0.2.1: Provides access to the EpiGraphDB platform. There is an overview, vignettes on the API, Platform Functionality, Meta Functions and three case studies on SNP protein associations, Drug Targets and Causal Evidence.

raveio v0.0.3: implements an interface to the RAVE (R analysis and visualization of human intracranial electroencephalography data) project which aims at analyzing brain recordings from patients with electrodes placed on the cortical surface or inserted into the brain. See Mafnotti et al. (2020) for background.

tboot v0.2.0: Provides functions to simulate clinical trial data with realistic correlation structures and assumed efficacy levels by using a tilted bootstrap resampling approach. There is a tutorial on The Tilted Bootstrap and another on Bayesian Marginal Reconstruction.

Statistics

BayesMRA v1.0.0: Fits sparse Bayesian multi-resolution spatial models using Markov Chain Monte Carlo. See the vignette.

bsem v1.0.0: Implements functions to allow structural equation modeling for particular cases using rstan that includes Bayesian semi-confirmatory factor analysis, confirmatory factor analysis, and structural equation models. See Mayrink (2013) for background and the vignettes: Get Started and Exploring bsem class.

cyclomort v1.0.2: Provides functions to do survival modeling with a periodic hazard function. See Gurarie et al. (2020) and the vignette for details.

ebmstate v0.1.1: Implements an empirical Bayes, multi-state Cox model for survival analysis. See Schall (1991) for details.

fairmodels v0.2.2: Provides functions to measure fairness for multiple models including measuring a model’s bias towards different races, sex, nationalities etc. There are Basic and Advanced tutorials.

MGMM v0.3.1: Implements clustering of multivariate normal random vectors with missing elements. Clustering is achieved by fitting a Gaussian Mixture Model (GMM). See McCaw et al. (2019) for details, and the vignette for examples.

rmsb v0.0.1: Is a Bayesian companion to the rms package which provides Bayesian model fitting, post-fit estimation, and graphics, and implements Bayesian regression models whose fit objects can be processed by rms functions. Look here for more information.

RoBMA v1.0.4: Implements a framework for estimating ensembles of meta-analytic models (assuming either presence or absence of the effect, heterogeneity, and publication bias) and uses Bayesian model averaging to combine them. See Maier et al. (2020) for background and the vignettes: Fitting custom meta-analytic ensembles,
Reproducing BMA, and Common warnings and errors.

tTOlr v0.2: Implements likelihood ratio statistics for one and two sample t-tests. There are two vignettes: Likelihood Ratio and False Positive Risk and P-values – Uses, abuses, and alternatives.

Time Series

fable.prophet v0.1.0: Enables prophet models to be used in tidyworkflows created with fabletools. See the vignette for an introduction.

garma v0.9.3: Provides methods for estimating long memory-seasonal/cyclical Gegenbauer univariate time series processes. See Dissanayake et al. (2018) for background and the vignette for the details of model fitting.

gratis v0.2.0: Generates time series based on mixture autoregressive models. See Kang et al. (2020) for background and the vignette for an introduction to the package.

rhosa v0.1.0: Implements higher-order spectra or polyspectra analysis for time series. Brillinger & Irizarry (1998) and Lii & Helland (1981) for background and the vignette for examples.

Utilities

DataEditR v0.0.5: Implements an interactive editor to allow the interactive viewing, entering and editing of data in R. See the vignette for details.

equatiomatic v0.1.0: Simplifies writing LaReX formulas by providing a function that takes a fitted model object as its input and returns the corresponding LaTeX code for the model. There is an Introduction and a vignette on Tests and Coverage

starschemar v1.1.0: Provides functions to obtain star schema from flat tables. The vignette shows multiple examples.

Visualization

glow v0.10.1: Provides a framework for creating plots with glowing points. See the vignette for examples.

graph3d v0.1.0 Implements a wrapper for the JavaScript library vis-graph that enables users to create three dimensional interactive visualizations. Look here for an example.

jsTreeR v0.1.0: Provides functions to implement interactive trees for representing hierarchical data that can be included in Shiny apps and R markdown documents. Look here for examples.

KMunicate v0.1.0: Provides functions to produce Kaplan–Meier plots in the style recommended following the KMunicate study by Morris et al. (2019). See the vignette for examples.

rAmCharts4 v0.1.0: Provides functions to create JavaScript charts that can be included in Shiny apps and R Markdown documents, or viewed from the R console and RStudio viewer. Look here for examples.

tabularmaps v0.1.0: Provides functions for creating tabular maps, a visualization method for efficiently displaying data consisting of multiple elements by tiling them. When dealing with geospatial data, they corrects for differences in visibility between areas. Look here and at the vignette for examples.

_____='https://rviews.rstudio.com/2020/09/22/august-2020-top-40-new-cran-packages/';

var vglnk = {key: '949efb41171ac6ec1bf7f206d57e90b8'}; (function(d, t) { var s = d.createElement(t); s.type = 'text/javascript'; s.async = true; // s.defer = true; // s.src = '//cdn.viglink.com/api/vglnk.js'; s.src = 'https://www.r-bloggers.com/wp-content/uploads/2020/08/vglnk.js'; var r = d.getElementsByTagName(t)[0]; r.parentNode.insertBefore(s, r); }(document, 'script'));

The post August 2020: "Top 40" New CRAN Packages first appeared on R-bloggers.

### Exploring 30 years of local CT weather history with R

Mon, 09/21/2020 - 20:00

[social4i size="small" align="align-left"] --> [This article was first published on R on Redwall Analytics, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

30k FAQ & Details What do I have to do to access the free courses? Simply sign up for a free Dataquest account (or log in to your existing account) and you will have access to every course on our platform. You can start at the beginning of one of our learning paths, or dive in at any point you’d like. All courses will be free from Sept. 21 at 0:00 UTC to Sept. 28 at 23:59 UTC. Do I need a credit card to sign up? No! You can create a Dataquest account for free, and no credit card is required. Even after the free week has ended, you’ll be able to use your free account to access any of our dozens of free missions. Are there any limits to how much I can do during this week? Nope! Our platform is self-serve and allows you to work at your own pace. You can complete as much of the path as you’d like during the free week, and you will be issued certificates for any courses you complete. After the free week is over, users who do not have a paid subscription will no longer have access to paywalled parts of the platform, but your progress from the free week will be saved, and you will still have access to a free mission in each course. What’s in the data science career resources pack? The data science career resources pack is a downloadable ZIP file that contains PDF resources relating to getting your first job in data science, including a 30,000+ word ebook on the data science job application process, as well as additional career resources. How do I get the data science career resources pack? To unlock the pack, you’ll need to complete at least one mission during the free week. A “mission” is a lesson on Dataquest, each course is made up of several missions. You can complete any mission you’d like. You can confirm you’ve completed a mission on the platform if you see this green checkmark in on the dashboard screen next to a mission name, like this: Inside the mission itself, you can also click the sandwich menu icon in the bottom left to view your progress. If every screen in the mission has a green checkmark, as seen on the left in the image below, the mission is complete: This guide to using Dataquest is a good resource for answering questions about the platform and understanding the structure of screens, missions, and courses. When do I get the data science career resources pack? Completing a mission will not instantly unlock the pack! After free week is over, we’ll analyze our platform data. We will send a link to the email account associated with your Dataquest account that will contain a downloadable zip file sometime on or before October 9 How can I access the platform once Free Week is over? Once the free week is over, a Dataquest subscription will be required to access most of the content on our platform, although there are 60+ missions available for free. Based on a 2020 survey of more than 600 Dataquest learners, 91% of students rated a Dataquest subscription a good or great investment, and 97% said they recommended Dataquest for career advancement. Charlie is a student of data science, and also a content marketer at Dataquest. In his free time, he’s learning to mountain bike and making videos about it. The post Learn to Code Free — Our Interactive Courses Are ALL Free This Week! appeared first on Dataquest. var vglnk = {key: '949efb41171ac6ec1bf7f206d57e90b8'}; (function(d, t) { var s = d.createElement(t); s.type = 'text/javascript'; s.async = true; // s.defer = true; // s.src = '//cdn.viglink.com/api/vglnk.js'; s.src = 'https://www.r-bloggers.com/wp-content/uploads/2020/08/vglnk.js'; var r = d.getElementsByTagName(t)[0]; r.parentNode.insertBefore(s, r); }(document, 'script')); To leave a comment for the author, please follow the link and comment on their blog: r-promote-feed – Dataquest. R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job. Want to share your content on R-bloggers? click here if you have a blog, or here if you don't. The post Learn to Code Free — Our Interactive Courses Are ALL Free This Week! first appeared on R-bloggers. ### Double dispatch in R: S4-vs-vctrs Sun, 09/20/2020 - 20:00 [social4i size="small" align="align-left"] --> [This article was first published on krzjoa, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don't. Why do we may need double dispatch? In most cases, when writing R scripts or even creating R packages, it is enough to use standard functions or S3 methods. However, there is one important field that forces us to consider double dispatch question: arithmetic operators. Suppose we’d like to create a class, which fits the problem we’re currently working on. Let’s name such class beer. <span class="n">beer</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="k">function</span><span class="p">(</span><span class="n">type</span><span class="p">){</span><span class="w"> </span><span class="n">structure</span><span class="p">(</span><span class="nf">list</span><span class="p">(</span><span class="n">type</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">type</span><span class="p">),</span><span class="n">class</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"beer"</span><span class="p">)</span><span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="n">opener</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="k">function</span><span class="p">(){</span><span class="w"> </span><span class="n">structure</span><span class="p">(</span><span class="nf">list</span><span class="p">(),</span><span class="w"> </span><span class="n">class</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"opener"</span><span class="p">)</span><span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="n">pilsner</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">beer</span><span class="p">(</span><span class="s2">"pilnser"</span><span class="p">)</span><span class="w"> </span><span class="n">my_opener</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">opener</span><span class="p">()</span><span class="w"> </span> Then, we create an operator which defines some non-standard behaviour. • if we add an opener to the beer, we get an opened_beer. • adding a numeric x, we get a case of beers (which even contain a negative number of bees, i.e. our owe…) • if second argument is different than a or opener or numeric, we get… untouched beer Let’s demonstrate, how does it work: <span class="n">+.beer</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="k">function</span><span class="p">(</span><span class="n">a</span><span class="p">,</span><span class="w"> </span><span class="n">b</span><span class="p">){</span><span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">inherits</span><span class="p">(</span><span class="n">b</span><span class="p">,</span><span class="w"> </span><span class="s2">"opener"</span><span class="p">))</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="nf">return</span><span class="p">(</span><span class="n">structure</span><span class="p">(</span><span class="nf">list</span><span class="p">(</span><span class="w"> </span><span class="n">name</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">paste</span><span class="p">(</span><span class="s2">"opened"</span><span class="p">,</span><span class="w"> </span><span class="n">a</span><span class="o"></span><span class="n">name</span><span class="p">)</span><span class="w"> </span><span class="p">),</span><span class="w"> </span><span class="n">class</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"opened_beer"</span><span class="p">))</span><span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">inherits</span><span class="p">(</span><span class="n">b</span><span class="p">,</span><span class="w"> </span><span class="s2">"numeric"</span><span class="p">))</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="n">print</span><span class="p">(</span><span class="s2">"It's magic! You've got a case of beers!"</span><span class="p">)</span><span class="w"> </span><span class="nf">return</span><span class="p">(</span><span class="n">structure</span><span class="p">(</span><span class="nf">list</span><span class="p">(</span><span class="w"> </span><span class="n">n_beers</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">1</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">b</span><span class="w"> </span><span class="p">),</span><span class="w"> </span><span class="n">class</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"case_of_beers"</span><span class="p">))</span><span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="nf">return</span><span class="p">(</span><span class="n">a</span><span class="p">)</span><span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="p">}</span><span class="w"> </span> <span class="n">pilsner</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">my_opener</span><span class="w"> </span> ## $name ## [1] "opened " ## ## attr(,"class") ## [1] "opened_beer" <span class="n">pilsner</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="m">-0.1</span><span class="w"> </span> ## [1] "It's magic! You've got a case of beers!" ##$n_beers ## [1] 0.9 ## ## attr(,"class") ## [1] "case_of_beers"

Don’t you think, that such operations should be commutative?

<span class="n">my_opener</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">pilsner</span><span class="w"> </span> ## list() ## attr(,"class") ## [1] "opener"

What did happen here? This is an example of the way the R interpreter
handles arithmetic operator. It was described with details on Hiroaki
Yutani’s
blog
.
Briefly speaking, in this particular case R engine matched method to the
second argument (not to the first one), because there is no +.opener
S3 method. What about such trick:

<span class="n">+.opener</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="k">function</span><span class="p">(</span><span class="n">a</span><span class="p">,</span><span class="w"> </span><span class="n">b</span><span class="p">)</span><span class="w"> </span><span class="n">b</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">a</span><span class="w"> </span>

After that, the result is different:

<span class="n">my_opener</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">pilsner</span><span class="w"> </span> ## Warning: Incompatible methods ("+.opener", "+.beer") for "+" ## Error in my_opener + pilsner: non-numeric argument to binary operator

We crashed our function call. When both objects have the + method
defined and these methods are not the same, R is trying to resolve the
conflict by applying an internal +. It obviously cannot work. This
case could be easily solved using more ‘ifs’ in the +.beer beer
function body. But let’s face a different situation.

<span class="m">-0.1</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">pilsner</span><span class="w"> </span> ## [1] -0.1

What a mess! Simple S3 methods are definitely not the best solution when
we need the double dispatch.

S4 class: a classic approach

To civilize such code, we can use classic R approach, S4 methods. We’ll
start from S4 classes declaration.

<span class="n">.S4_beer</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">setClass</span><span class="p">(</span><span class="s2">"S4_beer"</span><span class="p">,</span><span class="w"> </span><span class="n">representation</span><span class="p">(</span><span class="n">type</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"character"</span><span class="p">))</span><span class="w"> </span><span class="n">.S4_opened_beer</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">setClass</span><span class="p">(</span><span class="s2">"S4_opened_beer"</span><span class="p">,</span><span class="w"> </span><span class="n">representation</span><span class="p">(</span><span class="n">type</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"character"</span><span class="p">))</span><span class="w"> </span><span class="n">.S4_opener</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">setClass</span><span class="p">(</span><span class="s2">"S4_opener"</span><span class="p">,</span><span class="w"> </span><span class="n">representation</span><span class="p">(</span><span class="n">ID</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"numeric"</span><span class="p">))</span><span class="w"> </span><span class="n">.S4_case_of_beers</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">setClass</span><span class="p">(</span><span class="s2">"S4_case_of_beers"</span><span class="p">,</span><span class="w"> </span><span class="n">representation</span><span class="p">(</span><span class="n">n_beers</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"numeric"</span><span class="p">))</span><span class="w"> </span>

Then, we can two otptions, how to handle + operators. I didn’t mention
about it in the previous example, but both S3 and S4 operators are
S3,
S4).

We can set a S4 method for a single operator and that looks as follows:

<span class="n">setMethod</span><span class="p">(</span><span class="s2">"+"</span><span class="p">,</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="n">e1</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"S4_beer"</span><span class="p">,</span><span class="w"> </span><span class="n">e2</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"S4_opener"</span><span class="p">),</span><span class="w"> </span><span class="k">function</span><span class="p">(</span><span class="n">e1</span><span class="p">,</span><span class="w"> </span><span class="n">e2</span><span class="p">){</span><span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">inherits</span><span class="p">(</span><span class="n">e2</span><span class="p">,</span><span class="w"> </span><span class="s2">"S4_opener"</span><span class="p">))</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="nf">return</span><span class="p">(</span><span class="n">.S4_opened_beer</span><span class="p">(</span><span class="n">type</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">paste</span><span class="p">(</span><span class="s2">"opened"</span><span class="p">,</span><span class="w"> </span><span class="n">e1</span><span class="o">@</span><span class="n">type</span><span class="p">)))</span><span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">inherits</span><span class="p">(</span><span class="n">e2</span><span class="p">,</span><span class="w"> </span><span class="s2">"numeric"</span><span class="p">))</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="n">print</span><span class="p">(</span><span class="s2">"It's magic! You've got a case of beers!"</span><span class="p">)</span><span class="w"> </span><span class="nf">return</span><span class="p">(</span><span class="n">.S4_case_of_beers</span><span class="p">(</span><span class="n">n_beers</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">1</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">e2</span><span class="p">))</span><span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="nf">return</span><span class="p">(</span><span class="n">e1</span><span class="p">)</span><span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="p">})</span><span class="w"> </span><span class="n">setMethod</span><span class="p">(</span><span class="s2">"+"</span><span class="p">,</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="n">e1</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"S4_opener"</span><span class="p">,</span><span class="w"> </span><span class="n">e2</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"S4_beer"</span><span class="p">),</span><span class="w"> </span><span class="k">function</span><span class="p">(</span><span class="n">e1</span><span class="p">,</span><span class="w"> </span><span class="n">e2</span><span class="p">)</span><span class="w"> </span><span class="n">e2</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">e1</span><span class="p">)</span><span class="w"> </span>

Alternatively, we can define a method for Arith geneneric and check,
what method is exactly called at the moment. I decided to use the second
approach, because it’s more similar to the way the double dispatch is
implemented in the vctrs library.

<span class="n">.S4_fun</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="k">function</span><span class="p">(</span><span class="n">e1</span><span class="p">,</span><span class="w"> </span><span class="n">e2</span><span class="p">){</span><span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">inherits</span><span class="p">(</span><span class="n">e2</span><span class="p">,</span><span class="w"> </span><span class="s2">"S4_opener"</span><span class="p">))</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="nf">return</span><span class="p">(</span><span class="n">.S4_opened_beer</span><span class="p">(</span><span class="n">type</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">paste</span><span class="p">(</span><span class="s2">"opened"</span><span class="p">,</span><span class="w"> </span><span class="n">e1</span><span class="o">@</span><span class="n">type</span><span class="p">)))</span><span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">inherits</span><span class="p">(</span><span class="n">e2</span><span class="p">,</span><span class="w"> </span><span class="s2">"numeric"</span><span class="p">))</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="n">print</span><span class="p">(</span><span class="s2">"It's magic! You've got a case of beers!"</span><span class="p">)</span><span class="w"> </span><span class="nf">return</span><span class="p">(</span><span class="n">.S4_case_of_beers</span><span class="p">(</span><span class="n">n_beers</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">1</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">e2</span><span class="p">))</span><span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="nf">return</span><span class="p">(</span><span class="n">e1</span><span class="p">)</span><span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="n">setMethod</span><span class="p">(</span><span class="s2">"Arith"</span><span class="p">,</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="n">e1</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"S4_beer"</span><span class="p">,</span><span class="w"> </span><span class="n">e2</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"S4_opener"</span><span class="p">),</span><span class="w"> </span><span class="k">function</span><span class="p">(</span><span class="n">e1</span><span class="p">,</span><span class="w"> </span><span class="n">e2</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="n">op</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">.Generic</span><span class="p">[[</span><span class="m">1</span><span class="p">]]</span><span class="w"> </span><span class="nf">switch</span><span class="p">(</span><span class="n">op</span><span class="p">,</span><span class="w"> </span><span class="n">+</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">.S4_fun</span><span class="p">(</span><span class="n">e1</span><span class="p">,</span><span class="w"> </span><span class="n">e2</span><span class="p">),</span><span class="w"> </span><span class="n">stop</span><span class="p">(</span><span class="s2">"undefined operation"</span><span class="p">)</span><span class="w"> </span><span class="p">)</span><span class="w"> </span><span class="p">})</span><span class="w"> </span><span class="n">setMethod</span><span class="p">(</span><span class="s2">"Arith"</span><span class="p">,</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="n">e1</span><span class="o">=</span><span class="s2">"S4_opener"</span><span class="p">,</span><span class="w"> </span><span class="n">e2</span><span class="o">=</span><span class="s2">"S4_beer"</span><span class="p">),</span><span class="w"> </span><span class="k">function</span><span class="p">(</span><span class="n">e1</span><span class="p">,</span><span class="w"> </span><span class="n">e2</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="n">op</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">.Generic</span><span class="p">[[</span><span class="m">1</span><span class="p">]]</span><span class="w"> </span><span class="nf">switch</span><span class="p">(</span><span class="n">op</span><span class="p">,</span><span class="w"> </span><span class="n">+</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">e2</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">e1</span><span class="p">,</span><span class="w"> </span><span class="n">stop</span><span class="p">(</span><span class="s2">"undefined operation"</span><span class="p">)</span><span class="w"> </span><span class="p">)</span><span class="w"> </span><span class="p">})</span><span class="w"> </span>

Let’s create our class instances and do a piece of math.

<span class="n">S4_pilsner</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">.S4_beer</span><span class="p">(</span><span class="n">type</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"Pilsner"</span><span class="p">)</span><span class="w"> </span><span class="n">S4_opener</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">.S4_opener</span><span class="p">(</span><span class="n">ID</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">1</span><span class="p">)</span><span class="w"> </span> <span class="n">S4_pilsner</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">S4_opener</span><span class="w"> </span> ## An object of class "S4_opened_beer" ## Slot "type": ## [1] "opened Pilsner" <span class="n">S4_opener</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">S4_pilsner</span><span class="w"> </span> ## An object of class "S4_opened_beer" ## Slot "type": ## [1] "opened Pilsner"

Declared methods are clear, and, the most important: they work
correctly.

vctrs library: a tidyverse approach

vctrs is an interesting library,
thought as a remedy for a couple of R disadvantages. It delivers, among
others, a custom double-dispatch system based on well-known S3
mechanism.

At the first step we declare class ‘constructors’.

<span class="n">library</span><span class="p">(</span><span class="n">vctrs</span><span class="p">)</span><span class="w"> </span><span class="n">.vec_beer</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="k">function</span><span class="p">(</span><span class="n">type</span><span class="p">){</span><span class="w"> </span><span class="n">new_vctr</span><span class="p">(</span><span class="n">.data</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">list</span><span class="p">(</span><span class="n">type</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">type</span><span class="p">),</span><span class="w"> </span><span class="n">class</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"vec_beer"</span><span class="p">)</span><span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="n">.vec_opened_beer</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="k">function</span><span class="p">(</span><span class="n">type</span><span class="p">){</span><span class="w"> </span><span class="n">new_vctr</span><span class="p">(</span><span class="n">.data</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">list</span><span class="p">(</span><span class="n">type</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">type</span><span class="p">),</span><span class="w"> </span><span class="n">class</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"vec_opened_beer"</span><span class="p">)</span><span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="n">.vec_case_of_beers</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="k">function</span><span class="p">(</span><span class="n">n_beers</span><span class="p">){</span><span class="w"> </span><span class="n">new_vctr</span><span class="p">(</span><span class="n">.data</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">list</span><span class="p">(</span><span class="n">n_beers</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">n_beers</span><span class="p">),</span><span class="w"> </span><span class="n">class</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"vec_case_of_beers"</span><span class="p">)</span><span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="n">.vec_opener</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="k">function</span><span class="p">(){</span><span class="w"> </span><span class="n">new_vctr</span><span class="p">(</span><span class="n">.data</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">list</span><span class="p">(),</span><span class="w"> </span><span class="n">class</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"vec_opener"</span><span class="p">)</span><span class="w"> </span><span class="p">}</span><span class="w"> </span>

Then, we create class instances.

<span class="n">vec_pilsner</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">.vec_beer</span><span class="p">(</span><span class="s2">"pilnser"</span><span class="p">)</span><span class="w"> </span><span class="n">vec_opener</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">.vec_opener</span><span class="p">()</span><span class="w"> </span><span class="n">print</span><span class="p">(</span><span class="nf">class</span><span class="p">(</span><span class="n">vec_pilsner</span><span class="p">))</span><span class="w"> </span> ## [1] "vec_beer" "vctrs_vctr" <span class="n">print</span><span class="p">(</span><span class="nf">class</span><span class="p">(</span><span class="n">vec_opener</span><span class="p">))</span><span class="w"> </span> ## [1] "vec_opener" "vctrs_vctr"

At the end, we write a double-dispatched methods in vctrs style. As
you can see,

<span class="n">.fun</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="k">function</span><span class="p">(</span><span class="n">a</span><span class="p">,</span><span class="w"> </span><span class="n">b</span><span class="p">){</span><span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">inherits</span><span class="p">(</span><span class="n">b</span><span class="p">,</span><span class="w"> </span><span class="s2">"vec_opener"</span><span class="p">))</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="nf">return</span><span class="p">(</span><span class="n">.vec_opened_beer</span><span class="p">(</span><span class="n">type</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">paste</span><span class="p">(</span><span class="s2">"opened"</span><span class="p">,</span><span class="w"> </span><span class="n">a</span><span class="o">$</span><span class="n">type</span><span class="p">)))</span><span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">inherits</span><span class="p">(</span><span class="n">b</span><span class="p">,</span><span class="w"> </span><span class="s2">"numeric"</span><span class="p">))</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="n">print</span><span class="p">(</span><span class="s2">"It's magic! You've got a case of beers!"</span><span class="p">)</span><span class="w"> </span><span class="nf">return</span><span class="p">(</span><span class="n">.vec_case_of_beers</span><span class="p">(</span><span class="n">n_beers</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">1</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">b</span><span class="p">))</span><span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="nf">return</span><span class="p">(</span><span class="n">a</span><span class="p">)</span><span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="n">vec_arith.vec_beer</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="k">function</span><span class="p">(</span><span class="n">op</span><span class="p">,</span><span class="w"> </span><span class="n">x</span><span class="p">,</span><span class="w"> </span><span class="n">y</span><span class="p">,</span><span class="w"> </span><span class="n">...</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="nf">UseMethod</span><span class="p">(</span><span class="s2">"vec_arith.vec_beer"</span><span class="p">,</span><span class="w"> </span><span class="n">y</span><span class="p">)</span><span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="n">vec_arith.vec_opener</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="k">function</span><span class="p">(</span><span class="n">op</span><span class="p">,</span><span class="w"> </span><span class="n">x</span><span class="p">,</span><span class="w"> </span><span class="n">y</span><span class="p">,</span><span class="w"> </span><span class="n">...</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="nf">UseMethod</span><span class="p">(</span><span class="s2">"vec_arith.vec_opener"</span><span class="p">,</span><span class="w"> </span><span class="n">y</span><span class="p">)</span><span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="n">vec_arith.vec_beer.vec_opener</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="k">function</span><span class="p">(</span><span class="n">op</span><span class="p">,</span><span class="w"> </span><span class="n">x</span><span class="p">,</span><span class="w"> </span><span class="n">y</span><span class="p">,</span><span class="w"> </span><span class="n">...</span><span class="p">){</span><span class="w"> </span><span class="nf">switch</span><span class="p">(</span><span class="n">op</span><span class="p">,</span><span class="w"> </span><span class="n">+</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">.fun</span><span class="p">(</span><span class="n">x</span><span class="p">,</span><span class="w"> </span><span class="n">y</span><span class="p">),</span><span class="w"> </span><span class="n">stop_incompatible_op</span><span class="p">(</span><span class="n">op</span><span class="p">,</span><span class="w"> </span><span class="n">x</span><span class="p">,</span><span class="w"> </span><span class="n">y</span><span class="p">)</span><span class="w"> </span><span class="p">)</span><span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="n">vec_arith.vec_opener.vec_beer</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="k">function</span><span class="p">(</span><span class="n">op</span><span class="p">,</span><span class="w"> </span><span class="n">x</span><span class="p">,</span><span class="w"> </span><span class="n">y</span><span class="p">,</span><span class="w"> </span><span class="n">...</span><span class="p">){</span><span class="w"> </span><span class="n">y</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">x</span><span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="n">vec_pilsner</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">vec_opener</span><span class="w"> </span> ## <vec_opened_beer[1]> ## type ## opened pilnser <span class="n">vec_opener</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">vec_pilsner</span><span class="w"> </span> ## <vec_opened_beer[1]> ## type ## opened pilnser It works properly, too. Benchmark I’ve created all the classes and methods above not only to demonstate, how to implement double dispatch in R. My main goal is to benchmark both approaches and check, which one has smaller overhead. The hardware I used for the test looks as follows: ##$vendor_id ## [1] "GenuineIntel" ## ## $model_name ## [1] "Intel(R) Core(TM) i3 CPU M 350 @ 2.27GHz" ## ##$no_of_cores ## [1] 4 ## 8.19 GB <span class="n">sessionInfo</span><span class="p">()</span><span class="w"> </span> ## R version 3.6.1 (2019-07-05) ## Platform: x86_64-pc-linux-gnu (64-bit) ## Running under: Ubuntu 18.04.2 LTS ## ## Matrix products: default ## BLAS: /usr/local/lib/R/lib/libRblas.so ## LAPACK: /usr/local/lib/R/lib/libRlapack.so ## ## locale: ## [1] LC_CTYPE=pl_PL.UTF-8 LC_NUMERIC=C ## [3] LC_TIME=pl_PL.UTF-8 LC_COLLATE=pl_PL.UTF-8 ## [5] LC_MONETARY=pl_PL.UTF-8 LC_MESSAGES=en_US.utf8 ## [7] LC_PAPER=pl_PL.UTF-8 LC_NAME=C ## [9] LC_ADDRESS=C LC_TELEPHONE=C ## [11] LC_MEASUREMENT=pl_PL.UTF-8 LC_IDENTIFICATION=C ## ## attached base packages: ## [1] stats graphics grDevices utils datasets methods base ## ## other attached packages: ## [1] vctrs_0.2.3 ## ## loaded via a namespace (and not attached): ## [1] Rcpp_1.0.3 benchmarkmeData_1.0.3 knitr_1.23 ## [4] magrittr_1.5 tidyselect_0.2.5 doParallel_1.0.15 ## [7] lattice_0.20-38 R6_2.4.0 rlang_0.4.2 ## [10] foreach_1.4.7 httr_1.4.1 stringr_1.4.0 ## [13] dplyr_0.8.3 tools_3.6.1 parallel_3.6.1 ## [16] grid_3.6.1 xfun_0.9 htmltools_0.3.6 ## [19] iterators_1.0.12 yaml_2.2.0 digest_0.6.25 ## [22] assertthat_0.2.1 tibble_2.1.3 benchmarkme_1.0.3 ## [25] crayon_1.3.4 Matrix_1.2-17 purrr_0.3.3 ## [28] codetools_0.2-16 glue_1.3.1 evaluate_0.14 ## [31] rmarkdown_1.14 stringi_1.4.3 pillar_1.4.2 ## [34] compiler_3.6.1 pkgconfig_2.0.2

It’s my good old notebook, which is not a beast.

<span class="n">library</span><span class="p">(</span><span class="n">microbenchmark</span><span class="p">)</span><span class="w"> </span><span class="n">library</span><span class="p">(</span><span class="n">ggplot2</span><span class="p">)</span><span class="w"> </span> Beer + opener <span class="n">bm1</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">microbenchmark</span><span class="p">(</span><span class="w"> </span><span class="n">s4</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">S4_pilsner</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">S4_opener</span><span class="p">,</span><span class="w"> </span><span class="n">s3_vec</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">vec_pilsner</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">vec_opener</span><span class="p">,</span><span class="w"> </span><span class="n">times</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">1000</span><span class="w"> </span><span class="p">)</span><span class="w"> </span> ## Unit: microseconds ## expr min lq mean median uq max neval ## s4 153.292 158.2120 178.40541 161.4225 165.6375 5506.681 1000 ## s3_vec 56.686 60.1265 69.52364 68.9240 70.8830 163.278 1000

Opener + beer <span class="n">bm2</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">microbenchmark</span><span class="p">(</span><span class="w"> </span><span class="n">s4</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">S4_opener</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">S4_pilsner</span><span class="p">,</span><span class="w"> </span><span class="n">s3_vec</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">vec_opener</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">vec_pilsner</span><span class="p">,</span><span class="w"> </span><span class="n">times</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">1000</span><span class="w"> </span><span class="p">)</span><span class="w"> </span> ## Unit: microseconds ## expr min lq mean median uq max neval ## s4 159.512 164.6735 191.74781 168.9655 176.3165 6068.477 1000 ## s3_vec 71.110 78.5835 96.22535 86.6720 89.4015 4796.377 1000

Bonus: opener + beer vs addtion of numerics <span class="n">bm3</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">microbenchmark</span><span class="p">(</span><span class="w"> </span><span class="n">simple_R</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">1</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="m">2</span><span class="p">,</span><span class="w"> </span><span class="n">s4</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">S4_opener</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">S4_pilsner</span><span class="p">,</span><span class="w"> </span><span class="n">s3_vec</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">vec_opener</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">vec_pilsner</span><span class="p">,</span><span class="w"> </span><span class="n">times</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">1000</span><span class="w"> </span><span class="p">)</span><span class="w"> </span> ## Unit: nanoseconds ## expr min lq mean median uq max neval ## simple_R 130 344.0 697.49 744.5 857 2862 1000 ## s4 158769 164522.5 189297.35 169270.5 198120 375648 1000 ## s3_vec 74775 78395.5 94786.28 87192.5 94085 258129 1000

Conclusions

It seems that vctrs-based performs better than traditional S4
methods
. Obviously, I checked only one operation and probably some
edge cases may exists. However, I think that it shows us some direction,
what execution time we can expect.

Further sources

If you are interesting, how to implement double-dispatched operators in
S4, I encourage you to get familiar with code of the following R
libraries:

If you are looking for some examples of vctrs, I recommend you to
learn the source code of:

var vglnk = {key: '949efb41171ac6ec1bf7f206d57e90b8'}; (function(d, t) { var s = d.createElement(t); s.type = 'text/javascript'; s.async = true; // s.defer = true; // s.src = '//cdn.viglink.com/api/vglnk.js'; s.src = 'https://www.r-bloggers.com/wp-content/uploads/2020/08/vglnk.js'; var r = d.getElementsByTagName(t)[0]; r.parentNode.insertBefore(s, r); }(document, 'script'));

The post Double dispatch in R: S4-vs-vctrs first appeared on R-bloggers.

### Running an R Script on a Schedule: Heroku

Sun, 09/20/2020 - 20:00

[social4i size="small" align="align-left"] --> [This article was first published on Category R on Roel's R-tefacts, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

In this tutorial I have an R script that runs every day on heroku. It creates a curve in ggplot2 and posts that picture to twitter.

The use case is this: You have a script and it needs to run on a schedule (for instance every day).

In 2018 I wrote a small post how to run an R script on heroku.
The amazing thing is that the bot I created back then is still running! But I recently got a question about the scheduling, and because I did not really document it that well I will do a small update here.

Other ways to schedule a script

I will create a new post for many of the other ways on which you can run an R script on schedule. But in this case I will run the script on heroku. Heroku is useful if you have one script you want to run, and not too often (every hour/ every day). If you have many scripts, long running scripts or you want more precise time control, heroku is not the best solution. Also there are quite some manual steps, this is not really suited for a complete automatic setup.

Heroku details

Heroku does not have dedicated R runners but you can install an R runtime created by other people. In heroku they are called buildpacks. I’m using this one: https://github.com/virtualstaticvoid/heroku-buildpack-r

On a high level this is what is going to happen:

(We want the code to run on computer in the cloud) You save your script locally in a git repository You push everything to heroku (a cloud provider, think a laptop in the sky) # installation heroku installs R and the relevant packages and the script heroku saves this state and stops # running something you can give heroku a command and it will start up and run the script this starting up can be done on a timer

I first explain what you need, what my rscript does, and how to deal with credentials. If you are not interested go immediately to steps.

What you need:
• no fear for using the command line (know where it is and how to navigate is enough)
• an heroku account (you will need a creditcard, but once a day running is probably free)
• install the heroku CLI (command line interface)
• a folder with a script that does what you want to do
• (not necessary, but very useful) renv set up for this project
Example of a script

I have an R script that:

• creates a u-shape curve dataset
• adds random names to the x and y axes
• creates ggplot2 image
• posts the tweet as a twitter account

With this as result:

Of course you could create something that is actually useful, like downloading data, cleaning it and pushing it into a database. But this example is relatively small and you can actually see the results online.

Small diversion: credentials/ secrets

For many applications you need credentials and you don’t want to put the
credentials in the script, if you share the script with someone, they also have the credentials. If you put it on github, the world has your secrets (I just did this).

So how can you do it? R can read environmental variables
and in heroku you can input the environmental variables that will
be passed to the runner when it runs (there are better, more professional tools to do the same thing but this is good enough for me). So you create an environmental variable called apikey with a value like aVerY5eCretKEy. In your script you use Sys.getenv("apikey") and the script will retrieve the apikey: aVerY5eCretKEy and use that.

• Create a .Renviron file in your local project
• Now this file with secrets will be ignored by git and you
can never accidentely add it to a repo.
• the .Renviron file is a simple text file where you can add ‘secrets’ like: apikey="aVerY5eCretKEy" on a new line.

How do you add them to heroku?
I went into the heroku website of my project and manually
set the config vars (heroku’s name for environmental variables) but it is also possible to set them using heroku config:set GITHUB_USERNAME=joesmith in your project folder.

Check if the env vars are correctly set by running heroku config

Steps

So what do you need to make this work?

Steps in order

Check if your script runs on your computer (Set up renv) on the cmdline setup an heroku project add buildpack git commit all the files you need push to heroku testrun script on heroku add a scheduler

Steps with explanation
• run your R script locally to make sure it works source("script.R")
• (optional, but recommended) check if you have set up renv for this project. renv::status(). When you are satisfied with the script, use renv::snapshot() to fix the versions of your required packages. This creates an ‘renv.lock’ file that contains the package versions you used.
• go to the project folder on your command line
• Setup a heroku project: Do either:
heroku create --buildpack https://github.com/virtualstaticvoid/heroku-buildpack-r.git in the folder (this creates a new project and adds the buildpack)

or do heroku create first and add the buildpack with:
heroku buildpacks:set https://github.com/virtualstaticvoid/heroku-buildpack-r.git

In this previous step you get a name for your project for instance powerful-dusk-49558

• make sure you connect your repo to heroku (this didn’t happen automatically for me)
heroku git:remote -a powerful-dusk-49558

you now have a remote called ‘heroku’ (git remote -v shows this)

• commit all the necessary things:

renv/activate.R renv.lock script.R

You need the renv/activate.R script from renv so that the buildpack recognizes this as a renv-equiped R-project. The buildpack also works with a init.R file if you don’t want to use renv and manually write out which packages to install.

• push to heroku git push heroku master

The terminal shows all the installs of the buildback

remote: -----> R (renv) app detected remote: -----> Installing R remote: Version 4.0.0 will be installed. remote: -----> Downloading buildpack archives from AWS S3 ..etc...

This will take a while because it needs to install R and all the packages. Subsequent pushes are faster because of caching but will still take several minutes.

• After this is finished, test if it works on heroku with heroku run Rscript script.R

If it was successful you can add a scheduler

Setting up this scheduler is still manual work.

Go to your heroku project in the browser to set the scheduler or use
heroku addons:open scheduler to let your browser move to the correct window.

I first set the frequency to hourly to see if the tweet bot works hourly.

I set the job to Rscript script.R

There are no logs so you better make sure it works when you run heroku run Rscript script.R

If it works via run, it should also work via the scheduler.

I set the schedule to once a day.

Conclusion

And now it runs every day. However, there is in this free plan no
logs and no fine grained control. So if it fails, you wouldn’t know.

References Reproducibility At the moment of creation (when I knitted this document ) this was the state of my machine: **click here to expand** sessioninfo::session_info()

─ Session info ─────────────────────────────────────────────────────────────── setting value version R version 4.0.2 (2020-06-22) os macOS Catalina 10.15.6 system x86_64, darwin17.0 ui X11 language (EN) collate en_US.UTF-8 ctype en_US.UTF-8 tz Europe/Amsterdam date 2020-09-21 ─ Packages ─────────────────────────────────────────────────────────────────── package * version date lib source assertthat 0.2.1 2019-03-21 [1] CRAN (R 4.0.0) cli 2.0.2 2020-02-28 [1] CRAN (R 4.0.0) crayon 1.3.4 2017-09-16 [1] CRAN (R 4.0.0) digest 0.6.25 2020-02-23 [1] CRAN (R 4.0.0) evaluate 0.14 2019-05-28 [1] CRAN (R 4.0.0) fansi 0.4.1 2020-01-08 [1] CRAN (R 4.0.0) glue 1.4.1 2020-05-13 [1] CRAN (R 4.0.1) htmltools 0.5.0 2020-06-16 [1] CRAN (R 4.0.1) knitr 1.29 2020-06-23 [1] CRAN (R 4.0.1) magrittr 1.5 2014-11-22 [1] CRAN (R 4.0.0) rlang 0.4.7 2020-07-09 [1] CRAN (R 4.0.2) rmarkdown 2.3 2020-06-18 [1] CRAN (R 4.0.1) sessioninfo 1.1.1 2018-11-05 [1] CRAN (R 4.0.1) stringi 1.4.6 2020-02-17 [1] CRAN (R 4.0.0) stringr 1.4.0 2019-02-10 [1] CRAN (R 4.0.0) withr 2.2.0 2020-04-20 [1] CRAN (R 4.0.2) xfun 0.15 2020-06-21 [1] CRAN (R 4.0.2) yaml 2.2.1 2020-02-01 [1] CRAN (R 4.0.0) [1] /Library/Frameworks/R.framework/Versions/4.0/Resources/library var vglnk = {key: '949efb41171ac6ec1bf7f206d57e90b8'}; (function(d, t) { var s = d.createElement(t); s.type = 'text/javascript'; s.async = true; // s.defer = true; // s.src = '//cdn.viglink.com/api/vglnk.js'; s.src = 'https://www.r-bloggers.com/wp-content/uploads/2020/08/vglnk.js'; var r = d.getElementsByTagName(t)[0]; r.parentNode.insertBefore(s, r); }(document, 'script'));

The post Running an R Script on a Schedule: Heroku first appeared on R-bloggers.

### Multi-Armed Bandit with Thompson Sampling

Sun, 09/20/2020 - 14:03

[social4i size="small" align="align-left"] --> [This article was first published on R – Predictive Hacks, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Thompson Sampling is an algorithm for decision problems where actions are taken in sequence balancing between exploitation which maximizes immediate performance and exploration which accumulates new information that may improve future performance. There is always a trade-off between exploration and exploitation in all Multi-armed bandit problems.

Currently, Thompson Sampling has increased its popularity since is widely used in on-line campaigns like Facebook, Youtube and Web Campaigns where many variants are served at the beginning, and as time passes, the algorithm gives more weight to the strong variants.

Conjugate Prior

In Bayesian probability theory, if the posterior distributions p(θ | x) are in the same probability distribution family as the prior probability distribution p(θ), the prior and posterior are then called conjugate distributions, and the prior is called a conjugate prior for the likelihood function p(x | θ)

Let us focus on the binomial case since in digital marketing we care mostly about CTR and Conversion Rates which follow binomial distribution. According to Bayesian Theory, the conjugate prior distribution of a Binomial Distribution is Beta Distribution with Posterior hyperparameters α and β which are the successes and the failures respectively.

https://en.wikipedia.org/wiki/Conjugate_prior

Notice that The exact interpretation of the parameters of a beta distribution in terms of the number of successes and failures depends on what function is used to extract a point estimate from the distribution. The mean of a beta distribution is $$\displaystyle {\frac {\alpha }{\alpha +\beta }}$$,}{\frac {\alpha }{\alpha +\beta }}, which corresponds to {\displaystyle \alpha }\alpha successes and {\displaystyle \beta }\beta failures, while the mode is {\displaystyle {\frac {\alpha -1}{\alpha +\beta -2}},}{\frac {\alpha -1}{\alpha +\beta -2}}, which corresponds to {\displaystyle \alpha -1}\alpha -1 successes and {\displaystyle \beta -1}\beta -1 failures. Bayesians generally prefer to use the posterior mean rather than the posterior mode as a point estimate, justified by a quadratic loss function, and the use of {\displaystyle \alpha }\alpha and {\displaystyle \beta }\beta is more convenient mathematically, while the use of {\displaystyle \alpha -1}\alpha -1 and {\displaystyle \beta -1}\beta -1 has the advantage that a uniform {\displaystyle {\rm {Beta}}(1,1)}{\rm {Beta}}(1,1) prior corresponds to 0 successes and 0 failures. The same issues apply to the Dirichlet distribution

Thompson Sampling Algorithm

In practice, we want to maximize the expected number of rewards (i.e. clicks, conversions) given the actions (i.e. variants) and the current context. Conceptually, this means that at the beginning we serve the variants randomly and then we re-adjust our strategy (i.e. the distribution of the variants) based on the results.

The question is how do we get the weights for every state and the answer is with Monte Carlo Simulation. We assume the variants follow the Binomial distribution and according to Bayesian theory, their Conjugate prior distribution follows the Beta distribution with hyperparameters α = responses+1 and β = no responses + 1.

Let’s give an example of three variants:

• Variant 1: N=1000, responses=100 (i.e RR=10%). The conjugate prior will be Beta(a=101, b=901)
• Variant 2: N=1000, responses=110 (i.e RR=11%). The conjugate prior will be Beta(a=111, b=891)
• Variant 3: N=100, responses=10 (i.e RR=10%). The conjugate prior will be Beta(a=11, b=91)

Every time we should serve the variant with the highest RR. This is done by Monte Carlo Simulation. In the example above, we sample 5000 observations of each beta distribution by picking each time the variant with the highest value. The weights we got were:

• 14% Variant 1
• 45% Variant 2
• 41% Variant 3.

Once we adjust the weights and we
get new results, we follow the same approach again.

Thompson Sampling in R

It is quite easy to apply the Thompson Sampling in R. We will work with the three variants of our example above. Notice that the variant 3 has fewer impressions than the other two that is why is less steep as a distribution.

# vector of impressions per variant b_Sent<-c(1000, 1000, 100) # vector of responses per variant b_Reward<-c(100, 110, 10) msgs<-length(b_Sent) # number of simulations N<-5000 # simulation of Beta distributions (success+1, failures+1) B<-matrix(rbeta(N*msgs, b_Reward+1, (b_Sent-b_Reward)+1),N, byrow = TRUE) # Take the percentage where each variant # was observed with the highest rate rate P<-table(factor(max.col(B), levels=1:ncol(B)))/dim(B)[1] P

and we get:

> P 1 2 3 0.1368 0.4496 0.4136

Notice that since we are dealing with a Monte Carlo Simulation, if you do not set a random seed you will get slightly different results each time.

Thompson Sampling in Action

Above we showed what would be the weights the suggested weights based on the observed results of the three variants. Let’s provide an example where we are dealing with 4 Variants where we know their ground truth probability and let’s see how the Thompson Sampling will adjust the weights in every step.

Let’s assume that the ground truth conversion rates of the 4 variants are:

• Variant 1: 10%
• Variant 2: 11%
• Variant 3: 12%
• Variant 4: 13%

Our strategy is to update the weights in every 1000 impressions and let’s assume that the total sample size is 10000 (i.e. we will update the weights ten times).

output<-{} b_Probs<-c(0.10,0.11,0.12, 0.13) b_Sent<-rep(0, length(b_Probs)) b_Reward<-rep(0, length(b_Probs)) batch_size<-1000 N<-10000 steps<-floor(N/batch_size) msgs<-length(b_Probs) for (i in 1:steps) { B<-matrix(rbeta(1000*msgs, b_Reward+1, (b_Sent-b_Reward)+1),1000, byrow = TRUE) P<-table(factor(max.col(B), levels=1:ncol(B)))/dim(B)[1] # tmp are the weights for each time step tmp<-round(P*batch_size,0) # Update the Rewards b_Reward<-b_Reward+rbinom(rep(1,msgs), size=tmp, prob = b_Probs) #Update the Sent b_Sent<-b_Sent+tmp #print(P) output<-rbind(output, t(matrix(P))) } # the weights of every step output

And we get:

> output [,1] [,2] [,3] [,4] [1,] 0.239 0.259 0.245 0.257 [2,] 0.002 0.349 0.609 0.040 [3,] 0.003 0.329 0.533 0.135 [4,] 0.001 0.078 0.386 0.535 [5,] 0.001 0.016 0.065 0.918 [6,] 0.002 0.016 0.054 0.928 [7,] 0.002 0.095 0.154 0.749 [8,] 0.003 0.087 0.067 0.843 [9,] 0.001 0.059 0.069 0.871 [10,] 0.006 0.058 0.116 0.820

As we can see, even from the 5th step the algorithm started to assign more weight to the variant 4 and almost nothing to the variant 1 and variant 2.

Discussion

There exist other Multi-Armed Bandit algorithms like the ε-greedy, the greedy the UCB etc. There are also contextual multi-armed bandits. In practice, there are some issues with the multi-armed bandits. Let’s mention some:

• The CTR/CR can change across days as well as the preference of the variants(seasonality effect, day of week effect, fatigue of the campaign etc)
• By changing the weights of the variants it affects the CTR/CR mainly because:
• Due to the cookies, it affects the distribution of the new and the repeated users
• The results are skewed due to the late conversions
var vglnk = {key: '949efb41171ac6ec1bf7f206d57e90b8'}; (function(d, t) { var s = d.createElement(t); s.type = 'text/javascript'; s.async = true; // s.defer = true; // s.src = '//cdn.viglink.com/api/vglnk.js'; s.src = 'https://www.r-bloggers.com/wp-content/uploads/2020/08/vglnk.js'; var r = d.getElementsByTagName(t)[0]; r.parentNode.insertBefore(s, r); }(document, 'script'));

The post Multi-Armed Bandit with Thompson Sampling first appeared on R-bloggers.

### The Raspberry Pi 4B as a shiny server

Sat, 09/19/2020 - 20:00

[social4i size="small" align="align-left"] --> [This article was first published on Econometrics and Free Software, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

This blog post will not have any code, but will document how I went from hosting apps
on shinyapps.io to hosting shiny apps on my own server, which is a Raspberry Pi 4B with 8 gigs of ram.
First of all, why hosting apps on a Raspberry Pi? And why not continue on shinyapps.io?
Or why not get one of hose nifty droplets on DigitalOcean? Well for two reasons; one is that I wanted
to have full control of the server, and learn some basic web dev/web engineering skills that I lacked. These services
simplify the process of deploying and hosting a lot, which of course is a good thing if your only
goal is to deploy apps. But I wanted to learn how to do it myself from scratch for some time.
True, with a DigitalOcean droplet, I could have learned quite a lot about the whole process as well,
but there’s a second problem; the minimum amount of processing power that the droplet needed to run
shiny came at 10€ a month. Not a fortune, but already quite expensive for me, since I just wanted
to learn some stuff on my free time. Which is why I got a Raspberry Pi 4B with 8 gigs of ram. It’s less
than 100€, and now that I have it, I can do whatever I want whenever I want to. If I don’t touch it
for several months, no harm done. And if I get tired of it, I’ll make a retro console out of it and
play some old schools games. It’s a win-win situation if you ask me.

So first, you should get a Raspberry Pi. Those are quite easy to find online, and there’s many
tutorials available on how to install Ubuntu (or any other Linux distro) on it, so I won’t bother
with that. I also won’t explain to you how to ssh into your Raspberry Pi, again, there’s many tutorials
online. More importantly, is how to get Shiny on it? There’s two solutions; you either install
it from source, or you use Docker. I chose to use Docker, but maybe not in the way you’d expect;
there’s a lot of talk online about dockerizing apps, complete with all their dependencies and
environment. The advantage is that you’re guaranteed that deployment with be very smooth. But the
big disadvantage is that these dockerized apps are huge, around 1GB, or sometimes more. It is true that disk space is
quite cheap nowadays, but still… so I prefer to run a Shiny server from Docker, and then run the
apps out of this server. My apps are thus very small, and it’s only the Shiny server that is huge.
I found a Github repository from user havlev that explains how to do it here.
I have followed this guide, and created my own docker container, which is based on havlev’s
one. I added some dependencies (to the base Debian distro included, as well as some more R packages).

If you’re in a hurry, and want to use my Docker image, you can simply type the following on your
Raspberry pi:

mkdir shiny-server cd shiny-server mkdir apps mkdir conf mkdir logs docker run -d -p 3838:3838 -v shiny-apps:/srv/shiny-server/ -v shiny-logs:/var/log/ -v shiny-conf:/etc/shiny-server/ --name rpi-shiny-server brodriguesco/shiny_1_5:firstcommit

The first 5 commands will create some folders that we’ll need later on, while the last one will pull
my Docker container, which is based on havlev’s one, launch the server and it’ll start listening to
port 3838.

I made an app (another blog post, focusing on this app, will follow soon), hosted on my Raspberry Pi
that you can find here. I’ll also give you
some pointers on how you can achieve that.

But let’s start from the beginning.

Adding dependencies to a Docker container

So let’s suppose that you’re me a few weeks ago, and that you find and follow havlev’s guide here.
Getting the docker running is quite easy, you just need to set up Docker, and then find the line in the
tutorial that starts with docker run…. You’ll get Shiny running with its hello world app. Now,
how can you add more packages, either to the base Debian image, or R packages? For this part, I
followed this guide.
The idea is to “log in” to the console of the base Debian distro that is running from the container.
First, find the ID of the container by typing the following command in the terminal:

docker ps

You should see something like this:

ubuntu@ubuntu:~$docker ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 69420blazeit brodriguesco/shiny_1_5:firstcommit "/etc/shiny-server/i…" About a minute ago Up About a minute 0.0.0.0:3838->3838/tcp rpi-shiny-server now with the ID in hand, you can start any command line program from your Docker container, for instance bash: docker exec -it 69420blazeit bash You’ll be “logged in” as root: root@69420blazeit:/# and from there, you can install Debian packages. The following two packages are necessary to install many R packages from source, so I recommend you install them: root@69420blazeit:/# apt-get install libssl-dev libxml2-dev Once these Debian packages are installed, you can start R by simply typing R in the same console, and install whatever packages your Shiny apps will need. In my case, I installed {golem} and several others, but this will be the subject of another blog post. We’re almost done with that; we now need to save the changes because if you restart the container, you’ll lose all these changes. To save these changes, let’s run the following command, but in a new terminal on your Raspberry Pi (on the “local” Ubuntu, not the Debian running in the container): ubuntu@ubuntu:~$ docker commit -m "added some dependencies" 69420blazeit shiny_with_deps

So now you could run this container with the command from above, by replacing the adequate parts:

docker run -d -p 3838:3838 -v shiny-apps:/srv/shiny-server/ -v shiny-logs:/var/log/ -v shiny-conf:/etc/shiny-server/ --name rpi-shiny-server shiny_with_depsshiny_with_deps

Ok so now that the server is running, you can you deploy apps on it? Remember the folders that we
created at the beginning of the blog post (or that you created if you followed havlev’s guide)?
This is where you’ll drop your apps, the usual way. You create a folder there, and simply put the
ui.R and server.R files in here, and that it. These folders can be found in your HOME directory, and they are accessible to your docker container as well. Once you dropped one or two apps, you’ll be able to access them on a link similar as this one: http://192.168.178.55:3838/hello/ where 192.168.178.55 is the local IP address of the Raspberry Pi, 3838 is the port the server is listening to, and /hello/ is the name of the subfolder contained in the ~/shiny-server/apps folder that you created before. What is left doing is making your Raspberry Pi a proper server that can be accessed from the internet. For this, you’ll need to ask your ISP for a dynamic IP address. Generally, you’ll have to pay some money for it; in my case, I’m paying 2€ a month. This address can then be used to access your Raspberry Pi from the internet. The problem, is that being dynamic, the address changes every time you restart your server. To solve this issue, you can use a free dynamic DNS. I use duckdns. This will allow you to have domain that you can share with the world. What’s nice is that if you follow their guide the redirection to the dynamic IP address will happen seamlessly every time it changes, so no need to think about it and do it manually. Finally, you’ll also have to open up port 3838 on your router. The procedure changes from router to router, but you should be able to find the instructions for your router quite easily. If not, you should also be able to get help from your ISP. The end result is that you’ll have your own Shiny server running off a Raspberry Pi, and accessible over the internet! You’ll be able to deploy as many apps as you want, but of course, don’t forget that you’re running all this on a Raspberry Pi. While these machines have become quite powerful over the years, they won’t be powerful enough if you’re running some heavy duty apps with hundreds of concurrent users. In my next blog post, I’ll walk you through the development of a Shiny app using the {golem} package, which you can find here. Hope you enjoyed! If you found this blog post useful, you might want to follow me on twitter for blog post updates and buy me an espresso or paypal.me, or buy my ebook on Leanpub. .bmc-button img{width: 27px !important;margin-bottom: 1px !important;box-shadow: none !important;border: none !important;vertical-align: middle !important;}.bmc-button{line-height: 36px !important;height:37px !important;text-decoration: none !important;display:inline-flex !important;color:#ffffff !important;background-color:#272b30 !important;border-radius: 3px !important;border: 1px solid transparent !important;padding: 1px 9px !important;font-size: 22px !important;letter-spacing:0.6px !important;box-shadow: 0px 1px 2px rgba(190, 190, 190, 0.5) !important;-webkit-box-shadow: 0px 1px 2px 2px rgba(190, 190, 190, 0.5) !important;margin: 0 auto !important;font-family:'Cookie', cursive !important;-webkit-box-sizing: border-box !important;box-sizing: border-box !important;-o-transition: 0.3s all linear !important;-webkit-transition: 0.3s all linear !important;-moz-transition: 0.3s all linear !important;-ms-transition: 0.3s all linear !important;transition: 0.3s all linear !important;}.bmc-button:hover, .bmc-button:active, .bmc-button:focus {-webkit-box-shadow: 0px 1px 2px 2px rgba(190, 190, 190, 0.5) !important;text-decoration: none !important;box-shadow: 0px 1px 2px 2px rgba(190, 190, 190, 0.5) !important;opacity: 0.85 !important;color:#82518c !important;} var vglnk = {key: '949efb41171ac6ec1bf7f206d57e90b8'}; (function(d, t) { var s = d.createElement(t); s.type = 'text/javascript'; s.async = true; // s.defer = true; // s.src = '//cdn.viglink.com/api/vglnk.js'; s.src = 'https://www.r-bloggers.com/wp-content/uploads/2020/08/vglnk.js'; var r = d.getElementsByTagName(t)[0]; r.parentNode.insertBefore(s, r); }(document, 'script')); To leave a comment for the author, please follow the link and comment on their blog: Econometrics and Free Software. R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job. Want to share your content on R-bloggers? click here if you have a blog, or here if you don't. The post The Raspberry Pi 4B as a shiny server first appeared on R-bloggers. ### Moving on as Head of Solutions and AI at Draper and Dash Fri, 09/18/2020 - 09:52 [social4i size="small" align="align-left"] --> [This article was first published on R Blogs – Hutsons-hacks, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don't. Well, what can I say? It has been a blast over the past year and a half. In my time at the company I have managed to work on a number of fascinating projects, here is a list of just some of them: 1. Data Science Platform – this has been the culmination of lots of machine learning projects, undertaken separately in R and Python, and has allowed for our Healthcare Data Science Platform to be developed. This, alongside skilled data scientists, data engineers and front end developers, we think we have developed something really special 2. Clinical Outcome Review System (CORS) – we have built a lot into this product and it has allowed the web tool to be co created and aligned with our core customers. The CORS tool is a web framework that enables the centralisation of ‘learning from deaths’ in one, easy to use, web tool. 3. Command Centre – led the design and development of the underpinning machine learning and forecasting models embedded into the solution. Furthermore, working alongside key trusts to deploy the solution and tailor to the trusts needs. For information about this product see: https://draperanddash.com/solutions-2/command-centre/. 4. Readmission avoidance tool, stranded patient predictor, ED Length of Stay estimator, waiting list cohorter, building an effective forecasting solution, alongside many other projects – primarily developing the forecasting and predictive functionality, alongside our key data scientist Alfonso. These have not been created as Qlik and other BI dashboards which smoothly integrate and augment the machine learning with exploratory analysis functionality in the various business intelligence solutions. 5. Data Assurance Programme – working with a number of trusts to establish our data assurance framework and build out the discovery process that heightens and improves the visibility of data issues across the organisational sphere. 6. AI and ML innovation lead – established the wider AI/ML meetup group and managed the preexisting trustwide AI and ML group. This involved sending regular updates to the group and making sure they were sighted of coming D&D products, tools and algorithms. 7. Developed the Machine Learning and Data Science Training led on the rollout of upskilling analysts in data science and machine learning techniques, mainly using the R programming language as a vehicle for the demonstration of the ‘art of the possible’. 8. Cancer analytics – led the statistical analysis of a number of key cancer measures to establish if the change in the group was of significant noteworthiness. 9. COVID-19 – SIR and SEIR model development. These led to the correct estimation of a number of epidemic curves and peaks. Additionally, I created a Shiny web tool to allow for the estimation of peak demand. This was a very fun project to be working on, in a distressing time. These are just a few of the projects I have been involved with and I am hugely proud of the time at the company. I look forward to my next challenge back in the NHS as Head of Advanced Analytics for a CSU I won’t name… What will I miss about the role? I will miss a number of things in the role, but if I had to rank order them, as a statistician you know how much we like to quantify, they would be: 1. The people – the team at D&D are great and I have made some really good friends. 2. The challenge – every day is different, from coding a Shiny app, to working in R to do a Machine Learning problem, to working in Python to work on NLP and Deep Learning, to working as a consultant in a consulting role, to working with clients, to upskilling NHS analysts, to working on a number of different domains. 3. The learning curve – I like to continuously develop, and D&D affords that opportunity. Every day is a school day, as my professor used to say. 4. The team spirit – we are a team and that is how we deliver all our projects. 5. The coding – I love coding and I have time to do this on a daily basis – from R to Python for data science, SQL for extracting information, among others. The only thing I won’t miss is the travel, but you can’t have it all… Does this sound like your bag? If you want to become the new me, then you are in luck, as D&D are looking for a replacement: Click on the image above to be routed to the Draper and Dash main website. A special thanks to all the guys at D&D I would like to say that I have really enjoyed my time with the team and I will miss each and every one of the team. Team D&D You guys have been great and it has been emotional. var vglnk = {key: '949efb41171ac6ec1bf7f206d57e90b8'}; (function(d, t) { var s = d.createElement(t); s.type = 'text/javascript'; s.async = true; // s.defer = true; // s.src = '//cdn.viglink.com/api/vglnk.js'; s.src = 'https://www.r-bloggers.com/wp-content/uploads/2020/08/vglnk.js'; var r = d.getElementsByTagName(t)[0]; r.parentNode.insertBefore(s, r); }(document, 'script')); To leave a comment for the author, please follow the link and comment on their blog: R Blogs – Hutsons-hacks. R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job. Want to share your content on R-bloggers? click here if you have a blog, or here if you don't. The post Moving on as Head of Solutions and AI at Draper and Dash first appeared on R-bloggers. ### Permutations in R Fri, 09/18/2020 - 07:58 [social4i size="small" align="align-left"] --> [This article was first published on R – Predictive Hacks, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don't. During the interview process for Data Science positions, it is likely to be asked to calculate Combinations or Permutations. Today we will provide an example of how we can solve numerically permutation problems in R. Find all Permutations of the word baboon Mathematically we can approach this question as follows: $$P=\frac{n!}{n_1! n_2! n_3!…n_k!}$$ Where: • $$n$$ is the total number of object, i.e. letters in our case which is 6 • $$n_1$$ is the number of objects of type 1, for example, the number of b which is 2 • $$n_2$$ is the number of objects of type 2, for example, the number of a which is 1 • $$n_3$$ is the number of objects of type 3, for example, the number of o which is 2 • $$n_4$$ is the number of objects of type 4, for example, the number of n which is 1 Hence the number of permutations is $$P=\frac{6!}{2! \times 2!} = 180$$ Return all Permutations in R Let’s return all permutations in R Working with combinat package library(combinat) # get the list of all permutations my_list<-combinat::permn(c("b","a","b","o","o","n")) # convert the list to a matrix my_matrix<-do.call(rbind,my_list) #take the unique rows my_matrix<-unique(my_matrix) head(my_matrix) [,1] [,2] [,3] [,4] [,5] [,6] [1,] "b" "a" "b" "o" "o" "n" [2,] "b" "a" "b" "o" "n" "o" [3,] "b" "a" "b" "n" "o" "o" [4,] "b" "a" "n" "b" "o" "o" [5,] "b" "n" "a" "b" "o" "o" [6,] "n" "b" "a" "b" "o" "o" If we want to get the number of rows of the table, which are actually our permutations: dim(my_matrix) # [1] 180 6 As expected we got 180 rows (the permutations) and 6 columns (the number of letters) Comments When we are in a position to get all the possible permutations, we will be able to calculate the permutations of more complicated problems. Let’s say, that the question was to calculate all the possible permutations of the word baboon but by picking 4 letters each time (instead of 6). var vglnk = {key: '949efb41171ac6ec1bf7f206d57e90b8'}; (function(d, t) { var s = d.createElement(t); s.type = 'text/javascript'; s.async = true; // s.defer = true; // s.src = '//cdn.viglink.com/api/vglnk.js'; s.src = 'https://www.r-bloggers.com/wp-content/uploads/2020/08/vglnk.js'; var r = d.getElementsByTagName(t)[0]; r.parentNode.insertBefore(s, r); }(document, 'script')); To leave a comment for the author, please follow the link and comment on their blog: R – Predictive Hacks. R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job. Want to share your content on R-bloggers? click here if you have a blog, or here if you don't. The post Permutations in R first appeared on R-bloggers. ### Sequential satisficing Thu, 09/17/2020 - 20:00 [social4i size="small" align="align-left"] --> [This article was first published on R on OSM, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don't. In our last post, we ran simulations on our 1,000 randomly generated return scenarios to compare the average and risk-adjusted return for satisfactory, naive, and mean-variance optimized (MVO) maximum return and maximum Sharpe ratio portfolios.1 We found that you can shoot for high returns or high risk-adjusted returns, but rarely both. Assuming no major change in the underlying average returns and risk, choosing the efficient high return or high risk-adjusted return portfolio generally leads to similar performance a majority of the time in out-of-sample simulations. What was interesting about the results was that even though we weren’t arguing that one should favor satisficing over optimizing, we found that the satisfactory portfolio generally performed well. One area we didn’t test was the effect of sequence on mean-variance optimization. Recall we simulated 1,000 different sixty month (five year) return profiles for four potential assets—stocks, bonds, commodities (gold) and real estate. The weights we used to calculate the different portfolio returns and risk were based on data from 1987-1991 and were kept constant for all the simulations. What if the weighting system could “learn” from previous scenarios or adjust to recent information? In this post, we’ll look at the impact of incorporating sequential information on portfolio results and compare that to the non-sequential results. To orient folks, we’ll show the initial portfolio weight simulation graph with the different portfolios and efficient frontier based on the historical data. The satisfactory, naive, MVO-derived maximum Sharpe ratio, and MVO-derived maximum return portfolios are the colored blue, black, red, and purple points, respectively. Now we’ll take the weighting schemes and apply them to the return simulations. However, we’ll run through the returns sequentially and calculate the returns and risk-adjusted returns for the weighting scheme using the prior simulation as the basis for the allocation on the next simulation. For example, we’ll calculate the efficient frontier and run the portfolio weight algorithm (see Weighting on a friend for more details) on simulation #75. From those calculations, we’ll assign the weights for the satisfactory, maximum Sharpe ratio, and maximum efficient return portfolios to compute the returns and risk-adjusted returns on simulation #76. For the record, simulation #1 uses the same weights as those that produced the dots in the graph above. Running the simulations, here’s the average return by weighting algorithm. And here’s the original. For the naive and maximum return portfolios, the average return isn’t much different on the sequential calculations than on original constant weight computations. The sequential Sharpe allocations are about 1% point lower than the stable ones. However, the sequential satisfactory allocations are almost 2% points lower than the constant allocation. In about 5% of the cases, the return simulation was not capable of achieving a satisfactory portfolio with the original constraints of not less than 7% and not more than 10% annualized return and risk. In such a case, we reverted to the original weighting. This adjustment did not affect the average return even when we excluded the “unsatisfactory” portfolios. We assume that the performance drag is due to the specificity of the constraints amid random outcomes. The other portfolios (except for the naive one) optimize for the previous scenario and apply those optimal weights on the next portfolio in the sequence. The satisfactory portfolio chooses average weights to achieve a particular constraint and then applies those weights in sequence. Due to the randomness of the return scenarios, it seems possible that a weighting scheme that works for one scenario would produce poor results on another scenario. Of course, the satisfactory portfolios still achieved their return constraint on average. This doesn’t exactly explain why the MVO portfolios didn’t suffer a performance drag. Our intuition is that the MVO portfolios are border cases. Hence, there probably won’t be too much dispersion in the results on average since the return simulations draw from the same return, risk, and error terms throughout. However, the satisfactory portfolio lies with the main portion of the territory, so the range of outcomes could be much larger. Clearly, this requires further investigation, but we’ll have to shelve that for now. In the next set of graphs, we check out how each portfolio performs relative to the others. First the sequential simulation. And the original: There are some modest changes in the satisfactory and naive vs. MVO-derived portfolios. But the biggest change occurs in the satisfactory vs. naive portfolio—almost a 22 percentage point decline! This appears to support our conjecture above that the order of the sequence could be affecting the satisfactory portfolio’s performance. Nonetheless, sequencing does not appear to alter the performance of the naive and satisfactory portfolios relative to the MVO-derived ones in a material way. Let’s randomize the sequence to say if results change. For the random draw case, we randomly draw one return simulation on which to calculate our portfolio weights and then apply those weights on another randomly drawn portfolio. The draws are without replacement to ensure we don’t use the same simulation both to calculate weights and returns as well as to ensure that we use all simulations.2 The average return graph isn’t much different for the random sampling than it is for the sequential so we’ll save some space and not print it. Here is the relative performance graph. And here is the original relative performance for ease of comparison. With random ordering we see that the naive and satisfactory portfolios relative to the maximum return experience a modest performance improvement. Relative to the maximum Sharpe portfolio, the performance is relatively unchanged. The satisfactory portfolio again suffers a drag relative to the naive, but somewhat less severe that in the sequential case. What are the takeaways? In both sequential and random sampling, the performance of the naive and satisfactory compared to the MVO portfolios was relatively stable. That supports our prior observation that you can shoot for high returns or high risk-adjusted returns, but usually not both. It’s also noteworthy that the maximum Sharpe ratio consistently underperforms the naive and satisfactory portfolios in all of the simulations. The team at Alpha Architect noted similar results in a much broader research project. Here, part of the study found that the maximum Sharpe portfolio suffered the worst performance relative to many others including an equal-weighted (i.e., naive) portfolio. One issue with our random sampling test is that it was only one random sequence. To be truly robust, we’d want to run the sampling simulation more than once. But we’ll have to save that for when we rent time on a cloud GPU because we don’t think our flintstone-aged laptop will take kindly to running 1,000 simulations of 1,000 random sequences of 1,000 simulated portfolios with 1,000 optimizations and 3,000 random portfolio weightings! Perceptive readers might quibble with how we constructed our “learning” simulations. Indeed, in both the sequential and random sample cases we calculated our allocation weights only using the prior simulation in the queue (e.g. weights derived from return simulation #595 are deployed on simulation #596 (or, for example, #793 in the random simulation)). But that “learning” could be considered relatively short term. A more realistic scenario would be to allow for cumulative learning by trying to incorporate all or a majority of past returns in the sequence. And we could get even more complicated by throwing in some weighting scheme to emphasize nearer term results. Finally, we could calculate cumulative returns over one scenario or multiple scenarios. We suspect relative performance would be similar, but you never know! Until we experiment with these other ideas, the R and Python code are below. While we generally don’t discuss the code in detail, we have been getting more questions and constructive feedback on it of late. Thank you for that! One thing to note is the R code to produce the simulations runs much quicker than the Python code. Part of that is likely due to how we coded in the efficient frontier and maximum Sharpe and return portfolios in the different languages. But the differences are close to an order of magnitude. If anyone notices anything we could do to improve performance, please drop us an email at nbw dot osm at gmail dot com. Thanks! R code # Built using R 3.6.2 ### Load packages suppressPackageStartupMessages({ library(tidyquant) library(tidyverse) }) ### Load data # Seem prior posts for how we built these data frames df <- readRDS("port_const.rds") dat <- readRDS("port_const_long.rds") sym_names <- c("stock", "bond", "gold", "realt", "rfr") sim1 <- readRDS("hist_sim_port16.rds") ### Call functions # See prior posts for how we built these functions source("Portfolio_simulation_functions.R") source("Efficient_frontier.R") ## Function for calculating satisfactory weighting port_sim_wts <- function(df, sims, cols, return_min, risk_max){ if(ncol(df) != cols){ print("Columns don't match") break } # Create weight matrix wts <- matrix(nrow = (cols-1)*sims, ncol = cols) count <- 1 for(i in 1:(cols-1)){ for(j in 1:sims){ a <- runif((cols-i+1),0,1) b <- a/sum(a) c <- sample(c(b,rep(0,i-1))) wts[count,] <- c count <- count+1 } } # Find returns mean_ret <- colMeans(df) # Calculate covariance matrix cov_mat <- cov(df) # Calculate random portfolios port <- matrix(nrow = (cols-1)*sims, ncol = 2) for(i in 1:nrow(port)){ port[i,1] <- as.numeric(sum(wts[i,] * mean_ret)) port[i,2] <- as.numeric(sqrt(t(wts[i,]) %*% cov_mat %*% wts[i,])) } port <- as.data.frame(port) %>% colnames<-(c("returns", "risk")) port_select <- cbind(port, wts) port_wts <- port_select %>% mutate(returns = returns*12, risk = risk*sqrt(12)) %>% filter(returns >= return_min, risk <= risk_max) %>% summarise_at(vars(3:6), mean) port_wts } ## Portfolio func port_func <- function(df,wts){ mean_ret = colMeans(df) returns = sum(mean_ret*wts) risk = sqrt(t(wts) %*% cov(df) %*% wts) c(returns, risk) } ## Portfolio graph pf_graf <- function(df, nudge, multiplier, rnd, y_lab, text){ df %>% gather(key, value) %>% ggplot(aes(reorder(key, value), value*multiplier*100)) + geom_bar(stat='identity', fill = 'darkblue') + geom_text(aes(label = format(round(value*multiplier,rnd)*100,nsmall = 1)), nudge_y = nudge)+ labs(x = "", y = paste(y_lab, "(%)", sep = " "), title = paste(text, "by portfolio", sep = " ")) } ## Create outperformance graph perf_graf <- function(df, rnd, nudge, text){ df %>% rename("Satisfactory vs. Naive" = ovs, "Satisfactory vs. Max" = ovr, "Naive vs. Max" = rve, "Satisfactory vs. Sharpe" = ove, "Naive vs. Sharpe" = sve) %>% gather(key, value) %>% ggplot(aes(reorder(key, value), value*100)) + geom_bar(stat='identity', fill = 'darkblue') + geom_text(aes(label = format(round(value,rnd)*100, nsmall = 1)), nudge_y = nudge)+ labs(x = "", y = "Percent (%)", title = paste("Frequency of outperformance:", text, sep = " ")) } ### Create weight schemes satis_wts <- c(0.32, 0.4, 0.08, 0.2) # Calculated in previous post using port_select_func simple_wts <- rep(0.25, 4) eff_port <- eff_frontier_long(df[2:61,2:5], risk_increment = 0.01) eff_sharp_wts <- eff_port[which.max(eff_portsharpe),1:4] %>% as.numeric() eff_max_wts <- eff_port[which.max(eff_port$exp_ret), 1:4] %>% as.numeric() ## Run port sim on 1987-1991 data port_sim_1 <- port_sim_lv(df[2:61,2:5],1000,4) # Run function on three weighting schemes and one simulation weight_list <- list(satis = satis_wts, naive = simple_wts, sharp = eff_sharp_wts, max = eff_max_wts) wts_df <- data.frame(wts = c("satis", "naive", "sharp", "max"), returns = 1:4, risk = 5:8, stringsAsFactors = FALSE) for(i in 1:4){ wts_df[i, 2:3] <- port_func(df[2:61,2:5], weight_list[[i]]) } wts_df$sharpe = wts_df$returns/wts_df$risk # Graph portfolio simulation with three portfolios port_sim_1$graph + geom_point(data = wts_df, aes(x = risk*sqrt(12)*100, y = returns*1200), color = c("darkblue", "black", "red", "purple"), size = 4) + geom_line(data = eff_port, aes(stdev*sqrt(12)*100, exp_ret*1200), color = 'blue', size = 1.5) + theme(legend.position = c(0.05,0.8), legend.key.size = unit(.5, "cm"), legend.background = element_rect(fill = NA)) ## Calculate performance with rolling frontier on max sharpe set.seed(123) eff_sharp_roll <- list() eff_max_roll <- list() satis_roll <- list() for(i in 1:1000){ if(i == 1){ sharp_wts <- eff_sharp_wts max_wts <- eff_max_wts sat_wts <- satis_wts }else { eff_calc <- eff_frontier_long(sim1[[i-1]]$df, risk_increment = 0.01) sharp_wts <- eff_calc[which.max(eff_calc$sharpe), 1:4] max_wts <- eff_calc[which.max(eff_calc$exp_ret), 1:4] sat_wts <- port_sim_wts(sim1[[i-1]]$df, 1000, 4, 0.07, 0.1) } eff_sharp_roll[[i]] <- port_func(sim1[[i]]$df, as.numeric(sharp_wts)) eff_max_roll[[i]] <- port_func(sim1[[i]]$df, as.numeric(max_wts)) satis_roll[[i]] <- port_func(sim1[[i]]$df, as.numeric(sat_wts)) } list_to_df <- function(list_ob){ x <- do.call('rbind', list_ob) %>% as.data.frame() %>% colnames<-(c("returns", "risk")) %>% mutate(sharpe = returns/risk) x } roll_lists <- c("eff_sharp_roll", "eff_max_roll", "satis_roll") for(obj in roll_lists){ x <- list_to_df(get(obj)) assign(obj, x) } # Return roll_mean_pf <- data.frame(Satisfactory = mean(satis_roll[,1], na.rm = TRUE), Naive = mean(simple_df[,1]), Sharpe = mean(eff_sharp_roll[,1]), Max = mean(eff_max_roll[,1])) # Graph mean returns pf_graf(roll_mean_pf, 1, 12, 2,"Return (%)", "Rolling simulation return") pf_graf(mean_pf, 1, 12, 2,"Return (%)", "Original simulation return") # Create relative performance df roll_ret_pf <- data.frame(ovs = mean(satis_df[,1] > simple_df[,1]), ovr = mean(satis_df[,1] > eff_max_roll[,1]), rve = mean(simple_df[,1] > eff_max_roll[,1]), ove = mean(satis_df[,1] > eff_sharp_roll[,1]), sve = mean(simple_df[,1] > eff_sharp_roll[,1])) # Graph outperformance perf_graf(roll_ret_pf, 2, 4, "Rolling simulation returns") perf_graf(ret_pf, 2, 4, "Original simulation returns") ## Sampling portfolios set.seed(123) eff_sharp_samp <- list() eff_max_samp <- list() satis_samp <- list() for(i in 1:1000){ if(i == 1){ sharp_wts <- eff_sharp_wts max_wts <- eff_max_wts sat_wts <- satis_wts }else { samp_1 <- sample(1000, 1) # Sample a return simulation for weight scheme eff_calc <- eff_frontier_long(sim1[[samp_1]]$df, risk_increment = 0.01) sharp_wts <- eff_calc[which.max(eff_calc$sharpe), 1:4] max_wts <- eff_calc[which.max(eff_calc$exp_ret), 1:4] sat_wts <- port_sim_wts(sim1[[samp_1]]$df, 1000, 4, 0.07, 0.1) } samp_2 <- sample(1000, 1) # sample a return simulation to analyze performance eff_sharp_samp[[i]] <- port_func(sim1[[samp_2]]$df, as.numeric(sharp_wts)) eff_max_samp[[i]] <- port_func(sim1[[samp_2]]$df, as.numeric(max_wts)) satis_samp[[i]] <- port_func(sim1[[samp_2]]df, as.numeric(sat_wts)) } samp_lists <- c("eff_sharp_samp", "eff_max_samp", "satis_samp") for(obj in samp_lists){ x <- list_to_df(get(obj)) assign(obj, x) } # Return samp_mean_pf <- data.frame(Satisfactory = mean(satis_samp[,1], na.rm=TRUE), Naive = mean(simple_df[,1]), Sharpe = mean(eff_sharp_samp[,1]), Max = mean(eff_max_samp[,1])) # Graph mean returns: NOT SHOWN pf_graf(samp_mean_pf, 1, 12, 2,"Return (%)", "Random sampling return") # pf_graf(mean_pf, 1, 12, 2,"Return (%)", "Original return") # Create relative performance df samp_ret_pf <- data.frame(ovs = mean(satis_df[,1] > simple_df[,1]), ovr = mean(satis_df[,1] > eff_max_samp[,1]), rve = mean(simple_df[,1] > eff_max_samp[,1]), ove = mean(satis_df[,1] > eff_sharp_samp[,1]), sve = mean(simple_df[,1] > eff_sharp_samp[,1])) # Graph outperformance perf_graf(samp_ret_pf, 2, 4, "Random sample returns") perf_graf(ret_pf, 2, 4, "Original returns") Python code # Built using Python 3.7.4 # Load libraries import pandas as pd import pandas_datareader.data as web import numpy as np import matplotlib.pyplot as plt %matplotlib inline plt.style.use('ggplot') ## Load data # Seem prior posts for how we built these data frames df = pd.read_pickle('port_const.pkl') dat = pd.read_pickle('data_port_const.pkl') port_names = ['Original','Naive', 'Sharpe', 'Max'] sim1 = pd.read_pickle('hist_sim_port16.pkl') ## Load functions part 1 # Portfolio simulation functions # NOte the Port_sim class is slightly different than previous posts, # so we're reproducing it here. # We were getting a lot of "numpy.float64 is not callable" errors # due to overlapping names on variables and functions, so we needed # to fix the code. If it still throws error, let us know if it does. ## Simulation function class Port_sim: def calc_sim(df, sims, cols): wts = np.zeros((sims, cols)) for i in range(sims): a = np.random.uniform(0,1,cols) b = a/np.sum(a) wts[i,] = b mean_ret = df.mean() port_cov = df.cov() port = np.zeros((sims, 2)) for i in range(sims): port[i,0] = np.sum(wts[i,]*mean_ret) port[i,1] = np.sqrt(np.dot(np.dot(wts[i,].T,port_cov), wts[i,])) sharpe = port[:,0]/port[:,1]*np.sqrt(12) return port, wts, sharpe def calc_sim_lv(df, sims, cols): wts = np.zeros(((cols-1)*sims, cols)) count=0 for i in range(1,cols): for j in range(sims): a = np.random.uniform(0,1,(cols-i+1)) b = a/np.sum(a) c = np.random.choice(np.concatenate((b, np.zeros(i))),cols, replace=False) wts[count,] = c count+=1 mean_ret = df.mean() port_cov = df.cov() port = np.zeros(((cols-1)*sims, 2)) for i in range(sims): port[i,0] = np.sum(wts[i,]*mean_ret) port[i,1] = np.sqrt(np.dot(np.dot(wts[i,].T,port_cov), wts[i,])) sharpe = port[:,0]/port[:,1]*np.sqrt(12) return port, wts, sharpe def graph_sim(port, sharpe): plt.figure(figsize=(14,6)) plt.scatter(port[:,1]*np.sqrt(12)*100, port[:,0]*1200, marker='.', c=sharpe, cmap='Blues') plt.colorbar(label='Sharpe ratio', orientation = 'vertical', shrink = 0.25) plt.title('Simulated portfolios', fontsize=20) plt.xlabel('Risk (%)') plt.ylabel('Return (%)') plt.show() # Constraint function def port_select_func(port, wts, return_min, risk_max): port_select = pd.DataFrame(np.concatenate((port, wts), axis=1)) port_select.columns = ['returns', 'risk', 1, 2, 3, 4] port_wts = port_select[(port_select['returns']*12 >= return_min) & (port_select['risk']*np.sqrt(12) <= risk_max)] port_wts = port_wts.iloc[:,2:6] port_wts = port_wts.mean(axis=0) return port_wts def port_select_graph(port_wts): plt.figure(figsize=(12,6)) key_names = {1:"Stocks", 2:"Bonds", 3:"Gold", 4:"Real estate"} lab_names = [] graf_wts = port_wts.sort_values()*100 for i in range(len(graf_wts)): name = key_names[graf_wts.index[i]] lab_names.append(name) plt.bar(lab_names, graf_wts, color='blue') plt.ylabel("Weight (%)") plt.title("Average weights for risk-return constraint", fontsize=15) for i in range(len(graf_wts)): plt.annotate(str(round(graf_wts.values[i])), xy=(lab_names[i], graf_wts.values[i]+0.5)) plt.show() ## Load functions part 2 # We should have wrapped the three different efficient frontier functions # into one class or function but ran out of time. This is probably what slows # down the simulations below. # Create efficient frontier function from scipy.optimize import minimize def eff_frontier(df_returns, min_ret, max_ret): n = len(df_returns.columns) def get_data(weights): weights = np.array(weights) returns = np.sum(df_returns.mean() * weights) risk = np.sqrt(np.dot(weights.T, np.dot(df_returns.cov(), weights))) sharpe = returns/risk return np.array([returns,risk,sharpe]) # Contraints def check_sum(weights): return np.sum(weights) - 1 # Rante of returns mus = np.linspace(min_ret,max_ret,21) # Function to minimize def minimize_volatility(weights): return get_data(weights)[1] # Inputs init_guess = np.repeat(1/n,n) bounds = ((0.0,1.0),) * n eff_risk = [] port_weights = [] for mu in mus: # function for return cons = ({'type':'eq','fun': check_sum}, {'type':'eq','fun': lambda w: get_data(w)[0] - mu}) result = minimize(minimize_volatility,init_guess,method='SLSQP',bounds=bounds,constraints=cons) eff_risk.append(result['fun']) port_weights.append(result.x) eff_risk = np.array(eff_risk) return mus, eff_risk, port_weights # Create max sharpe function from scipy.optimize import minimize def max_sharpe(df_returns): n = len(df_returns.columns) def get_data(weights): weights = np.array(weights) returns = np.sum(df_returns.mean() * weights) risk = np.sqrt(np.dot(weights.T, np.dot(df_returns.cov(), weights))) sharpe = returns/risk return np.array([returns,risk,sharpe]) # Function to minimize def neg_sharpe(weights): return -get_data(weights)[2] # Inputs init_guess = np.repeat(1/n,n) bounds = ((0.0,1.0),) * n # function for return constraint = {'type':'eq','fun': lambda x: np.sum(x) - 1} result = minimize(neg_sharpe, init_guess, method='SLSQP', bounds=bounds, constraints=constraint) return -result['fun'], result['x'] # Create efficient frontier function from scipy.optimize import minimize def max_ret(df_returns): n = len(df_returns.columns) def get_data(weights): weights = np.array(weights) returns = np.sum(df_returns.mean() * weights) risk = np.sqrt(np.dot(weights.T, np.dot(df_returns.cov(), weights))) sharpe = returns/risk return np.array([returns,risk,sharpe]) # Function to minimize def port_ret(weights): return -get_data(weights)[0] # Inputs init_guess = np.repeat(1/n,n) bounds = ((0.0,1.0),) * n # function for return constraint = {'type':'eq','fun': lambda x: np.sum(x) - 1} result = minimize(port_ret, init_guess, method='SLSQP', bounds=bounds, constraints=constraint) return -result['fun'], result['x'] ## Load functions part 3 ## Portfolio return def port_func(df, wts): mean_ret = df.mean() returns = np.sum(mean_ret * wts) risk = np.sqrt(np.dot(wts, np.dot(df.cov(), wts))) return returns, risk ## Return and relative performance graph def pf_graf(names, values, rnd, nudge, ylabs, graf_title): df = pd.DataFrame(zip(names, values), columns = ['key', 'value']) sorted = df.sort_values(by = 'value') plt.figure(figsize = (12,6)) plt.bar('key', 'value', data = sorted, color='darkblue') for i in range(len(names)): plt.annotate(str(round(sorted['value'][i], rnd)), xy = (sorted['key'][i], sorted['value'][i]+nudge)) plt.ylabel(ylabs) plt.title('{} performance by portfolio'.format(graf_title)) plt.show() ## Create portfolio np.random.seed(123) port_sim_1, wts_1, sharpe_1 = Port_sim.calc_sim(df.iloc[1:61,0:4],1000,4) # Create returns and min/max ranges df_returns = df.iloc[1:61, 0:4] min_ret = min(port_sim_1[:,0]) max_ret = max(port_sim_1[:,0]) # Find efficient portfolio eff_ret, eff_risk, eff_weights = eff_frontier(df_returns, min_ret, max_ret) eff_sharpe = eff_ret/eff_risk ## Create weight schemes satis_wts = np.array([0.32, 0.4, 0.08, 0.2]) # Calculated in previous post using port_select_func simple_wts = np.repeat(0.25, 4) eff_sharp_wts = eff_weights[np.argmax(eff_sharpe)] eff_max_wts = eff_weights[np.argmax(eff_ret)] wt_list = [satis_wts, simple_wts, eff_sharp_wts, eff_max_wts] wts_df = np.zeros([4,3]) for i in range(4): wts_df[i,:2] = port_func(df.iloc[1:61,0:4], wt_list[i]) wts_df[:,2] = wts_df[:,0]/wts_df[:,1] ## Graph portfolios plt.figure(figsize=(12,6)) plt.scatter(port_sim_1[:,1]*np.sqrt(12)*100, port_sim_1[:,0]*1200, marker='.', c=sharpe_1, cmap='Blues') plt.plot(eff_risk*np.sqrt(12)*100,eff_ret*1200,'b--',linewidth=2) col_code = ['blue', 'black', 'red', 'purple'] for i in range(4): plt.scatter(wts_df[i,1]*np.sqrt(12)*100, wts_df[i,0]*1200, c = col_code[i], s = 50) plt.colorbar(label='Sharpe ratio', orientation = 'vertical', shrink = 0.25) plt.title('Simulated portfolios', fontsize=20) plt.xlabel('Risk (%)') plt.ylabel('Return (%)') plt.show() ## Create simplifed satisfactory portfolio finder function def port_sim_wts(df1, sims1, cols1, ret1, risk1): pf, wt, _ = Port_sim.calc_sim(df1, sims1, cols1) port_wts = port_select_func(pf, wt, ret1, risk1) return port_wts ## Run sequential simulation np.random.seed(123) eff_sharp_roll = np.zeros([1000,3]) eff_max_roll = np.zeros([1000,3]) satis_roll = np.zeros([1000,3]) for i in range(1000): if i == 0: sharp_weights = eff_sharp_wts max_weights = eff_max_wts sat_weights = satis_wts else: _, sharp_wts = max_sharpe(sim1[i-1][0]) _, max_wts = max_ret(sim1[i-1][0]) # Running the optimizatin twice is probably slowing down the simulation sharp_weights = sharp_wts max_weights = max_wts test = port_sim_wts(sim[i-1][0], 1000, 4, 0.07,0.1) if np.isnan(test): sat_weights = satis_wts else: sat_weights = test eff_sharp_roll[i,:2] = port_func(sim1[i][0], sharp_weights) eff_max_roll[i,:2] = port_func(sim1[i][0], max_weights) satis_roll[i,:2] = port_func(sim1[i][0], sat_weights) eff_sharp_roll[:,2] = eff_sharp_roll[:,0]/eff_sharp_roll[:,1] eff_max_roll[:,2] = eff_max_roll[:,0]/eff_max_roll[:,1] satis_roll[:,2] = satis_roll[:,0]/satis_roll[:,1] # Calculate simple returns simple_df = np.zeros([1000,3]) for i in range(1000): simple_df[i,:2] = port_func(sim1[i][0], simple_wts) simple_df[:,2] = simple_df[:,0]/simple_df[:,1] ## Add simulations to list and graph roll_sim = [satis_roll, simple_df, eff_sharp_roll, eff_max_roll] port_means = [] for df in roll_sim: port_means.append(np.mean(df[:,0])*1200) port_names = ['Satisfactory', 'Naive', 'Sharpe', 'Max'] # Sequential simulation pf_graf(port_names, port_means, 1, 0.5, 'Returns (%)', 'Rolling simulation return') # Original simulation port_means1 = [] for df in list_df: ## Comparison charts # Build names for comparison chart comp_names= [] for i in range(4): for j in range(i+1,4): comp_names.append('{} vs. {}'.format(port_names[i], port_names[j])) # Calculate comparison values comp_values = [] for i in range(4): for j in range(i+1, 4): comps =np.mean(roll_sim[i][:,0] > roll_sim[j][:,0]) comp_values.append(comps) # Sequential comparisons pf_graf(comp_names[:-1], comp_values[:-1], 2, 0.025, 'Frequency (%)', 'Rolling simulation frequency of') port_means1.append(np.mean(df[:,0])*1200) pf_graf(port_names, port_means1, 1, 0.5, 'Returns (%)', 'Original simulation return') # original comparisons # Calculate comparison values comp_values1 = [] for i in range(4): for j in range(i+1, 4): comps1 = np.mean(list_df[i][:,0] > list_df[j][:,0]) comp_values1.append(comps1) pf_graf(comp_names[:-1], comp_values1[:-1], 2, 0.025, 'Frequency (%)', 'Original simulation frequency of') ## Sample simulation from datetime import datetime start_time = datetime.now() np.random.seed(123) eff_sharp_samp = np.zeros([1000,3]) eff_max_samp = np.zeros([1000,3]) satis_samp = np.zeros([1000,3]) naive_samp = np.zeros([1000,3]) for i in range(1000): if i == 0: sharp_weights = eff_sharp_wts max_weights = eff_max_wts sat_weights = satis_wts nav_weights = simple_wts else: samp1 = int(np.random.choice(1000,1)) _, sharp_wts = max_sharpe(sim1[samp1][0]) _, max_wts = max_ret(sim1[samp1][0]) sharp_weights = sharp_wts max_weights = max_wts test = port_sim_wts(sim1[samp1][0], 1000, 4, 0.07, 0.1) if np.isnan(test.any()): sat_wts = satis_wts else: sat_wts = test samp2 = int(np.random.choice(1000,1)) eff_sharp_samp[i,:2] = port_func(sim1[samp2][0], sharp_wts) eff_max_samp[i,:2] = port_func(sim1[samp2][0], max_wts) satis_samp[i,:2] = port_func(sim1[samp2][0], sat_wts) naive_samp[i,:2] = port_func(sim1[samp2][0], nav_wts) eff_sharp_samp[:,2] = eff_sharp_samp[:,0]/eff_sharp_samp[:,1] eff_max_samp[:,2] = eff_max_samp[:,0]/eff_max_samp[:,1] satis_samp[:,2] = satis_samp[:,0]/satis_samp[:,1] naive_samp[:,2] = naive_samp[:,0]/naive_samp[:,1] end_time = datetime.now() print('Duration: {}'.format(end_time - start_time)) # Duration: 0:07:19.733893 # Create sample list and graph samp_list = [eff_sharp_samp, eff_max_samp, satis_samp, naive_samp] port_means_samp = [] for df in samp_list: port_means_samp.append(np.mean(df[:,0])*1200) # Sample graph pf_graf(port_names, port_means_samp, 1, 0.5, 'Returns (%)', 'Random sample simulation return') # Original graph pf_graf(port_names, port_means1, 1, 0.5, 'Returns (%)', 'Original simulation return') # Calculate comparison values for sample simulation comp_values_samp = [] for i in range(4): for j in range(i+1, 4): comps_samp = np.mean(samp_list[i][:,0] > samp_list[j][:,0]) comp_values_samp.append(comps_samp) # Sample graph pf_graf(comp_names[:-1], comp_values_samp[:-1], 2, 0.025, 'Frequency (%)', 'Random sample simulation frequency of') # original graph pf_graf(comp_names[:-1], comp_values1[:-1], 2, 0.025, 'Frequency (%)', 'Original simulation frequency of') 1. Whew, what a mouthful! 2. Ensuring our scenario draws don’t match is almost overkill, since the likelihood of drawing two of the same samples in a row is about one in a million. var vglnk = {key: '949efb41171ac6ec1bf7f206d57e90b8'}; (function(d, t) { var s = d.createElement(t); s.type = 'text/javascript'; s.async = true; // s.defer = true; // s.src = '//cdn.viglink.com/api/vglnk.js'; s.src = 'https://www.r-bloggers.com/wp-content/uploads/2020/08/vglnk.js'; var r = d.getElementsByTagName(t)[0]; r.parentNode.insertBefore(s, r); }(document, 'script')); To leave a comment for the author, please follow the link and comment on their blog: R on OSM. R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job. Want to share your content on R-bloggers? click here if you have a blog, or here if you don't. The post Sequential satisficing first appeared on R-bloggers. ### A Frosty Deal? Thu, 09/17/2020 - 20:00 [social4i size="small" align="align-left"] --> [This article was first published on R | Quantum Jitter, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don't. Reading news articles on the will-they-won’t-they post-Brexit trade negotiations with the EU sees days of optimism jarred by days of gloom. Do negative news articles, when one wants a positive outcome, leave a deeper impression? I wondered if I could get a more objective view from quantitative analysis of textual data. To do this, I’m going to look at hundreds of articles published in the Guardian newspaper over the course of the year to see how trade-talk sentiment has changed week-to-week. library(tidyverse) library(rebus) library(wesanderson) library(kableExtra) library(lubridate) library(GuardianR) library(quanteda) library(scales) library(tictoc) library(patchwork) library(text2vec) library(topicmodels) theme_set(theme_bw()) cols <- wes_palette(name = "Chevalier1") The Withdrawal Agreement between the UK and the European Union was signed on the 24th of January 2020. I’ll import Brexit-related newspaper articles from that date. The Guardian newspaper asks for requests to span no more than 1 month at a time. So I’ll first create a set of monthly date ranges. dates_df <- tibble(start_date = seq(ymd("2020-01-24"), today(), by = "1 month")) %>% mutate(end_date = start_date + months(1) - 1) dates_df %>% kable() start_date end_date 2020-01-24 2020-02-23 2020-02-24 2020-03-23 2020-03-24 2020-04-23 2020-04-24 2020-05-23 2020-05-24 2020-06-23 2020-06-24 2020-07-23 2020-07-24 2020-08-23 2020-08-24 2020-09-23 I’ll import the newspaper articles in monthly chunks. Note, access to the Guardian’s API requires a key which may be requested here. tic() article_df <- dates_df %>% pmap_dfr(., function(start_date, end_date) { Sys.sleep(1) get_guardian( "brexit", from.date = start_date, to.date = end_date, api.key = key ) }) toc() The data need a little cleaning, for example, to remove multi-topic articles, html tags and non-breaking spaces. trade_df <- article_df %>% filter(!str_detect(id, "/live/"), sectionId %in% c("world", "politics", "business")) %>% mutate( body = str_remove_all(body, "<.*?>") %>% str_to_lower(), body = str_remove_all(body, "[^a-z0-9 .-]"), body = str_remove_all(body, "nbsp") ) A corpus then gives me a collection of texts whereby each document is a newspaper article. trade_corp <- trade_df %>% corpus(docid_field = "shortUrl", text_field = "body") Although I’ve only imported articles mentioning Brexit since the Withdrawal Agreement was signed, some of these articles will not be related to trade negotiations with the EU. For example, there are on-going negotiations with many countries around the world. So, I’m going to use word embeddings to help narrow the focus to the specific context of the UK-EU trade deal. The chief negotiator for the EU is Michel Barnier, so I’ll quantitatively identify words in close proximity to “Barnier” in the context of these Brexit news articles. window <- 5 trade_fcm <- trade_corp %>% fcm(context = "window", window = window, count = "weighted", weights = window:1) glove <- GlobalVectorsnew(rank = 60, x_max = 10) set.seed(42) wv_main <- glove$fit_transform(trade_fcm, n_iter = 10) ## INFO [10:06:33.114] epoch 1, loss 0.3817 ## INFO [10:06:34.959] epoch 2, loss 0.2510 ## INFO [10:06:36.759] epoch 3, loss 0.2225 ## INFO [10:06:38.577] epoch 4, loss 0.2021 ## INFO [10:06:40.438] epoch 5, loss 0.1847 ## INFO [10:06:42.303] epoch 6, loss 0.1710 ## INFO [10:06:44.124] epoch 7, loss 0.1605 ## INFO [10:06:45.936] epoch 8, loss 0.1524 ## INFO [10:06:47.754] epoch 9, loss 0.1457 ## INFO [10:06:49.594] epoch 10, loss 0.1403 wv_context <- glove$components word_vectors <- wv_main + t(wv_context) search_coord <- word_vectors["barnier", , drop = FALSE] word_vectors %>% sim2(search_coord, method = "cosine") %>% as_tibble(rownames = NA) %>% rownames_to_column("term") %>% rename(similarity = 2) %>% arrange(desc(similarity)) %>% slice(1:10) %>% kable()

term similarity barnier 1.0000000 negotiator 0.7966461 michel 0.7587372 frost 0.7093119 eus 0.6728152 chief 0.6365480 brussels 0.5856139 negotiators 0.5598537 team 0.5488111 accused 0.5301669

Word embedding is a learned modelling technique placing words into a multi-dimensional vector space such that contextually-similar words may be found close by. Not surprisingly, the closest word contextually is “Michel”. And as he is the chief negotiator for the EU, we find “eu’s”, “chief”, and “negotiator” also in the top most contextually-similar words.

The word embeddings algorithm, through word co-occurrence, has identified the name of Michel Barnier’s UK counterpart David Frost. So filtering articles for “Barnier”, “Frost” and “UK-EU” should help narrow the focus.

context_df <- trade_df %>% filter(str_detect(body, "barnier|frost|uk-eu")) context_corp <- context_df %>% corpus(docid_field = "shortUrl", text_field = "body")

I can then use quanteda’s kwic function to review the key phrases in context to ensure I’m homing in on the texts I want. Short URLs are included below so I can click on any to read the actual article as presented by The Guardian.

set.seed(123) context_corp %>% tokens( remove_punct = TRUE, remove_symbols = TRUE, remove_numbers = TRUE ) %>% kwic(pattern = phrase(c("trade negotiation", "trade deal", "trade talks")), valuetype = "regex", window = 7) %>% as_tibble() %>% left_join(article_df, by = c("docname" = "shortUrl")) %>% slice_sample(n = 10) %>% select(docname, pre, keyword, post, headline) %>% kable()

Quanteda provides a sentiment dictionary which, in addition to identifying positive and negative words, also finds negative-negatives and negative-positives such as, for example, “not effective”. For each week’s worth of articles, I’ll calculate the proportion of positive sentiments.

tic() sent_df <- context_corp %>% dfm(dictionary = data_dictionary_LSD2015) %>% as_tibble() %>% left_join(context_df, by = c("doc_id" = "shortUrl")) %>% mutate( date = ceiling_date(as_date(webPublicationDate), "week"), pct_pos = (positive + neg_negative) / (positive + neg_negative + negative + neg_positive) ) sent_df %>% select(doc_id, starts_with("pos"), starts_with("neg")) %>% slice(1:10) %>% kable()

doc_id positive negative neg_positive neg_negative https://gu.com/p/d6qhb 40 22 0 0 https://gu.com/p/d9e9j 27 15 0 0 https://gu.com/p/d6kzd 51 27 0 1 https://gu.com/p/d6bt2 37 7 0 0 https://gu.com/p/d9vjq 13 23 0 0 https://gu.com/p/d7n8b 57 34 1 0 https://gu.com/p/d79cn 56 48 3 1 https://gu.com/p/d6t3c 28 26 0 0 https://gu.com/p/d9xtf 33 13 1 0 https://gu.com/p/d696t 15 21 1 0

summary_df <- sent_df %>% group_by(date) %>% summarise(pct_pos = mean(pct_pos), n = n()) toc()
## 0.708 sec elapsed

Plotting the changing proportion of positive sentiment over time did surprise me a little. The outcome was more balanced than I expected which perhaps confirms the deeper impression left on me by negative articles.

The upper violin plot shows the average weight of the sentiment across multiple articles for each week. Individually the articles range from 20% to 80% positive, with discernible periods of relatively negative and relatively positive sentiment.

The lower plot shows the volume of articles. As we draw closer to the crunch-point the volume appears to be picking up.

p1 <- sent_df %>% ggplot(aes(date, pct_pos)) + geom_violin(aes(group = date), alpha = 0.5, fill = cols[1]) + geom_line(data = summary_df, aes(date, pct_pos), colour = cols[1], linetype = "dashed") + geom_hline(yintercept = 0.5, linetype = "dotted", colour = cols[4]) + scale_y_continuous(labels = percent_format(), limits = c(0.2, 0.8)) + labs(title = "Changing Sentiment Towards a UK-EU Trade Deal", subtitle = "Week-to-week Since the Withdrawal Agreement", x = NULL, y = "Positive Sentiment") p2 <- summary_df %>% ggplot(aes(date, n)) + geom_line(colour = cols[1]) + labs(x = "Weeks", y = "Article Count", caption = "Source: Guardian Newspaper") p1 / p2 + plot_layout(heights = c(2, 1))

Some writers exhibit more sentiment variation than others.

byline_df <- sent_df %>% mutate(byline = word(byline, 1, 2) %>% str_remove_all(PUNCT)) %>% group_by(byline, date) %>% summarise(pct_pos = mean(pct_pos), n = n()) top_3 <- byline_df %>% count(byline, sort = TRUE) %>% ungroup() %>% filter(!is.na(byline)) %>% slice(c(3, 2)) %>% pull(byline) byline_df %>% filter(byline %in% top_3) %>% ggplot(aes(date, pct_pos, colour = byline)) + geom_line() + geom_hline(yintercept = 0.5, linetype = "dotted", colour = cols[2]) + scale_y_continuous(labels = percent_format(), limits = c(0.2, 0.8)) + scale_colour_manual(values = cols[c(1, 4)]) + labs(title = "Changing Sentiment Towards a UK-EU Trade Deal", subtitle = "Week-to-week Since the Withdrawal Agreement", x = "Weeks", y = "Positive Sentiment", colour = "Byline", caption = "Source: Guardian Newspaper")

R Toolbox

Summarising below the packages and functions used in this post enables me to separately create a toolbox visualisation summarising the usage of packages and functions across all posts.

Package Function base library[12]; c[8]; function[2]; mean[2]; set.seed[2]; conflicts[1]; cumsum[1]; is.na[1]; months[1]; search[1]; seq[1]; sum[1]; Sys.sleep[1] dplyr filter[8]; mutate[8]; as_tibble[4]; group_by[3]; if_else[3]; n[3]; select[3]; slice[3]; summarise[3]; tibble[3]; arrange[2]; desc[2]; left_join[2]; starts_with[2]; count[1]; pull[1]; rename[1]; slice_sample[1]; ungroup[1] ggplot2 aes[5]; geom_line[3]; ggplot[3]; labs[3]; geom_hline[2]; scale_y_continuous[2]; geom_violin[1]; scale_colour_manual[1]; theme_bw[1]; theme_set[1] GuardianR get_guardian[1] kableExtra kable[5] lubridate date[3]; as_date[1]; ceiling_date[1]; today[1]; ymd[1] patchwork plot_layout[1] purrr map[1]; map2_dfr[1]; pmap_dfr[1]; possibly[1]; set_names[1] quanteda corpus[2]; data_dictionary_LSD2015[1]; dfm[1]; fcm[1]; kwic[1]; phrase[1]; t[1]; tokens[1] readr read_lines[1] rebus literal[4]; lookahead[3]; whole_word[2]; ALPHA[1]; lookbehind[1]; one_or_more[1]; or[1]; PUNCT[1] scales percent_format[2] stringr str_detect[5]; str_remove_all[5]; str_c[2]; str_remove[2]; str_count[1]; str_to_lower[1]; word[1] text2vec sim2[1] tibble enframe[1]; rownames_to_column[1] tictoc tic[2]; toc[2] tidyr unnest[1] wesanderson wes_palette[1] var vglnk = {key: '949efb41171ac6ec1bf7f206d57e90b8'}; (function(d, t) { var s = d.createElement(t); s.type = 'text/javascript'; s.async = true; // s.defer = true; // s.src = '//cdn.viglink.com/api/vglnk.js'; s.src = 'https://www.r-bloggers.com/wp-content/uploads/2020/08/vglnk.js'; var r = d.getElementsByTagName(t)[0]; r.parentNode.insertBefore(s, r); }(document, 'script'));

The post A Frosty Deal? first appeared on R-bloggers.

### R – Risk and Compliance Survey: we need your help!

Thu, 09/17/2020 - 11:32

[social4i size="small" align="align-left"] --> [This article was first published on RBlog – Mango Solutions, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

We would like to evaluate how business restrictions impact the commercial usage of R. To do this, we would love to hear your thoughts. We have created a short survey, which should take one minute of your time.

Complete the survey.

Thank you!

The post R – Risk and Compliance Survey: we need your help! appeared first on Mango Solutions.

var vglnk = {key: '949efb41171ac6ec1bf7f206d57e90b8'}; (function(d, t) { var s = d.createElement(t); s.type = 'text/javascript'; s.async = true; // s.defer = true; // s.src = '//cdn.viglink.com/api/vglnk.js'; s.src = 'https://www.r-bloggers.com/wp-content/uploads/2020/08/vglnk.js'; var r = d.getElementsByTagName(t)[0]; r.parentNode.insertBefore(s, r); }(document, 'script'));

The post R – Risk and Compliance Survey: we need your help! first appeared on R-bloggers.

### VirtuEARL: Speaker interview

Thu, 09/17/2020 - 11:07

[social4i size="small" align="align-left"] --> [This article was first published on RBlog – Mango Solutions, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

We caught up with one of our VirtuEARL speakers and EARL regulars Jeremy Horne on what we can expect from his VirtuEARL talk and how he thinks the R community has adapted since lockdown.

You’ve presented a few times now at EARL, what makes you come back each time?

​I’ve said it for several years now – EARL has become the “must go” event of the R calendar – not only have I presented a few times, but I’ve been fortunate enough to attend every EARL since it started in 2014 and on each occasion, I’ve taken something new away to try with my clients – and indeed, the quality of the talks and use cases grows better and better each year – so I’m looking forward to some more excellent talks this year too.

How has the R community adapted since lockdown?

​Like every community, it was tough in the first few months. Events got cancelled (including my very own BrightonR user group ) and the interaction with other R users was limited to channels like slack – but with the realisation that we’re a long way off face-to-face meetups again, virtual user groups have been set-up and done quite well – there were some fantastic presentations at LondonR last month and we’ve finally got BrightonR off the ground again too!

Why are events like EARL and R meet ups important do you think?

​The networking – being able to talk to other like-minded R users and staying on the pulse of what they’re up to – this is the bit I miss terribly at the moment as it always gets me thinking about new things that I can do within R.

​I’m hoping you’ll take a few tips away on how you can improve your email marketing using R and specifically text analytics – and equally, would love to hear how you get on afterwards if you implement any of the techniques.

If you’d like to hear Jeremy’s talk and some other inspirational R based talks, then join us at VirtuEARL. The online Enterprise Applications of the R Language Conference taking place this October.

While you’re here – Please complete this short poll to help us evaluate how business restrictions impact the commercial usage of R.

The post VirtuEARL: Speaker interview appeared first on Mango Solutions.

var vglnk = {key: '949efb41171ac6ec1bf7f206d57e90b8'}; (function(d, t) { var s = d.createElement(t); s.type = 'text/javascript'; s.async = true; // s.defer = true; // s.src = '//cdn.viglink.com/api/vglnk.js'; s.src = 'https://www.r-bloggers.com/wp-content/uploads/2020/08/vglnk.js'; var r = d.getElementsByTagName(t)[0]; r.parentNode.insertBefore(s, r); }(document, 'script'));

The post VirtuEARL: Speaker interview first appeared on R-bloggers.

### D&D’s Data Science Platform (DSP) – making healthcare analytics easier

Thu, 09/17/2020 - 06:44

[social4i size="small" align="align-left"] --> [This article was first published on R Blogs – Hutsons-hacks, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

My current company Draper and Dash have been working on the Healthcare DSP for a number of months and we are now ready to launch. Find out what the HTN had to say: https://www.thehtn.co.uk/2020/09/15/new-health-data-science-platform-launches/

A word from D&D’s Chief Technology Officer

Data Science Resilience & Collaboration in times of unprecedented disruption https://t.co/pPYTw55w4g – all powered by R and Python under the hood. It has been great to work on this project.

— Gary Hutson (@StatsGary) September 17, 2020

What it does?

The DSP uses web technologies alongside Healthcare models built in R and Python to combine analytics with predictive machine learning, forecasting, Natural Language Processing, among other techniques:

Find out more