Under the hood - Software configuration of datapleth.io

By Datapleth.io | October 8, 2015

This is a short article to briefly describe the set of tools, software which are used by dataapleth.io for data processing, statistical computing and publishing on this blog.

In a nutshell :

  • linux as an operating system
  • R for statistical computing, data processing and visualization
  • Rstudio as integrated development environment
  • Git as version control
  • Github for sharing the code
  • Tracis as CI/CD (test and deploy)
  • Rmarkdown for writing articles (weaving analysis text, code and output)
  • Hugo as static website enging
  • blogdown package to intrate with R

Operating System

dataleth.io use gnu/linux, for several reason., but the main one is the ease of use and integration of UTF-8 characters such as chinese characters. There are a lot of issues in Microsoft environment, thus even in Windows operating system, it is better to have a virtual machine running a gnu/linux machine.

Ubuntu is our choice, but there are a lot of other decent alternatives. Bellow are information about kernel version and distribution version.

system("uname -r", intern = TRUE)
## [1] "4.15.0-1028-gcp"
system("cat /etc/lsb-release", intern = TRUE)
## [1] "DISTRIB_ID=Ubuntu"                         
## [2] "DISTRIB_RELEASE=16.04"                     
## [3] "DISTRIB_CODENAME=xenial"                   
## [4] "DISTRIB_DESCRIPTION=\"Ubuntu 16.04.6 LTS\""

R - Cran

Chinapleth uses R for statistical computing and visualizations. The standard package provided with Ubuntu is fine as well as all r-cran* packages.

From https://cran.r-project.org/

R is ‘GNU S’, a freely available language and environment for statistical computing and graphics which provides a wide variety of statistical and graphical techniques: linear and nonlinear modelling, statistical tests, time series analysis, classification, clustering, etc.

version
##                _                           
## platform       x86_64-pc-linux-gnu         
## arch           x86_64                      
## os             linux-gnu                   
## system         x86_64, linux-gnu           
## status                                     
## major          3                           
## minor          6.1                         
## year           2017                        
## month          01                          
## day            27                          
## svn rev        76783                       
## language       R                           
## version.string R version 3.6.1 (2017-01-27)
## nickname       Action of the Toes

Rstudio

For edition, publishing and many other actions, Chinapleth is using Rstudio.

From https://www.rstudio.com/ :

RStudio is an integrated development environment (IDE) for R. It includes a console, syntax-highlighting editor that supports direct code execution, as well as tools for plotting, history, debugging and workspace management.

Version control : Git & Github

Git

For version control of scripts and files, Chinapleth uses Git. This amazingly powerful even as a single user or with a small team.

From https://git-scm.com/ :

Git is a free and open source distributed version control system designed to handle everything from small to very large projects with speed and efficiency.

system("apt list --installed | grep git/", intern = TRUE)
## [1] "git/now 1:2.21.0-0ppa1~ubuntu16.04.1 amd64 [installed,local]"

Github

From https://en.wikipedia.org/wiki/GitHub

GitHub is a Web-based Git repository hosting service. It offers all of the distributed revision control and source code management (SCM) functionality of Git as well as adding its own features.

All the script distributed by Chinapleth are open source (see Licence details) unless specified otherwise Feel free to clone and use chinaPleth code : https://github.com/longwei66/chinaPleth

There are some small issues to push smoothly code to github with latest versions of Rstudio, the best is to follow this guide to switch to ssh authentication. http://www.r-bloggers.com/rstudio-pushing-to-github-with-ssh-authentication/

Publishing : blogdown & hugo

This blog is powered by blogdown R package which is using hugo as a static site generator.

More information in blogdown documentation

Rmarkdown & reproducible research

This is where interesting things start. R, key packages like sweave or Knit are perfect for reproducible research process. This article, as all chinaPleth posts is in fact an Rmd document (R Markdown) processed by Knit package as an html document.

From https://cran.r-project.org/web/views/ReproducibleResearch.html

The goal of reproducible research is to tie specific instructions to data analysis and experimental data so that scholarship can be recreated, better understood and verified.

R largely facilitates reproducible research using literate programming; a document that is a combination of content and data analysis code. The Sweave function (in the base R utils package) and the knitr package can be used to blend the subject matter and R code so that a single document defines the content and the algorithms.

With such approach it is very easy to write reports, presentations which contains both text, code and result of this code once processed in one document. The advantages are :

  1. provide to the reader all the element to reproduce the output
  2. quickly update reports when data source are evolving, easy to build templates, etc…

Continuous intregration

We use travis for continuous intregration as described here

Code information

Source code

The source code of this post is available on github

Session information

sessionInfo()
## R version 3.6.1 (2017-01-27)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 16.04.6 LTS
## 
## Matrix products: default
## BLAS:   /home/travis/R-bin/lib/R/lib/libRblas.so
## LAPACK: /home/travis/R-bin/lib/R/lib/libRlapack.so
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## loaded via a namespace (and not attached):
##  [1] compiler_3.6.1  magrittr_1.5    bookdown_0.16   tools_3.6.1    
##  [5] htmltools_0.4.0 yaml_2.2.0      Rcpp_1.0.3      stringi_1.4.3  
##  [9] rmarkdown_1.18  blogdown_0.17.1 knitr_1.26      stringr_1.4.0  
## [13] digest_0.6.23   xfun_0.11       rlang_0.4.2     evaluate_0.14