By Datapleth.io | October 8, 2015
This is a short article to briefly describe the set of tools, software which are used by dataapleth.io for data processing, statistical computing and publishing on this blog.
In a nutshell :
- linux as an operating system
- R for statistical computing, data processing and visualization
- Rstudio as integrated development environment
- Git as version control
- Github for sharing the code
- Tracis as CI/CD (test and deploy)
- Rmarkdown for writing articles (weaving analysis text, code and output)
- Hugo as static website enging
- blogdown package to intrate with R
Operating System
dataleth.io use gnu/linux, for several reason., but the main one is the ease of use and integration of UTF-8 characters such as chinese characters. There are a lot of issues in Microsoft environment, thus even in Windows operating system, it is better to have a virtual machine running a gnu/linux machine.
Ubuntu is our choice, but there are a lot of other decent alternatives. Bellow are information about kernel version and distribution version.
system("uname -r", intern = TRUE)
## [1] "4.15.0-1028-gcp"
system("cat /etc/lsb-release", intern = TRUE)
## [1] "DISTRIB_ID=Ubuntu"
## [2] "DISTRIB_RELEASE=16.04"
## [3] "DISTRIB_CODENAME=xenial"
## [4] "DISTRIB_DESCRIPTION=\"Ubuntu 16.04.6 LTS\""
R - Cran
Chinapleth uses R for statistical computing and visualizations. The standard package provided with Ubuntu is fine as well as all r-cran*
packages.
From https://cran.r-project.org/
R is ‘GNU S’, a freely available language and environment for statistical computing and graphics which provides a wide variety of statistical and graphical techniques: linear and nonlinear modelling, statistical tests, time series analysis, classification, clustering, etc.
version
## _
## platform x86_64-pc-linux-gnu
## arch x86_64
## os linux-gnu
## system x86_64, linux-gnu
## status
## major 3
## minor 6.1
## year 2017
## month 01
## day 27
## svn rev 76783
## language R
## version.string R version 3.6.1 (2017-01-27)
## nickname Action of the Toes
Rstudio
For edition, publishing and many other actions, Chinapleth is using Rstudio.
From https://www.rstudio.com/ :
RStudio is an integrated development environment (IDE) for R. It includes a console, syntax-highlighting editor that supports direct code execution, as well as tools for plotting, history, debugging and workspace management.
Version control : Git & Github
Git
For version control of scripts and files, Chinapleth uses Git. This amazingly powerful even as a single user or with a small team.
From https://git-scm.com/ :
Git is a free and open source distributed version control system designed to handle everything from small to very large projects with speed and efficiency.
system("apt list --installed | grep git/", intern = TRUE)
## [1] "git/now 1:2.21.0-0ppa1~ubuntu16.04.1 amd64 [installed,local]"
Github
From https://en.wikipedia.org/wiki/GitHub
GitHub is a Web-based Git repository hosting service. It offers all of the distributed revision control and source code management (SCM) functionality of Git as well as adding its own features.
All the script distributed by Chinapleth are open source (see Licence details) unless specified otherwise Feel free to clone and use chinaPleth code : https://github.com/longwei66/chinaPleth
There are some small issues to push smoothly code to github with latest versions of Rstudio, the best is to follow this guide to switch to ssh authentication. http://www.r-bloggers.com/rstudio-pushing-to-github-with-ssh-authentication/
Publishing : blogdown & hugo
This blog is powered by blogdown R package which is using hugo as a static site generator.
More information in blogdown documentation
Rmarkdown & reproducible research
This is where interesting things start. R, key packages like sweave or Knit are perfect for reproducible research process. This article, as all chinaPleth posts is in fact an Rmd document (R Markdown) processed by Knit package as an html document.
From https://cran.r-project.org/web/views/ReproducibleResearch.html
The goal of reproducible research is to tie specific instructions to data analysis and experimental data so that scholarship can be recreated, better understood and verified.
R largely facilitates reproducible research using literate programming; a document that is a combination of content and data analysis code. The Sweave function (in the base R utils package) and the knitr package can be used to blend the subject matter and R code so that a single document defines the content and the algorithms.
With such approach it is very easy to write reports, presentations which contains both text, code and result of this code once processed in one document. The advantages are :
- provide to the reader all the element to reproduce the output
- quickly update reports when data source are evolving, easy to build templates, etc…
Code information
Source code
The source code of this post is available on github
Session information
sessionInfo()
## R version 3.6.1 (2017-01-27)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 16.04.6 LTS
##
## Matrix products: default
## BLAS: /home/travis/R-bin/lib/R/lib/libRblas.so
## LAPACK: /home/travis/R-bin/lib/R/lib/libRlapack.so
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## loaded via a namespace (and not attached):
## [1] compiler_3.6.1 magrittr_1.5 bookdown_0.16 tools_3.6.1
## [5] htmltools_0.4.0 yaml_2.2.0 Rcpp_1.0.3 stringi_1.4.3
## [9] rmarkdown_1.18 blogdown_0.17.1 knitr_1.26 stringr_1.4.0
## [13] digest_0.6.23 xfun_0.11 rlang_0.4.2 evaluate_0.14