GDP Growth and Chinese new year fireworks

By Datapleth.io | March 9, 2016

Winter 2016 - For the first time ever, the city of Shanghai (and probably of Nanjing) entered the year of the monkey with a very quiet night. For safety and environmental reasons, the usual fireworks and firecrackers hysteria was forbidden downtown by the local government. (read the full story

This is a very big cultural change which was implemented very strictly. Fireworks and other noisy traditional firecrackers are used for important events in China, such as weddings, opening of shops or companies and lunar new year celebrations. They are used to chase away evils and bring good luck and prosperity for the future.

Our objective is to find out if GPD growth is improved by the amount of fireworks used during Chinese new year or if at the contrary, less improvement in economy lead to less frantic nights and less fireworks.

Introduction

In this post we will try with a bit of data hack and visualization :

  • to confirm and illustrate the impact of fireworks on air quality during the new year’s eve
  • to quantify the amount of fireworks based on air quality change during that night
  • to confirm links between GDP growth (the main economical indicator of China economy) of the country and the different cities with amount of fireworks used during new year’s eve

Overall Process:

  1. Get and clean PM2.5 from main cities (see previous post here)
  2. Get and clean dates of past Chinese new year eve
  3. Get and clean GDP growth data for China and its main cities
  4. Prepare a clean data set with days before and after new year
  5. Exploratory analysis
  6. Evaluate impact of fireworks on PM2.5
  7. Estimate the amount of fireworks used in each cities
  8. build a model PM2.5 > amount of fireworks > GDP growth
  9. analysis and conclusions

In this first part we will cover steps 1 to 4.

Required libraries

We need several R packages for this study.

library(lubridate)
library(dplyr)
library(ggplot2)
library(ggthemes)
library(reshape2)
library(data.table)
library(rvest) # for webscrapping
library(plotly)
library(RColorBrewer)
# Set-up locale for date format processing
Sys.setlocale("LC_ALL","C")
## [1] "LC_CTYPE=C;LC_NUMERIC=C;LC_TIME=C;LC_COLLATE=C;LC_MONETARY=C;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8;LC_IDENTIFICATION=C"

Getting and cleaning data :

We need PM2.5 readings, Chinese new year dates & GPD growth of main cities

Chinese new year dates

We will scrape wikipedia to extract and clean dates of Chinese New Year. The great rvest package will be our friends for that see example on r-blogger.

url <- "https://en.wikipedia.org/wiki/Chinese_New_Year"
table_xpath <- '//*[@id="mw-content-text"]/div/table[4]'
cny <- url %>%
  read_html() %>%
  html_nodes(xpath= table_xpath) %>%
  html_table(fill = TRUE)
## keep only what we want
cny <- cny[[1]]
cny <- cny[-1,1:4]
names(cny) <- c("year", "date","animal","weekday" )
cny$date <- paste(cny$date, cny$year)
cny$date <- as.character(strptime(cny$date, "%d %b %Y"))
head(cny)
##   year       date  animal   weekday
## 2 2001 2001-01-24   Snake Wednesday
## 3 2002 2002-02-12   Horse   Tuesday
## 4 2003 2003-02-01    Goat  Saturday
## 5 2004 2004-01-22  Monkey  Thursday
## 6 2005 2005-02-09 Rooster Wednesday
## 7 2006 2006-01-29     Dog    Sunday
## extract the New Year eve and night
cny$eve <- as.character(as.Date(cny$date)-1)

PM2.5

We will simply reuse directly the dataset generated in(our previous post)

load(file = "./data/aqi-1.Rda")
names(aqi)
##  [1] "country"               "city"                 
##  [3] "date.time"             "date"                 
##  [5] "year"                  "month"                
##  [7] "day"                   "hour"                 
##  [9] "measurement.available" "pm2.5"                
## [11] "aqi"
# Remove Paris
aqi <- aqi[city != "Paris"]

The codebook for this dataset will be soon available. We only keep here data for chinese new year date and eve.

aqicny <- aqi[date %in% cny$date | date %in% cny$eve]
aqicny[ , time := hour ] # keep only hours
aqicny[ date %in% cny$date, time := time + 24 ] # add 24 for NY day
aqicny[ , time := time - 24] # re-center on mid-night
aqicny[ , year := as.factor(year) ]

Let’s visualise the evolution of particles during this period. We clearly notice a recurring pattern with peak of pollution around midnight !

g <- ggplot(data = aqicny) +
  theme_tufte() + theme(legend.position = "none") +
  geom_point(aes( x = time,y = pm2.5, col = city), size = 0.5) +
  geom_vline(xintercept = 0, color = "grey") +
  facet_grid(facets = city ~ year) +
  ggtitle("Peak of pm2.5 during chinese new year eve (fireworks)") +
  xlab("Hours from new year's midnight") + 
  theme(axis.text.x = element_text(angle = 90)) +
  theme(axis.title.x = element_text(margin = margin(t = 20, r = 0, b = 0, l = 0))) +
  ylab("pm2.5 in micro g/m3") +
  scale_color_brewer(palette="Dark2") #Dark2 as alternative
p <- plotly::ggplotly(g)

## Fix plotly label spacing
## thanks : 
## https://stackoverflow.com/questions/42763280/r-ggplot-and-plotly-axis-margin-wont-change
p[['x']][['layout']][['annotations']][[1]][['y']] <- -0.06
p[['x']][['layout']][['annotations']][[2]][['x']] <- -0.06
p %>% config(displayModeBar = F) 
#%>% layout(margin = list(l = 75))

GDP Growth

The best source we found is the official China statistic bureau. You can follow the following this link to download the data (Choose National Account / Gross Regional Product Index) - you will need to register to download the file.

## data.stats.gov.cn files tend to be formatted with similar patterns
dataFile <- "https://data.datapleth.io/ext/GDP/GDP_Growth_AnnualbyProvince.csv"
dataFile <- "https://datapleth.sfo2.digitaloceanspaces.com/ext/GDP/GDP_Growth_AnnualbyProvince.csv"

## Not very elegant here but fread returns strange characters which breaks Knitr
gdpGrowth <- as.data.table(
  read.csv(
    dataFile, header = TRUE, stringsAsFactors = FALSE,skip = 3
  )
)
## Remove NA values
gdpGrowth <- gdpGrowth[complete.cases(gdpGrowth)]
## Fix dF names
setnames(gdpGrowth, c("Region", gsub("^X", "", names(gdpGrowth)[-1])))
## get the proper province names
gdpGrowth[ , Region := gsub("^Inner Mongolia$", "Nei Menggu", Region) ]
gdpGrowth[ , Region := gsub("^Tibet$", "Xizang", Region) ]

## Melt as long format
gdpGrowth <- melt(gdpGrowth, id.vars = "Region")
## Get the GDP Growth in %
gdpGrowth[ , value := value - 100 ]
setnames(gdpGrowth, c("Region", "Year", "provincial.gdp.Growth"))
## Get the year in numeric instead of factor
gdpGrowth[ , Year := as.numeric(as.character(Year)) ]


## Get the GDP in percentage of growth
#gdpGrowth[ , provincial.gdp.Growth := (provincial.gdp.Growth - 100)]

## Let's keep only the province for which we have PM2.5 data
listOfProvince <- c("Shanghai", "Beijing", "Sichuan", "Guangdong", "Liaoning")
gdpGrowth <- gdpGrowth[ Region %in% listOfProvince ]
## add the province capital
gdpGrowth[ , city := Region ]
test <- gdpGrowth$Region == "Liaoning"
gdpGrowth[test, city := "Shenyang"]
test <- gdpGrowth$Region == "Guangdong"
gdpGrowth[test, city := "Guangzhou" ]
test <- gdpGrowth$Region == "Sichuan"
gdpGrowth[test, city := "Chengdu"]
g <- ggplot(data = gdpGrowth, aes( x = Year,y = provincial.gdp.Growth), group = city) +
  theme_tufte() + theme(legend.position = "none") +
  geom_segment( aes(x=Year, xend=Year, y=0, yend=provincial.gdp.Growth, group = city), color = "grey") +
  geom_point(aes(col = city)) +
  scale_color_brewer(palette="Dark2") +
  geom_hline(yintercept = 0, color = "grey") +
  facet_wrap(facets = city ~ ., nrow = 1) +
  ggtitle("GDP Growth of associated province") +
  xlab("Year") + 
  theme(axis.text.x = element_text(angle = 90)) +
  ylab("Provincial GDP growth in %")
p <- plotly::ggplotly(g)
## plotly label spacing
p[['x']][['layout']][['annotations']][[1]][['y']] <- -0.15
p[['x']][['layout']][['annotations']][[2]][['x']] <- -0.04
p %>% config(displayModeBar = F)

Conclusions

We have now two clean datasets with PM2.5 readings for new year’s eve and new year’s day as well as GDP growth for the provinces concerned.

Let’s save the data for next post

save(gdpGrowth, aqicny, file = "./data/gqp-aqi-part1.Rda")