Exploration Data Analysis of factors affecting Greenhouse Gas Emission levels

Individual assignment for the visual analytics project.

Author

Affiliation

Wong Wei Sheng Dylan

 

Published

March 16, 2021

DOI

1.0 Purpose of Exploratory Data Analysis

Exploratory Data Analysis (EDA) is an approach that uses data visualisation to analyse data sets and to summarise their characteristics. It enables users to draw useful information and insights of the data sets with the use of graphical and non-graphical methods to reveal the distribution and statistics of selected variables. Many Data Science experts and literature state that EDA is a critical first step in data analysis as it helps to detect mistakes, check assumptions, determine relationships among the independent variables, and assess the direction and magnitude of relationships between the dependent and independent variablea (Seltman, 2018).

This assignment explores the appropriate tidyverse methods, as well as the ExPanDaR package for EDA.

2.0 Preparation of Data Visualisation

2.1 Installing and launching R Packages

Instead of the conventional way of installing the required packages right away, the code chunk below checks whether the required packages have already been installed and subsequently install those that are not before loading the packages.

packages = c('tidyverse', 'ExPanDaR', 'kableExtra', 'ggstatsplot', 'corrplot', 'DT')

for(p in packages){ 'exclude FOR loop for Shiny>' 
  if(!require(p, character.only = T)){
    install.packages(p)
  }
  library(p, character.only = T)
}

2.2 Data Source

The data sets for this Visual Analytics Project were sourced from Eurostat Climate Change Database. Through the preliminary literature review conducted, the data sets used were narrowed down, wrangled, and subsequently merged into a single panel or cross-sectional time-series data where behavior of entities, in this case countries, are observed and recorded across time.

The list of literature reviewed and details of the data preparation could be found here.

2.3 Loading the Dataset into R

Reading the csv file as GHG

GHG <- read_csv("data/GHGproj_data.csv")

-- Column specification ----------------------------------------------
cols(
  Country = col_character(),
  Year = col_double(),
  GHG_emissions = col_double(),
  CO2_emissions = col_double(),
  CH4_emissions = col_double(),
  NO2_emissions = col_double(),
  GDP = col_double(),
  Final_EI = col_double(),
  Fuel_mix = col_double(),
  Carbon_intensity = col_double(),
  Renewables_share = col_double(),
  Envt_taxes = col_double(),
  Liquid_biofuels = col_double(),
  Solar_thermal = col_double(),
  Heat_pumps = col_double()
)

Inspecting the structure of the dataset

str(GHG)
spec_tbl_df[,15] [324 x 15] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
 $ Country         : chr [1:324] "European Union - 27 countries (from 2020)" "European Union - 27 countries (from 2020)" "European Union - 27 countries (from 2020)" "European Union - 27 countries (from 2020)" ...
 $ Year            : num [1:324] 2010 2011 2012 2013 2014 ...
 $ GHG_emissions   : num [1:324] 4188244 4075283 3996645 3912623 3776464 ...
 $ CO2_emissions   : num [1:324] 3443914 3338307 3262594 3181485 3046141 ...
 $ CH4_emissions   : num [1:324] 427118 419634 416827 411161 406015 ...
 $ NO2_emissions   : num [1:324] 219323 216187 213789 214272 216858 ...
 $ GDP             : num [1:324] 10977298 11321811 11388518 11517119 11781640 ...
 $ Final_EI        : num [1:324] 92.2 85.9 84.3 83.6 77.4 ...
 $ Fuel_mix        : num [1:324] 2.8 2.88 2.78 2.67 2.62 2.57 2.54 2.53 2.4 NA ...
 $ Carbon_intensity: num [1:324] 91.9 91.5 90.9 89.5 88.3 88.7 88 86.7 85.2 NA ...
 $ Renewables_share: num [1:324] 14.4 14.6 16 16.7 17.5 ...
 $ Envt_taxes      : num [1:324] 259023 271901 278204 284121 291030 ...
 $ Liquid_biofuels : num [1:324] 29689 28711 28422 29751 29152 ...
 $ Solar_thermal   : num [1:324] 34488 37949 40587 43246 45425 ...
 $ Heat_pumps      : num [1:324] 31200 43941 48603 53298 79990 ...
 - attr(*, "spec")=
  .. cols(
  ..   Country = col_character(),
  ..   Year = col_double(),
  ..   GHG_emissions = col_double(),
  ..   CO2_emissions = col_double(),
  ..   CH4_emissions = col_double(),
  ..   NO2_emissions = col_double(),
  ..   GDP = col_double(),
  ..   Final_EI = col_double(),
  ..   Fuel_mix = col_double(),
  ..   Carbon_intensity = col_double(),
  ..   Renewables_share = col_double(),
  ..   Envt_taxes = col_double(),
  ..   Liquid_biofuels = col_double(),
  ..   Solar_thermal = col_double(),
  ..   Heat_pumps = col_double()
  .. )

Removing and Recoding variable

From the inspection, it could be observed that there are several countries that have very long name. To find out the unique values in Country:

unique(GHG$Country)
 [1] "European Union - 27 countries (from 2020)"                                     
 [2] "European Union - 28 countries (2013-2020) and Iceland under the Kyoto Protocol"
 [3] "European Union - 28 countries (2013-2020)"                                     
 [4] "Belgium"                                                                       
 [5] "Bulgaria"                                                                      
 [6] "Czechia"                                                                       
 [7] "Denmark"                                                                       
 [8] "Germany (until 1990 former territory of the FRG)"                              
 [9] "Estonia"                                                                       
[10] "Ireland"                                                                       
[11] "Greece"                                                                        
[12] "Spain"                                                                         
[13] "France"                                                                        
[14] "Croatia"                                                                       
[15] "Italy"                                                                         
[16] "Cyprus"                                                                        
[17] "Latvia"                                                                        
[18] "Lithuania"                                                                     
[19] "Luxembourg"                                                                    
[20] "Hungary"                                                                       
[21] "Malta"                                                                         
[22] "Netherlands"                                                                   
[23] "Austria"                                                                       
[24] "Poland"                                                                        
[25] "Portugal"                                                                      
[26] "Romania"                                                                       
[27] "Slovenia"                                                                      
[28] "Slovakia"                                                                      
[29] "Finland"                                                                       
[30] "Sweden"                                                                        
[31] "Iceland"                                                                       
[32] "Liechtenstein"                                                                 
[33] "Norway"                                                                        
[34] "Switzerland"                                                                   
[35] "United Kingdom"                                                                
[36] "Turkey"                                                                        

From the above, it is noticed that the variable Country consist of groups of countries that are not needed in the analysis. Also, Germany has a rather long name.

G <- GHG %>%
  filter(!Country %in% c("European Union - 27 countries (from 2020)",
                         "European Union - 28 countries (2013-2020)", 
                         "European Union - 28 countries (2013-2020) and Iceland under the Kyoto Protocol")) %>%
  mutate(Country=recode(Country, "Germany (until 1990 former territory of the FRG)" = "Germany"))

2.4 Overview of Dataset

Datatable

In the propose Shiny application, the data table will be made available to users who not only want to have any overview of the data structure, but also the actual data set.

datatable(
  data = G, 
  extensions = 'FixedHeader',
  options =
    list(
      scrollX = TRUE,
      scrollY = 500,
      scrollCollapse = TRUE,
      paging = FALSE
    )
)

Descriptive Statistics

t <- prepare_descriptive_table(G)
t$kable_ret  %>%
  kable_styling("condensed", full_width = F, position = "center")
Table 1: Descriptive Statistics
N Mean Std. dev. Min. 25 % Median 75 % Max.
Year 297 2,014.000 2.586 2,010.000 2,012.000 2,014.000 2,016.000 2,018.00
GHG_emissions 297 152,056.937 207,515.414 181.080 20,923.490 59,569.730 135,117.650 942,338.00
CO2_emissions 297 123,775.451 175,144.754 143.750 17,684.980 46,231.650 114,561.030 832,669.74
CH4_emissions 297 16,075.657 19,472.298 17.990 3,332.560 7,238.570 18,438.370 63,767.39
NO2_emissions 297 8,369.881 10,847.787 8.970 1,877.410 4,463.950 7,852.580 41,871.46
GDP 294 487,470.776 725,371.404 4,812.400 43,223.100 188,265.850 469,861.475 3,356,410.00
Final_EI 279 91.874 36.440 42.970 73.575 85.580 96.825 274.92
Fuel_mix 279 2.757 3.966 0.000 0.825 1.400 3.095 20.17
Carbon_intensity 279 88.397 11.960 44.400 83.600 88.500 94.850 124.90
Renewables_share 270 22.261 17.012 0.979 10.600 16.558 28.668 76.69
Envt_taxes 297 11,718.864 16,783.704 33.860 1,125.280 4,977.240 10,712.560 63,868.08
Liquid_biofuels 278 996.403 1,618.882 0.000 25.000 295.500 1,144.306 8,907.00
Solar_thermal 279 2,076.456 4,416.113 0.000 71.924 401.498 1,458.500 20,200.00
Heat_pumps 279 4,273.957 13,332.143 0.000 0.000 0.000 1,371.537 126,842.03

Missing Values

To find out the missing values in the data:

missing.values <- G %>%
    gather(key = "key", value = "val") %>%
    mutate(is.missing = is.na(val)) %>%
    group_by(key, is.missing) %>%
    summarise(num.missing = n()) %>%
    filter(is.missing==T) %>%
    select(-is.missing) %>%
    arrange(desc(num.missing)) 
missing.values %>% kable()
key num.missing
Renewables_share 27
Liquid_biofuels 19
Carbon_intensity 18
Final_EI 18
Fuel_mix 18
Heat_pumps 18
Solar_thermal 18
GDP 3

Visualising the Missing Values:

prepare_missing_values_graph(G, ts_id = "Country")

3.0 Exploratory Data Analysis

This section explore different visualisation of EDA and application of interactivity.

3.1 Distribution of Variables

Displaying histogram of multiple variables

E <- select(G, -c(Country,Year))
F <- E %>%
  gather() %>%
  ggplot(aes(value)) +
  facet_wrap( ~key, ncol=4, scales="free") +
  geom_histogram()
F

Displaying histogram of individual variables

H <- select(G, GHG_emissions)
ggplot(H, aes(GHG_emissions))+
  geom_histogram()

3.2 Visualising Time Trend

Since the independent variables are of continuous data type, to review the trend overtime, the ggplot function of ggplot2 package is used along with geom_line to plot the line graph.

t <- ggplot(G, aes(x=Year, y=GHG_emissions)) +
  geom_line()+
  facet_wrap(~ Country)
t

Having all countries included in a single plot.

t <- ggplot(G, aes(x=Year, y=GHG_emissions, color=Country)) +
  geom_line()
t

From the graph above, even with colours assigned to each country, it is hard for users to differentiate the countries. Also, at value below 250000, there are many countries overlapping. The way to improve this visualisation is to transform the y-axis to log10 scale.

u <- t +
  scale_y_log10()
u

While the overlapping of lines has slightly improved with the log10 scale, users will still have difficulty matching the color to legend. Since the legend is not useful, it could be removed using the theme() function of ggplot2, which enables the customisation of non-data components of plots.

v <- u +
  geom_text(data=subset(G, Year==2018), aes(label=Country, color=Country)) +
  theme(legend.position = "none")
v

In the plot above, there are overlapping between the countries texts, and also the line. To fix this, the hjust parameter of the aesthetics setting of geom_text() could be adjusted. check_overlap could also be turned on. The coord_cartesian() allows clipping to be turned off so that text could extend beyond the plot and not be truncated.

v <- u +
  geom_text(data=subset(G, Year==2018), aes(label=Country, color=Country, hjust = -0.1), check_overlap=TRUE) +
  coord_cartesian(clip = 'off') +
  theme(legend.position = "none")
v

The plot above though better, if far from perfect. Due to the turning off of text over-lapping, many of the labels have been removed. To improve on the aesthetics and users’ experience:

Finally, a grouped time trend graph will be shown to the users.

3.3 Visualising Correlation

In this session, the ggcorrmat() function of the ggstatsplot package, multiple functions of corrplot package, and the prepare_correlation_graph() function of ExPanDaR package are explored to find the most appropriate visualisation.

From the visualisation of the missing values, the situation of the missing values in the dataset is not too bad, and it is very much limited to only Switzerland and Liechtenstein. However, coorplote supports only data without missing values. Hence, to create a data set to be used for the corrplot correlation plots, the missing values are removed.

M <- select(G, -c(Country, Year))
N <- M[complete.cases(M), ]

Ellipse corrplot() of corrplot package

O <- cor(N)
corrplot(O, method = "ellipse")

Number corrplot() of corrplot package

O <- cor(N)
corrplot(O, method = "number")

corrplot.mixed() of corrplot package

O <- cor(N)
corrplot.mixed(O)

ggcormat() of ggstateplot package

ggcorrmat(data = M)

prepare_correlation_graph() function of ExPanDaR package

$df_corr
                 GHG_emissions CO2_emissions CH4_emissions
GHG_emissions       1.00000000    0.99784541   0.930140816
CO2_emissions       0.99413874    1.00000000   0.905829005
CH4_emissions       0.97144524    0.95078334   1.000000000
NO2_emissions       0.94710402    0.92372630   0.959508233
GDP                 0.86271907    0.87141830   0.832277410
Final_EI           -0.37694381   -0.34604142  -0.467943876
Fuel_mix            0.18760737    0.17048453   0.228648231
Carbon_intensity    0.02357443    0.02642678   0.001257642
Renewables_share   -0.25004005   -0.23094422  -0.261702456
Envt_taxes          0.87809209    0.88611949   0.852467040
Liquid_biofuels     0.77081143    0.76697476   0.706991817
Solar_thermal       0.74724643    0.74321289   0.727001177
Heat_pumps          0.42831149    0.44521129   0.380857932
                 NO2_emissions         GDP   Final_EI     Fuel_mix
GHG_emissions       0.91822300  0.90916053 -0.2662011  0.273841014
CO2_emissions       0.89605922  0.90638984 -0.2541530  0.258173601
CH4_emissions       0.93835851  0.82096507 -0.3192199  0.394317966
NO2_emissions       1.00000000  0.82762517 -0.2778211  0.351788621
GDP                 0.84440412  1.00000000 -0.2653626 -0.044843685
Final_EI           -0.41022136 -0.43546422  1.0000000 -0.017495209
Fuel_mix            0.23057069  0.03065245  0.1367399  1.000000000
Carbon_intensity   -0.02506221 -0.10495727  0.0654970  0.219588269
Renewables_share   -0.22918867 -0.13486515  0.2779467 -0.019695971
Envt_taxes          0.86451982  0.98096907 -0.4442150  0.002396859
Liquid_biofuels     0.69792002  0.72783664 -0.2058724 -0.089409346
Solar_thermal       0.67147366  0.71421697 -0.4921314 -0.065272694
Heat_pumps          0.34748022  0.54374122 -0.2023139  0.048261441
                 Carbon_intensity Renewables_share  Envt_taxes
GHG_emissions          0.10939055      -0.29550085  0.89278054
CO2_emissions          0.11426435      -0.28842393  0.88690010
CH4_emissions          0.09580844      -0.32566188  0.83248701
NO2_emissions          0.07032943      -0.27349823  0.80918834
GDP                    0.02835471      -0.21459569  0.96482276
Final_EI              -0.42439312       0.57321622 -0.28184944
Fuel_mix               0.13743471      -0.11295737 -0.01662560
Carbon_intensity       1.00000000      -0.46972038  0.03546597
Renewables_share      -0.30397325       1.00000000 -0.22539529
Envt_taxes            -0.10770795      -0.13058739  1.00000000
Liquid_biofuels       -0.13226131      -0.19134578  0.71586859
Solar_thermal         -0.04256105      -0.28566499  0.74522454
Heat_pumps            -0.18348267       0.02471386  0.48184550
                 Liquid_biofuels Solar_thermal  Heat_pumps
GHG_emissions         0.80888395    0.71707654  0.37944853
CO2_emissions         0.81528203    0.72483913  0.35691727
CH4_emissions         0.67251692    0.60414885  0.43118429
NO2_emissions         0.71922124    0.71680647  0.42243572
GDP                   0.81285907    0.54850567  0.53355859
Final_EI             -0.19482993   -0.21674724 -0.16845508
Fuel_mix             -0.04718531    0.33365143 -0.12384495
Carbon_intensity      0.02811054    0.09999862 -0.08765757
Renewables_share     -0.23486629   -0.13953268 -0.04609222
Envt_taxes            0.72717110    0.52967393  0.55750720
Liquid_biofuels       1.00000000    0.51491939  0.34638084
Solar_thermal         0.65844575    1.00000000  0.09847440
Heat_pumps            0.44471160    0.36127767  1.00000000

$df_prob
                 GHG_emissions CO2_emissions CH4_emissions
GHG_emissions     0.000000e+00  0.000000e+00 2.214261e-130
CO2_emissions    4.589804e-287  0.000000e+00 4.725595e-112
CH4_emissions    2.357611e-186 3.808932e-152  0.000000e+00
NO2_emissions    1.201835e-147 5.784735e-125 2.305553e-164
GDP               1.841234e-88  2.542659e-92  8.610783e-77
Final_EI          7.533660e-11  2.875435e-09  1.377506e-16
Fuel_mix          1.646245e-03  4.292795e-03  1.165846e-04
Carbon_intensity  6.950160e-01  6.602926e-01  9.833155e-01
Renewables_share  3.242710e-05  1.286207e-04  1.322064e-05
Envt_taxes        1.930319e-96 1.554219e-100  4.362829e-85
Liquid_biofuels   5.591005e-56  4.117373e-55  2.029889e-43
Solar_thermal     4.418228e-51  2.889804e-50  3.896660e-47
Heat_pumps        7.102909e-14  5.467934e-15  4.617453e-11
                 NO2_emissions           GDP     Final_EI
GHG_emissions    1.107023e-120 4.178441e-113 6.541791e-06
CO2_emissions    4.710304e-106 2.724632e-111 1.732054e-05
CH4_emissions    3.952887e-138  4.861730e-73 4.998240e-08
NO2_emissions     0.000000e+00  3.243922e-75 2.443410e-06
GDP               3.876546e-81  0.000000e+00 7.011285e-06
Final_EI          9.476106e-13  2.441471e-14 0.000000e+00
Fuel_mix          1.017025e-04  6.101790e-01 2.234259e-02
Carbon_intensity  6.768192e-01  8.010140e-02 2.755920e-01
Renewables_share  1.451789e-04  2.669751e-02 3.524004e-06
Envt_taxes        3.887780e-90 6.644798e-210 6.385613e-15
Liquid_biofuels   6.623125e-42  3.976896e-47 5.519562e-04
Solar_thermal     6.246854e-38  8.019620e-45 2.005984e-18
Heat_pumps        2.447790e-09  7.237810e-23 6.754436e-04
                     Fuel_mix Carbon_intensity Renewables_share
GHG_emissions    3.441234e-06     6.808287e-02     7.657793e-07
CO2_emissions    1.258195e-05     5.661516e-02     1.434689e-06
CH4_emissions    8.154235e-12     1.103087e-01     4.341741e-08
NO2_emissions    1.504001e-09     2.416352e-01     5.104848e-06
GDP              4.556373e-01     6.372227e-01     3.833793e-04
Final_EI         7.710988e-01     1.261549e-13     5.538967e-25
Fuel_mix         0.000000e+00     2.166373e-02     6.382450e-02
Carbon_intensity 2.185072e-04     0.000000e+00     3.185907e-16
Renewables_share 7.473258e-01     3.531281e-07     0.000000e+00
Envt_taxes       9.682081e-01     7.245808e-02     3.195221e-02
Liquid_biofuels  1.370110e-01     2.745437e-02     1.616725e-03
Solar_thermal    2.772432e-01     4.789210e-01     1.824181e-06
Heat_pumps       4.219898e-01     2.089752e-03     6.860157e-01
                    Envt_taxes Liquid_biofuels Solar_thermal
GHG_emissions    3.574286e-104    1.300381e-65  2.499206e-45
CO2_emissions    5.984541e-101    1.912263e-67  9.797314e-47
CH4_emissions     1.234026e-77    5.925337e-38  3.811122e-29
NO2_emissions     4.225961e-70    1.488791e-45  2.791854e-45
GDP              1.835520e-171    9.634015e-67  2.569499e-23
Final_EI          1.718263e-06    1.093975e-03  2.647071e-04
Fuel_mix          7.821836e-01    4.332584e-01  1.112717e-08
Carbon_intensity  5.552413e-01    6.407311e-01  9.551676e-02
Renewables_share  1.880067e-04    1.007874e-04  2.182794e-02
Envt_taxes        0.000000e+00    5.287208e-47  1.401543e-21
Liquid_biofuels   5.878084e-45    0.000000e+00  3.171027e-20
Solar_thermal     1.137672e-50    6.262780e-36  0.000000e+00
Heat_pumps        1.262439e-17    6.610859e-15  5.011830e-10
                   Heat_pumps
GHG_emissions    5.511675e-11
CO2_emissions    8.341815e-10
CH4_emissions    4.639549e-14
NO2_emissions    1.676178e-13
GDP              6.268999e-22
Final_EI         4.782774e-03
Fuel_mix         3.870624e-02
Carbon_intensity 1.441776e-01
Renewables_share 4.506933e-01
Envt_taxes       3.469327e-24
Liquid_biofuels  2.956242e-09
Solar_thermal    1.007032e-01
Heat_pumps       0.000000e+00

$df_n
                 GHG_emissions CO2_emissions CH4_emissions
GHG_emissions              297           297           297
CO2_emissions              297           297           297
CH4_emissions              297           297           297
NO2_emissions              297           297           297
GDP                        294           294           294
Final_EI                   279           279           279
Fuel_mix                   279           279           279
Carbon_intensity           279           279           279
Renewables_share           270           270           270
Envt_taxes                 297           297           297
Liquid_biofuels            278           278           278
Solar_thermal              279           279           279
Heat_pumps                 279           279           279
                 NO2_emissions GDP Final_EI Fuel_mix Carbon_intensity
GHG_emissions              297 294      279      279              279
CO2_emissions              297 294      279      279              279
CH4_emissions              297 294      279      279              279
NO2_emissions              297 294      279      279              279
GDP                        294 294      279      279              279
Final_EI                   279 279      279      279              279
Fuel_mix                   279 279      279      279              279
Carbon_intensity           279 279      279      279              279
Renewables_share           270 270      270      270              270
Envt_taxes                 297 294      279      279              279
Liquid_biofuels            278 278      278      278              278
Solar_thermal              279 279      279      279              279
Heat_pumps                 279 279      279      279              279
                 Renewables_share Envt_taxes Liquid_biofuels
GHG_emissions                 270        297             278
CO2_emissions                 270        297             278
CH4_emissions                 270        297             278
NO2_emissions                 270        297             278
GDP                           270        294             278
Final_EI                      270        279             278
Fuel_mix                      270        279             278
Carbon_intensity              270        279             278
Renewables_share              270        270             269
Envt_taxes                    270        297             278
Liquid_biofuels               269        278             278
Solar_thermal                 270        279             278
Heat_pumps                    270        279             278
                 Solar_thermal Heat_pumps
GHG_emissions              279        279
CO2_emissions              279        279
CH4_emissions              279        279
NO2_emissions              279        279
GDP                        279        279
Final_EI                   279        279
Fuel_mix                   279        279
Carbon_intensity           279        279
Renewables_share           270        270
Envt_taxes                 279        279
Liquid_biofuels            278        278
Solar_thermal              279        279
Heat_pumps                 279        279

Comparing the correlation plots above, it is clear that ggcormat() of ggstateplot package and prepare_correlation_graph() function of ExPanDaR package are more robust as they are able to generate correlation plot of dataset with missing values, which is suitable for the panel data project as the dataset has a small number of missing values. A downside of the plot from prepare_correlation_graph() is that it returns the dataframe below the plot that cannot be disabled. Therefore, the ggcormat() plot is preferred as it has the correlation coefficients embedded within the visualisation, and those variables pairs that are statistically insignificant at ρ < 0.05 are also crossed out

Bivariate Analysis with Scatterplot

Using the prepare_scatter_plot() function of the ExPanDaR package, bivariate analysis between variables could be shown so as to allow users to determine the empirical relationship between variables.

prepare_scatter_plot(G, x="GHG_emissions", y="Final_EI", color="Country", loess = 1)

4.0 Final Visualisation & Interactivity

Each point below represents a unique tab within the EDA module of the Shiny application

  1. Data table, Descriptive Statistics, and Missing Values
  2. Histograms of multiple variables and individual variables
    • To incorporate variables selector
  3. Time Trend of variables of group and each country
    • To incorporate slider for years and countries selector
  4. Correlation Plot
    • To incorporate slider for year and countries selector
  5. Bivariate analysis with Scatterplot
    • To incorporate variables selector

5.0 Major Advantages of Incorporating Interactivity

  1. A static visualisation may contain too much additional information that would distract users from focusing on the information that they want. For example, for the case of the time trend graph here, there are trend lines of all countries in the same graph. If a countries selector is included, for the case of an interactive visualisation, users will be able to select only those of interest, and those that he wants to analyse. By limiting the selection to only a few, it will also ensure minimal trend lines overlapping visually.
  2. Interactive visualisation allows users to drill and narrow down into the specific variable or period of time to allow them to observe the changes to relationships between variables over time. For the case of the correlation plot, the way to compare the correlation between different variables and across different time periods on static plots are to generate correlation plots of every individual variables and years combination. With interactive visualisation, the number of such plots could be reduce to just one.
  3. There is no one-size-fits-all EDA visualisation. Rather than controlling what users to could see in static visualisation, interactive visualisation provides users the flexibility to customise the visualisation they would like to see.

Distill is a publication format for scientific and technical writing, native to the web.

Learn more about using Distill at https://rstudio.github.io/distill.