Validating FBS results

Measures to benchmark quality of results produced by statistical modules

Validate Trade Input Data

The validate R package developed at Statistics Netherlands offers a data validation infrastructure.

The validator function can read a set of rules from a text file

v <- validator(.file = "../extra/summary_complete_tf_cpc_esdata_tldata_rulesfile.yml")

The text file can follow a free form or a yaml format

# content of summary_complete_tf_cpc_esdata_tldata_rulesfile.yaml
rules:
- 
  expr: weight > 0
  name: weight
  label: weight positivity
  description: |
   If the value is positive, weight should be positive as well.
-
  expr: "!is.na(qty)"
  name: qty
  label: qty exists
  description: |
    The qty should exist.

A data.frame containing the variables qty and weight can be confronted with a validator v:

cf_esdata <- esdata %>% as.data.frame %>% confront(x = v)
cf_tldata <- tldata %>% as.data.frame %>% confront(x = v)

Inspecting the result of the confrontation

Methods have been created for objects returned from the confront function

Summary EU trade since 1988 by CN8 from Eurostat Comext (only 2011)

complete_tf_cpc_esdata_tldata_sum$cf_esdata %>%
  summary
##     rule   items  passes   fails nNA error warning  expression
## 1 weight 8077901 6224972 1852929   0 FALSE   FALSE  weight > 0
## 2    qty 8077901 2384306 5693595   0 FALSE   FALSE !is.na(qty)
complete_tf_cpc_esdata_tldata_sum$cf_esdata %>%
  barplot(main = "EU trade since 1988 by CN8 from Eurostat Comext (only 2011)")

Summary Tariffline Data from UNSD Comtrade (only 2011)

complete_tf_cpc_esdata_tldata_sum$cf_tldata %>%
  summary
##     rule    items  passes  fails     nNA error warning  expression
## 1 weight 10850866 9769026      0 1081840 FALSE   FALSE  weight > 0
## 2    qty 10850866 9862034 988832       0 FALSE   FALSE !is.na(qty)
complete_tf_cpc_esdata_tldata_sum$cf_tldata %>%
  barplot(main = "Tariffline Data from UNSD Comtrade (only 2011)")

summary-complete-tf-cpc-esdata-tldata-2011-validate-barplot-esdata-1

summary-complete-tf-cpc-esdata-tldata-2011-validate-barplot-tldata-1

Visualization and Imputation of Missing Values

The quality of imputation can be visually explored using various univariate, bivariate, multiple and multivariate plot methods.

In addition to the scatterplot, boxplots for available and for imputed values, as well as univariate scatterplots for the imputed values are given in the plot margins. Furthermore the frequencies of imputed values are displayed, again for each variable.

Production and other variables

5510-5016

5510-5071

5510-5120

5510-5122

5510-5141

5510-5520

5510-5525

5510-5610

Import Quantity and other variables

5610-5016

5610-5071

5610-5120

5610-5141

5610-5520

5610-5525

5610-5525

Input Data Validation

input data validation flow diagram

Dietary Energy Supply DES

Shared working folder B, C

sua-fbs-iraq-des-comparison-2013.png

Comparison Dietary Energy Supply DES and Consumption DEC

Reconciliation between dietary energy consumption from FBS and NHS

report_des_dec_comparison_csv-highest-differences-1

report_des_dec_comparison_csv-sofidec-1

Differences Between FBS and NHS Shares of Food Consumption

Data table provided by Team D:

Data transformation steps:

The plots with n = 1…5 is available at:

report_fbs_hhs_des_results_adam_csv-highest-differences-1

FAO Methodology for the Measurement of Food Deprivation

report_des_dec_comparison_csv-skewed-1

The estimate of the proportion of the population below minimum level of dietary energy consumption has been defined within a probability distribution framework:

\[ P(U) = P \left( x < rL \right) = \int_{x < r_L} f(x)dx = F_{x}(r_L) \]

log-normal-dietary-energy-consumption

In the graph the curve f(x) depicts the proportion of the population corresponding to different per caput dietary energy consumption levels (x) represented by the horizontal line. The area under the curve up to the minimum energy requirement which is a cut-off point in the curve, r_L, represents the proportion of the population undernourished, i.e. prevalence of undernourishment.

Estimation of the Mean and Coefficient of Variation of the Density Function f(x)

There are two options for estimating the mean: using Food Balance Sheet (FBS) data or Household Budget Survey (HBS) data. The first can be used to prepare annual estimates for monitoring progress in food security for the country as a whole. The second one allows the derivation of sub-national estimates. The latter estimates can not be prepared on a yearly basis, as they depend on the survey frequency, in general ranging from 5 to 10 years. The illustrative results are presented for both options, FBS and HBS.

Dietary Energy Consumption from the Food Balance Sheet (FBS)

The mean is represented by the Dietary Energy Supply per person (DES) which refers to the food available for human consumption during the course of the reference period, expressed in terms of energy (kcal/person/day). The estimate is derived from the Food Balance Sheets compiled on the basis of data on the production (PROD) and trade (IMPorts and EXPorts) of food commodities. Using these data and the available information on stock changes (STCH), losses between the levels at which production is recorded and the household (WASTE) and types of utilization (SEED, FEED, FOOD, inputs for PROCessing derived products and OTHER uses) a supply/utilization account is prepared for each commodity in weight terms. The food component, which is usually derived as a balancing item, refers to the total amount of the commodity available for human consumption during the year. The DES is obtained by aggregating the food component of all commodities after conversion into energy values. The table below presents the standard Food Balance Sheet for the hypothetical country in 1999-2001.

Dietary Energy Consumption from Household Budget Survey

This option requires the conversion of quantities of the different food items consumed by the household into energy values. These data are usually collected through budget surveys using large scale samples which may allow mean estimates not only at the national level but also at sub-national levels such as geographic areas and socio-economic population groups.

Advantages of the use of food consumption estimates from Food Balance Sheets

The procedure of using the daily per person DES derived from the food balance has some advantages as indicated below.

Table of Contents