Summary reports for input data sources used by the statistical modules
time period
number of observations by partner
## 'data.frame': 3819 obs. of 3 variables:
## $ year: int 2014 2014 2014 2014 2014 2014 2014 2014 2014 2014 ...
## $ prt : Factor w/ 263 levels "4","8","10","12",..: 3 31 32 33 34 35 4 36 37 38 ...
## $ cnt : int 366 122448 15465 2773 47081 34981 35575 23670 244169 8205 ...
## year prt cnt
## Min. :2000 4 : 15 Min. : 1
## 1st Qu.:2003 8 : 15 1st Qu.: 4957
## Median :2007 10 : 15 Median : 17560
## Mean :2007 12 : 15 Mean : 65528
## 3rd Qu.:2011 16 : 15 3rd Qu.: 62636
## Max. :2014 20 : 15 Max. :1897129
## (Other):3729
Values larger than 10^{6}
## year prt cnt
## 1 2014 156 1104898
## 2 2013 156 1117188
## 3 2013 276 1034622
## 4 2012 156 1831846
## 5 2012 840 1897129
## 6 2011 156 1333591
## 7 2011 840 1714170
## 8 2010 156 1462158
## 9 2010 840 1573837
## 10 2009 156 1066428
## 11 2009 840 1774167
## 12 2000 276 1169905
## 13 2000 56 1022478
## 14 2000 840 1509820
Counting the number of observations by year
and prt
between 2000 and 2014.
ct_tariffline_unlogged_[year]
spark.sql("SELECT year, prt, COUNT(*) AS cnt FROM parquetTable GROUP BY year, prt ORDER BY year, prt")
count_[prt]
is the rowcount with the respective prt, e.g. count_HS2
are the number of observations using commodity classification H2 (HS 2002).pct_max
represents the share of the HS group that has the largest share in the respective year (in percent)## # A tibble: 15 × 2
## period count
## <chr> <int>
## 1 200052 28
## 2 200152 28
## 3 200252 29
## 4 200352 29
## 5 200452 29
## 6 200552 29
## 7 200652 29
## 8 200752 29
## 9 200852 29
## 10 200952 29
## 11 201052 29
## 12 201152 29
## 13 201252 29
## 14 201352 29
## 15 201452 29
## codeUN cou
## 1 40 AUT
## 2 56 BEL
## 3 100 BGR
## 4 196 CYP
## 5 203 CZE
## 6 276 DEU
## 7 208 DNK
## 8 724 ESP
## 9 233 EST
## 10 246 FIN
## 11 250 FRA
## 12 826 GBR
## 13 300 GRC
## 14 191 HRV
## 15 348 HUN
## 16 372 IRL
## 17 380 ITA
## 18 440 LTU
## 19 442 LUX
## 20 428 LVA
## 21 470 MLT
## 22 528 NLD
## 23 616 POL
## 24 620 PRT
## 25 642 ROU
## 26 703 SVK
## 27 705 SVN
## 28 752 SWE
## # A tibble: 15 × 2
## period count
## <chr> <int>
## 1 200052 232
## 2 200152 243
## 3 200252 244
## 4 200352 244
## 5 200452 244
## 6 200552 245
## 7 200652 244
## 8 200752 244
## 9 200852 243
## 10 200952 243
## 11 201052 244
## 12 201152 243
## 13 201252 243
## 14 201352 249
## 15 201452 249
## Warning in FUN(.): These ES Geonom codes were not converted to FAO area
## list: 479, 475, 466, 477, 977, 229, 955, 225
## [1] "533 ABW" "4 AFG" "24 AGO" "8 ALB" "784 ARE" "32 ARG" "51 ARM"
## [8] "16 ASM" "28 ATG" "36 AUS" "36 AUS" "36 AUS" "36 AUS" "36 AUS"
## [15] "36 AUS" "40 AUT" "31 AZE" "108 BDI" "56 BEL" "204 BEN" "854 BFA"
## [22] "50 BGD" "100 BGR" "48 BHR" "44 BHS" "70 BIH" "112 BLR" "84 BLZ"
## [29] "60 BMU" "68 BOL" "76 BRA" "52 BRB" "96 BRN" "64 BTN" "72 BWA"
## [36] "140 CAF" "124 CAN" "756 CHE" "756 CHE" "152 CHL" "384 CIV" "120 CMR"
## [43] "180 COD" "178 COG" "184 COK" "170 COL" "174 COM" "132 CPV" "188 CRI"
## [50] "192 CUB" "136 CYM" "196 CYP" "203 CZE" "276 DEU" "262 DJI" "212 DMA"
## [57] "208 DNK" "214 DOM" "12 DZA" "218 ECU" "818 EGY" "232 ERI" "724 ESP"
## [64] "724 ESP" "724 ESP" "233 EST" "231 ETH" "231 ETH" "246 FIN" "242 FJI"
## [71] "238 FLK" "238 FLK" "250 FRA" "250 FRA" "234 FRO" "583 FSM" "266 GAB"
## [78] "826 GBR" "268 GEO" "288 GHA" "324 GIN" "270 GMB" "624 GNB" "226 GNQ"
## [85] "300 GRC" "308 GRD" "304 GRL" "320 GTM" "316 GUM" "316 GUM" "328 GUY"
## [92] "344 HKG" "340 HND" "191 HRV" "332 HTI" "348 HUN" "360 IDN" "356 IND"
## [99] "372 IRL" "364 IRN" "368 IRQ" "352 ISL" "376 ISR" "380 ITA" "380 ITA"
## [106] "380 ITA" "388 JAM" "400 JOR" "392 JPN" "398 KAZ" "404 KEN" "417 KGZ"
## [113] "116 KHM" "296 KIR" "659 KNA" "659 KNA" "410 KOR" "414 KWT" "418 LAO"
## [120] "422 LBN" "430 LBR" "434 LBY" "662 LCA" "144 LKA" "426 LSO" "440 LTU"
## [127] "442 LUX" "428 LVA" "446 MAC" "504 MAR" "498 MDA" "450 MDG" "462 MDV"
## [134] "484 MEX" "584 MHL" "807 MKD" "466 MLI" "470 MLT" "104 MMR" "499 MNE"
## [141] "496 MNG" "508 MOZ" "478 MRT" "500 MSR" "480 MUS" "454 MWI" "458 MYS"
## [148] "175 MYT" "516 NAM" "540 NCL" "562 NER" "566 NGA" "558 NIC" "570 NIU"
## [155] "528 NLD" "578 NOR" "524 NPL" "520 NRU" "554 NZL" "554 NZL" "554 NZL"
## [162] "512 OMN" "586 PAK" "591 PAN" "604 PER" "608 PHL" "585 PLW" "598 PNG"
## [169] "616 POL" "408 PRK" "620 PRT" "600 PRY" "275 PSE" "258 PYF" "634 QAT"
## [176] "642 ROU" "643 RUS" "646 RWA" "682 SAU" "686 SEN" "702 SGP" "654 SHN"
## [183] "90 SLB" "694 SLE" "222 SLV" "706 SOM" "666 SPM" "688 SRB" "688 SRB"
## [190] "688 SRB" "678 STP" "740 SUR" "703 SVK" "705 SVN" "752 SWE" "748 SWZ"
## [197] "690 SYC" "760 SYR" "796 TCA" "148 TCD" "768 TGO" "764 THA" "762 TJK"
## [204] "795 TKM" "776 TON" "780 TTO" "788 TUN" "792 TUR" "798 TUV" "834 TZA"
## [211] "800 UGA" "804 UKR" "858 URY" "840 USA" "840 USA" "840 USA" "860 UZB"
## [218] "670 VCT" "862 VEN" "92 VGB" "704 VNM" "548 VUT" "876 WLF" "882 WSM"
## [225] "887 YEM" "710 ZAF" "894 ZMB" "716 ZWE" "530 NA" "1248 NA" "158 NA"
## [232] "729 NA"
The workflow diagram informs about the various steps of the pre-processing
SWS Data
faosws
to retrieve datatables for each year from SWS (approx. 15-25 minutes per table, total around 30 tables)rds
to preserve data typesProcessing (Spark)
dryworkflow (R)
html_fragment
as export type in rmarkdown; this will create an HTML page without a header that can be used in JekyllWebsite (Jekyll)
html_fragment
is included to trigger jekyll updatejekyll serve
is running, the contents of _site
directory will be updated automaticallysws_r_share/fbsmodules
using rsync
Resources
The various source files for Jekyll and dryworkflow have been copied to a network location