The parlnet repository contains links to other repositories, which each track the code to build the networks of a specific country. To get all the code at once, clone this repository with Git and update its submodules:
git clone https://github.com/briatte/parlnet
cd parlnet
git submodule update --init
The parlnet repository is versioned. Cloning the most recent version of the repository, as recommended above, will clone the most recent release and the changes that have been applied to it since publication.
The main entry point of each repository is make.r. Each repository also contains the following scripts, which are run in that order by make.r:
load.r– package loading (see dependencies)parties.r– party codes, colors, names and scoresfunctions.r– network and text utilitiesdata.r– scrapers for bills and sponsorsbuild.r– the network building routinecomm.r– the committee co-membership routine
The contents of the functions.r script is identical across all repositories. Some of the other scripts are occasionally broken into several files when there is more than one parliamentary chamber in the country.
Note: the scripts that handle the French upper chamber require a PostgreSQL installation; see the
READMEfile of theparlementrepository for further details. Once PostgreSQL is installed, the repository contains a shell script that handles all required operations. This is the only dependency external to R in the entireparlnetrepository. The same repository also contains an additional set of functions to clean up French names and identify legislatures.
To build the networks of any given country, set your working directory to its repository and run the make.r script after checking the list of package dependencies below. The plot and gexf boolean parameters can be set to FALSE to skip over network plots (see below).
The code will create a series of folders to store the results:
-
Raw data are stored in
raw. All raw data are scraped from official parliamentary websites, generally as JSON or HTML files. The code will look for any zipped raw data folder namedraw.zipto start with, and will do the same forphotosor variations of it. This makes it easy to update the networks, which only requires to remove thedatafolder and to run the code.The
rawdata folders are often internally organised in several subfolders containing bill indexes (lists of bills), bill pages, sponsor indexes and sponsor pages. The folder hierarchy is created bymake.rif it does not exist.The
READMEcontains a link to the raw data collected by the most recent release ofparlnet, in order to make it easy to replicate all networks in a limited amount of time (less than half an hour). -
Derived datasets are stored in
dataas CSV files, along with the final networks, which are stored asnetworkobjects in a.rdafile. The objects include weighted cosponsorship directed networks (prefixnet_) and, when possible, weighted committee co-membership undirected networks (prefixconet_). See theparlnet.csvdataset for measures derived from them.The network-level (chamber-level), vertex-level (sponsor-level) and edge-level (edge weights) attributes of the networks are documented in full in the appendix to the repository.
The
.rdafile produced by each country will also contain the raw bills data (asbills_objects) and the raw edge lists (asedges_objects) used during network construction.The
parlnet.rdafile contains thenet_andconet_network objects produced for each country, as obtained from a clean run of allmake.rscripts contained in the repository. -
Network plots will appear in
plotsas JPG and PDF files. The PDF files include a legend with the party abbreviations corresponding to each node color. Both plots size nodes proportionally to their degree quartile and place them by Fruchterman-Reingold placement. Last, edges are coloured by shared party affiliation when relevant. Plotting can be skipped by settingplottoFALSEinmake.r.The placement method used in the plots can be changed to any method supported by the
snapackage (e.g.kamadakawai) by editing the value of themodeobject inmake.r. -
Sponsor photos are stored in
photos, or some variation of it if there is more than one parliamentary chamber in the country. Sponsor photos are only used in the interactive visualizations of the networks, which are based on GEXF exports of the networks. The exports will be saved to the root of the repository. Exporting can be skipped by settinggexftoFALSEinmake.r.
The parlnet.log file gives the estimated runtime of each repository when the raw data and derived datasets already exist on disk, and when plotting and exporting the networks is skipped. On a standard laptop, replicating the parlnet.rda object by running all repositories takes approximately half an hour.
Each repository depends on:
dplyr,ggplot2,httrandstringr, by Hadley Wickhamgrid, by Paul Murrell (distributed with R)networkandsna, by Carter T. Butts and othersrgexf, by George Vega Yon and othersXML, by Duncan Temple Lang
The code also occasionally calls:
- the
jsonlitepackage by Jeroen Ooms, - the
lubridatepackage by Vitalie Spinu and others, - the
rvestpackage by Hadley Wickham
All packages can be installed from CRAN.
As of December 2015, the code requires R ≥ 3.1.2 in order to support dplyr. Example session information after loading all required packages is copied below.
R version 3.2.3 (2015-12-10)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X 10.9.5 (Mavericks)
locale:
[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8
attached base packages:
[1] grid stats graphics grDevices utils datasets methods
[8] base
other attached packages:
[1] lubridate_1.5.0 jsonlite_0.9.19 rvest_0.3.1 xml2_0.1.2
[5] ggplot2_2.0.0 sna_2.3-2 network_1.13.0 rgexf_0.15.3
[9] igraph_1.0.1 Rook_1.1-1 XML_3.98-1.3 stringr_1.0.0
[13] httr_1.0.0 dplyr_0.4.3
loaded via a namespace (and not attached):
[1] Rcpp_0.12.2 magrittr_1.5 munsell_0.4.2 colorspace_1.2-6
[5] R6_2.1.1 brew_1.0-6 plyr_1.8.3 tools_3.2.3
[9] parallel_3.2.3 gtable_0.1.2 DBI_0.3.1 assertthat_0.1
[13] stringi_1.0-1 scales_0.3.0Notes:
- The
save_plotfunction found infunctions.ruses code from theggnetfunction, by Moritz Marbach and myself. The complete function is published in theGGallypackage by Barret Schloerke. See theggnetrepository for the full function code with many examples.- The
str_cleanandstr_spacetext cleaning functions found infunctions.rare lightweight remixes of thescrubberandTrimfunctions found in the very richqdappackage by Tyler Rinker.