This zsh script uses wget (required), git (optionnal) and pv (required) to synchronize one or many of the following open data .xml stocks from France's DILA (Direction de l'Information Légale et Adminsitrative) legal datasets (données juridiques).
It is mostly intended for private use on the annotated military pensions code, but might be helpful to other people interested in syncing DILA datasets, however, it requires a few specfic dependencies (like BSD grep) to work properly, as is. Feel free to improve and create a pull request if you do.
gnu utils must be installed and available in PATH.
Here are the deps I'm currently running the script with:
- sed (GNU sed) 4.4
- GNU Wget 1.19.2
- pv 1.6.6
- tar (GNU tar) 1.30
- git version 2.15.1
- grep (BSD grep) 2.5.1-FreeBSD (default macOS grep)
| Name | Description and url |
|---|---|
| LEGI | Codes, lois et règlements consolidés ftp://ftp2.journal-officiel.gouv.fr/LEGI/ |
| JORF | Textes publiés au Journal officiel de la République française ftp://ftp2.journal-officiel.gouv.fr/JORF/ |
| KALI | Conventions collectives nationales ftp://ftp2.journal-officiel.gouv.fr/KALI/ |
| CASS | Arrêts publiés de la Cour de cassation ftp://ftp2.journal-officiel.gouv.fr/CASS/ |
| INCA | Arrêts inédits de la Cour de cassation ftp://ftp2.journal-officiel.gouv.fr/INCA/ |
| CAPP | Décisions des cours d’appel et des juridictions judiciaires de premier degré ftp://ftp2.journal-officiel.gouv.fr/CAPP/ |
| CONSTIT | Décisions du Conseil constitutionnel ftp://ftp2.journal-officiel.gouv.fr/CONSTIT/ |
| JADE | Décisions des juridictions administratives ftp://ftp2.journal-officiel.gouv.fr/JADE/ |
| CNIL | Délibérations de la CNIL ftp://ftp2.journal-officiel.gouv.fr/CNIL/ |
| SARD | Référentiel thématique sur la majeure partie des textes législatifs et réglementaires en vigueur ftp://ftp2.journal-officiel.gouv.fr/SARD/ |
JORFSIMPLE is not supported yet (Version simplifiée du Journal officiel - ftp://ftp2.journal-officiel.gouv.fr/JORFSIMPLE/)
For a more detailed explanation, view Licences données juridiques (page in french) on DILA's Répertoire des Informations Publiques.
./dila-sync.sh [-hgv] [-l rate_limit] stock_name [stock_name...]# Let's say you want to sync each and every stock and you did setup
# a .dila-sync-gitwatch file listing the directories to be versionned
./dila-sync.sh -g legi capp cass cnil constit inca jade kali sarde-
-hprint help -
-guse git for versioning. See below -
-vverbose -
-lrate_limitlimit wget download rate torate_limit.
stock_name
You have to provide at least one stock_name for dila-sync to synchronize it, see the list above for the supported stocks. stock_name can be either uppercase, lowercase, or a mix of both if you're feeling funky.
In order to use git to version some parts of the stock, you need to create a file called .dila-sync-gitwatch in the script folder before running ./dila-sync.sh -g.
This file should contain no more than a single path to version per line, relative to the script directory (so it must include ./stock)
Example
# Create .dila-sync-gitwatch
touch .dila-sync-gitwatch
# Version the whole CNIL stock
echo "./stock/cnil" >>> .dila-sync-gitwatch
# Version only LEGITEXT000006074068 in the LEGI stock
# (code des pensions militaires...)
echo "./stock/legi/global/code_et_TNC_en_vigueur/code_en_vigueur/LEGI/TEXT/00/00/06/07/40/LEGITEXT000006074068" >>> .dila-sync-gitwatch
# Add as many directories you need to version with git
# ...We're not automatically cleaning empty directories after each delta is applied, so every once in a whil it might be a good idea to run the following command in the stock directory
# Find and remove empty directories in current folder
find . -type d -empty -deletecodes="./stock/legi/global/code_et_TNC_en_vigueur/code_en_vigueur"
ls "$codes"/*/*/*/*/*/*/*/* | grep ".stock" | sed 's/:$//g'find . -type f -mtime -100 -name "*.xml" -printf "%TD %TR %p\n"Simply change the starting point of find (here we are using a variable for the path we're interested in):
find_in="./legi/global/code_et_TNC_en_vigueur/code_en_vigueur/LEGI/TEXT/00/00/06/07/40/LEGITEXT000006074068"
find $find_in -type f -mtime -100 -name "*.xml" -printf "%TD %TR %p\n"look_in="./legi/global/code_et_TNC_en_vigueur/code_en_vigueur/LEGI/TEXT/00/00/06/07/40/LEGITEXT000006074068"
git log -p -1 -s -- $look_inIf you need to quickly reset everything to a blank state, here are the steps:
# Remove .dila-sync folder
rm -rf ./.dila-sync;
# Remove the stock
rm -rf ./stock;
# Optionaly #
# Remove the archive files if you want to re-download everything
# rm -rf ./.tmp;