- Edit configuration in config/default.jsonand
- custom environment variables names in config/custom-environment-variables.json,
- Application constants can be configured in ./constants.js
- Since the data we need to download and process is huge it's better (/ safer) to use 2 different tools instead of one single script so in case that something goes wrong during processing, we'll minimise the damage.
- Run npm run download-datato download all available datasets.
- The datasets will be stored in the configured directory.
- Old data will be replaced.
- This operation does not affect the database.
- Run npm run import-datato import all data using the downloaded files from the previous step.
Before starting the application, make sure that PostgreSQL is running and you have configured everything correctly in config/default.json
- Install dependencies npm i
- Run lint check npm run lint
- Start app npm start. This will run all tools in the following sequence:
npm run download-data => npm run import-data
The application will print progress information and the results in the terminal.
- To verify that the data is imported, you can use the pgAdmin tool and browser the database.
- The total size of all datasets is > 1.5GB so it will take quite some time, depending on your internet connection, to finish the operation.
- max_old_space_sizehas been set to 4096MB to allow parse/process such huge data files without any issues. The app will clean the memory right after using the data to prevent memory/heap leaks.
- The dataset for FOREIGN ADDRESSESdoesn't have a header in the CSV file and it has slightly different format (it has an extra column). The app handles all datasets without any issue.