For my PhD project I'm using primary care data from the Optimum Patient Care Research Database. In order to derive a variable for patient ethnicity, I wanted a comprehensive Read code list meeting the following criteria:
- Includes both version 2 and version 3 Read codes.
- Includes both the 2001 and 2011 censuses.
- Is mapped to a reliable ethnicity categorisation.
There are several great Read code resources (e.g. OpenSafely, CALIBER, Phenotype Library, ClinicalCodes, Vision), but I couldn't find a single list that satisfied all three of the above criteria for ethnicity.
Therefore I've consolidated multiple lists into a single list that satisifies the first two criteria, and mapped the categorisation specified by the OpenSafely team to all Read codes in the list to satisfy the third criteria. This repository includes:
- The code I used to scrape the ethnicity Read code lists from the above websites read_tables.R.
- The scraped and cleaned data rc_ethnicity.csv.
- A markdown file prepare_list.md documenting the methods that I used to map the ethnicity codes in the OpenSafely list to the codes in the CALIBER, Phenotype Library, Clinical Codes and Vision lists.
- The final code list rc_ethnicity_final.csv.
Please comment/suggest changes/use for your own research!