Last updated on May 7, 2025.
Website: policychangeindex.org/projects/pci-tensions
Authors: Kaiwei Hsu and Weifeng Zhong
PCI Tentions is an open-source method that uses large language models (LLMs) to analyze CCP propaganda and develop early warning signals for a Taiwan Strait crisis --- potentially a prelude to an invasion. The methodology is as follows:
- Collect the full text of the People's Daily from the 1994–1996 training period and 2022–2024 test period and label a set of essential metadata for each article, such as publication date, title, content, and page number.
- Identify major events that occurred leading up to and during the 1995 Taiwan Strait Crisis (covered by the training period) and recent US-Taiwan diplomatic events in the test period.
- For each newspaper article in the training period, build a set of quantitative indices to measure China-Taiwan relations by prompting an LLM with questions such as how the relationship between China and Taiwan is perceived and how the Chinese government views the Taiwanese government. Then, aggregate each article-level index to a weekly sum and calculate the four-week moving average of that sum.
- Assess how well the time series of the indices match or even predict the timing of the major events during the 1994–1996 period.
- Fine-tune the algorithm by repeating steps 3 and 4. Specifically, improve the indices by revising the LLM prompts to better capture a variety of topics most relevant to China-Taiwan tensions, such as military activities, US engagement, economic relations, culture exchange, and China's emphasis on reunification and the One China principle. The goal of this step is to optimize the fit of the indices to the timing of the major events.
- Assess the model's performance by deploying the model to People's Daily articles in the 2022–2024 period, which covers major political events, including Taiwan's diplomatic activities and China's military escalations, across the Taiwan Strait.
This repository provides the code to implement steps 2-6 of the PCI-Tensions workflow. Due to copyright considerations, we do not provide the training data. However, the same workflow can be applied to text classification tasks with binary labels and temporal information, such as publication dates. Interested researchers can use their own data for replication.
Please follow the following stepes to replicate this study:
- Prepare the People's Daily data by following the step 1 mentioned in the Introduction. Save the data as
input.csv(as referenced in01_Data_processing.py). - Run
01_Data_processing.pyto process the data. - Run
02_LLM.pywith an active Open AI key (not included) to execute the LLM-based algorithm. - Run
03_Analysis.ipynbto visualize the results of the model.
Please cite the source of the latest PCI-Tensions by the website: https://policychangeindex.org.
For academic work, please cite the following research paper:
- Kaiwei Hsu and Weifeng Zhong. 2025. "Predicting Taiwan Strait Crises Using Propaganda: A New Open-Source Method." Mercatus Policy Brief.
Please stay tune for more research products based on/using this algorithm.