GitHub - Rong-Ding/ModellingHumanReading: Scripts of projects using models such as Word2Vec, LSTM, and SRN to simulate and fit human reading data

📖 Modelling Human Reading

This repository contains scripts for the two Python-based modelling projects (Projects 1 and 2), where semantic models (Word2Vec) and recurrent neural networks (that is, simple recurrent networks (SRNs) and long-short-term memory (LSTM) models) were trained to simulate and interpret human reading.

Impact: Modelling how humans process and interpret language provides valuable insights into neuro-cognitive behaviour. This could help neuropsychologists better understand language processing impairments, such as in aphasia or autism spectrum disorders. In practical applications, understanding human reading performance via modelling improves language learning technologies (e.g., Duolingo, Babbel), by making them more tailored as adaptive learning systems to real cognitive patterns.

Key Conclusions

Word2Vec models effectively capture human-like word semantics, especially with larger hidden layers and smaller context windows, mirroring cognitive constraints like limited working memory
LSTMs outperform SRNs in modeling human sensitivity to syntactic ambiguity, despite similar performance in capturing general language statistics (perplexity)
Lower perplexity ≠ better cognitive modeling: statistical accuracy alone is insufficient for predicting human-like language behaviour
Together, these findings highlight the importance of aligning computational models not just with linguistic data, but also with psycholinguistic phenomena

Overview of Analyses and Findings

Project 1: Semantic models (Word2Vec) in handling word semantics

Training: CBOW Word2Vec models on a large English corpus (ENCOW) with 16 billion words; hidden layer size and context window varied
Testing: The models' ability to predict humans' semantics-based word processing (categorisation and semantic priming)
Findings:
- Distributional word vectors from Word2Vec generally capture well word semantics used by humans.
- The bigger the hidden layer size, the better a Word2Vec model predicts human word processing.
- However, models with a smaller context window size tend to predict human behaviour better, indicating limited working memory capacity in human word processing

Project 2: SRN vs LSTM in characterising the statistical structure of language and syntactic ambiguity (garden-path sentences)

Training: SRN vs LSTM models on a large English corpus with 8.7 billion words; training data size varied
Testing: The models' ability to characterise language statistics (perplexity) and predict human performance (sensitivity to syntactic ambiguity)
Findings:
- With big training sizes, SRN and LSTM give almost equally low perplexity, meaning that they both capture well the statistical structure of language.
- However, LSTMs show higher sensitivity to syntactically ambiguous sentences than SRNs do.
- Characterising linguistic statistics well therefore does not necessarily indicate good prediction of human sentence processing.

Name		Name	Last commit message	Last commit date
Latest commit History 58 Commits
Project 1		Project 1
Project 2		Project 2
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

📖 Modelling Human Reading

Key Conclusions

Overview of Analyses and Findings

Project 1: Semantic models (Word2Vec) in handling word semantics

Project 2: SRN vs LSTM in characterising the statistical structure of language and syntactic ambiguity (garden-path sentences)

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

Rong-Ding/ModellingHumanReading

Folders and files

Latest commit

History

Repository files navigation

📖 Modelling Human Reading

Key Conclusions

Overview of Analyses and Findings

Project 1: Semantic models (Word2Vec) in handling word semantics

Project 2: SRN vs LSTM in characterising the statistical structure of language and syntactic ambiguity (garden-path sentences)

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages