PhD codebase: Robust Neural Machine Translation of User-Generated Content

Welcome to my PhD Projects Organization 👋

This organization hosts all repositories related to my PhD research in AI and NLP, including robust NMT, lexical normalization, sentence embeddings, and data augmentation projects.

⚠️ Work in progress: still migrating repositories from my lab's private GitLab.

🎓 Read the full thesis here: Robust Neural Machine Translation of User-Generated Content.

For an overview of my personal projects, contributions, and pinned repositories, visit my personal GitHub: github.com/lydianish

📌 Pinned Projects

🔹 robust-embeddings

This repository contains the full research code and experiments from my PhD work on making sentence embeddings robust to user-generated content (UGC). It includes the full training pipelines for RoLASER and RoSONAR, covering synthetic UGC generation, teacher–student training, and evaluation on both natural and artificial non-standard text. Ideal for researchers interested in UGC robustness, sentence embeddings, and multilingual NLP.

🔹 RoLASER

A demo-focused version of the RoLASER model for quick exploration. It provides pre-trained models, example scripts, and visualisations to understand how token-level and character-level student encoders align standard and non-standard sentences in the LASER embedding space. Perfect for testing and educational purposes.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PhD codebase: Robust Neural Machine Translation of User-Generated Content

Welcome to my PhD Projects Organization 👋

📌 Pinned Projects

🔹 robust-embeddings

🔹 RoLASER

Pinned Loading

Repositories

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

People

Top languages

Uh oh!

Most used topics

Uh oh!