Skip to content
@lydianish-phd

PhD codebase: Robust Neural Machine Translation of User-Generated Content

Code and experiments from Lydia Nishimwe's PhD research conducted at Inria Paris (2021-2025).

Welcome to my PhD Projects Organization 👋

This organization hosts all repositories related to my PhD research in AI and NLP, including robust NMT, lexical normalization, sentence embeddings, and data augmentation projects.

⚠️ Work in progress: still migrating repositories from my lab's private GitLab.

🎓 Read the full thesis here: Robust Neural Machine Translation of User-Generated Content.

For an overview of my personal projects, contributions, and pinned repositories, visit my personal GitHub: github.com/lydianish

📌 Pinned Projects

This repository contains the full research code and experiments from my PhD work on making sentence embeddings robust to user-generated content (UGC). It includes the full training pipelines for RoLASER and RoSONAR, covering synthetic UGC generation, teacher–student training, and evaluation on both natural and artificial non-standard text. Ideal for researchers interested in UGC robustness, sentence embeddings, and multilingual NLP.

🔹 RoLASER

A demo-focused version of the RoLASER model for quick exploration. It provides pre-trained models, example scripts, and visualisations to understand how token-level and character-level student encoders align standard and non-standard sentences in the LASER embedding space. Perfect for testing and educational purposes.

Pinned Loading

  1. RoLASER RoLASER Public

    A Robust LASER sentence encoder for English User-Generated Content

    Python 2

  2. robust-embeddings robust-embeddings Public

    Robust sentence embeddings for user-generated content (UGC) with training and evaluation pipelines.

    Jupyter Notebook

Repositories

Showing 9 of 9 repositories

People

This organization has no public members. You must be a member to see who’s a part of this organization.

Top languages

Loading…

Most used topics

Loading…