LeCoNav: Legacy Code Navigator

LeCoNav (Legacy Code Navigator) is an AI-powered platform designed to help developers understand, document, and interact with large, complex, and legacy codebases. By leveraging a Retrieval-Augmented Generation (RAG) pipeline, LeCoNav allows you to "chat" with your source code, ask complex questions, and receive context-aware answers, drastically reducing the time spent on code archeology.


## 🎯 The Core Challenge: Semantic Code Understanding

LLMs are powerful, but they are not inherently experts in software architecture. When analyzing source code, simply treating it as a continuous stream of text leads to catastrophic context fragmentation. A naive approach, like splitting a file every N characters, will inevitably break the semantic integrity of the code.

| Naive Chunking (`RecursiveCharacterTextSplitter`) | ✅ Intelligent Chunking (Our Approach) |
| :``` | :``` |
| ❌ Breaks functions and classes mid-definition. | ✅ Preserves the integrity of logical code units (functions, classes, methods). |
| ❌ Separates comments and docstrings from their code. | ✅ Associates documentation and comments with their corresponding code block. |
| ❌ Creates chunks with little to no semantic meaning. | ✅ Creates chunks that represent a single, complete semantic concept. |
| ❌ Lacks crucial context (e.g., imports, parent class). | ✅ Enriches chunks with vital metadata: language, unit type, name, and line numbers. |
| **Result: Low-quality retrieval, incorrect answers.** | **Result: High-quality, precise context retrieval for accurate LLM responses.** |

Our core mission is to replace this naive method with a **syntax-aware parsing strategy**, ensuring each piece of vectorized code represents a complete, logical unit.

🏛️ System Architecture

LeCoNav is built on a modern, scalable microservices architecture designed for asynchronous processing of large codebases.

graph TD
    subgraph User Interaction
        A[Developer] -->|1. Upload .zip| B(FastAPI Web API)
    end

    subgraph Backend Infrastructure
        B -->|2. Enqueue Task| C{Redis Broker}
        C -->|3. Fetch Task| D[Celery Worker]
    end

    subgraph AI & Data Processing
        D -->|4. Split & Parse Code| E{Intelligent Chunker}
        E -->|5. Generate Embeddings| F(Ollama LLM)
        E -->|6. Store Metadata| G[(MongoDB)]
        F -->|7. Store Chunks & Vectors| H[(Weaviate Vector DB)]
    end

    style A fill:#cde4ff
    style B fill:#90caf9
    style D fill:#fff59d


## ✨ Key Features

* **Asynchronous Processing:** Upload entire project zip files and let the Celery workers handle the heavy lifting in the background without blocking the API.
* **Intelligent, Syntax-Aware Chunking:** Goes beyond simple text splitting to parse code into meaningful semantic units like functions and classes.
* **Extensible Language Support:** Designed from the ground up to support multiple programming languages with a clear strategy for adding more.
* **Rich Metadata:** Each code chunk is enriched with valuable metadata (language, unit type, name, location) for precise context retrieval.
* **Graceful Fallback:** Non-parseable files (like `.md` or `.properties`) are still indexed using a standard text splitter, ensuring no information is lost.

🚀 Getting Started

These instructions will get you a copy of the project up and running on your local machine for development and testing purposes.

Prerequisites

Docker & Docker Compose

Installation & Launch

Generate the Project Structure: If you are starting from scratch, use the setup script to generate all necessary files and directories.
```
chmod +x setup_project.sh
./setup_project.sh
```
Build and Run the Services: Use the management script to build the Docker images and launch all services (FastAPI, Celery, Redis).
```
chmod +x manage.sh
./manage.sh rebuild
```
To simply start the services if they are already built, use ./manage.sh start.
Verify the Services:
- API is available at http://localhost:8000.
- Redis is exposed at localhost:6379.


## Workflow & Usage

### 1. Upload a Project

Package your source code into a `.zip` file and upload it to the processing endpoint.

```bash
curl -X POST -F "file=@/path/to/your/project.zip" http://localhost:8000/api/v1/upload-and-process/

The API will immediately respond with a task_id.

{
  "task_id": "a1b2c3d4-e5f6-7890-g1h2-i3j4k5l6m7n8",
  "message": "File upload successful. Processing has started."
}

2. Check Processing Status

Use the task_id to poll the status endpoint and retrieve the result once processing is complete.

curl http://localhost:8000/api/v1/tasks/status/a1b2c3d4-e5f6-7890-g1h2-i3j4k5l6m7n8


## 🗺️ Roadmap & TODO

This project is under active development. Our roadmap is focused on building a robust, intelligent, and user-friendly code analysis platform.

### Phase 1: Intelligent Parsing Core
-   [ ] **Implement Syntax-Aware Chunking:**
    -   [ ] Replace `RecursiveCharacterTextSplitter` in the Celery worker with a new strategy based on `langchain.text_splitter.Language`.
    -   [ ] Add initial support for **Python** and **Java**.
    -   [ ] Implement the fallback mechanism for unsupported file types (`.properties`, `.xml`, `.md`, etc.).
    -   [ ] Implement the critical metadata enrichment for each chunk (`language`, `unit_type`, `unit_name`, `start_line`, `end_line`).
-   [ ] **Support Additional File Types:**
    -   [ ] Add parsers for shell scripts (`.sh`, `.bat`).
    -   [ ] Add specific handling for configuration files, especially for **Spring Boot** (`.properties`, `.yml`) and **Gradle** (`.gradle`, `.kts`).
-   [ ] **Implement Advanced Context Strategy:**
    -   [ ] Develop a mechanism to prepend relevant context (e.g., file-level imports, class docstrings) to function-level chunks.

### Phase 2: RAG Pipeline & UI
-   [ ] **Integrate Vector & Document Databases:**
    -   [ ] Connect the Celery worker to **Weaviate** to store code chunks and their embeddings.
    -   [ ] Connect to **MongoDB** to store project metadata and file information.
-   [ ] **Build the RAG Chain:**
    -   [ ] Implement the full retrieval and generation logic using LangChain, Ollama, and Weaviate.
-   [ ] **Develop a User Interface:**
    -   [ ] Create a web UI for managing projects and uploaded files.
    -   [ ] Implement a "Code Documentation Generator" feature.
    -   [ ] Build the core "Chat with your Code" interface to interact with the RAG pipeline.

🤝 Contributing

Contributions are what make the open-source community such an amazing place to learn, inspire, and create. Any contributions you make are greatly appreciated. Please fork the repo and create a pull request.


## 📜 License

Distributed under the MIT License. See `LICENSE` for more information.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
app		app
uploads		uploads
Dockerfile		Dockerfile
README.md		README.md
docker-compose.yml		docker-compose.yml
manage.sh		manage.sh
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

LeCoNav: Legacy Code Navigator

🏛️ System Architecture

🚀 Getting Started

Prerequisites

Installation & Launch

2. Check Processing Status

🤝 Contributing

About

Uh oh!

Releases

Packages

Languages

mfdogalindo/LeCoNav

Folders and files

Latest commit

History

Repository files navigation

LeCoNav: Legacy Code Navigator

🏛️ System Architecture

🚀 Getting Started

Prerequisites

Installation & Launch

2. Check Processing Status

🤝 Contributing

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages