BSCode-Optimizer

BSCode-Optimizer is an advanced AI-driven tool designed for analyzing, optimizing, and fixing code inefficiencies and bugs. It leverages state-of-the-art generative models such as BSJCode-1-Stable, Code Llama, and Terjman-Ultra to provide robust support for code refinement, error detection, and multilingual explanations. This tool is ideal for developers, educators, and researchers looking to streamline their workflows and enhance code quality.

Features

Code Analysis: Identifies inefficiencies, bugs, and potential optimizations in code.
Code Optimization: Rewrites and optimizes code for better performance and readability.
Multilingual Support: Explains and translates code-related instructions in Moroccan Darija, English, and other languages.
Customizable Pipelines: Supports BSJCode-1-Stable for structured code optimization, Code Llama for code understanding, and Terjman-Ultra for multilingual translations.
Extensibility: Easily integrate into existing workflows via APIs or Hugging Face’s Transformers library.

Architecture Overview

The BSCode-Optimizer pipeline is built with multiple components to handle diverse requirements:

BSCode-Optimizer Global Architecture

This image illustrates the high-level workflow of the BSCode-Optimizer, showing the interaction between different components and data flows.

BSCode-Optimizer Model Architecture

This diagram provides a detailed view of the architecture, including how models like BSJCode-1-Stable, Code Llama, and Terjman-Ultra interact within the pipeline.

BSCode-Optimizer Front-End Architecture

The front-end of BSCode-Optimizer is designed to provide an intuitive user experience, enabling seamless interaction with the API and models.

BSCode Architecture Diagram

This diagram provides a technical overview of how different modules of BSCode-Optimizer are connected and function together.

Installation

Prerequisites

Python 3.8+
CUDA-enabled GPU (optional for accelerated inference)
Hugging Face Transformers

Install Required Libraries

pip install transformers torch accelerate flask pyngrok

Clone the Repository

git clone https://github.com/your-repo/BSCode-Optimizer.git
cd BSCode-Optimizer

Usage

Using the Flask API

The BSCode-Optimizer includes a Flask API for seamless integration into workflows.

Start the Flask Server

python app.py

Example API Call

curl -X POST http://localhost:5000/predict \
-H "Content-Type: application/json" \
-d '{
    "input": "public int findMin(List<Integer> nums) { ... }",
    "service": "Pro"
}'

Response

{
    "output": "Optimized and fixed code output here."
}

Direct Model Interaction

You can also interact with the models directly using the following example:

BSJCode-1-Stable Example

from transformers import AutoModelForCausalLM, AutoTokenizer

# Load the model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("BSAtlas/BSJCode-1-Stable")
model = AutoModelForCausalLM.from_pretrained("BSAtlas/BSJCode-1-Stable")

# Input code snippet
code_snippet = "public int findMin(List<Integer> nums) { ... }"

# Tokenize input
tokens = tokenizer(code_snippet, return_tensors="pt")

# Generate optimized code
output = model.generate(**tokens)
optimized_code = tokenizer.decode(output[0], skip_special_tokens=True)
print(optimized_code)

Terjman-Ultra Translation

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

# Load the tokenizer and model
tokenizer = AutoTokenizer.from_pretrained("atlasia/Terjman-Ultra")
model = AutoModelForSeq2SeqLM.from_pretrained("atlasia/Terjman-Ultra")

# Translate code explanation to Darija
input_text = "Explain this code snippet in Moroccan Darija."
tokens = tokenizer(input_text, return_tensors="pt")
translation = model.generate(**tokens)
output_text = tokenizer.decode(translation[0], skip_special_tokens=True)
print(output_text)

Team and Contributors

The development of BSCode-Optimizer was a collaborative effort led by an expert team of AI researchers, software engineers, and linguists. Key contributors include:

Core Team

Marouane Asmih
Ahmed Elmessaoudi
Habbach Oualid
LamrabetChaymae

Technologies and Tools Used

Programming Languages: Python, Java
Frameworks: Flask, Hugging Face Transformers
Libraries: PyTorch, BitsAndBytes
Hardware: NVIDIA A100 GPUs for training and inference
Datasets: Custom-curated datasets for code and multilingual tasks

Model Details

BSJCode-1-Stable

Purpose: Java code optimization and error detection
Key Features:
- Identifies bugs
- Provides optimized solutions with O-complexity analysis
Training Dataset: Java-focused dataset with curated examples
Inference Time: <200ms per code snippet on A100 GPU

Code Llama (7B-Instruct)

Purpose: General-purpose code synthesis and understanding
Capabilities:
- Code completion
- Infilling
- Instruction following
Training Period: January 2023 – July 2023

Terjman-Ultra

Purpose: Multilingual code explanations and translations
Capabilities:
- Transliteration from Moroccan Darija (Arabizi)
- Accurate Arabic-English translations
Performance:
- Loss: 2.7070
- BLEU: 4.6998
- Gen Len: 35.6088 tokens

Training and Fine-Tuning

BSJCode-1-Stable

Dataset: Java-specific curated dataset
GPU: NVIDIA A100-40GB
Training Time: 32 hours

Terjman-Ultra

Base Model: facebook/nllb-200-1.3B
Dataset: darija_english dataset with curated corpora
Hyperparameters:
- Learning Rate: 3e-05
- Epochs: 25
- Batch Size: 16

Limitations

BSJCode-1-Stable:
- May not handle non-Java languages effectively.
Code Llama:
- Accuracy can vary for large, complex codebases.
Terjman-Ultra:
- Limited by the quality of tokenizer for transliterations.

Ethical Considerations

Unpredictable Outputs: As with all LLMs, outputs may contain inaccuracies or undesirable content.
Responsible Use: Follow Meta’s Responsible Use Guide.
Privacy: Avoid using sensitive or proprietary code as input.

License

BSJCode-1-Stable: Custom commercial license
Code Llama: Meta’s custom commercial license
Terjman-Ultra: Available under Hugging Face model card terms

Feedback and Contributions

We welcome contributions, feedback, and suggestions to improve BSCode-Optimizer. Please open an issue or submit a pull request on the GitHub repository.

Acknowledgments

This project utilizes advanced language models and training techniques from the following:

Meta’s Code Llama
BSJCode-1-Stable by BSAtlas
Terjman-Ultra for multilingual capabilities

Special thanks to the global developer community and contributors who provided invaluable feedback and support during development.

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
.idea		.idea
front-end		front-end
micro-services		micro-services
model		model
.gitignore		.gitignore
BSCode-architecture.drawio		BSCode-architecture.drawio
README.md		README.md
docker-compose.yml		docker-compose.yml
front-end.jpg		front-end.jpg
global-architecture.jpg		global-architecture.jpg
model-architecture.jpg		model-architecture.jpg

walidhabbach/BSCode-optimizer

Folders and files

Latest commit

History

Repository files navigation

BSCode-Optimizer

Features

Architecture Overview

BSCode-Optimizer Global Architecture

BSCode-Optimizer Model Architecture

BSCode-Optimizer Front-End Architecture

BSCode Architecture Diagram

Installation

Prerequisites

Install Required Libraries

Clone the Repository

Usage

Using the Flask API

Start the Flask Server

Example API Call

Response

Direct Model Interaction

BSJCode-1-Stable Example

Terjman-Ultra Translation

Team and Contributors

Core Team

Technologies and Tools Used

Model Details

BSJCode-1-Stable

Code Llama (7B-Instruct)

Terjman-Ultra

Training and Fine-Tuning

BSJCode-1-Stable

Terjman-Ultra

Limitations

Ethical Considerations

License

Feedback and Contributions

Acknowledgments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Uh oh!

Languages

Packages