needllm

This tool allows you to quickly assess a repository and get recommendations for the most suitable LLM based on context window sizes. It helps optimize your workflow when working with AI models on your codebase.

Features

Traverses the repository and counts tokens in relevant files.
Calculates the total token count for the entire repository or a specified path.
Compares the token count with different LLM context windows.
Provides recommendations based on the percentage of the repository that fits within each LLM's context window.
Supports caching of downloaded repositories for faster subsequent analyses.

Installation

Clone the repository:

git clone https://github.com/yourusername/needllm.git
cd needllm

Create and activate a virtual environment:

python3 -m venv venv
source venv/bin/activate  # On Windows use `venv\Scripts\activate`

Install the required dependencies:
```
pip install -r requirements.txt
```

Usage

Basic usage:

python llm_analyzer.py https://github.com/username/repo

Analyze a specific path within the repository:

python llm_analyzer.py https://github.com/username/repo --path src/

Include README and LICENSE files:

python llm_analyzer.py https://github.com/username/repo --include-readme-license

Include .git files:

python llm_analyzer.py https://github.com/username/repo --include-git-files

Use caching to speed up subsequent analyses:

python llm_analyzer.py https://github.com/username/repo --cache

Options

--path: Specify a path within the cloned directory to analyze. Only files within this path will be processed.
--include-readme-license: Include README and LICENSE files without extensions in the analysis.
--include-git-files: Include files under the .git folder in the analysis.
--cache: Use local cache for downloaded repositories to speed up subsequent analyses.
--cache-dir: Specify a custom directory to store cached repositories (default is ~/.llm_analyzer_cache).

Considerations

File Types: The tool focuses on common code and documentation file types. You may need to adjust the file extensions based on your specific needs.
Tokenization: The tool uses the GPT-3.5 Turbo tokenizer. For more accurate results with specific models, you might want to use model-specific tokenizers.
Context Relevance: Remember that larger context windows don't always lead to better performance. The tool provides a simple coverage percentage, but you should also consider the nature of your tasks and the complexity of your codebase.
Performance: For very large repositories, you might want to implement multiprocessing to speed up the analysis.
Memory Usage: Be mindful of memory usage when processing large files or repositories.
Caching: The --cache option can significantly speed up repeated analyses of the same repository. However, it may use additional disk space to store the cached repositories.

Example

python llm_analyzer.py https://github.com/bitcoin/bitcoin --path src/index --cache

This command will analyze only the src/ directory of the Bitcoin repository, use caching for faster subsequent runs, and provide LLM recommendations based on the token count of files in that directory.

Deactivating the Virtual Environment

When you're done using the tool, you can deactivate the virtual environment:

deactivate

This will return you to your global Python environment.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.gitignore		.gitignore
README.md		README.md
llm_analyzer.py		llm_analyzer.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

needllm

Features

Installation

Usage

Options

Considerations

Example

Deactivating the Virtual Environment

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

AlonzoRicardo/needllm

Folders and files

Latest commit

History

Repository files navigation

needllm

Features

Installation

Usage

Options

Considerations

Example

Deactivating the Virtual Environment

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages