Local LLM Integration Example

This repository demonstrates how to integrate a Local Language Model into a Python application. While public LLM APIs (e.g., OpenAI, Cohere) are popular, they can be expensive or rate-limited. These limits are enough to reduce how much exploration a programmer does.

By experimenting with a locally hosted model, you can freely explore prompt engineering, data extraction, and more—without incurring API costs.

We're using a 2GB-sized model (phi-3.1-mini-128k-instruct) for simplicity, but you can also experiment with larger models (e.g., 20GB, like Cohere's Command R) for more advanced tasks. This local setup is enough to try out basic tasks such as structured data extraction from text.

Why a Local LLM?

Cost-Effective: No pay-per-call API fees.
Faster Iteration: Experiments run locally, no network latency.
Privacy: Your data never leaves your machine.

Sample Prompt

In this application, we will use the following sample prompt to extract email addresses from text:

You can find several other potential prompts in the SAMPLE-PROMPTS.md file.

Extracting Email Addresses

Prompt:

Extract all email addresses from the text below. Provide the emails in a JSON list.

Text:
"Hello David, please reach out to sarah.jones@example.com and support@mycompany.org. 
Also, don’t forget to CC marketing-team@website.co.uk."

Expected Output (Example):

[
  "sarah.jones@example.com",
  "support@mycompany.org",
  "marketing-team@website.co.uk"
]

Installation & Setup

1. Install LM Studio

Download LM Studio from lmstudio.ai.
Follow the installation instructions for your operating system.
Start LM Studio, and verify it's running on http://127.0.0.1:1234 (the default).

Tip: If LM Studio doesn't start on that port, check its Preferences > API settings.

2. Model Configuration

Search for a Model: In LM Studio, click Models → Add New Model (or something similar) to browse available models.
Install the phi-3.1-mini-128k-instruct Model: This is a ~2GB model that can handle simple tasks like data extraction. It's enough to demonstrate the flow without using too much GPU/CPU.
Load Your Model: In LM Studio, ensure the newly downloaded model is loaded and “running” (LM Studio usually shows a green check or “ready” status).
Check the API: Confirm that LM Studio's local server is running by visiting http://127.0.0.1:1234/v1/models. You should see a JSON list of models, including "phi-3.1-mini-128k-instruct".

3. Local Environment Setup

This repo uses pipenv for dependency management. Make sure you have Python 3.12 (or similar) installed.

Fork (or import/copy) this repository to your own GitHub account, keep the name local-llm-integration-example.
Clone this repository (e.g., via GitHub).
Install Dependencies:
```
pipenv install
```
This will install requests, openai, and loguru.
Run the Script:
```
pipenv run python main.py
```
- The script will:
  1. Check that LM Studio is up and that your requested model is available.
  2. Send a prompt to the model to extract data (e.g., email addresses).
  3. Print out the raw or parsed JSON response.

Suggestions for Experimentation

Change Log Level to DEBUG
In main.py, the line:
```
loguru.logger.add(sys.stderr, level="INFO")
```
sets the default log level. Change "INFO" to "DEBUG" if you'd like to see more detailed logs about request payloads and model responses.
Inspect the JSON Outputs
Each request's text output is attempted to be parsed as JSON via helper.extract_json(). Sometimes the model includes extra text or formatting around the JSON. If you want to pretty-print the final JSON, you can use tools like jsonformatter.org/json-pretty-print or Python's json.dumps(obj, indent=2) in your code.
Try Different Prompts
The included example asks for email extraction. You can experiment with other data-extraction tasks (like phone numbers, product details, or structured outputs). Adjust the prompt in main.py and see how the local model handles it.

Happy experimenting with your Local LLM! If you run into issues, make sure that:

LM Studio is running and the correct port (default: 1234) is used.
Your model is actually loaded in LM Studio and shows up in GET /v1/models.
You have enough system resources to run the model (2GB in memory usage or more, depending on your hardware).

Enjoy building your own offline GPT-like applications!

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.gitignore		.gitignore
LICENSE		LICENSE
Pipfile		Pipfile
Pipfile.lock		Pipfile.lock
README.md		README.md
SAMPLE-PROMPTS.md		SAMPLE-PROMPTS.md
helper.py		helper.py
main.py		main.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Local LLM Integration Example

Why a Local LLM?

Sample Prompt

Extracting Email Addresses

Installation & Setup

1. Install LM Studio

2. Model Configuration

3. Local Environment Setup

Suggestions for Experimentation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Local LLM Integration Example

Why a Local LLM?

Sample Prompt

Extracting Email Addresses

Installation & Setup

1. Install LM Studio

2. Model Configuration

3. Local Environment Setup

Suggestions for Experimentation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages