A sample web application for extracting text from images using Google Genkit and vision models served by Ollama. This project demonstrates how to integrate these technologies to build AI applications with local language models, optimized for developer laptops.
- πΌοΈ Multiple Input Methods: Upload images via drag-and-drop, file selection, URL, or paste from clipboard
- π€ Multiple Vision Models: Support for LLaVA (7B, 13B, 34B) and Gemma 3 (27B) vision models through Ollama
- π¨ Modern UI: Compact layout with side-by-side input/output, dark mode support
- β‘ Real-time Streaming: See results as they're generated with live updates
- π§ Customizable Prompts: Pre-built templates for common use cases and custom prompt support
- π Multiple Output Formats: View results as text, JSON, or markdown
- πΎ Export Options: Download or copy extracted text with one click
- π Compact Image Preview: Zoom and rotate images without excessive scrolling
- π Sticky Results Panel: Results stay visible while you adjust settings
- π¨ Smart Error Handling: Clear messages when Ollama is not running or models are not installed
- Node.js 18.x or later
- Ollama installed and running
- At least one vision-capable model installed (e.g.,
llava:7b)
# Install recommended LLaVA vision model (default)
ollama pull llava:7b # Fast, good quality, 4.7GB (recommended)
# Alternative models
ollama pull llava:13b # Better accuracy, 7.3GB
ollama pull llava:34b # Best accuracy, 20GB
ollama pull gemma3:27b # Google's large model, 17GBNote: This sample has been tested with LLaVA and Gemma models. Other vision-capable models available in Ollama should work as well. The app will automatically detect all installed models that support vision capabilities.
- Clone this repository:
git clone <your-repo-url>
cd genkit-vision-nextjs- Install dependencies:
npm install- Start the development server:
npm run dev- Open http://localhost:3000 in your browser
genkit-vision-nextjs/
βββ app/
β βββ api/
β β βββ extract-text/ # Genkit flow API endpoint
β β βββ models/ # Model availability endpoint
β βββ components/ # React components
β β βββ Header.tsx # App header with theme toggle
β β βββ ImageUpload.tsx # Image input component
β β βββ ImagePreview.tsx # Image viewer with controls
β β βββ ModelSelector.tsx # Model selection dropdown
β β βββ PromptInput.tsx # Prompt customization
β β βββ ExtractionResults.tsx # Results display
β βββ hooks/ # Custom React hooks
β βββ page.tsx # Main application page
βββ lib/
β βββ genkit/
β β βββ config.ts # Genkit and model configurations
β β βββ flows.ts # Text extraction flow
β βββ utils.ts # Utility functions
βββ public/ # Static assets
This app uses the genkitx-ollama plugin to integrate with local vision models. The core logic is defined in a Genkit flow:
// lib/genkit/flows.ts
import { ai } from './config';
import { ollama } from 'genkitx-ollama';
export const extractTextFromImage = ai.defineFlow(
{
name: 'extractTextFromImage',
// ... schema definitions
},
async (input) => {
const response = await ai.generate({
model: ollama.model(input.model),
prompt: [
{ text: input.prompt },
{ media: { contentType: 'image/jpeg', url: `data:image/jpeg;base64,${input.imageBase64}` } },
],
});
return { extractedText: response.text };
}
);The Genkit flow is exposed as a Next.js API route using the appRoute helper:
// app/api/extract-text/route.ts
import { appRoute } from '@genkit-ai/next';
import { extractTextFromImage } from '@/lib/genkit/flows';
export const POST = appRoute(extractTextFromImage);The React frontend uses Genkit's client SDK for type-safe API calls with streaming:
import { streamFlow } from '@genkit-ai/next/client';
import type { extractTextFromImage } from '@/lib/genkit/flows';
// Use streamFlow for streaming responses
const { stream, output } = streamFlow<typeof extractTextFromImage>({
url: '/api/extract-text',
input: {
model: selectedModel,
imageBase64: base64Image,
prompt: extractionPrompt,
}
});
// Process streaming chunks
for await (const chunk of stream) {
console.log(chunk);
}
// Get final result
const result = await output;Create a .env.local file:
# Ollama API endpoint (optional, defaults to http://localhost:11434)
OLLAMA_API_URL=http://localhost:11434The app uses the genkitx-ollama plugin with dynamic model discovery:
// lib/genkit/config.ts
import { genkit } from 'genkit';
import { ollama } from 'genkitx-ollama';
export const ai = genkit({
plugins: [
ollama({
serverAddress: process.env.OLLAMA_SERVER_ADDRESS || 'http://127.0.0.1:11434',
}),
],
});The app automatically discovers all vision-capable models installed in Ollama. No hardcoded model list is required!
- Upload an Image: Drag and drop, select a file, paste from clipboard, or provide a URL
- Select a Model: Choose from available vision models (green checkmark indicates installed models)
- Customize the Prompt: Use preset prompts or create your own
- Extract Text: Click the button to start extraction
- View Results: See extracted text with streaming updates
- Export: Copy to clipboard or download as text/JSON
For the best development experience, we recommend running the Next.js frontend and the Genkit runtime in separate terminal sessions. This allows you to see logs from both processes independently and ensures the Genkit Developer UI functions correctly.
-
Start the Genkit Runtime:
Open a terminal and run the following command to start the Genkit runtime with hot-reloading. This will also launch the Genkit Developer UI.
npm run genkit:watch
The Genkit Developer UI will be available at
http://localhost:4000. -
Start the Frontend Application:
In a second terminal, run the following command to start the Next.js development server.
npm run dev
Your application will be available at
http://localhost:9002.
Note: It is important to run the Genkit runtime (genkit start) separately from the frontend development server (npm run dev). Attempting to run them together with a command like genkit start -- npm run dev can lead to connection issues with the Genkit Developer UI, as the Next.js server runs in its own process and may not expose the necessary hooks for the UI to connect to the runtime.
To create a standalone build of the application, run the following commands:
npm run build
npm startIf the app can't connect to Ollama:
- Ensure Ollama is running:
ollama serve - Check the API endpoint in your environment variables
- Verify models are installed:
ollama list
If a model shows as unavailable:
- Install it with Ollama:
ollama pull model-name - Refresh the page to update model availability
- Ensure images are under 10MB
- Supported formats: PNG, JPG, JPEG, GIF, WebP
- For URLs, ensure CORS is enabled on the image server
This project is provided as-is, and I do not plan to accept pull requests. Please feel free to fork the repository and make any changes you'd like.
This project is licensed under the MIT License - see the LICENSE file for details.
This software is provided "as is", without warranty of any kind, express or implied, including but not limited to the warranties of merchantability, fitness for a particular purpose and noninfringement. In no event shall the authors or copyright holders be liable for any claim, damages or other liability, whether in an action of contract, tort or otherwise, arising from, out of or in connection with the software or the use or other dealings in the software.
- Google Genkit for AI orchestration
- Ollama for local model serving
- Next.js for the web framework
- Tailwind CSS for styling
