rao-search is a Python-based image gallery application that leverages Google Gemini AI to automatically generate captions, tags, and vector embeddings for images. It stores image metadata and embeddings in a PostgreSQL database with the pgvector extension, enabling semantic search capabilities on the image library.
- Automatically indexes images in a specified local directory.
- Automatically switches to fallback models if your default models aren't working.
- Uses Google Gemini AI for image analysis: captions, tags, and object detection.
- Stores image metadata, tags, and vector embeddings in PostgreSQL.
- Supports semantic search of images based on natural language queries.
- Handles GIFs by converting them into sprite sheets for analysis.
- Async and concurrent indexing with retry and exponential backoff.
- OAuth2 and API key authentication for Gemini API.
- Colorized logging to console and file.
- Exception handling for robust error management.
- Python 3.10+
- PostgreSQL with
pgvectorextension installed - Google Cloud credentials for Gemini API (OAuth client secrets or API key)
- Packages as specified in code imports (e.g.,
colorlog,pydantic,sqlalchemy[asyncio],google-auth,google-genai,pgvector,pillow,python-dotenv)
- Clone the repository.
- Create a Python virtual environment and activate it. Example: On Linux/MacOS you can type
python3 -m venv .venvthensource .venv/bin/activate - Install required packages, for example:
pip install -r requirements.txt- Create a
.envfile to configure the environment variables or modify the default values in theSettingsclass. - Place your Google OAuth client secret JSON in
.secrets/client_secret.jsonor set the API key in.env. You can change your secret and temp directories with cli arguments or your.env. - Ensure PostgreSQL is running and accessible with
pgvectorextension installed.
The application is configured via environment variables or CLI, with defaults defined in the Settings class:
PROJECT_ID: Google Cloud Project ID. (Mandatory if using OAUTH)LOCATION: Google Cloud location (defaultus-central1). (Mandatory if using OAUTH)CLIENT_SECRETS_FILE: Path to OAuth client secrets JSON. (Mandatory if using OAUTH). You can find instructions on how to make your client file here in the Rclone documentation.TOKEN_DIR: Directory for OAuth token storage.GOOGLE_API_KEY: API key for Gemini API (if using API auth).CRED_TYPE: Authentication type,"API"or"OAUTH".DB_URL: Async PostgreSQL URL withasyncpgdriver.AI_MODEL: Gemini AI model for image analysis.EMBEDDING_MODEL: Model for text embedding.VECTOR_DIMENSION: Dimensionality of embeddings (default 768).IMAGE_GALLERY: Directory containing images to index.TEMP_DIRECTORY: Temp directory for GIF conversions.LOG_DIR: Directory to save logs.SEMAPHORE_LIMIT: Number of concurrent indexing tasks.INDEX_TASK_DELAY: Delay in seconds between indexing tasks.MAX_RETRIES: Number of retries for Gemini API calls.IMAGE_EXTENSIONS: Allowed image file extensions. The default is what file types Gemini currently allows.NO_RETRY_DELAY: Set to True if you don't want tasks to wait before starting.
See the .env.sample for inspiration in your own config. The Settings class of the app.py will always have the most up to date selection of env and cli parementers.
Run the main application:
python rao_search.pyThe application will:
- Initialize logging and database.
- Authenticate with Google Gemini using OAuth or API key. API is default.
- Scan the image gallery directory for new or incomplete images.
- Index the images asynchronously, performing:
- Upload to Gemini API.
- AI analysis for captions, tags, objects.
- Generate embedding vectors.
- Store results in PostgreSQL.
- Perform a sample semantic search.
- Clean up temporary files.
Settingsclass manages env vars and defaults with Pydantic.- Supports CLI args and
.envfiles.
- Uses
colorlogfor color console logs. - Writes detailed debug logs to files with timestamped names.
- Custom exceptions for error handling:
RaoSearchError,PermanentAnalysisError,DatabaseError,ConfigurationError,FileProcessingError.
- SQLAlchemy declarative models:
User,Image,Tag,ImageTagXref,FeatureVector.
- Includes relations and constraints for robust data integrity.
ObjectDetectionandImageAnalysismodel Gemini AI response structure for validation.
- Retries async functions with exponential backoff on transient errors.
- Handles authentication (OAuth or API key).
- Wraps Gemini AI API calls for models, file upload, analysis, embedding, and file deletion.
- Error handling and model caching.
- Async PostgreSQL engine and session management.
- Methods for initialization, user creation, image fetching, vector search.
- Convert GIFs to sprite sheets for animation analysis.
- Identify MIME types.
- Cleanup temp directory.
- Process images: upload, analyze, embed, and store in DB.
- Concurrent task runner with semaphore and delay.
- Performs semantic search test and logs results.
- Graceful shutdown with error handling.
- The application assumes that the
pgvectorextension is enabled in PostgreSQL. Initialization script enables it if missing. - Handles OAuth authentication headlessly with
NO_BROWSERflag by prompting for a code. - Limits tags and objects per image for structured AI output.
- Designed for extensibility with retries and structured logging.
- Supports semantic similarity search using vector L2 distance.
- Free Tier, and therefore
OAUTHwill likely reach its rate limit and quota frequently. Consider a paid tier if needed. See Gemini documentation for latest rate limit information.
After indexing, a search for "Give me images that include cats." will display the closest matching images with captions and similarity scores.
(.venv) $ python app.py -q "Give me images that include cats."
2025-11-16 20:27:19 - INFO - --- Rao Search Application Starting ---
2025-11-16 20:27:19 - INFO - Connecting to database...
2025-11-16 20:27:20 - INFO - Database initialization complete.
2025-11-16 20:27:20 - INFO - Initializing Gemini client with API Key...
2025-11-16 20:27:20 - INFO - Temporary directory cleaned up.
2025-11-16 20:27:20 - INFO - Skipped 10 already indexed images.
2025-11-16 20:27:20 - INFO - Image gallery is fully indexed and up to date.
2025-11-16 20:27:20 - INFO -
==================================================
PERFORMING SEMANTIC SEARCH TEST
==================================================
2025-11-16 20:27:20 - INFO - Search Query: 'Give me images that include cats.'
2025-11-16 20:27:20 - INFO - Fetching available models from the API...
2025-11-16 20:27:20 - DEBUG - Found 40 AI models and 5 embedding models.
2025-11-16 20:27:20 - INFO - Generating embedding for 1 strings with model 'text-embedding-004'...
- File: pinterest_836051118369623033.jpg (ID: 3)
Caption: A white image displays five separate humorous short stories or jokes typed in black text.
Distance (L2): 0.9853
- File: pinterest_836051118369657507.jpg (ID: 1)
Caption: A classical painting of a woman sleeping in a chair, overlaid with a humorous text about naps and waking up twice in one day.
Distance (L2): 1.0093
- File: pinterest_836051118369605133.jpg (ID: 8)
Caption: A text-based meme expressing a feeling of being largely feral and unable to integrate back into society.
Distance (L2): 1.0106
2025-11-16 20:27:21 - INFO - --- Application Finished ---
- Investigate if I can somehow use OAUTH to allow the Gemini SDK to read directly from Google Drive/Photos
- Possibly adapt this script to allow generation from other models (maybe using OpenAI adapters)
- Maybe turn this into a proper cli package and then fork this to create a web frontend.
- Investigate making the cli prettier with Rich.
- Investigate if I need to normalize the Distance measurements. 😓