diff --git a/REPOSITORY_OVERVIEW.md b/REPOSITORY_OVERVIEW.md
new file mode 100644
index 00000000..300e21b8
--- /dev/null
+++ b/REPOSITORY_OVERVIEW.md
@@ -0,0 +1,544 @@
+# iSamples in a Box - Repository Overview
+
+## Table of Contents
+1. [Project Purpose](#project-purpose)
+2. [Architecture Overview](#architecture-overview)
+3. [Key Components](#key-components)
+4. [Getting Started](#getting-started)
+5. [Data Flow](#data-flow)
+6. [Key Scripts and Entry Points](#key-scripts-and-entry-points)
+7. [API Endpoints](#api-endpoints)
+8. [Development Workflow](#development-workflow)
+9. [Testing](#testing)
+10. [Deployment](#deployment)
+
+## Project Purpose
+
+**iSamples in a Box** (ISB) is a comprehensive Python-based system for aggregating, managing, and providing access to geological and environmental sample metadata from multiple authoritative sources. The system enables researchers and institutions to:
+
+- **Harvest** sample data from multiple repositories (SESAR, GEOME, Smithsonian, OpenContext)
+- **Store** sample records in a PostgreSQL database with full metadata
+- **Index** relationships and searchable metadata in Apache Solr for fast querying
+- **Expose** data through a REST API using FastAPI
+- **Browse** samples through a web UI
+- **Mint** identifiers (DataCite DOIs) with ORCID authentication
+- **Search** geospatially using H3 hexagon-based heatmaps
+
+**Current Version:** 0.5.1
+**License:** Apache 2.0
+**Python Version:** 3.11+
+
+## Architecture Overview
+
+```
+┌─────────────────────────────────────────────────────────────┐
+│                    Data Sources                              │
+│  SESAR  │  GEOME  │  Smithsonian  │  OpenContext            │
+└────┬────────┬───────────┬──────────────┬─────────────────────┘
+     │        │           │              │
+     │   Source Adapters (isb_lib/*_adapter.py)
+     │        │           │              │
+     ▼        ▼           ▼              ▼
+┌────────────────────────────────────────────────────────────┐
+│              Metadata Transformers                          │
+│         (isamples_metadata/*Transformer.py)                 │
+└────────────────────────┬───────────────────────────────────┘
+                         │
+                         ▼
+        ┌────────────────────────────────┐
+        │     PostgreSQL Database        │
+        │  (SQLModel ORM - Thing model)  │
+        └────────────────┬───────────────┘
+                         │
+                         ▼
+        ┌────────────────────────────────┐
+        │      Apache Solr Index         │
+        │   (isb_core_records collection)│
+        └────────────────┬───────────────┘
+                         │
+                         ▼
+        ┌────────────────────────────────┐
+        │   FastAPI Web Service          │
+        │   (isb_web/main.py)            │
+        │   - REST API                    │
+        │   - Web UI (Jinja2 templates)  │
+        │   - Export Service             │
+        └────────────────────────────────┘
+```
+
+## Key Components
+
+### 1. Core Library (`isb_lib/`)
+
+The heart of the system, containing business logic and utilities:
+
+- **`core.py`** (803 lines) - Core utilities including date parsing, validation, vocabulary management
+- **Source Adapters**:
+  - `sesar_adapter.py` - SESAR (System for Earth Sample Registration)
+  - `geome_adapter.py` - GEOME (Genomic Observatories Metadatabase)
+  - `smithsonian_adapter.py` - Smithsonian Institution collections
+  - `opencontext_adapter.py` - Open Context archaeological data
+- **`models/`** - SQLModel ORM definitions:
+  - `thing.py` - Core `Thing` model representing a sample
+  - `isb_core_record.py` - Extended metadata model
+  - `export_job.py` - Export job tracking
+  - `namespace.py` - Identifier namespaces
+- **`identifiers/`** - Identifier minting (DataCite DOIs, N2T ARKs)
+- **`vocabulary/`** - Controlled vocabulary management
+- **`utilities/`** - Helper utilities (H3 geospatial, Solr transformations)
+- **`sitemaps/`** - Sitemap generation for search engines
+- **`authorization/`** - User authentication and authorization
+
+### 2. Web Service (`isb_web/`)
+
+FastAPI-based REST API and web interface:
+
+- **`main.py`** (931 lines) - Main FastAPI application with all routes
+- **`sqlmodel_database.py`** (630 lines) - Database access object (DAO)
+- **`isb_solr_query.py`** - Solr query builder and executor
+- **`export.py`** - Data export service (CSV, JSONL)
+- **`manage.py`** - User and identifier management
+- **`auth.py`** - ORCID OAuth authentication
+- **`templates/`** - Jinja2 HTML templates for web UI
+- **`static/`** - CSS, JavaScript, controlled vocabulary JSON files
+
+### 3. Metadata Transformation (`isamples_metadata/`)
+
+Transforms source data to standardized iSamples schema:
+
+- **Transformers** for each source (SESAR, GEOME, OpenContext, Smithsonian)
+- **Controlled vocabularies** for consistent categorization
+- **Taxonomy mappings** for biological classifications
+
+### 4. Scripts (`scripts/`)
+
+CLI tools for data management (22+ scripts):
+
+**Main Entry Points:**
+- `sesar_things.py` - Load and index SESAR samples
+- `geome_things.py` - Load and index GEOME samples
+- `opencontext_things.py` - Load OpenContext samples
+- `smithsonian_things.py` - Load Smithsonian samples
+- `isb_things.py` - General ISB operations
+
+**Utility Scripts:**
+- `dump_thing_json.py` - Export Thing records as JSON
+- `create_sql_lite_dump.py` - Create SQLite database dumps
+- `load_isamples_vocabularies.py` - Load controlled vocabularies
+- `migrations/` - Database migration utilities
+
+## Getting Started
+
+### Prerequisites
+
+- Python 3.11+
+- PostgreSQL
+- Apache Solr 8.8+
+- Poetry (Python dependency management)
+
+### Quick Setup
+
+1. **Clone and setup Python environment:**
+```bash
+git clone git@github.com:isamplesorg/isamples_inabox.git
+cd isamples_inabox
+poetry install
+```
+
+2. **Setup PostgreSQL:**
+```bash
+psql postgres
+CREATE DATABASE isb_1;
+CREATE USER isb_writer WITH ENCRYPTED PASSWORD 'your_password';
+GRANT ALL PRIVILEGES ON DATABASE isb_1 TO isb_writer;
+```
+
+3. **Setup Solr:**
+```bash
+solr create -c isb_core_records
+python scripts/solr_schema_init/create_isb_core_schema.py
+```
+
+4. **Create configuration file (`isb.cfg`):**
+```ini
+db_url = "postgresql+psycopg2://isb_writer:your_password@localhost/isb_1"
+solr_url = "http://localhost:8983/solr/isb_core_records/"
+max_records = 1000
+verbosity = "INFO"
+```
+
+5. **Load sample data:**
+```bash
+poetry run sesar_things --config isb.cfg load -m 5000
+poetry run sesar_things --config isb.cfg relations
+```
+
+6. **Start web service:**
+```bash
+python isb_web/main.py
+# Navigate to http://localhost:8000/
+```
+
+## Data Flow
+
+### 1. Ingestion Flow
+
+```
+Source API → Adapter → Transformer → PostgreSQL Thing Table
+                                   ↓
+                                   Solr Index (for search)
+```
+
+**Example: Loading SESAR data**
+```bash
+poetry run sesar_things --config isb.cfg load -m 5000
+poetry run sesar_things --config isb.cfg relations
+```
+
+### 2. Query Flow
+
+```
+User/API Request → FastAPI (isb_web/main.py)
+                 ↓
+                 Solr Query (for search/filter)
+                 ↓
+                 PostgreSQL (for full record details)
+                 ↓
+                 JSON Response
+```
+
+### 3. Export Flow
+
+```
+User → Export API (/export/create?q=...&format=CSV)
+     ↓
+     Export Job Created (UUID returned)
+     ↓
+     Background Worker queries Solr
+     ↓
+     Results transformed (SolrResultTransformer)
+     ↓
+     File written (/tmp/{uuid}.csv or .jsonl)
+     ↓
+     User downloads via /export/download?uuid=...
+```
+
+## Key Scripts and Entry Points
+
+### Web Service
+
+```bash
+# Start FastAPI server (dev mode)
+python isb_web/main.py
+
+# Production deployment uses uvicorn:
+uvicorn isb_web.main:app --host 0.0.0.0 --port 8000
+```
+
+### Data Loading (via Poetry)
+
+```bash
+# SESAR samples
+poetry run sesar_things --config isb.cfg load -m 5000
+poetry run sesar_things --config isb.cfg relations
+
+# GEOME samples
+poetry run geome_things --config isb.cfg load -m 5000
+poetry run geome_things --config isb.cfg relations
+
+# OpenContext samples
+poetry run opencontext_things --config isb.cfg load
+
+# Smithsonian samples
+poetry run smithsonian_things --config isb.cfg load
+```
+
+### Utility Scripts
+
+```bash
+# Dump Thing records as JSON
+python scripts/dump_thing_json.py -d <db_url> -a SMITHSONIAN -c 1000 -p /output/path
+
+# Create SQLite dump
+python scripts/create_sql_lite_dump.py --config isb.cfg -q "*:*"
+
+# Load controlled vocabularies
+python scripts/load_isamples_vocabularies.py --config isb.cfg
+```
+
+## API Endpoints
+
+The FastAPI service provides multiple API categories:
+
+### Things API (`/thing`)
+- `GET /thing/{identifier}` - Get a specific Thing by identifier
+- `GET /thing` - Search Things with filtering
+
+### Solr API (`/solr`)
+- `GET /solr/search` - Direct Solr query interface
+- `GET /solr/select` - Solr select handler
+- `GET /solr/heatmap` - Get H3 hexagon heatmap data
+
+### Export API (`/export`) - **Requires ORCID authentication**
+- `GET /export/create?q=...&export_format=CSV|JSONL` - Create export job
+- `GET /export/status?uuid=...` - Check export job status
+- `GET /export/download?uuid=...` - Download completed export
+
+### Vocabularies API (`/vocabularies`)
+- `GET /vocabularies` - List all controlled vocabularies
+- `GET /vocabularies/{vocab_name}` - Get specific vocabulary
+
+### Management API (`/manage`) - **Requires authentication**
+- `GET /manage/login` - ORCID OAuth login
+- `POST /manage/identifiers` - Mint new identifiers
+
+### Metrics API (`/metrics`)
+- `GET /metrics` - Prometheus-compatible metrics
+
+## Development Workflow
+
+### Code Quality Tools
+
+The project enforces code quality through:
+
+1. **flake8** - Linting (max complexity 10)
+```bash
+flake8 isb_lib isb_web scripts tests
+```
+
+2. **mypy** - Type checking
+```bash
+mypy isb_lib isb_web scripts
+```
+
+3. **black** - Code formatting (recommended)
+```bash
+black isb_lib isb_web scripts tests
+```
+
+4. **pytest** - Unit testing (71% coverage minimum required)
+```bash
+pytest --cov --cov-fail-under=71
+```
+
+### Git Workflow
+
+- Main branch: `main` (production)
+- Development branch: `develop`
+- Feature branches: Create from `develop`
+- Pull requests must pass CI/CD checks (GitHub Actions)
+
+### CI/CD
+
+GitHub Actions workflows:
+- `.github/workflows/python-app.yml` - Unit tests + linting on every PR
+- `.github/workflows/python-integration-test.yaml` - Integration tests
+
+## Testing
+
+### Unit Tests
+
+```bash
+# Run all tests with coverage
+pytest --cov --cov-fail-under=71
+
+# Run specific test file
+pytest tests/test_core.py
+
+# Run with verbose output
+pytest -v
+```
+
+### Integration Tests
+
+Integration tests verify end-to-end functionality:
+
+```bash
+# Run integration tests (requires running Solr + PostgreSQL)
+pytest integration_tests/
+```
+
+See `docs/indexing_integration_test.md` for details.
+
+## Deployment
+
+### Docker Deployment
+
+The project includes Docker support for containerized deployment:
+
+```bash
+# Build Docker image
+docker build -t isamples_inabox .
+
+# Run with docker-compose (includes PostgreSQL + Solr)
+docker-compose up
+```
+
+### Production Considerations
+
+1. **Database**: Use managed PostgreSQL service (AWS RDS, Google Cloud SQL)
+2. **Solr**: Run in SolrCloud mode with ZooKeeper for high availability
+3. **Web Service**: Deploy behind reverse proxy (Nginx) with HTTPS
+4. **Secrets**: Use environment variables for sensitive configuration
+5. **Monitoring**: Enable Prometheus metrics endpoint (`/metrics`)
+
+### Environment Variables
+
+Key environment variables for production:
+
+```bash
+db_url=postgresql+psycopg2://user:pass@host:5432/dbname
+solr_url=http://solr-host:8983/solr/isb_core_records/
+ORCID_CLIENT_ID=your_orcid_client_id
+ORCID_CLIENT_SECRET=your_orcid_secret
+ORCID_ISSUER=https://orcid.org
+orcid_superusers=0000-0001-2345-6789,0000-0002-3456-7890
+```
+
+## Data Model
+
+### Core Entity: Thing
+
+The `Thing` model (in `isb_lib/models/thing.py`) represents a sample:
+
+**Key Fields:**
+- `id` - Globally unique identifier (format: `scheme:value`)
+- `authority_id` - Source authority (SESAR, GEOME, etc.)
+- `resolved_content` - Full JSON metadata from source
+- `resolved_status` - HTTP status of last fetch
+- `item_type` - Type of sample
+- `tcreated` - Creation timestamp
+- `tstamp` - Last update timestamp
+- Plus 30+ additional metadata fields
+
+### ISBCoreRecord
+
+Extended metadata following iSamples Core schema:
+- Sample identifiers and labels
+- Geospatial information (lat/lon, elevation, H3 hexagons)
+- Sampling context (site, purpose, method)
+- Material and specimen classifications
+- Curation information
+- Related resources
+
+## Documentation
+
+Additional documentation in `docs/`:
+
+- `python_setup.md.html` - Python environment setup
+- `authentication_and_identifiers.md` - ORCID OAuth and DOI minting
+- `export_service.md` - Export API usage
+- `SOLR_Performance_Testing.md` - Performance benchmarking
+- `sitemaps_and_transport.md` - Sitemap generation
+- `hypothesis_integration.md` - Web annotation integration
+- `flat_file_import.md` - CSV import procedures
+
+## Support and Contributing
+
+- **Issues**: Report bugs at https://github.com/isamplesorg/isamples_inabox/issues
+- **Contributing**: Submit pull requests to `develop` branch
+- **License**: Apache 2.0
+
+## Related Repositories
+
+### iSamples Export Client
+- **Repository**: https://github.com/isamplesorg/export_client
+- **Purpose**: CLI tool for exporting iSamples data with GeoParquet support
+- **Key Features**:
+  - Export to GeoParquet, CSV, and JSONL formats
+  - STAC metadata generation
+  - Local web server for viewing exports
+  - ORCID authentication integration
+- **Installation**: `pipx install "git+https://github.com/isamplesorg/export_client.git"`
+- **Documentation**: [docs/geoparquet_export_code.md](docs/geoparquet_export_code.md)
+
+### Other iSamples Repositories
+- **Metadata Schemas**: https://github.com/isamplesorg/metadata - Core metadata specifications
+- **Vocabularies**: https://github.com/isamplesorg/vocabularies - Controlled vocabularies
+- **PQG (Property Graph)**: https://github.com/isamplesorg/pqg - Property graph in DuckDB
+
+## Common Tasks
+
+### Add a new sample source
+
+1. Create adapter in `isb_lib/` (e.g., `newsource_adapter.py`)
+2. Create transformer in `isamples_metadata/` (e.g., `NewSourceTransformer.py`)
+3. Create CLI script in `scripts/` (e.g., `newsource_things.py`)
+4. Add entry point to `pyproject.toml`
+5. Update documentation
+
+### Export data
+
+#### Option 1: Using the Export Client (Recommended for GeoParquet)
+
+The **iSamples Export Client** (https://github.com/isamplesorg/export_client) provides a CLI tool with GeoParquet support:
+
+```bash
+# Install export client
+pipx install "git+https://github.com/isamplesorg/export_client.git"
+
+# Login to get JWT
+isample login
+
+# Export to GeoParquet format
+export TOKEN="your_jwt_token"
+isample export -j $TOKEN -f geoparquet -d /output -q 'source:SESAR'
+
+# Also supports CSV and JSONL
+isample export -j $TOKEN -f csv -d /output -q 'keywords:geology'
+```
+
+**Export Client Features:**
+- **Formats**: JSONL, CSV, and **GeoParquet** (not available via server API)
+- **STAC Metadata**: Automatically generates STAC catalog
+- **Local Viewer**: Built-in web server to browse exports
+- **See**: [docs/geoparquet_export_code.md](docs/geoparquet_export_code.md) for implementation details
+
+#### Option 2: Direct API Access (CSV/JSONL only)
+
+```bash
+# Via API (requires ORCID authentication)
+curl -H "Authorization: Bearer <JWT>" \
+  "https://central.isample.xyz/isamples_central/export/create?q=source:SESAR&export_format=jsonl"
+
+# Returns: {"status":"created","uuid":"..."}
+
+# Check status
+curl -H "Authorization: Bearer <JWT>" \
+  "https://central.isample.xyz/isamples_central/export/status?uuid=..."
+
+# Download when complete
+curl -H "Authorization: Bearer <JWT>" \
+  "https://central.isample.xyz/isamples_central/export/download?uuid=..."
+```
+
+### Query samples
+
+```bash
+# Search via Solr API
+curl "http://localhost:8000/solr/search?q=keywords:geology&rows=10"
+
+# Get specific Thing
+curl "http://localhost:8000/thing/igsn:XXXXX"
+
+# Get geospatial heatmap
+curl "http://localhost:8000/solr/heatmap?q=*:*&h3_resolution=4"
+```
+
+## Technology Stack Summary
+
+- **Language**: Python 3.11+
+- **Web Framework**: FastAPI 0.104.0 + Uvicorn
+- **Database**: PostgreSQL (SQLAlchemy/SQLModel ORM)
+- **Search**: Apache Solr 8.8+
+- **Authentication**: OAuth2 (ORCID), JWT
+- **Geospatial**: Shapely, H3, GeoJSON
+- **Data Processing**: PETL, Pandas
+- **Testing**: pytest (71% coverage minimum)
+- **Dependency Management**: Poetry
+- **Code Quality**: flake8, mypy, black
+
+---
+
+**Last Updated**: 2025-11-14
+**Project Repository**: https://github.com/isamplesorg/isamples_inabox
diff --git a/docs/geoparquet_export_code.md b/docs/geoparquet_export_code.md
new file mode 100644
index 00000000..936ce2a7
--- /dev/null
+++ b/docs/geoparquet_export_code.md
@@ -0,0 +1,381 @@
+# GeoParquet Export Code - Location and Implementation
+
+## Summary
+
+The code that generates the iSamples GeoParquet export file is located in a **separate repository**:
+
+**Repository**: https://github.com/rdhyee/export_client (also at https://github.com/isamplesorg/export_client)
+
+## Export Client Overview
+
+The `export_client` is a Python CLI tool (`isample`) that retrieves content from the iSamples Export Service and provides GeoParquet conversion capabilities.
+
+### Key Features
+
+- **CLI Tool**: `isample` command-line interface
+- **Authentication**: ORCID OAuth with JWT tokens
+- **Export Formats**: JSONL, CSV, and **GeoParquet**
+- **STAC Support**: Generates STAC (SpatioTemporal Asset Catalog) metadata
+- **Local Server**: Can run a web server to view exported data
+
+### Installation
+
+```bash
+# Install with pipx
+pipx install "git+https://github.com/isamplesorg/export_client.git"
+
+# Or with Poetry
+git clone https://github.com/isamplesorg/export_client.git
+cd export_client
+poetry install
+```
+
+## GeoParquet Export Implementation
+
+### Architecture
+
+The GeoParquet export follows this workflow:
+
+```
+1. User runs: isample export -f geoparquet -q "source:SMITHSONIAN" -d /output
+                              ↓
+2. Export Client requests JSONL format from iSamples server
+                              ↓
+3. Server returns JSONL file (one JSON object per line)
+                              ↓
+4. Export Client downloads JSONL file
+                              ↓
+5. Export Client converts JSONL → GeoParquet
+                              ↓
+6. Output: isamples_export_YYYY_MM_DD_HH_MM_SS_geo.parquet
+```
+
+### Core Code: `geoparquet_utilities.py`
+
+Location: `isamples_export_client/geoparquet_utilities.py`
+
+```python
+import logging
+import os.path
+
+
+def write_geoparquet_from_json_lines(filename: str) -> str:
+    import pandas as pd
+    import geopandas as gpd
+
+    logging.info(f"Transforming json lines file at {filename} to geoparquet")
+    filename_no_extension = os.path.splitext(filename)[0]
+
+    # 1. Read JSONL file with pandas
+    with open(filename, "r") as json_file:
+        df = pd.read_json(json_file, lines=True)
+
+        # 2. Extract longitude/latitude from nested "produced_by" field
+        normalized_produced_by = pd.json_normalize(df["produced_by"])
+        df["sample_location_longitude"] = normalized_produced_by["sampling_site.sample_location.longitude"]
+        df["sample_location_latitude"] = normalized_produced_by["sampling_site.sample_location.latitude"]
+
+        # 3. Create GeoDataFrame with Point geometries
+        gdf = gpd.GeoDataFrame(
+            df,
+            geometry=gpd.points_from_xy(
+                df.sample_location_longitude,
+                df.sample_location_latitude
+            ),
+            crs="EPSG:4326"  # WGS84 coordinate reference system
+        )
+
+    # 4. Export to GeoParquet
+    dest_file = f"{filename_no_extension}_geo.parquet"
+    gdf.to_parquet(dest_file)
+    logging.info(f"Wrote geoparquet file to {dest_file}")
+    return dest_file
+```
+
+### Key Implementation Details
+
+1. **Data Source**: Reads from JSONL (JSON Lines) format
+   - Each line is a complete JSON object representing a sample
+   - Schema follows iSamples Core metadata specification
+
+2. **Coordinate Extraction**:
+   - Uses `pd.json_normalize()` to flatten nested `produced_by` structure
+   - Extracts: `produced_by.sampling_site.sample_location.longitude`
+   - Extracts: `produced_by.sampling_site.sample_location.latitude`
+
+3. **Geometry Creation**:
+   - Uses `gpd.points_from_xy()` to create Point geometries
+   - Stores as GeoDataFrame with proper geometry column
+
+4. **Coordinate Reference System**:
+   - **CRS**: EPSG:4326 (WGS84)
+   - Standard geographic coordinate system (latitude/longitude in degrees)
+
+5. **Output Format**:
+   - GeoParquet: Apache Parquet with GeoParquet spatial extension
+   - Filename pattern: `{original_name}_geo.parquet`
+
+### Integration in Export Client
+
+Location: `isamples_export_client/export_client.py` (lines 96-101, 452-453)
+
+```python
+class ExportClient:
+    def __init__(self, ..., format: str, ...):
+        # When user requests geoparquet format...
+        if format == "geoparquet":
+            self._format = "jsonl"  # Request JSONL from server
+            self.is_geoparquet = True
+        else:
+            self._format = format
+            self.is_geoparquet = False
+
+    def perform_full_download(self):
+        # ... download JSONL file ...
+        filename = self.download(uuid)
+
+        # Convert to GeoParquet if requested
+        parquet_filename = None
+        if self.is_geoparquet:
+            parquet_filename = write_geoparquet_from_json_lines(filename)
+```
+
+## Dependencies
+
+From `pyproject.toml`:
+
+```toml
+[tool.poetry.dependencies]
+python = "^3.11"
+pandas = "^2.2.2"
+geopandas = "^0.14.4"
+geoarrow-pyarrow = "^0.1.2"
+geoarrow-pandas = "^0.1.1"
+duckdb = "^0.10.2"
+```
+
+Key libraries:
+- **pandas** 2.2.2+ - Data manipulation
+- **geopandas** 0.14.4+ - Geographic data operations
+- **geoarrow-pyarrow** 0.1.2+ - Arrow/Parquet geographic data
+- **duckdb** 0.10.2+ - For querying exported data
+
+## Usage Example
+
+### Command Line
+
+```bash
+# 1. Login to get JWT token
+isample login
+# Browser opens for ORCID authentication
+# Copy the JWT token
+
+# 2. Export to GeoParquet
+export TOKEN="your_jwt_token_here"
+
+isample export \
+  -j $TOKEN \
+  -f geoparquet \
+  -d /output/directory \
+  -q 'source:SMITHSONIAN'
+```
+
+### What Gets Created
+
+The export creates a directory structure like:
+
+```
+/output/directory/
+└── 2025_04_21_16_23_46/
+    ├── isamples_export_2025_04_21_16_23_46.jsonl      # Original JSONL
+    ├── isamples_export_2025_04_21_16_23_46_geo.parquet # GeoParquet!
+    ├── manifest.json                                    # Export metadata
+    └── stac.json                                        # STAC metadata
+```
+
+### Output File Details
+
+**GeoParquet File**: `isamples_export_2025_04_21_16_23_46_geo.parquet`
+
+This file contains:
+- All sample metadata fields from iSamples Core schema
+- A `geometry` column with Point geometries
+- Coordinate columns: `sample_location_latitude`, `sample_location_longitude`
+- Full nested JSON structures preserved (produced_by, curation, etc.)
+- Efficient columnar storage (Parquet format)
+- Geographic metadata (GeoParquet specification)
+
+## Zenodo Export File
+
+The file available at https://zenodo.org/records/15278211/files/isamples_export_2025_04_21_16_23_46_geo.parquet
+was created using this exact process:
+
+```bash
+# Likely command used:
+isample export \
+  -j $TOKEN \
+  -f geoparquet \
+  -d /tmp \
+  -q '*:*'  # Export all records
+```
+
+## Data Schema
+
+### Input JSONL Schema (iSamples Core)
+
+Each line in the JSONL file contains a sample record like:
+
+```json
+{
+  "sample_identifier": "IGSN:BSU0005H1",
+  "@id": "https://isample.org/thing/BSU0005H1",
+  "label": "BJJ-4487",
+  "description": "...",
+  "source_collection": "SESAR",
+  "has_specimen_category": [...],
+  "has_material_category": [...],
+  "has_context_category": [...],
+  "keywords": [...],
+  "produced_by": {
+    "identifier": "event_id",
+    "label": "Event label",
+    "result_time": "2019-09-10T03:41:45Z",
+    "sampling_site": {
+      "label": "Site name",
+      "place_name": ["Arizona", "USA"],
+      "sample_location": {
+        "latitude": 31.8854,
+        "longitude": -110.7733,
+        "elevation": 1200.0
+      }
+    },
+    "responsibility": [...]
+  },
+  "curation": {...},
+  "registrant": {...}
+}
+```
+
+### Output GeoParquet Schema
+
+The GeoParquet file has:
+
+1. **All original JSONL fields** (preserved as-is)
+2. **Additional extracted fields**:
+   - `sample_location_latitude` (float64)
+   - `sample_location_longitude` (float64)
+3. **Geometry column**:
+   - Name: `geometry`
+   - Type: Point (2D)
+   - CRS: EPSG:4326
+
+## Why This Architecture?
+
+The design choice to keep GeoParquet conversion **client-side** has several benefits:
+
+1. **Server Simplicity**: iSamples server only needs to support JSONL and CSV
+2. **Flexibility**: Client can add new formats without server changes
+3. **Bandwidth**: JSONL is more compact than GeoParquet for transmission
+4. **Local Control**: Users can customize conversion if needed
+5. **STAC Integration**: Client generates STAC metadata alongside GeoParquet
+
+## Comparison with isamples_inabox Export Service
+
+### isamples_inabox (Server)
+- **Location**: `isb_web/export.py`, `isb_lib/utilities/solr_result_transformer.py`
+- **Formats**: CSV, JSONL only
+- **Architecture**: Server-side transformation
+- **Output**: File available via API endpoint
+- **Dependencies**: petl, no geographic libraries
+
+### export_client (Client)
+- **Location**: `isamples_export_client/geoparquet_utilities.py`
+- **Formats**: CSV, JSONL, GeoParquet
+- **Architecture**: Client-side transformation (JSONL → GeoParquet)
+- **Output**: Local file with STAC metadata
+- **Dependencies**: pandas, geopandas, geoarrow
+
+## Extending the Export
+
+### To Add GeoParquet Support to isamples_inabox Server
+
+If you wanted to add native GeoParquet support to the server, you would:
+
+1. **Add dependencies** to `requirements.txt`:
+   ```
+   geopandas>=0.14.4
+   pyarrow>=10.0.0
+   ```
+
+2. **Update `TargetExportFormat` enum** in `isb_lib/utilities/solr_result_transformer.py`:
+   ```python
+   class TargetExportFormat(Enum):
+       CSV = "CSV"
+       JSONL = "JSONL"
+       GEOPARQUET = "GEOPARQUET"  # Add this
+   ```
+
+3. **Create `GeoParquetExportTransformer`** class:
+   ```python
+   class GeoParquetExportTransformer(AbstractExportTransformer):
+       @staticmethod
+       def transform(table: Table, dest_path_no_extension: str, append: bool) -> list[str]:
+           import pandas as pd
+           import geopandas as gpd
+
+           # Convert petl table to pandas DataFrame
+           df = pd.DataFrame(table.dicts())
+
+           # Extract coordinates
+           lat = df[SOLR_PRODUCED_BY_SAMPLING_SITE_LOCATION_LATITUDE]
+           lon = df[SOLR_PRODUCED_BY_SAMPLING_SITE_LOCATION_LONGITUDE]
+
+           # Create GeoDataFrame
+           gdf = gpd.GeoDataFrame(
+               df,
+               geometry=gpd.points_from_xy(lon, lat),
+               crs="EPSG:4326"
+           )
+
+           # Export
+           dest_path = f"{dest_path_no_extension}.parquet"
+           gdf.to_parquet(dest_path)
+           return [dest_path]
+   ```
+
+However, the current client-side approach is probably better for the reasons listed above.
+
+## Additional Resources
+
+- **Export Client Repository**: https://github.com/isamplesorg/export_client
+- **Export Client Documentation**: https://github.com/isamplesorg/export_client/blob/main/README.md
+- **iSamples Export Service Docs**: https://github.com/isamplesorg/isamples_inabox/blob/develop/docs/export_service.md
+- **GeoParquet Specification**: https://geoparquet.org/
+- **iSamples Core Schema**: https://github.com/isamplesorg/metadata
+
+## Testing the Export Code
+
+```bash
+# Clone the export_client repository
+git clone https://github.com/isamplesorg/export_client.git
+cd export_client
+
+# Install dependencies
+poetry install
+
+# Run tests
+poetry run pytest
+
+# Test GeoParquet conversion directly
+poetry run python -c "
+from isamples_export_client.geoparquet_utilities import write_geoparquet_from_json_lines
+result = write_geoparquet_from_json_lines('test_data.jsonl')
+print(f'Created: {result}')
+"
+```
+
+---
+
+**Document Updated**: 2025-11-14
+**Export Client Version**: 0.2.2
+**Repository**: https://github.com/rdhyee/export_client
diff --git a/docs/geoparquet_export_findings.md b/docs/geoparquet_export_findings.md
new file mode 100644
index 00000000..284e2afb
--- /dev/null
+++ b/docs/geoparquet_export_findings.md
@@ -0,0 +1,214 @@
+# GeoParquet Export Code - Investigation Findings
+
+## Summary
+
+An investigation was conducted to locate the code that generates the iSamples GeoParquet export file available at:
+https://zenodo.org/records/15278211/files/isamples_export_2025_04_21_16_23_46_geo.parquet
+
+## Investigation Results
+
+**🎉 UPDATE: CODE FOUND!**
+
+The GeoParquet export code is located in a **separate repository**:
+
+**Repository**: https://github.com/rdhyee/export_client (also at https://github.com/isamplesorg/export_client)
+
+**See**: [geoparquet_export_code.md](./geoparquet_export_code.md) for complete documentation of the export implementation.
+
+---
+
+## Original Investigation
+
+**Initial Status**: The GeoParquet export code was **NOT FOUND** in the current `isamples_inabox` repository.
+
+## Search Methods Used
+
+1. **Pattern Matching**: Searched for keywords including:
+   - `geoparquet`, `GeoParquet`, `geo.parquet`
+   - `parquet`, `Parquet`, `PARQUET`
+   - `pyarrow`, `arrow`, `geopandas`, `gpd.to_parquet`
+   - `to_parquet` (the typical method for writing parquet files)
+
+2. **File Inspection**: Examined key files:
+   - `isb_web/export.py` - Main export service (only supports CSV and JSONL)
+   - `isb_lib/utilities/solr_result_transformer.py` - Export transformers (only CSV and JSONL)
+   - All scripts in `scripts/` directory
+   - Jupyter notebooks in `notes/` directory
+
+3. **Git History**: Searched commit history for export-related changes
+
+4. **Dependency Analysis**: Checked for parquet-related libraries in requirements
+
+## Current Export Capabilities
+
+The `isamples_inabox` repository **currently supports only two export formats**:
+
+### 1. CSV Export
+- **Class**: `CSVExportTransformer` in `isb_lib/utilities/solr_result_transformer.py:61-69`
+- **Method**: Uses `petl.io.csv.tocsv()` or `petl.io.csv.appendcsv()`
+- **Output**: Flat CSV file with renamed columns
+
+### 2. JSONL Export (JSON Lines)
+- **Class**: `JSONExportTransformer` in `isb_lib/utilities/solr_result_transformer.py:72-132`
+- **Method**: Writes one JSON object per line
+- **Output**: Structured JSON following iSamples metadata schema
+
+### Export Format Enum
+```python
+# From isb_lib/utilities/solr_result_transformer.py:38-50
+class TargetExportFormat(Enum):
+    """Valid target export formats"""
+    CSV = "CSV"
+    JSONL = "JSONL"
+```
+
+**Notable Absence**: No `PARQUET` or `GEOPARQUET` format option exists.
+
+## Export Service Architecture
+
+The current export service (`isb_web/export.py`) works as follows:
+
+1. User creates export job via API: `/export/create?q=...&export_format=CSV|JSONL`
+2. Export job queued in database (`ExportJob` model)
+3. Background worker queries Solr
+4. `SolrResultTransformer` converts results to target format
+5. File written to `/tmp/{uuid}.csv` or `.jsonl`
+6. User downloads via `/export/download?uuid=...`
+
+## Likely Origins of GeoParquet Export
+
+Given the investigation results, the GeoParquet file was most likely created using **ONE** of the following methods:
+
+### Hypothesis 1: External Script (Most Likely)
+A standalone Python script was created **outside the main repository** to:
+1. Query the iSamples Solr index or PostgreSQL database
+2. Fetch sample records with geospatial coordinates
+3. Use `geopandas` to create GeoDataFrame
+4. Export to GeoParquet using `geopandas.GeoDataFrame.to_parquet()`
+
+**Typical code pattern:**
+```python
+import geopandas as gpd
+from shapely.geometry import Point
+import pandas as pd
+
+# Query database/Solr for samples
+samples = fetch_samples()  # Custom function
+
+# Create geometry column
+geometry = [Point(xy) for xy in zip(samples['longitude'], samples['latitude'])]
+gdf = gpd.GeoDataFrame(samples, geometry=geometry, crs='EPSG:4326')
+
+# Export to GeoParquet
+gdf.to_parquet('isamples_export_2025_04_21_16_23_46_geo.parquet')
+```
+
+### Hypothesis 2: Different Repository/Branch
+The code may exist in:
+- A different branch not checked out
+- A separate repository for data exports/analytics
+- A private/internal repository
+- A personal development repository
+
+### Hypothesis 3: One-Time Script
+The export may have been created using an ad-hoc script that was:
+- Run manually on the server
+- Not committed to version control
+- Deleted after execution
+- Created for a specific publication/dataset release
+
+### Hypothesis 4: Notebook-Based Export
+The export may have been created in a Jupyter notebook that:
+- Connected directly to the database
+- Performed custom transformations
+- Exported to GeoParquet
+- Was not committed to the repository
+
+## Recommendations
+
+### To Locate the Original Code:
+
+1. **Ask the team member who created the Zenodo upload**
+   - Check Zenodo metadata for uploader information
+   - Ask about the script/method used
+
+2. **Check server/production environments**
+   - Look in `/home/` directories for user scripts
+   - Check cron jobs or scheduled tasks
+   - Search for `*.py` files with "parquet" in content
+
+3. **Search other repositories**
+   - Check `isamplesorg` GitHub organization for related repos
+   - Look for data analysis or export-specific repositories
+
+4. **Check documentation/notes**
+   - Look for data release documentation
+   - Check for README files describing export process
+
+### To Recreate the Export:
+
+If the original code cannot be found, a new GeoParquet export can be created by:
+
+1. **Extending the existing export service** (Recommended)
+   - Add `PARQUET` and `GEOPARQUET` to `TargetExportFormat` enum
+   - Create `ParquetExportTransformer` class
+   - Create `GeoParquetExportTransformer` class using `geopandas`
+   - Update export API to support new formats
+
+2. **Creating a standalone script** (Quick solution)
+   - Query Solr or PostgreSQL directly
+   - Transform to GeoDataFrame
+   - Export to GeoParquet
+   - See `docs/geoparquet_to_pqg_conversion_plan.md` for reference
+
+## Required Dependencies for GeoParquet Export
+
+To create GeoParquet exports, these packages would be needed (not currently in requirements):
+
+```
+geopandas>=0.14.0
+pyarrow>=10.0.0
+shapely>=2.0.0
+```
+
+Current `requirements.txt` includes:
+- ✓ `shapely==2.0.2` - For geometry creation
+- ✗ `geopandas` - NOT present (would be needed)
+- ✗ `pyarrow` - NOT present (would be needed for Parquet)
+
+## Investigation Statistics
+
+- **Files searched**: 153+ Python files
+- **Keywords searched**: 8 different patterns
+- **Directories examined**: All major directories (`isb_lib`, `isb_web`, `scripts`, `notes`)
+- **Git commits reviewed**: 20+ export-related commits
+- **Time spent**: Comprehensive search of codebase
+
+## Conclusion
+
+### Original Conclusion (Before Finding Code)
+The GeoParquet export code does **not exist in the current `isamples_inabox` repository**. The file was most likely created using:
+1. An external standalone script (most probable) ✅ **CORRECT**
+2. A Jupyter notebook
+3. Code in a different repository or branch ✅ **CORRECT**
+4. An ad-hoc one-time export script
+
+### Final Conclusion (After Finding Code)
+
+**The investigation was correct!** The GeoParquet export code exists in a **separate repository**: https://github.com/rdhyee/export_client
+
+**Key Findings:**
+- The `export_client` repository contains a CLI tool (`isample`) for exporting iSamples data
+- GeoParquet conversion is implemented in `isamples_export_client/geoparquet_utilities.py`
+- The export process: Server provides JSONL → Client converts to GeoParquet
+- Uses pandas, geopandas, and pyarrow for the conversion
+- The Zenodo file was created using: `isample export -f geoparquet -q '*:*'`
+
+**Documentation**: See [geoparquet_export_code.md](./geoparquet_export_code.md) for complete implementation details.
+
+---
+
+**Investigation Date**: 2025-11-14
+**Code Found Date**: 2025-11-14
+**Repository Commit**: f8fd9d4
+**Investigator**: Claude (AI Assistant)
diff --git a/docs/geoparquet_to_pqg_conversion_plan.md b/docs/geoparquet_to_pqg_conversion_plan.md
new file mode 100644
index 00000000..27dc9757
--- /dev/null
+++ b/docs/geoparquet_to_pqg_conversion_plan.md
@@ -0,0 +1,966 @@
+# Conversion Plan: iSamples GeoParquet to PQG Format
+
+## Overview
+
+This document provides a detailed plan for converting the iSamples GeoParquet export file
+(`isamples_export_2025_04_21_16_23_46_geo.parquet`) into the PQG (Property Graph in DuckDB) format
+as documented at https://github.com/isamplesorg/pqg.
+
+**📝 Note**: The GeoParquet export code was located at https://github.com/rdhyee/export_client.
+See [geoparquet_export_code.md](./geoparquet_export_code.md) for details on how the GeoParquet files are created.
+
+## Background
+
+### Source Format: GeoParquet
+- **File**: `isamples_export_2025_04_21_16_23_46_geo.parquet` (available on Zenodo: https://zenodo.org/records/15278211)
+- **Format**: Apache Parquet with GeoParquet spatial extension
+- **Content**: iSamples sample metadata including geospatial coordinates
+- **Schema**: Based on iSamples Core metadata schema (see `isb_lib/models/isb_core_record.py`)
+- **Creation Tool**: Generated using the `isample` CLI from https://github.com/rdhyee/export_client
+- **Process**: Server exports JSONL → Client converts to GeoParquet (see `geoparquet_utilities.py`)
+
+### Target Format: PQG
+- **Library**: Python library for property graphs using DuckDB backend
+- **Architecture**: Single-table design with nodes and edges
+- **Requirements**: Python 3.11+, dataclasses-based models
+- **Graph Model**: Nodes (entities) with properties + Edges (relationships)
+
+## Understanding PQG Structure
+
+### PQG Nodes Structure
+Each node in PQG contains:
+- `row_id`: Auto-incrementing primary key
+- `pid`: Unique persistent identifier (string)
+- `otype`: Object/node type classification
+- `label`: Human-readable name
+- `description`: Optional text description
+- `altids`: Alternative identifiers (list)
+- Custom properties as defined by dataclass
+
+### PQG Edges Structure
+Edges follow Subject-Predicate-Object model:
+- `s`: Source node reference (internal integer ID)
+- `p`: Relationship/predicate type (string)
+- `o`: Target node reference(s) - array of integer IDs
+- `n`: Optional named graph designation
+
+### Key PQG Features
+- **Automatic decomposition**: Nested objects become separate nodes with edges
+- **Geographic support**: Spatial data can be included
+- **Export formats**: Parquet, GeoJSON, Graphviz
+- **Columnar storage**: Fast queries via DuckDB
+
+## iSamples Data Model Analysis
+
+### Core Entity: Sample (Thing)
+
+Based on `isb_lib/models/isb_core_record.py` and the export service, each sample contains:
+
+**Primary Identifiers:**
+- `sample_identifier` (id) - Main sample ID (e.g., "IGSN:BSU0005H1")
+- `@id` (isb_core_id) - iSamples internal identifier
+- `source_collection` - Source authority (SESAR, GEOME, etc.)
+
+**Descriptive Metadata:**
+- `label` - Short name/label
+- `description` - Full description
+- `keywords` - List of keywords
+- `informal_classification` - Free-text classification
+
+**Controlled Vocabularies:**
+- `has_specimen_category` - Sample object type (array)
+- `has_material_category` - Material classification (array)
+- `has_context_category` - Geological context (array)
+
+**Sampling Event (produced_by):**
+- `identifier` - Sampling event ID
+- `label`, `description` - Event metadata
+- `result_time` - When sample was collected
+- `has_feature_of_interest` - What was sampled
+- `responsibility` - Array of {role, name} objects (collectors, owners)
+- `sampling_site` - Nested location information:
+  - `place_name` - Array of place names
+  - `label`, `description` - Site metadata
+  - `sample_location`:
+    - `latitude`, `longitude` - Coordinates (decimal degrees)
+    - `elevation` - Elevation in meters
+
+**Curation:**
+- `label`, `description` - Curation information
+- `curation_location` - Where sample is stored
+- `responsibility` - Curators (array)
+- `access_constraints` - Access restrictions
+
+**Administrative:**
+- `registrant` - {name} who registered the sample
+- `sampling_purpose` - Purpose of sampling
+- `related_resource` - Links to related resources
+- `authorized_by`, `complies_with` - Authorization info
+- `last_modified_time` - Source update timestamp
+
+## Conversion Strategy
+
+### Graph Model Design
+
+The iSamples data will be decomposed into a property graph with the following node types and relationships:
+
+```
+┌──────────────┐
+│   Sample     │
+│  (otype:     │
+│   "Sample")  │
+└──────┬───────┘
+       │
+       │ has_material_category
+       ├──────────────────────────────► ┌───────────────────┐
+       │                                 │ MaterialCategory  │
+       │ has_specimen_category           │ (otype:           │
+       ├──────────────────────────────► │  "Vocabulary")    │
+       │                                 └───────────────────┘
+       │ has_context_category
+       ├──────────────────────────────► ┌───────────────────┐
+       │                                 │ ContextCategory   │
+       │                                 │ (otype:           │
+       │                                 │  "Vocabulary")    │
+       │                                 └───────────────────┘
+       │ produced_by
+       ├──────────────────────────────► ┌───────────────────┐
+       │                                 │  SamplingEvent    │
+       │                                 │  (otype:          │
+       │                                 │   "Event")        │
+       │                                 └─────────┬─────────┘
+       │                                           │
+       │                                           │ at_site
+       │                                           ├────────► ┌──────────────┐
+       │                                           │          │ SamplingSite │
+       │                                           │          │ (otype:      │
+       │                                           │          │  "Place")    │
+       │                                           │          │ + geometry   │
+       │                                           │          └──────────────┘
+       │                                           │
+       │                                           │ has_responsibility
+       │                                           └────────► ┌──────────────┐
+       │                                                      │   Person/Org │
+       │                                                      │   (otype:    │
+       │                                                      │   "Agent")   │
+       │                                                      └──────────────┘
+       │ curated_by
+       ├──────────────────────────────► ┌───────────────────┐
+       │                                 │   Curation        │
+       │                                 │   (otype:         │
+       │                                 │    "Activity")    │
+       │                                 └───────────────────┘
+       │ registered_by
+       └──────────────────────────────► ┌───────────────────┐
+                                         │  Registrant       │
+                                         │  (otype: "Agent") │
+                                         └───────────────────┘
+```
+
+### Node Types (otype values)
+
+1. **Sample** - Core sample entity
+2. **SamplingEvent** - The event that produced the sample
+3. **SamplingSite** - Geographic location (with geometry)
+4. **Person** or **Organization** - Agents (collectors, curators, registrants)
+5. **VocabularyTerm** - Controlled vocabulary terms (material, specimen, context categories)
+6. **Curation** - Curation activity
+7. **Keyword** - Keywords for search
+8. **RelatedResource** - Links to external resources
+
+### Relationship Types (predicate values)
+
+- `produced_by` - Sample → SamplingEvent
+- `at_site` - SamplingEvent → SamplingSite
+- `has_responsibility` - Event/Curation → Person/Organization (with role property)
+- `has_material_category` - Sample → VocabularyTerm
+- `has_specimen_category` - Sample → VocabularyTerm
+- `has_context_category` - Sample → VocabularyTerm
+- `has_keyword` - Sample → Keyword
+- `curated_by` - Sample → Curation
+- `registered_by` - Sample → Person/Organization
+- `related_to` - Sample → RelatedResource
+
+## Implementation Steps
+
+### Phase 1: Setup and Dependencies
+
+1. **Install required packages:**
+```bash
+pip install duckdb pyarrow geopandas pqg
+```
+
+2. **Create project structure:**
+```
+conversion_project/
+├── src/
+│   ├── models.py          # PQG dataclass definitions
+│   ├── loader.py          # Load GeoParquet data
+│   ├── transformer.py     # Transform to PQG format
+│   └── exporter.py        # Export PQG graph
+├── scripts/
+│   └── convert.py         # Main conversion script
+├── tests/
+│   └── test_conversion.py # Unit tests
+└── README.md
+```
+
+### Phase 2: Define PQG Data Models
+
+Create dataclass models in `src/models.py`:
+
+```python
+from dataclasses import dataclass, field
+from typing import Optional, List
+from pqg import Base
+
+@dataclass
+class Sample(Base):
+    """Main sample node"""
+    pid: str  # sample_identifier
+    otype: str = "Sample"
+    label: str = ""
+    description: str = ""
+    altids: List[str] = field(default_factory=list)  # e.g., isb_core_id
+    source_collection: str = ""
+    informal_classification: List[str] = field(default_factory=list)
+    last_modified_time: Optional[str] = None
+
+@dataclass
+class SamplingEvent(Base):
+    """Sampling event that produced the sample"""
+    pid: str  # Constructed from sample_id + "_event"
+    otype: str = "SamplingEvent"
+    label: str = ""
+    description: str = ""
+    result_time: Optional[str] = None
+    has_feature_of_interest: str = ""
+
+@dataclass
+class SamplingSite(Base):
+    """Geographic location with spatial data"""
+    pid: str  # Constructed from coordinates or site_label
+    otype: str = "SamplingSite"
+    label: str = ""
+    description: str = ""
+    place_names: List[str] = field(default_factory=list)
+    latitude: Optional[float] = None
+    longitude: Optional[float] = None
+    elevation: Optional[float] = None
+    # PQG supports geometry - can store as WKT or GeoJSON
+    geometry: Optional[str] = None
+
+@dataclass
+class Agent(Base):
+    """Person or organization"""
+    pid: str  # Name-based or unique ID
+    otype: str = "Agent"  # Could be "Person" or "Organization"
+    label: str = ""
+    role: Optional[str] = None  # Role in specific context
+
+@dataclass
+class VocabularyTerm(Base):
+    """Controlled vocabulary term"""
+    pid: str  # Vocabulary identifier URI
+    otype: str = "VocabularyTerm"
+    label: str = ""
+    category: str = ""  # "material", "specimen", or "context"
+
+@dataclass
+class Curation(Base):
+    """Curation information"""
+    pid: str  # Constructed from sample + curation info
+    otype: str = "Curation"
+    label: str = ""
+    description: str = ""
+    location: str = ""
+    access_constraints: List[str] = field(default_factory=list)
+
+@dataclass
+class Keyword(Base):
+    """Keyword for search"""
+    pid: str  # The keyword itself
+    otype: str = "Keyword"
+    label: str = ""
+```
+
+### Phase 3: Load GeoParquet Data
+
+Create data loader in `src/loader.py`:
+
+```python
+import geopandas as gpd
+import pyarrow.parquet as pq
+
+class GeoParquetLoader:
+    """Load iSamples GeoParquet export"""
+
+    def __init__(self, parquet_path: str):
+        self.parquet_path = parquet_path
+
+    def load(self) -> gpd.GeoDataFrame:
+        """Load GeoParquet file as GeoDataFrame"""
+        gdf = gpd.read_parquet(self.parquet_path)
+        print(f"Loaded {len(gdf)} samples")
+        print(f"Columns: {gdf.columns.tolist()}")
+        return gdf
+
+    def get_schema(self):
+        """Examine parquet schema"""
+        parquet_file = pq.ParquetFile(self.parquet_path)
+        return parquet_file.schema
+```
+
+### Phase 4: Transform to PQG Format
+
+Create transformer in `src/transformer.py`:
+
+```python
+from typing import List, Dict, Set
+import json
+from pqg import Graph
+from .models import (
+    Sample, SamplingEvent, SamplingSite, Agent,
+    VocabularyTerm, Curation, Keyword
+)
+
+class ISamplesToPQGTransformer:
+    """Transform iSamples data to PQG property graph"""
+
+    def __init__(self):
+        self.graph = Graph()
+        self.seen_pids: Set[str] = set()  # Track created nodes
+
+    def transform_sample(self, row: dict) -> Sample:
+        """Transform a single sample record to Sample node"""
+        sample = Sample(
+            pid=row['sample_identifier'],
+            label=row.get('label', ''),
+            description=row.get('description', ''),
+            altids=[row.get('@id', '')],  # isb_core_id as altid
+            source_collection=row.get('source_collection', ''),
+            informal_classification=self._to_list(
+                row.get('informal_classification', [])
+            ),
+            last_modified_time=row.get('last_modified_time')
+        )
+        self.graph.add_node(sample)
+        return sample
+
+    def transform_sampling_event(self, sample_pid: str,
+                                   produced_by: dict) -> SamplingEvent:
+        """Transform sampling event from produced_by field"""
+        event_pid = produced_by.get('identifier',
+                                     f"{sample_pid}_event")
+
+        event = SamplingEvent(
+            pid=event_pid,
+            label=produced_by.get('label', ''),
+            description=produced_by.get('description', ''),
+            result_time=produced_by.get('result_time'),
+            has_feature_of_interest=produced_by.get(
+                'has_feature_of_interest', ''
+            )
+        )
+        self.graph.add_node(event)
+
+        # Create edge: Sample produced_by SamplingEvent
+        self.graph.add_edge(sample_pid, 'produced_by', event_pid)
+
+        return event
+
+    def transform_sampling_site(self, event_pid: str,
+                                  sampling_site: dict) -> SamplingSite:
+        """Transform sampling site with geographic data"""
+        # Use coordinates or label to create unique PID
+        lat = sampling_site.get('sample_location', {}).get('latitude')
+        lon = sampling_site.get('sample_location', {}).get('longitude')
+
+        if lat and lon:
+            site_pid = f"site_{lat}_{lon}"
+        else:
+            site_pid = f"site_{sampling_site.get('label', 'unknown')}"
+
+        # Create Point geometry if coordinates available
+        geometry = None
+        if lat and lon:
+            geometry = f"POINT({lon} {lat})"  # WKT format
+
+        site = SamplingSite(
+            pid=site_pid,
+            label=sampling_site.get('label', ''),
+            description=sampling_site.get('description', ''),
+            place_names=self._to_list(sampling_site.get('place_name', [])),
+            latitude=lat,
+            longitude=lon,
+            elevation=sampling_site.get('sample_location', {}).get(
+                'elevation'
+            ),
+            geometry=geometry
+        )
+
+        if site_pid not in self.seen_pids:
+            self.graph.add_node(site)
+            self.seen_pids.add(site_pid)
+
+        # Create edge: SamplingEvent at_site SamplingSite
+        self.graph.add_edge(event_pid, 'at_site', site_pid)
+
+        return site
+
+    def transform_agents(self, context_pid: str,
+                         relationship_type: str,
+                         responsibilities: List[dict]):
+        """Transform responsibility records to Agent nodes"""
+        for resp in responsibilities:
+            name = resp.get('name', '')
+            role = resp.get('role', '')
+
+            # Create agent PID from name (could enhance with ORCID if available)
+            agent_pid = f"agent_{name.replace(' ', '_').lower()}"
+
+            if agent_pid not in self.seen_pids:
+                agent = Agent(
+                    pid=agent_pid,
+                    label=name,
+                    role=role
+                )
+                self.graph.add_node(agent)
+                self.seen_pids.add(agent_pid)
+
+            # Create edge with role as property
+            self.graph.add_edge(
+                context_pid,
+                relationship_type,
+                agent_pid,
+                properties={'role': role}
+            )
+
+    def transform_vocabulary_terms(self, sample_pid: str,
+                                     terms: List[dict],
+                                     category: str,
+                                     relationship: str):
+        """Transform controlled vocabulary terms"""
+        for term in terms:
+            term_id = term.get('identifier', '')
+            if not term_id:
+                continue
+
+            term_pid = term_id  # Use vocabulary URI as PID
+
+            if term_pid not in self.seen_pids:
+                vocab_term = VocabularyTerm(
+                    pid=term_pid,
+                    label=term_id.split('/')[-1],  # Extract label from URI
+                    category=category
+                )
+                self.graph.add_node(vocab_term)
+                self.seen_pids.add(term_pid)
+
+            # Create edge: Sample → VocabularyTerm
+            self.graph.add_edge(sample_pid, relationship, term_pid)
+
+    def transform_keywords(self, sample_pid: str, keywords: List[dict]):
+        """Transform keywords"""
+        for kw in keywords:
+            keyword_text = kw.get('keyword', '')
+            if not keyword_text:
+                continue
+
+            kw_pid = f"keyword_{keyword_text.lower().replace(' ', '_')}"
+
+            if kw_pid not in self.seen_pids:
+                keyword = Keyword(
+                    pid=kw_pid,
+                    label=keyword_text
+                )
+                self.graph.add_node(keyword)
+                self.seen_pids.add(kw_pid)
+
+            self.graph.add_edge(sample_pid, 'has_keyword', kw_pid)
+
+    def transform_curation(self, sample_pid: str, curation: dict):
+        """Transform curation information"""
+        if not curation or not any(curation.values()):
+            return  # Skip empty curation
+
+        curation_pid = f"{sample_pid}_curation"
+
+        curation_node = Curation(
+            pid=curation_pid,
+            label=curation.get('label', ''),
+            description=curation.get('description', ''),
+            location=curation.get('curation_location', ''),
+            access_constraints=self._to_list(
+                curation.get('access_constraints', [])
+            )
+        )
+        self.graph.add_node(curation_node)
+
+        # Create edge: Sample curated_by Curation
+        self.graph.add_edge(sample_pid, 'curated_by', curation_pid)
+
+        # Transform curators as agents
+        if 'responsibility' in curation:
+            self.transform_agents(
+                curation_pid,
+                'has_curator',
+                curation['responsibility']
+            )
+
+    def transform_row(self, row: dict):
+        """Transform a single GeoParquet row to graph nodes/edges"""
+        # Parse JSON fields if they're strings
+        row = self._parse_json_fields(row)
+
+        # 1. Create Sample node
+        sample = self.transform_sample(row)
+        sample_pid = sample.pid
+
+        # 2. Transform produced_by (sampling event and site)
+        if 'produced_by' in row and row['produced_by']:
+            produced_by = row['produced_by']
+            event = self.transform_sampling_event(sample_pid, produced_by)
+
+            # 3. Transform sampling site
+            if 'sampling_site' in produced_by:
+                self.transform_sampling_site(
+                    event.pid,
+                    produced_by['sampling_site']
+                )
+
+            # 4. Transform event responsibilities (collectors, etc.)
+            if 'responsibility' in produced_by:
+                self.transform_agents(
+                    event.pid,
+                    'has_responsibility',
+                    produced_by['responsibility']
+                )
+
+        # 5. Transform vocabulary terms
+        if 'has_specimen_category' in row:
+            self.transform_vocabulary_terms(
+                sample_pid,
+                row['has_specimen_category'],
+                'specimen',
+                'has_specimen_category'
+            )
+
+        if 'has_material_category' in row:
+            self.transform_vocabulary_terms(
+                sample_pid,
+                row['has_material_category'],
+                'material',
+                'has_material_category'
+            )
+
+        if 'has_context_category' in row:
+            self.transform_vocabulary_terms(
+                sample_pid,
+                row['has_context_category'],
+                'context',
+                'has_context_category'
+            )
+
+        # 6. Transform keywords
+        if 'keywords' in row:
+            self.transform_keywords(sample_pid, row['keywords'])
+
+        # 7. Transform curation
+        if 'curation' in row:
+            self.transform_curation(sample_pid, row['curation'])
+
+        # 8. Transform registrant
+        if 'registrant' in row and row['registrant']:
+            registrant = row['registrant']
+            if isinstance(registrant, dict):
+                name = registrant.get('name', '')
+                agent_pid = f"agent_{name.replace(' ', '_').lower()}"
+
+                if agent_pid not in self.seen_pids:
+                    agent = Agent(pid=agent_pid, label=name)
+                    self.graph.add_node(agent)
+                    self.seen_pids.add(agent_pid)
+
+                self.graph.add_edge(
+                    sample_pid,
+                    'registered_by',
+                    agent_pid
+                )
+
+    def _to_list(self, value):
+        """Ensure value is a list"""
+        if isinstance(value, str):
+            return [value]
+        elif isinstance(value, list):
+            return value
+        else:
+            return []
+
+    def _parse_json_fields(self, row: dict) -> dict:
+        """Parse JSON string fields to dicts/lists"""
+        for key, value in row.items():
+            if isinstance(value, str) and value.startswith('{'):
+                try:
+                    row[key] = json.loads(value)
+                except:
+                    pass
+            elif isinstance(value, str) and value.startswith('['):
+                try:
+                    row[key] = json.loads(value)
+                except:
+                    pass
+        return row
+
+    def get_graph(self) -> Graph:
+        """Return the constructed graph"""
+        return self.graph
+```
+
+### Phase 5: Main Conversion Script
+
+Create `scripts/convert.py`:
+
+```python
+#!/usr/bin/env python3
+"""
+Convert iSamples GeoParquet export to PQG format
+
+Usage:
+    python scripts/convert.py \\
+        --input isamples_export_2025_04_21_16_23_46_geo.parquet \\
+        --output isamples_graph.duckdb \\
+        --export-geojson samples.geojson \\
+        --limit 1000
+"""
+
+import argparse
+import sys
+from pathlib import Path
+
+# Add src to path
+sys.path.insert(0, str(Path(__file__).parent.parent / 'src'))
+
+from loader import GeoParquetLoader
+from transformer import ISamplesToPQGTransformer
+
+def main():
+    parser = argparse.ArgumentParser(
+        description='Convert iSamples GeoParquet to PQG format'
+    )
+    parser.add_argument(
+        '--input',
+        required=True,
+        help='Input GeoParquet file path'
+    )
+    parser.add_argument(
+        '--output',
+        default='isamples_graph.duckdb',
+        help='Output DuckDB file path'
+    )
+    parser.add_argument(
+        '--export-geojson',
+        help='Optional: Export geographic nodes as GeoJSON'
+    )
+    parser.add_argument(
+        '--export-parquet',
+        help='Optional: Export graph as Parquet'
+    )
+    parser.add_argument(
+        '--limit',
+        type=int,
+        help='Limit number of samples to process (for testing)'
+    )
+    parser.add_argument(
+        '--verbose',
+        action='store_true',
+        help='Verbose output'
+    )
+
+    args = parser.parse_args()
+
+    # 1. Load GeoParquet
+    print(f"Loading GeoParquet from {args.input}...")
+    loader = GeoParquetLoader(args.input)
+    gdf = loader.load()
+
+    if args.verbose:
+        print(f"Schema: {loader.get_schema()}")
+        print(f"Sample columns: {gdf.columns.tolist()}")
+
+    # Limit if requested
+    if args.limit:
+        print(f"Limiting to {args.limit} samples for testing")
+        gdf = gdf.head(args.limit)
+
+    # 2. Transform to PQG
+    print("Transforming to PQG property graph...")
+    transformer = ISamplesToPQGTransformer()
+
+    for idx, row in gdf.iterrows():
+        if args.verbose and idx % 1000 == 0:
+            print(f"Processed {idx} samples...")
+
+        transformer.transform_row(row.to_dict())
+
+    graph = transformer.get_graph()
+
+    # 3. Save graph to DuckDB
+    print(f"Saving graph to {args.output}...")
+    graph.save(args.output)
+
+    # 4. Export additional formats if requested
+    if args.export_geojson:
+        print(f"Exporting geographic data to {args.export_geojson}...")
+        graph.export_geojson(args.export_geojson)
+
+    if args.export_parquet:
+        print(f"Exporting graph to Parquet: {args.export_parquet}...")
+        graph.export_parquet(args.export_parquet)
+
+    # 5. Print statistics
+    print("\nConversion complete!")
+    print(f"Nodes: {graph.node_count()}")
+    print(f"Edges: {graph.edge_count()}")
+    print(f"Node types: {graph.node_types()}")
+    print(f"Relationship types: {graph.relationship_types()}")
+
+if __name__ == '__main__':
+    main()
+```
+
+### Phase 6: Testing and Validation
+
+Create `tests/test_conversion.py`:
+
+```python
+import pytest
+from src.loader import GeoParquetLoader
+from src.transformer import ISamplesToPQGTransformer
+from src.models import Sample, SamplingEvent, SamplingSite
+
+def test_sample_transformation():
+    """Test basic sample transformation"""
+    row = {
+        'sample_identifier': 'IGSN:TEST001',
+        '@id': 'https://isample.org/thing/TEST001',
+        'label': 'Test Sample',
+        'description': 'A test sample',
+        'source_collection': 'TEST',
+        'informal_classification': ['rock']
+    }
+
+    transformer = ISamplesToPQGTransformer()
+    sample = transformer.transform_sample(row)
+
+    assert sample.pid == 'IGSN:TEST001'
+    assert sample.label == 'Test Sample'
+    assert 'https://isample.org/thing/TEST001' in sample.altids
+
+def test_sampling_site_with_coordinates():
+    """Test sampling site with geographic coordinates"""
+    sampling_site = {
+        'label': 'Test Site',
+        'description': 'A test location',
+        'place_name': ['California', 'USA'],
+        'sample_location': {
+            'latitude': 37.7749,
+            'longitude': -122.4194,
+            'elevation': 100.0
+        }
+    }
+
+    transformer = ISamplesToPQGTransformer()
+    site = transformer.transform_sampling_site('event_1', sampling_site)
+
+    assert site.latitude == 37.7749
+    assert site.longitude == -122.4194
+    assert site.geometry == 'POINT(-122.4194 37.7749)'
+    assert 'California' in site.place_names
+
+def test_full_row_transformation():
+    """Test complete row transformation with all components"""
+    row = {
+        'sample_identifier': 'IGSN:TEST002',
+        '@id': 'https://isample.org/thing/TEST002',
+        'label': 'Full Test Sample',
+        'description': 'Complete test',
+        'source_collection': 'TEST',
+        'has_material_category': [
+            {'identifier': 'http://vocab.org/Rock'}
+        ],
+        'produced_by': {
+            'identifier': 'event_test_002',
+            'label': 'Test Sampling Event',
+            'result_time': '2025-01-15',
+            'responsibility': [
+                {'name': 'John Doe', 'role': 'Collector'}
+            ],
+            'sampling_site': {
+                'label': 'Test Location',
+                'sample_location': {
+                    'latitude': 40.7128,
+                    'longitude': -74.0060
+                }
+            }
+        },
+        'keywords': [{'keyword': 'geology'}],
+        'registrant': {'name': 'Jane Smith'}
+    }
+
+    transformer = ISamplesToPQGTransformer()
+    transformer.transform_row(row)
+    graph = transformer.get_graph()
+
+    # Verify nodes were created
+    assert graph.node_count() > 0
+    # Verify edges were created
+    assert graph.edge_count() > 0
+```
+
+## Execution Plan
+
+### Step-by-Step Execution
+
+1. **Download GeoParquet file:**
+```bash
+# Download from Zenodo
+wget https://zenodo.org/records/15278211/files/isamples_export_2025_04_21_16_23_46_geo.parquet
+```
+
+2. **Setup Python environment:**
+```bash
+python3.11 -m venv venv
+source venv/bin/activate
+pip install duckdb pyarrow geopandas pqg
+```
+
+3. **Test with small subset:**
+```bash
+python scripts/convert.py \\
+    --input isamples_export_2025_04_21_16_23_46_geo.parquet \\
+    --output test_graph.duckdb \\
+    --limit 100 \\
+    --verbose
+```
+
+4. **Run full conversion:**
+```bash
+python scripts/convert.py \\
+    --input isamples_export_2025_04_21_16_23_46_geo.parquet \\
+    --output isamples_full_graph.duckdb \\
+    --export-geojson isamples_sites.geojson \\
+    --verbose
+```
+
+5. **Validate results:**
+```bash
+# Use DuckDB CLI to explore
+duckdb isamples_full_graph.duckdb
+# Run queries to verify data
+```
+
+## Expected Challenges and Solutions
+
+### Challenge 1: Large Data Volume
+**Problem**: GeoParquet file may contain millions of samples
+**Solution**:
+- Process in batches
+- Use streaming/iterative processing
+- Monitor memory usage
+- Consider parallel processing for large datasets
+
+### Challenge 2: Nested JSON Structures
+**Problem**: GeoParquet may store complex nested JSON
+**Solution**:
+- Implement robust JSON parsing in `_parse_json_fields()`
+- Handle both string and native JSON types
+- Add error handling for malformed JSON
+
+### Challenge 3: Duplicate Node Detection
+**Problem**: Same agents/locations may appear multiple times
+**Solution**:
+- Use `seen_pids` set to track created nodes
+- Create consistent PID generation for agents (name-based)
+- For sites, use coordinate-based PIDs
+
+### Challenge 4: Missing Geographic Data
+**Problem**: Not all samples may have coordinates
+**Solution**:
+- Make latitude/longitude optional in SamplingSite
+- Create site PIDs from labels when coordinates missing
+- Skip geometry field if coordinates unavailable
+
+### Challenge 5: Vocabulary Term URIs
+**Problem**: Controlled vocabulary may use full URIs
+**Solution**:
+- Use full URI as PID
+- Extract human-readable label from URI
+- Store category type for filtering
+
+## Query Examples (Post-Conversion)
+
+Once converted to PQG, you can query the graph using DuckDB SQL:
+
+```sql
+-- Find all samples from SESAR
+SELECT * FROM nodes
+WHERE otype = 'Sample'
+AND source_collection = 'SESAR';
+
+-- Find all sampling sites in a region
+SELECT * FROM nodes
+WHERE otype = 'SamplingSite'
+AND latitude BETWEEN 30 AND 40
+AND longitude BETWEEN -120 AND -110;
+
+-- Find samples by material category
+SELECT s.*
+FROM nodes s
+JOIN edges e ON s.pid = e.s
+JOIN nodes v ON e.o[1] = v.row_id
+WHERE s.otype = 'Sample'
+AND e.p = 'has_material_category'
+AND v.category = 'material';
+
+-- Find all samples collected by a specific person
+SELECT s.*
+FROM nodes s
+JOIN edges e1 ON s.pid = e1.s
+JOIN edges e2 ON e1.o[1] IN (SELECT row_id FROM nodes WHERE pid IN (SELECT o[1] FROM edges WHERE s = e1.o[1]))
+JOIN nodes agent ON agent.row_id = e2.o[1]
+WHERE s.otype = 'Sample'
+AND agent.label = 'John Doe'
+AND agent.role = 'Collector';
+```
+
+## Performance Considerations
+
+- **Batch size**: Process 10,000-50,000 records per batch
+- **Memory**: Monitor with `--limit` during testing
+- **Indexing**: PQG/DuckDB handles indexing automatically
+- **Export time**: Full dataset may take 30-60 minutes
+- **Storage**: Expect 2-3x size increase due to graph structure
+
+## Next Steps
+
+1. Obtain the GeoParquet export code from the iSamples team (not found in current repository)
+2. Implement the data models and transformer classes
+3. Test with small subset (100-1000 samples)
+4. Validate graph structure and relationships
+5. Run full conversion
+6. Create sample queries for common use cases
+7. Document query patterns for end users
+
+## References
+
+- **PQG Documentation**: https://github.com/isamplesorg/pqg
+- **iSamples GeoParquet**: https://zenodo.org/records/15278211
+- **iSamples Metadata Schema**: See `isb_lib/models/isb_core_record.py`
+- **GeoParquet Specification**: https://geoparquet.org/
+
+---
+
+**Document Version**: 1.0
+**Last Updated**: 2025-11-14
+**Author**: Claude (AI Assistant)