Skip to content

Conversation

@rdhyee
Copy link

@rdhyee rdhyee commented Nov 14, 2025

No description provided.

…plan

This commit adds three new documentation files to improve repository orientation:

1. REPOSITORY_OVERVIEW.md - Comprehensive guide covering:
   - Project purpose and architecture overview
   - Key components and directory structure
   - Getting started instructions
   - Data flow diagrams
   - API endpoints and CLI tools
   - Development workflow and testing
   - Deployment considerations
   - Common tasks and examples

2. docs/geoparquet_export_findings.md - Investigation report on GeoParquet export code:
   - Detailed search methodology
   - Current export capabilities (CSV/JSONL only)
   - Analysis of export service architecture
   - Hypotheses about GeoParquet file creation
   - Recommendations for locating/recreating export code

3. docs/geoparquet_to_pqg_conversion_plan.md - Detailed conversion plan:
   - Overview of PQG (Property Graph in DuckDB) format
   - iSamples data model analysis
   - Graph model design with node types and relationships
   - Complete implementation plan with Python code examples
   - Phase-by-phase execution steps
   - Testing and validation strategy
   - Performance considerations and query examples

These documents provide essential orientation for new developers and document
the path forward for converting iSamples data to the PQG graph format.
MAJOR UPDATE: Found the GeoParquet export code in export_client repository!

This commit adds complete documentation of the actual GeoParquet export
implementation and updates all existing documentation to reference it.

New file:
- docs/geoparquet_export_code.md - Comprehensive documentation of:
  * Location: https://github.com/rdhyee/export_client
  * Implementation: geoparquet_utilities.py conversion code
  * Architecture: Server exports JSONL → Client converts to GeoParquet
  * Dependencies: pandas, geopandas, geoarrow-pyarrow
  * Usage examples with the 'isample' CLI tool
  * Complete code walkthrough and schema documentation
  * How the Zenodo file was created

Updated files:
- docs/geoparquet_export_findings.md:
  * Added prominent "CODE FOUND" notice at top
  * Confirmed original hypothesis was correct (external repository)
  * Added reference to new geoparquet_export_code.md documentation

- docs/geoparquet_to_pqg_conversion_plan.md:
  * Added note about export_client at top of document
  * Clarified how GeoParquet files are actually created
  * Referenced geoparquet_utilities.py implementation

- REPOSITORY_OVERVIEW.md:
  * Added "Related Repositories" section documenting export_client
  * Expanded export documentation with two options:
    1. Export Client (recommended for GeoParquet)
    2. Direct API (CSV/JSONL only)
  * Clarified that GeoParquet is only available via export_client
  * Added links to other iSamples repositories (metadata, vocabularies, pqg)

Key findings:
- GeoParquet export is client-side, not server-side
- export_client provides CLI tool with GeoParquet, CSV, JSONL support
- Uses pandas + geopandas to convert JSONL → GeoParquet
- Automatically generates STAC metadata catalogs
- Supports ORCID authentication and local web viewer

This resolves the investigation and provides complete path forward for
working with iSamples GeoParquet exports and PQG conversion.
@rdhyee rdhyee requested a review from Copilot November 14, 2025 15:09
Copilot finished reviewing on behalf of rdhyee November 14, 2025 15:11
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This pull request adds comprehensive documentation to the iSamples in a Box repository, including a repository overview, conversion plans for GeoParquet to PQG format, and documentation of the GeoParquet export implementation.

Key changes:

  • Four new documentation files covering repository structure, data conversion strategies, and export implementations
  • Detailed technical documentation with code examples and architectural diagrams
  • Cross-referenced documentation linking related concepts across files

Reviewed Changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 10 comments.

File Description
REPOSITORY_OVERVIEW.md Comprehensive repository guide covering architecture, setup, development workflow, and common tasks
docs/geoparquet_to_pqg_conversion_plan.md Detailed conversion plan from GeoParquet format to PQG property graph format with implementation code
docs/geoparquet_export_findings.md Investigation findings documenting the location of GeoParquet export code in a separate repository
docs/geoparquet_export_code.md Technical documentation of the GeoParquet export implementation, architecture, and usage examples

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +630 to +633
python scripts/convert.py \\
--input isamples_export_2025_04_21_16_23_46_geo.parquet \\
--output isamples_graph.duckdb \\
--export-geojson samples.geojson \\
Copy link

Copilot AI Nov 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inconsistent backslash usage in bash commands. Line 630 uses \\ for line continuation while lines 840, 849 show single \. In markdown code blocks, use single backslash \ for line continuation in bash commands.

Suggested change
python scripts/convert.py \\
--input isamples_export_2025_04_21_16_23_46_geo.parquet \\
--output isamples_graph.duckdb \\
--export-geojson samples.geojson \\
python scripts/convert.py \
--input isamples_export_2025_04_21_16_23_46_geo.parquet \
--output isamples_graph.duckdb \
--export-geojson samples.geojson \

Copilot uses AI. Check for mistakes.
args = parser.parse_args()

# 1. Load GeoParquet
print(f"Loading GeoParquet from {args.input}...")
Copy link

Copilot AI Nov 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo in "GeoParqet" should be "GeoParquet".

Copilot uses AI. Check for mistakes.
Comment on lines +840 to +852
python scripts/convert.py \\
--input isamples_export_2025_04_21_16_23_46_geo.parquet \\
--output test_graph.duckdb \\
--limit 100 \\
--verbose
```

4. **Run full conversion:**
```bash
python scripts/convert.py \\
--input isamples_export_2025_04_21_16_23_46_geo.parquet \\
--output isamples_full_graph.duckdb \\
--export-geojson isamples_sites.geojson \\
Copy link

Copilot AI Nov 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inconsistent backslash usage in bash commands. Should use single backslash \ for line continuation in bash commands, not double backslash \\.

Suggested change
python scripts/convert.py \\
--input isamples_export_2025_04_21_16_23_46_geo.parquet \\
--output test_graph.duckdb \\
--limit 100 \\
--verbose
```
4. **Run full conversion:**
```bash
python scripts/convert.py \\
--input isamples_export_2025_04_21_16_23_46_geo.parquet \\
--output isamples_full_graph.duckdb \\
--export-geojson isamples_sites.geojson \\
python scripts/convert.py \
--input isamples_export_2025_04_21_16_23_46_geo.parquet \
--output test_graph.duckdb \
--limit 100 \
--verbose
  1. Run full conversion:
python scripts/convert.py \
    --input isamples_export_2025_04_21_16_23_46_geo.parquet \
    --output isamples_full_graph.duckdb \
    --export-geojson isamples_sites.geojson \

Copilot uses AI. Check for mistakes.
Comment on lines +840 to +852
python scripts/convert.py \\
--input isamples_export_2025_04_21_16_23_46_geo.parquet \\
--output test_graph.duckdb \\
--limit 100 \\
--verbose
```

4. **Run full conversion:**
```bash
python scripts/convert.py \\
--input isamples_export_2025_04_21_16_23_46_geo.parquet \\
--output isamples_full_graph.duckdb \\
--export-geojson isamples_sites.geojson \\
Copy link

Copilot AI Nov 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inconsistent backslash usage in bash commands. Should use single backslash \ for line continuation in bash commands, not double backslash \\.

Suggested change
python scripts/convert.py \\
--input isamples_export_2025_04_21_16_23_46_geo.parquet \\
--output test_graph.duckdb \\
--limit 100 \\
--verbose
```
4. **Run full conversion:**
```bash
python scripts/convert.py \\
--input isamples_export_2025_04_21_16_23_46_geo.parquet \\
--output isamples_full_graph.duckdb \\
--export-geojson isamples_sites.geojson \\
python scripts/convert.py \
--input isamples_export_2025_04_21_16_23_46_geo.parquet \
--output test_graph.duckdb \
--limit 100 \
--verbose
  1. Run full conversion:
python scripts/convert.py \
    --input isamples_export_2025_04_21_16_23_46_geo.parquet \
    --output isamples_full_graph.duckdb \
    --export-geojson isamples_sites.geojson \

Copilot uses AI. Check for mistakes.
Comment on lines +501 to +502
curl -H "Authorization: Bearer <JWT>" \
"https://central.isample.xyz/isamples_central/export/create?q=source:SESAR&export_format=jsonl"
Copy link

Copilot AI Nov 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inconsistent backslash usage in bash commands. Should use single backslash \ for line continuation in bash commands, not double backslash \\.

Copilot uses AI. Check for mistakes.
Comment on lines +507 to +508
curl -H "Authorization: Bearer <JWT>" \
"https://central.isample.xyz/isamples_central/export/status?uuid=..."
Copy link

Copilot AI Nov 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inconsistent backslash usage in bash commands. Should use single backslash \ for line continuation in bash commands, not double backslash \\.

Copilot uses AI. Check for mistakes.
Comment on lines +519 to +520
curl "http://localhost:8000/solr/search?q=keywords:geology&rows=10"

Copy link

Copilot AI Nov 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inconsistent backslash usage in bash commands. Should use single backslash \ for line continuation in bash commands, not double backslash \\.

Copilot uses AI. Check for mistakes.
Comment on lines +606 to +607
except:
pass
Copy link

Copilot AI Nov 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Empty except block without any exception handling or logging. This bare except: will silently catch and ignore all exceptions, including system exits and keyboard interrupts, which can make debugging difficult. Consider either:

  1. Catching specific exceptions (e.g., except json.JSONDecodeError:)
  2. Adding logging to track when parsing fails
  3. Re-raising the exception if it can't be handled

Copilot uses AI. Check for mistakes.
Comment on lines +611 to +612
except:
pass
Copy link

Copilot AI Nov 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Empty except block without any exception handling or logging. This bare except: will silently catch and ignore all exceptions. Consider either:

  1. Catching specific exceptions (e.g., except json.JSONDecodeError:)
  2. Adding logging to track when parsing fails
  3. Re-raising the exception if it can't be handled

Copilot uses AI. Check for mistakes.
Comment on lines +511 to +512
curl -H "Authorization: Bearer <JWT>" \
"https://central.isample.xyz/isamples_central/export/download?uuid=..."
Copy link

Copilot AI Nov 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inconsistent backslash usage in bash commands. Should use single backslash \ for line continuation in bash commands, not double backslash \\.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants