-
Notifications
You must be signed in to change notification settings - Fork 1
some first efforts at trying to document this repo #429
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
some first efforts at trying to document this repo #429
Conversation
…plan This commit adds three new documentation files to improve repository orientation: 1. REPOSITORY_OVERVIEW.md - Comprehensive guide covering: - Project purpose and architecture overview - Key components and directory structure - Getting started instructions - Data flow diagrams - API endpoints and CLI tools - Development workflow and testing - Deployment considerations - Common tasks and examples 2. docs/geoparquet_export_findings.md - Investigation report on GeoParquet export code: - Detailed search methodology - Current export capabilities (CSV/JSONL only) - Analysis of export service architecture - Hypotheses about GeoParquet file creation - Recommendations for locating/recreating export code 3. docs/geoparquet_to_pqg_conversion_plan.md - Detailed conversion plan: - Overview of PQG (Property Graph in DuckDB) format - iSamples data model analysis - Graph model design with node types and relationships - Complete implementation plan with Python code examples - Phase-by-phase execution steps - Testing and validation strategy - Performance considerations and query examples These documents provide essential orientation for new developers and document the path forward for converting iSamples data to the PQG graph format.
MAJOR UPDATE: Found the GeoParquet export code in export_client repository! This commit adds complete documentation of the actual GeoParquet export implementation and updates all existing documentation to reference it. New file: - docs/geoparquet_export_code.md - Comprehensive documentation of: * Location: https://github.com/rdhyee/export_client * Implementation: geoparquet_utilities.py conversion code * Architecture: Server exports JSONL → Client converts to GeoParquet * Dependencies: pandas, geopandas, geoarrow-pyarrow * Usage examples with the 'isample' CLI tool * Complete code walkthrough and schema documentation * How the Zenodo file was created Updated files: - docs/geoparquet_export_findings.md: * Added prominent "CODE FOUND" notice at top * Confirmed original hypothesis was correct (external repository) * Added reference to new geoparquet_export_code.md documentation - docs/geoparquet_to_pqg_conversion_plan.md: * Added note about export_client at top of document * Clarified how GeoParquet files are actually created * Referenced geoparquet_utilities.py implementation - REPOSITORY_OVERVIEW.md: * Added "Related Repositories" section documenting export_client * Expanded export documentation with two options: 1. Export Client (recommended for GeoParquet) 2. Direct API (CSV/JSONL only) * Clarified that GeoParquet is only available via export_client * Added links to other iSamples repositories (metadata, vocabularies, pqg) Key findings: - GeoParquet export is client-side, not server-side - export_client provides CLI tool with GeoParquet, CSV, JSONL support - Uses pandas + geopandas to convert JSONL → GeoParquet - Automatically generates STAC metadata catalogs - Supports ORCID authentication and local web viewer This resolves the investigation and provides complete path forward for working with iSamples GeoParquet exports and PQG conversion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This pull request adds comprehensive documentation to the iSamples in a Box repository, including a repository overview, conversion plans for GeoParquet to PQG format, and documentation of the GeoParquet export implementation.
Key changes:
- Four new documentation files covering repository structure, data conversion strategies, and export implementations
- Detailed technical documentation with code examples and architectural diagrams
- Cross-referenced documentation linking related concepts across files
Reviewed Changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 10 comments.
| File | Description |
|---|---|
| REPOSITORY_OVERVIEW.md | Comprehensive repository guide covering architecture, setup, development workflow, and common tasks |
| docs/geoparquet_to_pqg_conversion_plan.md | Detailed conversion plan from GeoParquet format to PQG property graph format with implementation code |
| docs/geoparquet_export_findings.md | Investigation findings documenting the location of GeoParquet export code in a separate repository |
| docs/geoparquet_export_code.md | Technical documentation of the GeoParquet export implementation, architecture, and usage examples |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| python scripts/convert.py \\ | ||
| --input isamples_export_2025_04_21_16_23_46_geo.parquet \\ | ||
| --output isamples_graph.duckdb \\ | ||
| --export-geojson samples.geojson \\ |
Copilot
AI
Nov 14, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Inconsistent backslash usage in bash commands. Line 630 uses \\ for line continuation while lines 840, 849 show single \. In markdown code blocks, use single backslash \ for line continuation in bash commands.
| python scripts/convert.py \\ | |
| --input isamples_export_2025_04_21_16_23_46_geo.parquet \\ | |
| --output isamples_graph.duckdb \\ | |
| --export-geojson samples.geojson \\ | |
| python scripts/convert.py \ | |
| --input isamples_export_2025_04_21_16_23_46_geo.parquet \ | |
| --output isamples_graph.duckdb \ | |
| --export-geojson samples.geojson \ |
| args = parser.parse_args() | ||
|
|
||
| # 1. Load GeoParquet | ||
| print(f"Loading GeoParquet from {args.input}...") |
Copilot
AI
Nov 14, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo in "GeoParqet" should be "GeoParquet".
| python scripts/convert.py \\ | ||
| --input isamples_export_2025_04_21_16_23_46_geo.parquet \\ | ||
| --output test_graph.duckdb \\ | ||
| --limit 100 \\ | ||
| --verbose | ||
| ``` | ||
|
|
||
| 4. **Run full conversion:** | ||
| ```bash | ||
| python scripts/convert.py \\ | ||
| --input isamples_export_2025_04_21_16_23_46_geo.parquet \\ | ||
| --output isamples_full_graph.duckdb \\ | ||
| --export-geojson isamples_sites.geojson \\ |
Copilot
AI
Nov 14, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Inconsistent backslash usage in bash commands. Should use single backslash \ for line continuation in bash commands, not double backslash \\.
| python scripts/convert.py \\ | |
| --input isamples_export_2025_04_21_16_23_46_geo.parquet \\ | |
| --output test_graph.duckdb \\ | |
| --limit 100 \\ | |
| --verbose | |
| ``` | |
| 4. **Run full conversion:** | |
| ```bash | |
| python scripts/convert.py \\ | |
| --input isamples_export_2025_04_21_16_23_46_geo.parquet \\ | |
| --output isamples_full_graph.duckdb \\ | |
| --export-geojson isamples_sites.geojson \\ | |
| python scripts/convert.py \ | |
| --input isamples_export_2025_04_21_16_23_46_geo.parquet \ | |
| --output test_graph.duckdb \ | |
| --limit 100 \ | |
| --verbose |
- Run full conversion:
python scripts/convert.py \
--input isamples_export_2025_04_21_16_23_46_geo.parquet \
--output isamples_full_graph.duckdb \
--export-geojson isamples_sites.geojson \| python scripts/convert.py \\ | ||
| --input isamples_export_2025_04_21_16_23_46_geo.parquet \\ | ||
| --output test_graph.duckdb \\ | ||
| --limit 100 \\ | ||
| --verbose | ||
| ``` | ||
|
|
||
| 4. **Run full conversion:** | ||
| ```bash | ||
| python scripts/convert.py \\ | ||
| --input isamples_export_2025_04_21_16_23_46_geo.parquet \\ | ||
| --output isamples_full_graph.duckdb \\ | ||
| --export-geojson isamples_sites.geojson \\ |
Copilot
AI
Nov 14, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Inconsistent backslash usage in bash commands. Should use single backslash \ for line continuation in bash commands, not double backslash \\.
| python scripts/convert.py \\ | |
| --input isamples_export_2025_04_21_16_23_46_geo.parquet \\ | |
| --output test_graph.duckdb \\ | |
| --limit 100 \\ | |
| --verbose | |
| ``` | |
| 4. **Run full conversion:** | |
| ```bash | |
| python scripts/convert.py \\ | |
| --input isamples_export_2025_04_21_16_23_46_geo.parquet \\ | |
| --output isamples_full_graph.duckdb \\ | |
| --export-geojson isamples_sites.geojson \\ | |
| python scripts/convert.py \ | |
| --input isamples_export_2025_04_21_16_23_46_geo.parquet \ | |
| --output test_graph.duckdb \ | |
| --limit 100 \ | |
| --verbose |
- Run full conversion:
python scripts/convert.py \
--input isamples_export_2025_04_21_16_23_46_geo.parquet \
--output isamples_full_graph.duckdb \
--export-geojson isamples_sites.geojson \| curl -H "Authorization: Bearer <JWT>" \ | ||
| "https://central.isample.xyz/isamples_central/export/create?q=source:SESAR&export_format=jsonl" |
Copilot
AI
Nov 14, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Inconsistent backslash usage in bash commands. Should use single backslash \ for line continuation in bash commands, not double backslash \\.
| curl -H "Authorization: Bearer <JWT>" \ | ||
| "https://central.isample.xyz/isamples_central/export/status?uuid=..." |
Copilot
AI
Nov 14, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Inconsistent backslash usage in bash commands. Should use single backslash \ for line continuation in bash commands, not double backslash \\.
| curl "http://localhost:8000/solr/search?q=keywords:geology&rows=10" | ||
|
|
Copilot
AI
Nov 14, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Inconsistent backslash usage in bash commands. Should use single backslash \ for line continuation in bash commands, not double backslash \\.
| except: | ||
| pass |
Copilot
AI
Nov 14, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Empty except block without any exception handling or logging. This bare except: will silently catch and ignore all exceptions, including system exits and keyboard interrupts, which can make debugging difficult. Consider either:
- Catching specific exceptions (e.g.,
except json.JSONDecodeError:) - Adding logging to track when parsing fails
- Re-raising the exception if it can't be handled
| except: | ||
| pass |
Copilot
AI
Nov 14, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Empty except block without any exception handling or logging. This bare except: will silently catch and ignore all exceptions. Consider either:
- Catching specific exceptions (e.g.,
except json.JSONDecodeError:) - Adding logging to track when parsing fails
- Re-raising the exception if it can't be handled
| curl -H "Authorization: Bearer <JWT>" \ | ||
| "https://central.isample.xyz/isamples_central/export/download?uuid=..." |
Copilot
AI
Nov 14, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Inconsistent backslash usage in bash commands. Should use single backslash \ for line continuation in bash commands, not double backslash \\.
No description provided.