Support maps for CoL Extended taxonomy

### Background
The [Spark module](https://github.com/gbif/maps/tree/master/spark-generate-maps) currently generates HBase tables for point and tile maps using taxonomic identifiers from the GBIF Backbone taxonomy. To support the Catalogue of Life (CoL) Extended release, we need to extend the existing infrastructure to process and store maps based on CoL taxonomic identifiers.  

### Current Architecture
The current implementation: 
- Reads GBIF occurrence data from Avro files
- Extracts GBIF Backbone taxonomic keys:   `kingdomKey`, `phylumKey`, `classKey`, `orderKey`, `familyKey`, `genusKey`, `speciesKey`, `taxonKey`
- Generates point maps (for low-occurrence views) and tile pyramids (for high-occurrence views)
- Stores data in HBase tables with multiple EPSG projections (4326, 3857, 3575, 3031)

### Proposed Solution
Extend the existing Spark jobs to support CoL identifiers and create parallel processing workflows for CoL Extended release maps.  

### Suggested Tasks

#### 1. **Spark Job Modifications**

**Files requiring changes:**

- [ ] **`MapBuilder.java`**: Update `readAvroSource()` method to select CoL identifier fields
  - Add CoL taxonomic key columns (e.g., `colKingdomKey`, `colPhylumKey`, etc.) to the `.select()` statement
  - Parameterize taxonomy source to support both GBIF and CoL

- [ ] **`MapKeysUDF.java`**: Extend the UDF to handle CoL taxonomic hierarchies
  - Add support for CoL identifier parameters
  - Ensure map key generation works with CoL taxonomy structure
  - Consider adding a taxonomy source parameter to distinguish between GBIF and CoL keys

- [ ] **`PointMapBuilder.java`** and **`TileMapBuilder.java`**: Update SQL queries to use CoL fields
  - Modify the `mapKeys()` UDF calls to include CoL identifiers
  - Ensure proper grouping and aggregation with CoL taxonomy

#### 2. **Configuration Management**
- [ ] Create separate configuration files for CoL Extended (e.g., `col-extended-dev.yml`, `col-extended-prod.yml`)
  - Define CoL-specific HBase table names (e.g., `col_maps_points`, `col_maps_tiles`)
  - Configure separate target directories for CoL map outputs
  - Set appropriate threshold values for tile pyramid generation
  - Define separate Hive database/table names for CoL processing

#### 3. **Airflow Workflows**
- [ ] Create new Airflow DAG for CoL point maps generation
- [ ] Create new Airflow DAG for CoL tile maps generation
- [ ] Add monitoring and alerting for CoL maps

#### 4. **HBase Table Management**
- [ ] Create separate HBase tables for CoL Extended maps
  - `col_maps_points_<timestamp>` for point maps
  - `col_maps_tiles_<timestamp>` for tile pyramids
- [ ] Update table creation scripts in `PrepareBackfill.java` to support taxonomy parameter
- [ ] Ensure cleanup and snapshot management works for CoL tables

#### 5. **Testing & Validation**
- [ ] Unit tests for CoL-specific UDF functionality
- [ ] Integration tests with sample CoL occurrence data
- [ ] Validate map generation for various CoL taxonomic ranks

#### 6. **Documentation**
- [ ] Update README.md with CoL Extended support details
- [ ] Document configuration parameters for CoL workflows
- [ ] Add examples for running CoL map generation
- [ ] Update deployment and operational documentation


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support maps for CoL Extended taxonomy #105

Background

Current Architecture

Proposed Solution

Suggested Tasks

1. Spark Job Modifications

2. Configuration Management

3. Airflow Workflows

4. HBase Table Management

5. Testing & Validation

6. Documentation

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Support maps for CoL Extended taxonomy #105

Description

Background

Current Architecture

Proposed Solution

Suggested Tasks

1. Spark Job Modifications

2. Configuration Management

3. Airflow Workflows

4. HBase Table Management

5. Testing & Validation

6. Documentation

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions