Allow arbitrary flavor strings for trainer CLI without code changes

## Background

`TrainerRunner.java` currently contains a hardcoded if-else chain that maps CLI model name strings (e.g., `segmentation-dh-law-footnotes`) to Java trainer classes and `Flavor` enum values. Every time a new document-type flavor is added, this file must be modified. Similarly, the `Flavor` enum in `GrobidModels.java` must be extended.

## Analysis

### Hard technical reasons (code changes remain necessary)
The only genuine code-level constraint is **SAX parser selection**: trainer classes (`SegmentationTrainer`, `HeaderTrainer`, `FulltextTrainer`) select a TEI XML parser based on the `Flavor` enum, because different document types have structurally different TEI annotations. Adding a genuinely new document structure still requires a new parser class.

### What does NOT need to be hardcoded
- **Model/dataset path resolution** already works with arbitrary strings: `GrobidModels.modelFor(String)` creates `GrobidModel` objects for any string, deriving paths from `getFolderName()`. No enum entry is needed.
- **The if-else in TrainerRunner**: The base model name (e.g., `segmentation`, `header`) is what determines the trainer class — this is a small, stable set. The flavor part is what keeps growing unnecessarily.
- **Parser selection in trainers**: For new flavors that reuse an existing SAX parser (or fall back to the default), no code changes should be needed at all.

## Aim

The aim of this issue is to allow flavor names to be arbitrary strings that identify the models or datasets to be used, independent of the codebase. Concretely:

- A user creates a new corpus directory, e.g. `resources/dataset/segmentation/my-domain/corpus`
- They train on it immediately by passing `segmentation/my-domain` as the model argument — **without any Java code changes**
- Only when a new document type requires a new SAX parser does any code need updating (and only in the relevant trainer class, not in `TrainerRunner`)

## Proposed Changes

### 1. New CLI argument format for TrainerRunner

Support `{baseModel}/{flavorLabel}` where the first `/`-delimited segment identifies the trainer class and the remainder is the flavor path (used directly as the model folder suffix):

| CLI argument | Result |
|---|---|
| `segmentation` | `SegmentationTrainer()` — unchanged |
| `segmentation/article/light` | `SegmentationTrainer("article/light")` — same as current `segmentation-light` |
| `segmentation/my-domain` | `SegmentationTrainer("my-domain")` — **new, no code changes needed** |

**Backward compatibility**: Keep all existing hardcoded cases. Add a final `else` that parses `{base}/{flavorLabel}` dynamically.

### 2. String-based constructor in trainer classes

Add a `String flavorLabel` constructor to `SegmentationTrainer`, `HeaderTrainer`, and `FulltextTrainer`:

```java
public SegmentationTrainer(String flavorLabel) {
    super(GrobidModels.modelFor("segmentation/" + flavorLabel));
    // Flavor.fromLabel returns null for unknown flavors → default parser is used
    this.flavor = Flavor.fromLabel(flavorLabel);
}
```

### 3. `Flavor` enum unchanged

The `Flavor` enum in `GrobidModels.java` stays as the registry for named flavors with custom parsers, but is no longer the only way to express a flavor. Unknown flavor strings fall back to the default parser.

### Files to modify

- `grobid-trainer/src/main/java/org/grobid/trainer/TrainerRunner.java`
- `grobid-trainer/src/main/java/org/grobid/trainer/SegmentationTrainer.java`
- `grobid-trainer/src/main/java/org/grobid/trainer/HeaderTrainer.java`
- `grobid-trainer/src/main/java/org/grobid/trainer/FulltextTrainer.java`

## Verification

1. Build: `./gradlew :grobid-trainer:shadowJar --no-daemon`
2. Existing flavor still works: `... 0 segmentation-light -gH grobid-home`
3. New format equivalent: `... 0 segmentation/article/light -gH grobid-home`
4. Arbitrary new flavor (with corpus dir present): `... 0 segmentation/my-domain -gH grobid-home` resolves corpus at `resources/dataset/segmentation/my-domain/corpus`

CLI argument	Result
`segmentation`	`SegmentationTrainer()` — unchanged
`segmentation/article/light`	`SegmentationTrainer("article/light")` — same as current `segmentation-light`
`segmentation/my-domain`	`SegmentationTrainer("my-domain")` — new, no code changes needed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow arbitrary flavor strings for trainer CLI without code changes #1388

Background

Analysis

Hard technical reasons (code changes remain necessary)

What does NOT need to be hardcoded

Aim

Proposed Changes

1. New CLI argument format for TrainerRunner

2. String-based constructor in trainer classes

3. `Flavor` enum unchanged

Files to modify

Verification

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Allow arbitrary flavor strings for trainer CLI without code changes #1388

Description

Background

Analysis

Hard technical reasons (code changes remain necessary)

What does NOT need to be hardcoded

Aim

Proposed Changes

1. New CLI argument format for TrainerRunner

2. String-based constructor in trainer classes

3. Flavor enum unchanged

Files to modify

Verification

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

3. `Flavor` enum unchanged