🚀 Feat: template structured data expansion

### Description
Given a small sample of structured data (e.g., 100 JSON records), generate a much larger dataset (e.g., 1,000 records) that is statistically and semantically similar to the original.

### Constraints
- **Maintaining Statistical Distribution**: the LLM must not just copy the types of values, but also their frequency. If 20% of users in the original set are "admin" and 80% are "user," the 1,000-record set should reflect this ratio. This is very difficult for an LLM, which naturally follows linguistic, not statistical, probability.

- **Preserving Correlations**: the LLM must learn implicit rules. For example, "if plan_type is 'Free', storage_limit is always '1GB', but if plan_type is 'Pro', storage_limit is '10GB' or '50GB'." It needs to generate new, valid combinations of these correlated fields.

- **Avoiding Repetition**: the generated records must be novel and not just slight rephrasings or duplicates of the original 100.

Before start this issue suggest a solution and wait for the approval.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

🚀 Feat: template structured data expansion #37

Description

Constraints

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

🚀 Feat: template structured data expansion #37

Description

Description

Constraints

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions