A Go library for inferring JSON Schema from JSON samples. This library analyzes multiple JSON documents and automatically generates a JSON Schema that describes their structure, types, and patterns.
- ✅ Infer basic types: string, boolean, number, integer
- ✅ Detect optional fields: tracks which fields appear in all samples vs. some samples
- ✅ Handle arrays: treats all array items as the same type and infers their schema
- ✅ Nested objects: full support for deeply nested object structures
- ✅ Arrays of objects: infers schemas for complex array items with optional fields
- ✅ Unified format detection: all formats detected using the same mechanism (FormatDetector functions)
- ✅ Built-in formats: datetime (ISO 8601), email, UUID, IPv4, IPv6, and URL (HTTP/HTTPS/FTP/FTPS)
- ✅ Custom format detectors: register user-defined format detection functions
- ✅ Configurable: disable built-in formats for full control
- ✅ Predefined types: configure specific field types (e.g.,
created_atas DateTime) - ✅ Flexible root types: supports objects, arrays, and primitives at root level
- ✅ Incremental updates: schema evolves after each sample is added
- ✅ Load/Resume: load previously generated schemas and continue adding samples
- ✅ Tree-based architecture: clean recursive structure for maintainability
- ✅ Max samples limit: optionally limit the number of samples to process
- Go 1.25 or higher
go get github.com/JLugagne/jsonschema-infer- Usage Guide - Detailed examples and best practices
- API Documentation - Complete API reference
- Architecture - Internal design and algorithms
- Examples - Runnable example programs
package main
import (
"fmt"
"github.com/JLugagne/jsonschema-infer"
)
func main() {
// Create a new generator
generator := jsonschema.New()
// Add JSON samples
generator.AddSample(`{"name": "John", "age": 30, "active": true}`)
generator.AddSample(`{"name": "Jane", "age": 25, "active": false}`)
generator.AddSample(`{"name": "Bob", "age": 35}`)
// Generate the schema
schema, err := generator.Generate()
if err != nil {
panic(err)
}
fmt.Println(schema)
}Output:
{
"$schema": "http://json-schema.org/draft-07/schema#",
"type": "object",
"properties": {
"name": {
"type": "string"
},
"age": {
"type": "integer"
},
"active": {
"type": "boolean"
}
},
"required": ["name", "age"]
}Note: active is not in required because it doesn't appear in all samples.
Configure specific fields to have predefined types:
generator := jsonschema.New(
jsonschema.WithPredefined("created_at", jsonschema.DateTime),
jsonschema.WithPredefined("updated_at", jsonschema.DateTime),
)
generator.AddSample(`{"id": 1, "created_at": "2023-01-15T10:30:00Z"}`)
generator.AddSample(`{"id": 2, "created_at": "2023-02-20T14:45:00Z"}`)
schema, _ := generator.Generate()Available predefined types:
DateTime- string with date-time formatString- string typeBoolean- boolean typeNumber- number typeInteger- integer typeArray- array typeObject- object type
The library handles arrays of objects and detects optional fields within array items:
generator := jsonschema.New()
generator.AddSample(`{
"users": [
{"id": 1, "name": "John", "email": "john@example.com"},
{"id": 2, "name": "Jane"}
]
}`)
generator.AddSample(`{
"users": [
{"id": 3, "name": "Bob", "email": "bob@example.com"}
]
}`)
schema, _ := generator.Generate()The resulting schema will show that email is optional in the array items since it doesn't appear in all objects.
Load a previously generated schema and continue adding samples:
// Generate initial schema
generator1 := jsonschema.New()
generator1.AddSample(`{"name": "John", "age": 30}`)
schemaJSON, _ := generator1.Generate()
// Later, load the schema and add more samples
generator2 := jsonschema.New()
err := generator2.Load(schemaJSON)
if err != nil {
panic(err)
}
// Add new samples with additional fields
generator2.AddSample(`{"name": "Jane", "age": 25, "email": "jane@example.com"}`)
// Generate updated schema
updatedSchema, _ := generator2.Generate()Retrieve the current schema as a Schema object after any sample:
generator := jsonschema.New()
generator.AddSample(`{"name": "John"}`)
// Get the current schema as an object (not JSON string)
schema := generator.GetCurrentSchema()
// Access properties
fmt.Println(schema.Type) // "object"
fmt.Println(schema.Properties["name"].Type) // "string"go buildOr use the Makefile:
make buildgo test -vOr use the Makefile:
make testmake test-coverageThis generates coverage.html which you can open in a browser.
The library uses a tree-based recursive architecture:
-
SchemaNode: Each node represents a part of the JSON structure- Handles only primitives (string, number, boolean, null)
- Delegates to child nodes for complex types (arrays, objects)
- Accumulates observations across all samples
-
Incremental Updates: Schema is rebuilt after each
AddSample()call- No need to wait until all samples are collected
- Can inspect schema evolution at any point
-
Optional Field Detection: Tracks how many times each field appears
- Fields appearing in all samples → required
- Fields appearing in some samples → optional
See the examples/ directory for runnable examples:
- basic - Basic type inference and optional fields
- arrays - Arrays of objects with optional fields
- datetime - Automatic datetime detection
- predefined - Configuring predefined types
- load_resume - Loading and resuming schemas
- nested - Deeply nested structures
- incremental - Watching schema evolution
Run all examples:
cd examples
./run-examples.shThis library is unique in the Go ecosystem for sample-based JSON schema inference. Similar functionality exists in other languages:
- Python: genson - similar approach
- JavaScript: @jsonhero/schema-infer
- Online: jsonschema.net - web-based tool
Key advantages of jsonschema-infer:
- ✅ Pure Go implementation
- ✅ Incremental schema updates
- ✅ Load/resume capability
- ✅ Tree-based recursive architecture
- ✅ Optional field frequency tracking
Contributions are welcome! Please feel free to submit issues and pull requests.
[Specify your license here]
- The library uses Go's standard
encoding/jsonpackage for JSON parsing - All array items are treated as having the same schema (merged together)
- Multiple type detection is supported (e.g., a field that's sometimes string, sometimes number)