to_json_schema

deep_dataclasses.to_json_schema(cls, strict=False, allow_additional_properties=False)[source]

Generate a JSON Schema object for a dataclass.

Recursively converts the field types of cls to their JSON Schema equivalents. The resulting schema can be used directly with any JSON Schema validator (e.g. jsonschema.validate).

The following Python types are supported:

  • Primitives: bool, int, float, str, None / type(None)

  • Collections: List[T], Tuple[T, ...], Tuple[T1, T2, ...], Set[T], FrozenSet[T], Dict[K, V]

  • Composites: Optional[T], Union[A, B, ...], Literal[...]

  • Nested dataclasses (recursed automatically)

  • typing.Any (no constraint)

Parameters:
  • cls (type) – A @dataclass or @deep_dataclass class.

  • strict (bool, default False) –

    When False (default), only fields without a default value and not typed as Optional are listed under "required". This allows partial dicts to validate successfully as long as omitted fields have defaults.

    When True, every field — even those with defaults — is added to "required" unless explicitly typed as Optional. Use this when you want to enforce that all fields are always present.

  • allow_additional_properties (bool, default False) –

    When False (default), "additionalProperties": false is added to the schema, rejecting any keys not declared as fields. This is the right choice for closed schemas such as Union variant discrimination.

    When True, extra keys are silently accepted. Useful when the dataclass represents a partial view of a larger document.

Returns:

A JSON Schema object with the following keys:

"type"

Always "object".

"properties"

A dict mapping each field name to its type schema, including a "default" key when the field has a default value or factory.

"required"

List of field names that must be present (only included when at least one field is required).

"additionalProperties"

False unless allow_additional_properties is True.

Return type:

dict

Notes

The validate-then-construct pattern works end-to-end with @deep_dataclass because construction coerces nested dicts to their declared types. With plain @dataclass, validation succeeds but nested dicts are not coerced, so the constructed object may contain raw dicts in place of typed fields.

Examples

Basic usage — validate a raw dict before constructing:

>>> import jsonschema
>>> from dataclasses import dataclass
>>> from deep_dataclasses import deep_dataclass, to_json_schema
>>>
>>> @deep_dataclass
... class Config:
...     class Optimizer:
...         lr: float = 1e-3
...         momentum: float = 0.9
...     epochs: int = 100
>>>
>>> schema = to_json_schema(Config)
>>> jsonschema.validate({"Optimizer": {"lr": 0.01}, "epochs": 50}, schema)
>>> cfg = Config(**{"Optimizer": {"lr": 0.01}, "epochs": 50})
>>> cfg.Optimizer.lr
0.01

strict=True requires every field to be present, even those with defaults:

>>> strict = to_json_schema(Config, strict=True)
>>> jsonschema.validate({"epochs": 50}, strict)  # raises — Optimizer missing
Traceback (most recent call last):
    ...
jsonschema.exceptions.ValidationError: 'Optimizer' is a required property

allow_additional_properties=True accepts extra keys:

>>> open_schema = to_json_schema(Config, allow_additional_properties=True)
>>> jsonschema.validate({"epochs": 10, "unknown_key": 42}, open_schema)

Literal fields are enforced via "enum":

>>> from typing import Literal
>>>
>>> @dataclass
... class Run:
...     device: Literal["cpu", "cuda"] = "cpu"
>>>
>>> jsonschema.validate({"device": "tpu"}, to_json_schema(Run))
Traceback (most recent call last):
    ...
jsonschema.exceptions.ValidationError: 'tpu' is not valid under any of the given schemas

Overview

to_json_schema converts any @dataclass or @deep_dataclass class into a JSON Schema object descriptor. The resulting dict can be passed directly to any JSON Schema validator.

to_json_schema works with any standard dataclass — not only classes produced by @deep_dataclass. If you already have plain @dataclass classes, you can export and validate their schemas without any changes.


Basic usage

import jsonschema
from deep_dataclasses import deep_dataclass, to_json_schema

@deep_dataclass
class Config:
    class Optimizer:
        lr: float = 1e-3
        momentum: float = 0.9
    epochs: int = 100

schema = to_json_schema(Config)
# {'type': 'object',
#  'properties': {
#      'Optimizer': {'type': 'object', 'properties': {...}, ...},
#      'epochs':    {'type': 'integer', 'default': 100}
#  },
#  'additionalProperties': False}

jsonschema.validate({"epochs": 50}, schema)          # passes — Optimizer has a default
jsonschema.validate({"epochs": "fifty"}, schema)     # raises — wrong type

Supported types

Python type

JSON Schema

bool

{"type": "boolean"}

int

{"type": "integer"}

float

{"type": "number"}

str

{"type": "string"}

None / type(None)

{"type": "null"}

typing.Any

{} (no constraint)

List[T]

{"type": "array", "items": <T schema>}

Set[T] / FrozenSet[T]

{"type": "array", "uniqueItems": true, "items": <T schema>}

Tuple[T, ...]

{"type": "array", "items": <T schema>}

Tuple[T1, T2, T3]

fixed-length array with prefixItems

Dict[K, V]

{"type": "object", "additionalProperties": <V schema>}

Optional[T]

{"anyOf": [<T schema>, {"type": "null"}]}

Union[A, B, …]

{"anyOf": [<A schema>, <B schema>, …]}

Literal[x, y, …]

{"enum": [x, y, …]}

Nested dataclass

Recursed — full object schema inline


strict mode

By default (strict=False), only fields that have no default value and are not Optional are listed under "required". This means a partial dict validates successfully as long as the missing fields have defaults:

schema = to_json_schema(Config)               # strict=False
jsonschema.validate({"epochs": 50}, schema)  # passes — Optimizer has a default

With strict=True, every field is required unless it is Optional[T]:

strict = to_json_schema(Config, strict=True)
jsonschema.validate({"epochs": 50}, strict)
# ValidationError: 'Optimizer' is a required property

Use strict=True when you want to enforce that callers always provide every field explicitly, for example in an API boundary or config file validator.


allow_additional_properties

By default "additionalProperties": false is added to the schema, so any key not declared as a field is rejected. Pass allow_additional_properties=True to accept extra keys silently — useful when the dataclass represents a partial view of a larger document:

open_schema = to_json_schema(Config, allow_additional_properties=True)
jsonschema.validate({"epochs": 10, "experiment_id": "run-42"}, open_schema)

Validate-then-construct: a poor man’s pydantic

The combination of to_json_schema and @deep_dataclass’s dict coercion gives you a lightweight validate-then-construct pattern that covers most use cases where you might otherwise reach for Pydantic:

import json, jsonschema
from deep_dataclasses import deep_dataclass, to_json_schema
from typing import Literal, List
from dataclasses import field

@deep_dataclass
class TrainingRun:
    class Optimizer:
        lr: float = 1e-3
        momentum: float = 0.9
    epochs: int = 100
    device: Literal["cpu", "cuda"] = "cpu"
    tags: List[str] = field(default_factory=list)

schema = to_json_schema(TrainingRun)

# --- load and validate raw data (e.g. from a config file) ---
raw = json.loads(open("run.json").read())
jsonschema.validate(raw, schema)     # raises on bad data — before any construction

# --- construct fully-typed object ---
run = TrainingRun(**raw)
isinstance(run.Optimizer, TrainingRun.Optimizer)  # True — dicts coerced recursively

Compared to Pydantic:

to_json_schema + @deep_dataclass

Pydantic

Validation

At construction (jsonschema)

At construction

Type coercion

Dataclass-typed fields only

All fields

Schema export

to_json_schema(cls)

Model.model_json_schema()

Dependencies

jsonschema (optional)

pydantic

stdlib compatibility

Full @dataclass

Separate BaseModel

Nested dict coercion

Field validators

❌ (use __post_init__)

The key trade-off: @deep_dataclass stays in the stdlib dataclass world — you get full compatibility with everything that consumes dataclasses, at the cost of no per-field validators.


Optional and Union schemas

Optional[T] becomes anyOf: [<T schema>, null]:

from typing import Optional
from dataclasses import dataclass

@dataclass
class Record:
    name: str
    score: Optional[float] = None

to_json_schema(Record)
# {'type': 'object',
#  'properties': {
#      'name':  {'type': 'string'},
#      'score': {'anyOf': [{'type': 'number'}, {'type': 'null'}], 'default': None}
#  },
#  'required': ['name'],
#  'additionalProperties': False}

Union[A, B] where both variants are dataclasses becomes anyOf with their full inline schemas, which enables union best-match coercion at construction time:

from typing import Union
from dataclasses import field
from deep_dataclasses import deep_dataclass, auxiliary, to_json_schema

@deep_dataclass
class Config:
    @auxiliary
    class TrainMode:
        lr: float = 1e-3
    @auxiliary
    class TestMode:
        metric: str = "accuracy"
    mode: Union[TrainMode, TestMode] = field(default_factory=TrainMode)

schema = to_json_schema(Config)
# mode schema: {'anyOf': [{'type': 'object', ...TrainMode...},
#                         {'type': 'object', ...TestMode...}]}

Using with plain @dataclass

to_json_schema works on any standard dataclass. The only limitation is that construction will not coerce nested dicts — that feature requires @deep_dataclass. Validation still works correctly:

from dataclasses import dataclass
from deep_dataclasses import to_json_schema
import jsonschema

@dataclass
class Point:
    x: float
    y: float

schema = to_json_schema(Point)
jsonschema.validate({"x": 1.0, "y": 2.0}, schema)   # passes
jsonschema.validate({"x": "bad", "y": 2.0}, schema)  # raises

Integration with validate_defaults

The validate_defaults decorator from deep_dataclasses.extras uses to_json_schema internally to check that a class’s own default instance satisfies its schema. This is useful as a module-load-time sanity check on configuration classes:

from deep_dataclasses import deep_dataclass
from deep_dataclasses.extras import validate_defaults

@validate_defaults
@deep_dataclass
class Config:
    device: Literal["cpu", "cuda"] = "cpu"  # checked at import time