quicktype under the hood
The Pipeline
quicktype does its magic in three stages, much like a classic compiler:
-
Read: quicktype reads the input and converts it to an internal representation. The formats it can read so far are JSON and JSON Schema. If the input is JSON, it also does some more involved type processing here.
-
Simplify: this stage only runs for JSON input. In this stage, quicktype tries to simplify the internal representation it constructed in the Read stage. So far, it does two things: unify classes that look similar, and detect when it's better to represent a class as a map.
-
Render: renders the simplified internal representation as source code in Java, C#, Swift, TypeScript, Go, JSON Schema,[1] etc.
Types
Before exploring these stages in detail, let's discuss the internal representation. Here's quicktype's definition of what a JSON value's type can be:
data IRType
= IRNoInformation
| IRAnyType
| IRNull
| IRInteger
| IRDouble
| IRBool
| IRString
| IRArray IRType
| IRClass Int
| IRMap IRType
| IRUnion IRUnionRep
Most of these correspond directly to JSON types:
-
IRNull
,IRBool
, andIRString
representnull
values, booleans, and strings, respectively. -
IRInteger
andIRDouble
are for numbers. JSON doesn't distinguish between integers and doubles semantically, but many applications do, so quicktype does, too. -
IRArray
is for arrays with elements of a specific type, for example[1, 2, 3, 4, 5]
. In JSON, arrays are not homogeneous–each element can be any type–but quicktype keeps arrays homogeneous if possible. Once it can't do that any longer (e.g. for the array[1, true, "foo"]
), it usesIRUnion
, which we discuss below. -
IRClass
represents JSON objects, such as{ "name": "Frank", "age": 34 }
, with a specific set of property names and types. It's weird that it has an associated integer and nothing else–we'll get into that later. -
IRMap
also represents JSON objects, but only where quicktype has determined (in the Simplify stage) that the property names are probably not fixed, and that it's better to consider it a map from strings to a fixed type, like in this U.S. climate data sample, wheredata
is rendered as a map type. -
quicktype uses
IRNoInformation
when it knows nothing about a type. This currently only happens with empty arrays: when quicktype encounters an empty array it represents it asIRArray IRNoInformation
. Before quicktime renders output, it replaces allIRNoInformation
s withIRAnyType
: -
The permitted values for
IRAnyType
can be anything, hence the name. The array typeIRArray IRAnyType
, for example, will accept any combination of element types, such as in[1, true, "foo"]
.IRAnyType
can come about not only viaIRNoInformation
, but also when reading JSON Schema, which can express the concept directly. -
Finally,
IRUnion
describes values that can be any one type within a set of types (e.g. an integer or a string). When quicktype encounters heterogeneous JSON values, it creates anIRUnion
rather than simply inferring a more general type likeobject
. Heterogeneous values are extremely common in JSON, and this approach preserves more type information. We'll see how thoseIRUnion
s come about when we talk about transforming the input into the internal representation.
The Class Graph
Many data formats contain some self-referential parts. Let's say you have some JSON data for a family tree. Each object describing a person could have a biologicalMother
field, the value of which would also be a person. The type for person, therefore, is self-referential. quicktype's internal representation is able to represent such self-referential types. PureScript (the language quicktype is implemented in) cannot construct self-referential values[2], so quicktype does this indirectly through a table. Each slot in the table stores the data for a class, and classes are referred to via integers indices in that table.[3]
Entries in that class table look like this:
newtype IRClassData = IRClassData
{ names :: Named (Set String)
, properties :: Map String IRType
}
names
is a value used to name classes which we will cover in a future blog post. properties
is a map with the IRType
s for all the class's properties.
As an example, let's represent a classic binary tree as an IRType
. The equivalent PureScript type would be
newtype Tree = Tree
{ Data :: Int
, Left :: Maybe Tree
, Right :: Maybe Tree
}
An example tree in JSON:
{
"data": 31415,
"left": { "data": 9265, "left": null, "right": null },
"right": null
}
left
and right
are either null
or a tree, so they have to be IRUnion
s of IRNull
and an IRClass
. The Int
of that class is the index of the table slot with the IRClassData
of the Tree
class itself, so if we put the Tree
class at index 123
, its IRType
would be IRClass 123
, and its IRClassData
(at slot 123
in the table) would look like this (IRUnion
details omitted, and Map
syntax simplified):
IRClassData
{ names: ...
properties:
{ "data": IRInt
, "left": IRUnion [ IRNull, IRClass 123 ]
, "right": IRUnion [ IRNull, IRClass 123 ]
}
}
Finally, pulling all of the types we've discussed so far together is IRGraph
:
newtype IRGraph = IRGraph
{ classes :: Seq Entry
, toplevels :: Map String IRType
}
classes
is the class table we just covered[4], and toplevels
is one or more top-level types–the top-level types of your JSON sample data or JSON schema. The graph from the quicktype.io web app currently allows only a single top-level type, but it's possible to have any number of top-levels when using the quicktype CLI. Note that the top-level types don't have to be IRClass
es; quicktype will happily accept an array as a top-level input, or even a primitive type, like a boolean[5].
Read
The Read stage is simple except for one detail. For the most part, it just converts JSON values into their corresponding IRType
s as described above. The one complication arises when it encounters arrays that have elements of more than one type. To solve the problem, it "unifies" the element types. These are the rules of unification:
-
Unifying any type
T
with itself gives that same typeT
. -
Unifying
IRInteger
andIRDouble
givesIRDouble
. -
Arrays are unified with other arrays by unifying their element types.
-
Classes are unified with other classes by building the union of their properties. In cases where the classes have properties with the same name but with different types, the types of those properties are unified.
-
Unifying
IRNoInformation
with any typeT
gives that same typeT
. SinceIRNoInformation
is the element type of empty arrays, it would seem likeIRNoInformation
would never have to be unified with any other type. Consider, however, the array[ [1, 2, 3], [] ]
. The first element array will have typeIRArray IRInteger
, and the second oneIRArray IRNoInformation
. These two types have to be unified now to get a result type for the whole array, and per theIRArray
rule it's therefore necessary to unifyIRInteger
andIRNoInformation
. The result of that isIRInteger
per this very rule, so the result for the whole array isIRArray IRInteger
. -
Unifying
IRAnyType
with any typeT
results inIRAnyType
. Unifying two types always generalizes, andIRAnyType
can't be generalized any further. -
Unifying two
IRUnion
s:IRUnion
s cannot directly contain otherIRUnion
s, so the previous rule cannot unify them. Instead, quicktype forms a newIRUnion
containing the union of the types found in the twoIRUnion
s; however, in our current implementation, anIRUnion
can contain at most one class, one array, and one map[6]. When unifying two unions where both contain, for example, an array, the unified union will contain one array that's the result of unifying the two arrays. -
Unifying any other pair of types produces an
IRUnion
containing both of them.
The way IRUnion
currently works makes it impossible to express some type constraints. For example, quicktype can represent the type of arrays that contain either integers or strings (Array<int | string>
in TypeScript), such as
[1, 2, "foo", "bar", 3]
but it cannot express the type of array of integers or array of strings (Array<int> | Array<string>
in TypeScript). This was a design choice to keep things simple for now, but we plan to enable this later.
Simplify
We only run Simplify when generating code based on JSON sample data rather than JSON Schema; when generating code from JSON Schema, quicktype assumes that the user wants that exact schema, so there's no room for interpretation.
Currently, quicktype performs two transformations in this stage:
-
Unifying similar classes. This transformation considers two classes A and B similar if at least three-fourths of A's properties are also in B with the same type, and vice versa. Similar classes are unified into a single one[7], via the unification algorithm discussed above.
-
Transforming classes into maps. quicktype has a simple heuristic for deciding when to represent a class as a map: if the class has 20 or more properties, and all its properties are of the same type, then it becomes a map. There are more heuristics we want to implement.
Render
Each renderer targets a different language, but there are many commonalities among them. A renderer for a particular target language is a record of type Renderer
. It includes metadata like the target language name; contains strategies for naming types in the target language; and, most importantly, each renderer has a function for producing the output. Here's what it looks like for C#:
renderer =
{ name: "C#"
, aceMode: "csharp"
, extension: "cs"
, doc: csharpDoc
, options: [listOption.specification]
, transforms:
{ nameForClass: simpleNamer nameForClass
, nextName: \s -> "Other" <> s
, forbiddenNames
, topLevelName: noForbidNamer csNameStyle
, unions: Just
{ predicate: unionIsNotSimpleNullable
, properName: simpleNamer nameForType
, nameFromTypes: simpleNamer (unionNameIntercalated csNameStyle "Or")
}
}
}
-
name
is the renderer's name in the UI. -
aceMode
is the name of the syntax highlighting mode for the code editor in the UI, Ace. -
extension
is the file name extension used for the language. -
doc
is the rendering function, to be discussed in a future blog post. -
options
is an array of customization options for this renderer. -
transforms.nameForClass
is a function for naming classes, given the name that comes from the original JSON or JSON Schema. -
transforms.nextName
produces a new name if a name is already taken. quicktype is not very smart about this yet, and just prepends "Other
" to the original name. -
transforms.forbiddenNames
are names that must not be used for types from JSON. Those are usually keywords and names of common types in the target language. -
transforms.topLevelName
is a naming function for top-level types. -
transforms.unions
is aMaybe
that can beNothing
for renderers that don't treatIRUnion
s specially. quicktype generates C# classes to "emulate" unions, for example, but in TypeScript it expresses them directly with the native union types, so this field isn't needed in the TypeScript renderer. If we do haveunions
, it contains:-
transforms.unions.predicate
decides which unions require special treatment, like generating classes for them. C#, for example, has aNullable
type which allows expressing the union ofIRBoolean
andIRNull
directly as "bool?
", and reference types like classes, arrays, and strings, are always nullable, so "emulated" unions are not needed in those cases, hencepredicate
will returnfalse
for them. -
transforms.unions.properName
andtransforms.unions.nameFromTypes
are naming functions for those unions that do get special treatment. The former is used when quicktype has inferred a name for the union, the latter when it hasn't, in which case this particular naming function will produce something like "IntOrString
".
-
We'll see how these rendering components work together to produce valid source code in future posts. If you're interested, please subscribe to our RSS feed, check out our GitHub repo, or simply say "👋" on Slack.
Yes, that means you can make quicktype produce JSON Schema from a JSON Schema input. Our test suite actually checks this configuration and requires that input JSON Schemas be identical to the outputs. ↩︎
PureScript supports self-referential types, such as linked lists (
List
), but quicktype don't represent JSON types as PureScript types, it represents JSON types as PureScript values (of typeIRType
), so it needs self-referential values. ↩︎We could have done that for all
IRType
s instead of just for classes, which would make quicktype able to represent recursive array types, for examples. Whichever way we do it when we revise the internal representation, the implementation details of this indirection shouldn't be exposed anymore. ↩︎For the purposes of this blog post please pretend
Entry
is the same asIRClassData
. ↩︎Swift's
JSONSerialization
only accepts objects and arrays as top-level values, so onlyIRClass
,IRArray
, andIRMap
will work there. ↩︎Allowing both maps and classes in the same
IRUnion
is a bug. ↩︎We don't look for pairs of similar classes, but for groups, but we're not rigorous about it. ↩︎