Transformed String Types: How quicktype Converts JSON String Values to Stronger Types

how-to

JSON has a limited selection of data types; more complex types, such as dates, are often represented as strings. When writing code, we would prefer to use our programming language’s stronger, more expressive types (e.g. DateTimeOffset in C#) rather than strings.

For example, given the JSON:

{
    "name": "David",
    "favoriteDate": "2012-04-23T18:25:43.511Z"
}

quicktype generates this type and corresponding marshaling code in C#:

public partial class Person
{
    [JsonProperty("name")]
    public string Name { get; set; }

    [JsonProperty("favoriteDate")]
    public DateTimeOffset FavoriteDate { get; set; }
}

This allows the programmer to write safer, more expressive code:

const person = Person.FromJson(jsonString);
// This comparison would not compile if FavoriteDate were string:
if (person.FavoriteDate < DateTimeOffset.Now)
{
    Console.WriteLine($"{person.Name} is nostalgic");
}

Many JSON APIs even string-encode values for which JSON types exists, such as booleans: { “goodIdea”: “false” }. Here, too, we would rather use booleans and rely on JSON de/serialization to do the conversion for us transparently.

In this post, I'll show how we added two new transformed string types (TSTs) to C# and Python: UUIDs and booleans. With this, you can take your customization of quicktype a step further, and maybe contribute a PR or two!

The feature

You've probably seen JSON that looks like this:

{
    "name": "Mark",
    "weight": "215",
    "can-juggle": "true",
    "id": "8352a2a8-b0cb-4cb6-8484-357cbcb6d5aa"
}

Using strings for numbers and booleans isn’t ideal because there is more potential for error. JSON frameworks will treat them as regular strings, which means you have to write the code to do the conversion back and forth. If you ask quicktype to infer the type for the above data, however, it'll give you this C# class:

public partial class Person
{
   public string Name { get; set; }
   public long Weight { get; set; }
   public bool CanJuggle { get; set; }
   public Guid Id { get; set; }
}

Now it’s easy to manipulate those values directly, and if you let quicktype generate de/serializers as well, convert from and to JSON:

var person = Person.FromJson(jsonString);
person.Weight -= 5;
jsonString = person.ToJson();

Let's see how we implemented booleans and UUIDs!

Adding booleans

The code for these two TSTs (transformed string types) is split up into three commits each. In the first commit for booleans, we tell quicktype about stringified booleans and how they are represented as strings in JSON.

Teaching quicktype about the new TST is one line of code in Type.ts, adding a property to transformedStringTypeTargetTypeKinds:

"bool-string": { jsonSchema: "boolean", primitive: "bool" } as TransformedStringTypeTargets

What it says is that we want a new TST called bool-string, which has the JSON Schema “format” boolean, and which corresponds to quicktype's primitive type bool.

To detect stringified booleans in input JSON we modify the function inferTransformedStringTypeKindForString in StringTypes.ts:

/**
 * JSON inference calls this function to figure out whether a given string is to be
 * transformed into a higher level type.  Must return undefined if not, otherwise the
 * type kind of the transformed string type.
 *
 * @param s The string for which to determine the transformed string type kind.
 */
export function inferTransformedStringTypeKindForString(s: string): TransformedStringTypeKind | undefined {
    if (s.length === 0 || "0123456789-abcdeft".indexOf(s[0]) < 0) return undefined;

    if (isDate(s)) {
        return "date";
    } else if (isTime(s)) {
        return "time";
    } else if (isDateTime(s)) {
        return "date-time";
    } else if (isIntegerString(s)) {
        return "integer-string";
    } else if (s === "false" || s === "true") {
        return "bool-string";
    } else if (isUUID(s)) {
        return "uuid";
    }
    return undefined;
}

These two new lines do the actual business:

} else if (s === "false" || s === "true") {
   return "bool-string";

The first line of the function is an optimization: If the string is empty, or it doesn't start with one of the characters in that list then it can't be one of the string types we detect, so we bail out early. All we have to do for booleans is to add the letters f and t. If we wanted to also interpret the strings "no" and "yes" as booleans, we'd have to add the letters n and y, too.

At this point, if we ask quicktype to produce a JSON Schema for this JSON:

{
    "foo": "true"
}

we get this:

{
    "$schema": "http://json-schema.org/draft-06/schema#",
    "$ref": "#/definitions/TopLevel",
    "definitions": {
        "TopLevel": {
            "type": "object",
            "additionalProperties": false,
            "properties": {
                "foo": {
                    "type": "string",
                    "format": "boolean"
                }
            },
            "required": ["foo"],
            "title": "TopLevel"
        }
    }
}

Notice how the type for foo is string, with boolean as its format. That's what we specified as the jsonSchema property in the first part of the commit. format is an "assertion" that’s part of JSON Schema which allows string types to be constrained. An example for a format that’s part of the JSON Schema specification is ”date-time” for date-time strings. boolean isn’t in the spec, but we’re allowed to invent our own. Unfortunately, the JSON Schema verifier Ajv that we use in our test suite fails if it sees this new unknown format, which is why the last part of the commit tells Ajv to ignore it:

let ajv = new Ajv({ format: "full", unknownFormats: ["integer", "boolean"] });

C#

The commit that adds support for stringified booleans to C# starts out by adding a case to the function needTransformerForType:

if (t.kind === "integer-string" || t.kind === "bool-string") return "manual";

manual here means that when the renderer emits the C# code for a bool-string property, it adds a JsonConverterAttribute to the property that specifies a JsonConverter to convert said property. Since only some booleans are decoded from strings, this explicit attribute is required. Here is the code for one real boolean, and one converted from strings:

[JsonProperty("real-bool")]
public bool RealBool { get; set; }

[JsonProperty("string-bool")]
[JsonConverter(typeof(ParseStringConverter))]
public bool StringBool { get; set; }

The next part adds a line to stringTypeMapping:

mapping.set("bool-string", "bool-string");

All entries not explicitly set in mapping are treated as if they were set to string, which means that without this line, quicktype would make each bool-string in the input into a string for C#. You would get correct code, but stringified booleans would still just be handled like plain strings. You might think that mapping should just be a set of all types which are to be kept as-is, but sometimes types are mapped to other non-string types:

mapping.set("date", "date-time");
mapping.set("time", "date-time");
mapping.set("date-time", "date-time");

Dates, times, and date-times are all mapped to the same type date-time, which in C# is represented by DateTimeOffset. That's because C# doesn't have specialized types for dates and times, only a combined date-time type.

The last two pieces are about how to do the actual conversions between strings and booleans when your JSON is de/serialized at runtime. The code generators for the conversions are cases for the type bool in the transformers ParseStringTransformer and StringifyTransformer. C# does not have a type that corresponds to bool-string; it only has bool and string. Transformers are the pieces that convert between bool and string, and the CSharpRenderer has to generate the code for them. Here's the case for ParseStringTransformer:

case "bool":
   this.emitLine("bool b;");
   this.emitLine("if (Boolean.TryParse(", variable, ", out b))");
   this.emitBlock(() => this.emitConsume("b", xfer.consumer, targetType, emitFinish));
   break;

Here’s code that it generates:

bool b;
if (Boolean.TryParse(value, out b))
{
   return b;
}

variable is where the string to be converted is stored. emitBlock produces the curly-braces block, the insides of which are generated by emitConsume. xfer.consumer is an optional transformer for the result (the variable b) of this conversion. In this example, we don't have a consumer, but we'll see one later on. targetType is the type that is produced at the end of this transformer chain, i.e., once the consumer, and its consumer, and its consumer's consumer, etc., are done. It just has to be passed through. emitFinish emits the code for what needs to happen at that end of the transformer chain, which is the return statement in this example. There might be other transformers to try after this one, which is why if TryParse fails, the generated code just continues on. On the other hand, if this is the last hope of decoding this string, the code to throw an exception is generated automatically by the CSharpRenderer.

On to the last part of this commit to support stringified booleans in C#: the case for StringifyTransformer, which turns boolean values back into JSON string values! It looks similar to the switch case for ParseStringTransformer:

case "bool":
   this.emitLine("var boolString = ", variable, ' ? "true" : "false";');
   return this.emitConsume("boolString", xfer.consumer, targetType, emitFinish);

The first line does a switch on the boolean to get its string value, and the second line is almost identical to what we've seen before. The relevant difference is that the code returns the result of emitConsume. I mentioned above that the CSharpRenderer generates code for the case when the runtime transformation fails. This code is only generated, however, if the transformation can actually fail. How does the renderer know? The function we modified returns true if the transformation is guaranteed to succeed, i.e., never falls through. By default it returns false, signifying the transformation can fail, which is why we only had to break for the ParseStringTransformer. Here, we know that our part of the transformer chain always succeeds: a boolean can only be either false or true, and we handle both cases. However, the consumer might still fail, so we return the failability result of emitConsume.

That's all that's needed to make stringified booleans work in C#.

Python

Now let me explain how I implemented stringified boolean support in Python. The first part, which we've already seen above, lets quicktype know that PythonRenderer supports stringified booleans:

mapping.set("bool-string", "bool-string");

The next bit, in needsTransformerForType, is easier to explain than the C# version. In C# we have help from the Json.NET framework, which has lots of converters built in, but in Python our generated code has to do it all on its own, so it doesn’t distinguish between “manual” and “automatic” conversion. That’s why, for bool-string, we always return true:

return t.kind === "integer-string" || t.kind === "bool-string";

In Python, if we can express a transformer as a simple, short expression, we emit it inline; otherwise, we emit a function for it. As we'll see below, converting a boolean into a string is indeed a short expression, but the reverse is not, so we need a function. We only want to emit this function when we actually have stringified booleans in our JSON, which is why the PythonRenderer keeps track of which converter functions it has to emit. First, we have to add a new case to the type of all converter functions, ConverterFunction:

| "from-stringified-bool"

Next is the code that emits the converter function:

protected emitFromStringifiedBoolConverter(): void {
   this.emitBlock(
       ["def from_stringified_bool(x", this.typeHint(": str"), ")", this.typeHint(" -> bool"), ":"],
       () => {
           this.emitBlock('if x == "true":', () => this.emitLine("return True"));
           this.emitBlock('if x == "false":', () => this.emitLine("return False"));
           this.emitLine("assert False");
       }
   );
}

The calls to typeHint make sure that the type hints are only emitted when the user selects a Python version that supports types. Note another difference to C#: If the conversion fails, an exception is thrown.

The function emitConverter has a case for each converter function that just calls its emitter:

case "from-stringified-bool":
   return this.emitFromStringifiedBoolConverter();

Next up is producing code for the bool case of ParseStringTransformer:

case "bool":
   vol = this.convFn("from-stringified-bool", inputTransformer);
   break;

convFn emits a call to the function from_stringified_bool, and its argument is the result of the transformation inputTransformer: in Python we go the other way around to emitting a transformer chain: The consumers are generated first, with the producers (the transformers that preceded them) nested inside.

The case for StringifyTransformer doesn't have to produce a function call since we can do the conversion as a simple, inline expression:

case "bool":
   vol = compose(inputTransformer, v => ["str(", v, ").lower()"]);
   break;

The compose function emits code for the inputTransformer and then passes the result on to the expression that the arrow function emits. Here’s the line of code that’s generated for converting the boolean property foo to a string:

result["foo"] = from_str(str(self.foo).lower())

That's it for Python!

Adding UUIDs

UUIDs (universally unique identifiers) are strings that are commonly used to identify items, such as in primary keys for rows in a database table, or data items in REST APIs, and as such are often found in JSON data.

After having seen the code for booleans, most of the code for UUIDs doesn't hold any surprises, but there are still a few details that are different. Let's start at the first commit, which adds the uuid type. The main point of interest here is its entry in transformedStringTypeTargetTypeKinds:

uuid: { jsonSchema: "uuid", primitive: undefined },

Unlike bool-string, for which we set primitive to "bool", the uuid type doesn't have a primitive. That's because it doesn't correspond to any primitive type that's already in quicktype: it's a new type of its own.

More small differences await us in the commit that adds support for C#. To start with, uuid has been added to isValueType:

return ["integer", "double", "bool", "enum", "date-time", "uuid"].indexOf(t.kind) >= 0;

That's needed because the C# type for UUIDs, Guid, is a value type in C#. We didn't have to do this for booleans because we didn't introduce a new type in C#: bool was already there. Speaking of Guid, there's a new case that introduces it in csTypeForTransformedStringType:

if (t.kind === "uuid") {
    return "Guid";
}

One last little change that we didn't see with booleans above is that Guid is now a "forbidden name" for the global namespace, which essentially means that quicktype won't name a C# class Guid, which would result in a name collision:

protected forbiddenNamesForGlobalNamespace(): string[] {
   return ["QuickType", "Type", "System", "Console", "Exception", "DateTimeOffset", "Guid"];
}

In the Python commit we see similar changes, in particular a new case in pythonType to produce the UUID Python type:

if (transformedStringType.kind === "uuid") {
    return this.withImport("uuid", "UUID");
}

The withImport invocation returns UUID and imports the uuid package. The commit adds an identical case in typeObject, which returns the Python type object so the generated code can check whether a given value is an instance of that type.

The rest of the changes are very similar to those for booleans.

What we get for free

We only had to implement converting individual values, but data transformers use that building block to give us a lot more for free. To illustrate, let's quicktype this JSON input:

{
    "boolOrInts": ["true", 123]
}

quicktype generates this type for the array items of foo, which is a union of boolean and integer:

    public partial struct BoolOrInt
    {
        public bool? Bool;
        public long? Integer;

        public static implicit operator BoolOrInt(bool Bool) => new BoolOrInt { Bool = Bool };
        public static implicit operator BoolOrInt(long Integer) => new BoolOrInt { Integer = Integer };
    }

Taking a look at the code for converting array items we see these lines:

if (Boolean.TryParse(stringValue, out b))
{
   return new Foo { Bool = b };
}

This is an example of a consumer transformer: The consumer of this ParseStringTransformer is a UnionInstantiationTransformer, which generates code that takes the boolean value and makes a Foo out of it. There's much more to data transformers than this, a lot of which we'll get to in a future blog post.

How to contribute

quicktype is Open Source. If you'd like to see it improve, please consider contributing. When it comes to TSTs, there are two kinds of contributions that would be very helpful:

  • Implement more TSTs. In particular null, floating point numbers, base64 blobs, and URLs. I hope I managed to convince you in this blog post that the scope of this is fairly manageable. It's a little tricky to understand at first, but once you do, you can contribute powerful features to quicktype, and we’re always there to help.

  • Implement data transformers for other languages. Right now transformers are only implemented in C# and Python, with no other language getting the benefit of TSTs, or other future enhancements to transformers. I won't lie: this is quite a challenge, but if you're up for one, please talk to us on Slack; we're always delighted to help.

Mark

Mark