Type-Level GraphQL Parsing and Desirable Features for TypeScript

Introduction

The story around TypeScript’s interoperability with GraphQL is far from picture-perfect. Between libraries, language services and bundler plugins, we have options––options such as the Guild’s tools, Nexus, TypeGraphQL, QuickType, and more. However, while they let us avoid defining types twice for both type systems, they all fail to let us do so without codegen.

I don’t enjoy configuring and regularly executing codegen scripts. But I love TypeScript. I want the language––by itself––to hold the solution to type-level interoperability with others. And the pieces seemingly began to come together:

With this release, we became capable of slicing and recombining strings within the type system. Also part of the release: we became capable of embedding conditional logic within recursive types. The door to new type-safe experiences swung off its hinges.

A few tiny type-level parsers came out of the woodworks––most notably, ts-sql. I believed it was only a matter of time before someone bridged the GraphQL and TypeScript type systems using template string types, and I was impatient for that person to come around, so I figured I’d take a crack at giving myself and other developers a unified experience:

  1. From a single source of truth––the GraphQL schema––defined as a TypeScript string literal…

2. … we could define type-checked resolvers…

3. … and GraphQL documents as string literals, which could be type-checked against the schema…

4. … which would allow overlain libraries to obtain the correct signature of requests.

It seemed that the first step to building such an experience would be to create a reliable type-level parser.

This turned into a grueling battle with the TypeScript compiler’s recursion limiter and with shortcomings of the language, which I’ll detail. Let’s dive in.

Structure

How do we describe the building blocks of our type-level AST?

Let’s say we’re defining a GraphQL input object. Here is the GraphQL source:

Its AST node counterpart would look as follows:

The outer-most node––an input object type definition––contains a name and a list of fields––input value definitions––, which each contain a name and named type.

In representing these nodes within the type system, we may want to begin with an atom, such as the leaf node Name.

In other nodes, we’ll want to pass in Name type instances via type params. For example, the NamedType node:

What do we write in place of the constraint (currently place-held by an ellipsis)? To constrain the Name param, we’ll need a “widest” instance of Name, so that any subtype of Name is valid.

This kind of namespacing allows us to group our nodes and their widest-instance counterparts more legibly.

For good hygiene, it often makes sense to have even more deeply-nested namespacing. Value.Object.Field for example:

We can re-use this structure to organize our node-specific parsing logic. For instance, we can place the argument parsing logic relative to the Argument AST node definition.

Meanwhile, Argument.Parse can make use of other Parse types––in this case Value.Parse.

Many nodes will look similar to the following, with a description, an identifier, a kind, and some directives.

Aside from kind––which is a hard-coded literal type––every field is generic. We can think of these generic types as factories. The runtime equivalent might look as follows.

Techniques

A quick look at string parsing. Let’s say we want to split a string at its first space, we can do it like so.

We use this template literal syntax to match and (essentially) “destructure” by the first occurrence of a character (in this case a space).

If we want to do this for a longer sequence of words, we can create a recursive Split type.

The parsing continues until there are no more spaces to match, at which point we return the final iteration’s type param ("today?”), wrapped with brackets, as to be spread in the return type of its parent call.

This unraveling of a source string is the foundation of our parser. However, there is a distinct difference between runtime string parsing and type-level string parsing, and it’s a difference of which we’ll need be wary: we want to parse one AST. When it comes to TypeScript 4.1’s pattern matching, this can be difficult.

Let’s say we’re parsing the following Src string and we want to gather whatever characters precede the first instance of whitespace.

Whitespace can be a line break, a space or a tab. How would we go about gathering the text following the leading whitespace?

Your first instinct might be to match on Src like so:

If we take a look at the tuple type contained within Parsed, we’ll see it’s not quite what we expected.

We ended up with a tuple containing two string literal unions. These unions contain the possibilities as if we’d passed in each whitespace kind independently. With respect to type systems, this is desirable behavior. Yet it does throw a wrench in our approach to certain kinds of parsing: if we don’t account for branching, we can end up with an AST type that represents multiple ASTs (or even thousands!). How do we safeguard against unintentional variance?

From here on, we’ll want to use a dedicated Match utility. Your first instinct may be to create Match as a recursive type that iterates over a string and checks whether one of the delimiters matches the left-shrinking slice (n²). Already, recursion limiting is of concern. If we use such a Match utility within other recursive types and their lineage (for instance Schema.ParseObject.ParseField.ParseInput.ParseDirective.ParseValue.Parse) our CPU could very well turn into a modern-day Chernobyl plant. I’ll share the solution which worked best for me. While it is just as complex, it incurs less recursion limiting, ostensibly because of the checker’s mapped-type optimizations.

1. Create a Match type for convenience of later access (M extends Match<infer X, infer Y, infer Z> is simple to reason about within conditional branches).

2. Create a type which accepts a source string and a union of delimiter strings, and gives us the union of delimiter-leading strings.

For instance, Leading<”a b c d”, “a” | “b” | “c” | “d”> would give us “” | “a “ | “a b “ | “a b c “.

3. Create our Match type as follows.

And Voila! We’ll be using the Match.First utility and Match types in most of our node-specific parsers. Here’s a quick look at how we can use this type:

So, we’ve outlined our approach to node factories and to safely matching the first occurrence of a member of a union of delimiters. Now, let’s parse some GraphQL source! Specifically, let’s parse the fields contained within an enum. Here’s a complex example of what we may encounter:

The CAT option has the directive mean, which accepts no input. The HAMSTER option has a description. And the DOG option has a description and two directives, one of (breed) which accepts an input with a single field name.

Building off of the aforementioned Argument.Parse, we’ll first need to define a Directive.SequenceAndTail.Parse. This utility type will accept whatever slice follows the enum value, and give us a list of directives (could be empty) followed by the tail (the next enum option and beyond).

Our Directive & its widened type will look like this:

Next we’ll create a type that encapsulates the return of our Directive.SequenceAndTail.Parse. A tuple should be just fine.

And finally, we’ll defineParse.

Let’s break this down a bit.

  1. The type params are (A) Src, the source string in need of parsing, (B) Directives, the previously accumulated list of directives to be returned upon encountering a non-directive, (C) Trimmed, Src, stripped of its leading whitespace, and (D) FirstMatch, which defaults to the result of Match.First, checking for the @ symbol (signaling a directive) or whitespace on Trimmed.
  2. If the first match is in fact on whitespace, then there’s no leading directive, and we can return whatever’s been accumulated in a Directive.SequenceAndTail. Otherwise, we’ll want to gather the name and (optional) arguments from the currently-visited leading directive before passing the formed Directive node into the next SequenceAndTail.Parse call.

Now that we defined the directive and tail parsing, we’ll want to do the same for descriptions.

And finally, we can dive into Enum.Value.Sequence.Parse:

First, we define Enum.Value.NameAndTail.Parse to gather the leading value name and its tail (which may contain a directive or sequence of directives).

And then we tie the sub-parsers together. Notice how their tail-calls coalesce quite legibly.

But wait!

Type instantiation is excessively deep and possibly infinite.

Although we know that Src is eventually whittled down to an empty string, and that the Tail2 extends “” check protects us from infinity, the checker is cramping our style. This simple trick gives us the upper hand:

1. Define a SafeTail utility type.

2. Use the SafeTail utility to (effectively) re-alias Tail2. Our parse type will now look like this:

And with that, we say goodbye to pre-emptive recursion limiting.

This is the essence of any type-level parsers in TS 4.1: we can define recursive tail call types, which produce wrappers of specific node types, which can be neatly unwrapped inside the conditional branches of other types. When we encounter recursion limiting, we use SafeTail, and we slowly build up to our elder nodes.

The Requiem

You can get all the way to the tippy top of your tree––all the way to the elusive Schema.Parse and Document.Parse utility types––only to have your dreams crushed upon instantiation (as mine were).

Type instantiation is excessively deep and possibly infinite.

If matching first-characters wasn’t so complex, perhaps we could get further. Perhaps we could parse 100-LOC schemas. Maybe even 1000-LOC schemas. But what happens if we’re trying to work with a schema such as GitHub’s (30K+ LOC)?

Even if we could overcome the recursion limiting, this would be very taxing on semantic analysis. The checker is NOT incremental. According to Ryan Cavanaugh (one of the TypeScript program managers)

semantic information produced from type computation is recomputed fresh on each edit.

Although the GraphQL source may remain unchanged between edits, its computed structure is not preserved.

Cavanaugh continued…

The intent of template literals is not to use them to write parsers using the type system. I can’t stress this enough; I know everyone was doing it for fun in various issues and on twitter, but this is not what they’re for and we’re not going to expend engineering effort to enable this scenario. Transforms like this should happen out-of-band in processing steps once when the GraphQL changes; by doing this in the type system you’re redoing this work literally on every keystroke.

I asked for more detail:

I’m [me, Harry] still uncertain about why moving parsing into TypeScript — as to unify the environments — would be undesirable. To me it feels like a natural step. I’d love to know why you feel like this would be a mistake.

His response seemed to address mechanics, not ideology.

Moving parsing (for the sake of generating .d.ts) into the TypeScript pipeline is perhaps a thing we will consider eventually, but that isn’t what template literals is for and it isn’t designed to support that level of modification. It’s for trivial transforms like foo -> getFoo.

Moreover:

Incremental semantic analysis is what flow is trying to do and… they’re not really able to do much else except making that work 🙂. It’s a big trade-off.

Parsing with template literals is fine from a correctness perspective, but you’re going to encounter a lot of performance issues doing this on too many large strings, and various depth limiters are subject to change from version to version as we try to stop runaway or useless recursion in other scenarios. If we ever make something that’s specifically for turning external DSLs into something TS understands, we’ll make it very clear that that’s what it’s for.

And while that subtle jab at Flow made me smile, the pursuit of my DX vision came to a grinding halt. It was officially time to stop development and write a blog post or something.

Aside from performance, there doesn’t seem to be a “strict” reason to not map DSL types into the TypeScript environment. However, it does seem aggravating that we’d need trade battle-tested open source parsers for those written in the extremely-limited type-level language. It also seems that––putting performance hurtles aside––this mapping “should” be possible. What would need to come into place to ease the creation of type-level parsers? Let’s now speculate about the future of TypeScript?

Potential Future TypeScript Features

Matching

To be conservative in the face of recursion limits, we created a parser without a lexing step. This isn’t necessarily preferable. Normally, we might prefer to define and match rules to a stream of tokens. If we were to define such rules in the type system (imagining type-level parser generators), we would likely represent these rules as tuples.

In a single GraphQL object field––for instance––we have a group of characters, followed by a colon, followed by camelcase characters, followed by an optional sequence of directives, which may or may not contain arguments. How would we try to represent this rule? Spoiler: we cannot, and the following is invalid.

Although the above does not work, the hope would be that we could iterate over the tokens and check––in order––which of our rules (such as FieldRule) match. From there, we could decide whether to process the next tokens as belonging to a sibling node, or decide to descend or ascent.

Beyond the illegal treatment of arrays, the code above assumes that AlphaCharGroup and CamelCaseCharGroup can be matched like subtypes. Aka. “HelloWorld” extends CamelCaseCharGroup. This is not the case. Let’s look at another example: differentiating between an integer string and others. How would we model the type-level check to determine whether a string is of an integer?

First we would create an AreDigits utility type, which returns true if all characters in Src are digits.

Next we would use AreDigits within an IsInt type (which allows for a single leading sign).

How would we then use this utility type to match an integer string?

We could of course check to see if IsInt<”1618”> extends true. But we cannot use IsInt as part of a pattern (for instance X extends `hello ${IsInt}` extends true). This is very limiting in the world of parsers. Fortunately, this could soon change for TypeScript, should “Conditional Assignability”––a proposed language feature––make its way into future versions of TypeScript.

Here is a conditional assignability snippet written by Anders Hejlsberg (TypeScript lead architect / Obi-Wan Anders):

The look of this is strikingly similar to that of parser combinators. Conditional assignability would give us a greater simplicity leap than that of promise callbacks to async-await. Moreover, we could begin to model type-level parsers as rule-based.

For now––however––we’re stuck. And even if we could effectively match arrangements by constraints other than subtype, look-ahead is limited. This makes it difficult to mark regions as off-limits for certain kinds of processing (such as in comments and descriptions). When it comes to comments, the temporary solution is to do a first-pass strip of all commented lines (splitting the GraphQL source by #, recursively stripping trailing text until a new line, and then recombining the remaining text). This is costly.

Errors

Who doesn’t love a good “never-hunt”––that is, debugging unintentional bottom-up never propagation. Much of the time, this has to do with conditional branches for which we did not want to account in the first place.

Why is it that we end up writing never?

Moreover, why is it that––when we have multiple conditions that evaluate to the same branch––that we need repeat them?

For instance, let’s say we have three booleans, and we want a type that evaluates to the number of truthy. If writing runtime code, we’d do something along the following lines.

This makes sense. This is easy to read. Meanwhile, the type-level equivalent is harder to grok.

When we have deep nesting of conditions, this becomes unwieldy, as the size of our code can become exponentially greater with each level of nesting. While the technique I showed (result-and-tail-type unwrapping) somewhat helps with this, the presence of excessive statements is stark. If you want to push for cleaner expression of conditional types, please upvote this issue.

Aside from the ergonomics of never-”domino-ing” conditionals, we lack type-level error throwing. While there is a draft PR out to address this shortcoming, it is unclear whether this is currently a priority to the TypeScript team. Meanwhile, I put out my own proposal on type-level erroring, although I’m not sure if I fully stand by it in retrospect.

Type-level erroring would allow developers to communicate unmet constraints within conditional types, which would likely take over for ESLint in the long-term. As of recently, I’ve stopped using ESLint. I still use Prettier (I soon hope to switch to DPrint), but the TypeScript compiler covers most of my static analysis needs. ESLint is starting to seem trivial.

Documentation

Type-level documentation is one of the most underrated features of TypeScript: the documentation flows through re-aliasing and mapped types as if part of the language itself (and if you’re judging based on the compiler internals, it is!).

While it abides by a particular control flow, documentation is not accessible within the type system, and it’s difficult to be deliberate about where and how to assign and validate the documentation.

For the GraphQL use case, this is a major problem, as we want the GraphQL descriptions to flow into the TypeScript environment.

Let’s say we have an object type Person.

In the resulting TypeScript type Person, we would want the name field to contain type-level documentation drawn from the GraphQL field description. We currently have no way of specifying this behavior. Please upvote this proposal to help secure type-level documentation’s place in the future of TypeScript.

Conclusion

There’s a lot of fun to be had with type-level programming.

There’s also a lot of excess. While I love the idea that––through the type system––one can make an API impossible to misuse, there is a point at which type-level considerations hinder productivity. I’m trying hard to steer clear of the “ideal” in favor of the “actually completed and released.” I’m coming to terms with the fact that this is my major shortcoming as a programmer: I don’t always code with an explicit purpose or achievable goal. I code to explore uncharted territory to trigger the release of dopamine in my brain.

Through this process and through interacting in TypeScript issues, I met someone with whom I’ve had some of the very best of conversations: an incredibly creative and thoughtful individual by the alias of tjjfvi. Definitely worth following him/her on GitHub. They have chosen to remain anonymous. Could be Elvis on a private island for all we know. Death faked. Mojito in hand. Palm fronds rustlang… I mean rustling in the wind.

Thank you all for reading. Here is my twitter. My mom is my only follower. Please help a guy out.

Tech & futurism geek