Automating the Migration From JS to TS for the ZK Framework

I was recently involved in the TypeScript migration of the ZK Framework. For those who are new to ZK, ZK is the Java counterpart of the Node.js stack; i.e., ZK is a Java full-stack web framework where you can implement event callbacks in Java and control frontend UI with Java alone. Over more than a decade of development and expansion, we have reached a code base of more than 50K JavaScript and over 400K Java code, but we noticed that we are spending almost the same amount of time and effort in maintaining Java and JavaScript code, which means, in our project, JavaScript is 8 times harder to maintain than Java.

I would like to share the reason we made the move to migrate from JavaScript to TypeScript, the options we evaluated, how we automated a large part of the migration, and how it changed the way we work and gave us confidence.

The Problem

ZK has been a server-centric solution for more than a decade. In recent years, we noticed the need for cloud-native support and have made this the main goal of our upcoming new version, ZK 10. The new feature will alleviate servers’ burden by transferring much of the model-view-model bindings to the client side so that the server side becomes as stateless as possible. This brings benefits such as reduced server memory consumption, simplified load balancing for ZK 10 clustered backends, and potentially easier integration with other frontend frameworks.

We call this effort “Client MVVM.” However, this implies huge growth of JavaScript code. As we are already aware that JavaScript is harder to maintain, it is high time that we made our JavaScript codebase easier to work with at 50k lines of code. Otherwise, extending the existing JavaScript code with the whole MVVM stack will become Sisyphean, if not impossible. We started to look at why Java has higher productivity and how we can bring the same productivity to our client side.

Why Does Java Beat JavaScript at Large-Scale Development?

What did Java get right to enable us an 8x boost in productivity? We conclude that the availability of static analysis is the primary factor.

We design and write programs long before programs are executed and often before compilation. Normally, we refactor, implement new features, and fix bugs by modifying source code instead of modifying the compiler-generated machine code or the memory of the live program. That is, programmers analyze programs statically (before execution) as opposed to dynamically (during execution).

Not only is static analysis more natural to humans, but static analysis is also easier to automate. Nowadays, compilers not only generate machine code from source code but also perform the sort of analysis that humans would do on source code like name resolution, initialization guards, dead-code analysis, etc.

Humans can still perform static analysis on JavaScript code. However, without the help of automated static analyzers (compilers and linters), reasoning with JavaScript code becomes extremely error-prone and time-consuming. What value does the following JavaScript function return? It’s actually undefined instead of 1. Surprised?

function f() {
  return
    1
}

Compare this with Java, where we have the compiler to aid our reasoning “as we type.” With TypeScript, the compiler will perform “automatic semicolon insertion” analysis followed by dead code analysis, yielding:

Humans can never beat the meticulousness of machines. By delegating this sort of monotonous but critical tasks to machines, we can free up a huge amount of time while achieving unprecedented reliability.

How Can We Enable Static Analysis for JavaScript?

We evaluated the following 6 options and settled on TypeScript due to its extensive ECMA standard conformance, complete support for all mainstream JS module systems, and massive ecosystem. We provide a comparison of them at the end of the article. Here is a short synopsis.

Google’s Closure Compiler: All types are specified in JSDoc, thereby bloating code and making inline type assertion very clumsy
Facebook’s Flow: A much smaller ecosystem in terms of tooling and libraries compared to TypeScript
Microsoft’s TypeScript: The most mature and complete solution
Scala.js: Subpar; emitted JavaScript code
ReScript: Requires a paradigm shift to purely functional programming; otherwise, very promising

Semi-Automated Migration to TypeScript

Prior to the TypeScript migration, our JavaScript code largely consisted of prototype inheritance via our ad-hoc zk.$extends function, as shown on the left-hand side. We intend to transform it to the semantically equivalent TypeScript snippet on the right-hand side.

Module.Class = zk.$extends(Super, {
  field: 1,
  field_: 2,
  _field: 3,

  $define: {
    field2: function () {
      // Do something in setter.
    },
  },

  $init: function() {},

  method: function() {},
  method_: function() {},
  _method: function() {},
}, {
  staticField: 1,
  staticField_: 2,
  _staticField: 3,

  staticMethod: function() {},
  staticMethod_: function() {},
  _staticMethod: function() {},
});

export namespace Module {
  @decorator('meta-data')
  export class Class extends Super {
    public field = 1;
    protected field_ = 2;
    private _field = 3;

    private _field2?: T;
    public getField2(): T | undefined {
      return this._field2;
    }
    public setField2(field2: T): this {
      const old = this._field2;
      this._field2 = field2;
      if (old !== field2) {
        // Do something in setter.
      }
      return this;
    }

    public constructor() {
      super();
    }

    public method() {}
    protected method_() {}
    private _method() {}

    public static staticField = 1;
    protected static staticField_ = 2;
    private static _staticField = 3;

    public static staticMethod() {}
    protected static staticMethod_() {}
    private static _staticMethod() {}
  }
}

There are hundreds of such cases among which many have close to 50 properties. If we were to rewrite manually, it would not only take a very long time but be riddled with typos. Upon closer inspection, the transformation rules are quite straightforward. It should be subject to automation! Then, the process would be fast and reliable.

Indeed, it is a matter of parsing the original JavaScript code into an abstract syntax tree (AST), modifying the AST according to some specific rules, and consolidating the modified AST into formatted source code.

Fortunately, there is jscodeshift that does the parsing and consolidation of source code and provides a set of useful APIs for AST modification. Furthermore, there is AST Explorer that acts as a real-time IDE for jscodeshift so we can develop our jscodeshift transformation script productively. Better yet, we can author a custom typescript-eslint rule that spawns the jscodeshift script upon the presence of zk.$extends. Then, we can automatically apply the transformation to the whole codebase with the command eslint --fix.

Let’s turn to the type T in the example above. Since jscodeshift presents us with the lossless AST (including comments), we can author a visitor that extracts the @return JSDoc of getter() if it can be found; if not, we can let the visitor walk into the method body of getter() and try to deduce the type T, e.g., deduce T to be string if the return value of getter() is the concatenation of this._field2 with some string. If still no avail, specify T as void, so that after jscodeshift is applied, the TypeScript compiler will warn us about a type mismatch. This way we can perform as much automated inference as possible before manual intervention and the sections required for manual inspection will be accurately surfaced by the compiler due to our fault injection.

Besides whole file transformations like jscodeshift that can only run in batch mode, the typescript-eslint project allows us to author small and precise rules that update source code in an IDE, like VSCode, in real-time. For instance, we can author a rule that marks properties of classes or namespaces that begin or end with single underscores as @internal, so that documentation extraction tools and type definition bundlers can ignore them:

export namespace N {
  export function _helper() {}
  export class A {
    /**
     * Description ...
     */
    protected doSomething_() {}
  }
}

export namespace N {
  /** @internal */
  export function _helper() {}
  export class A {
    /**
     * Description ...
     * @internal
     */
    protected doSomething_() {}
  }
}

Regarding the example above, one would have to determine the existence of property-associating JSDoc, the pre-existence of the @internal tag, and the position to insert the @internal tag if missing. Since typescript-eslint also presents us with a lossless AST, it is easy to find the associating JSDoc of class or namespace properties. The only non-trivial task left is to parse, transform, and consolidate JSDoc fragments. Fortunately, this can be achieved with the TSDoc parser. Similar to activating jscodeshift via typescript-eslint in the first example, this second example is a case of delegating JSDoc transformation to the TSDoc parser upon a typescript-eslint rule match.

With sufficient knowledge of JavaScript, TypeScript, and their build systems, one can utilize jscodeshift, typescript-eslint, AST Explorer, and the TSDoc parser to make further semantic guarantees of one’s codebase, and whenever possible, automate the fix with the handy eslint --fix command. The importance of static analysis cannot be emphasized enough!

Bravo! Zk 10 Has Completely Migrated to TypeScript

For ZK 10, we have actively undergone static analysis with TypeScript for all existing JavaScript code in our codebase. Not only were we able to fix existing errors (some are automatic with eslint --fix), thanks to the typescript-eslint project that enables lots of extra type-aware rules, we also wrote our own rules, and we are guaranteed to never make those mistakes ever again in the future. This means less mental burden and a better conscience for the ZK development team.

Our Client MVVM effort also becomes much more manageable with TypeScript in place. The development experience is close to that of Java. In fact, some aspects are even better, as TypeScript has better type narrowing, structural typing, refinement types via literal types, and intersection/union types.

As for our users, ZK 10 has become more reliable. Furthermore, our type definitions are freely available, so that ZK 10 users can customize the ZK frontend components with ease and confidence. In addition, users can scale their applications during execution with Client MVVM. Adopting TypeScript in ZK 10 further enables us to scale correctness during development. Both are fundamental improvements.

Annex: Comparing Static Typing Solutions for JavaScript

Google’s Closure Compiler

Type system soundness unknown; Assumed as unsound, as sound type systems are rare
@interface denotes nominal types whereas @record denotes structural types
All type annotations are specified in comments leading to code bloat, and comments often go out of sync with the code.
Most advanced and aggressive code optimization among all options listed here
Find more information on GitHub

Facebook’s Flow

Unsound type system
Nominal types for ES6 classes and structural types for everything else, unlike TypeScript where all types are structural; whereas in Java, all types are nominal
Compared to TypeScript, Flow has a much smaller ecosystem in terms of tooling (compatible formatter, linter, IDE plugin) and libraries (TypeScript even has the DefinitelyTyped project to host type definitions on NPM)
Find more information in Flow Documentation

Microsoft’s TypeScript

Supports all JavaScript features and follows the ECMA standard closely even for subtleties: class fields and TC39 decorators
Seamless interoperation between all mainstream JavaScript module systems: ES modules, CommonJS, AMD, and UMD
Unsound type system
All types are structural, which is the most natural way to model dynamic types statically, but the ability to mark certain types as nominal would be good to have. Flow and the Closure Compiler have an edge in this respect.
Also supports Closure-Compiler-style type annotations in comments
Best-in-class tooling and a massive ecosystem; built-in support by VSCode; hence, its availability is almost ubiquitous
Each enum variant is a separate subtype, unlike all other type systems we have ever encountered, including Rust, Scala 3, Lean 4, and Coq
Find more information in The TypeScript Handbook

Scala.js

Leverages the awesome type system of Scala 3, which is sound
Seamlessly shares build scripts (sbt) and code with any Scala 3 project
The emitted JavaScript code is often bloated and sometimes less efficient than that of the Closure Compiler, Flow, and TypeScript.
Learn more on the Scala.js site

ReScript

Touted to have a sound type system (where is the proof?) like that of Scala 3, but the syntax of ReScript is closer to JavaScript and OCaml
The type system is highly regular like all languages in the ML family, allowing for efficient type checking, fast JavaScript emission, and aggressive optimizations.
The emitted JavaScript code is very readable. This is a design goal of ReScript.
Interoperation with TypeScript via genType
As of ReScript 10.1, async/await is supported.
Might require familiarity with more advanced functional programming techniques and purely functional data structures
Learn more in the ReScript Language Manual documentation

Source link