Ivan Chebykin | Post

← Back

openapic: openapi compiler like protoc

Introduction

No matter how large or small is your project, dealing with APIs, SDK, and integrations is always a burden. There is a lot of boilerplate code, you have to write clients/servers, track protocol updates, and so on. To solve the boilerplate issue you can use code generators, and there are a lot of great tools for that, like openapi-generator. OpenAPI itself is a great way to define API without having to duplicate models in different languages, I'm biased here, I'm a fan of schema-first development, so I think this tool is quite handy.

Last year, my friend and I started writing Mify to save time not only on writing clients and servers but also on more specific boilerplate for backend services, like dependency initialization, configuration, logging, and basic service structure. I'll expand more on Mify in a separate article, right now I think more focus is needed on API SDK generators.

We took openapi-generator to handle server/client generation for REST APIs. We're using it in a specific way:

  1. Continuously updating API-related codegen on schema changes. I think this is a common use case, we used it similarly before in products we've been working on.
  2. Overridden templates to add our custom code.
  3. Separated generated handlers from other generated code so that users can edit them, but not edit the internal code.

openapi-generator does its job, but for this use case, it has problems. Granted, we may have used it in the wrong way, I've got some feedback, but I've wanted to highlight some noticeable problems, and propose a way to solve them to improve the generator. Here are the problems we've seen:

  • Probably biased, but for us mustache and handlebars formats are quite complex and unreadable.
  • Hard to embed in another tooling, it has a dependency on Java runtime, and it can’t be used as a library. We settled on using it as a CLI wrapped in a docker container, which adds overhead and reduces performance.
  • Too many generated files, code is mixed with implementation, different helpers, there is no distinction between editable code and non-editable, essentially openapi-generator has a fire-and-forget model by default.
  • There are a lot of generators for the same language with different server/client implementation that they have to support.
  • And partly because of the previous point, it has poor support for the generated code, there are always some bugs and omissions in implementations.

Because of these problems, we want to propose a different kind of generator, which is more akin to a compiler for openapi schemas. Essentially, this would be a protobuf-compiler for OpenAPI.

Before starting working on it, however, I want to check with the community to get feedback on this and see if we’re missing something. This post is essentially a design doc for this compiler.

Right now, we have a repository with a PoC for it, not much is there, it's just the skeleton for the most part, but you can still check it out: https://github.com/mify-io/openapic/.

Goals

Here’s the list of goals that we want to have in openapic:

  • Embeddable and usable as a library: We’ll write it in Rust for that, and it will also be fast because of this (at least from the start-up time perspective).
  • Minimal amount of generated code, only necessary code for types, serialization, and deserialization.
  • Plugins support.
  • Less moving parts in generated code, ideally we don’t want to allow changing templates on the fly, customization should be done with plugins.

Next, I’ll describe implementation in detail.

High-level architecture

alt

This is a high-level description of the generator. Mainly it will be split into two components:

  • frontendc - this module consists of two components:
    • SchemaValidator - processing the arguments and checking the schema.
    • SchemaNormalizer - Resolver to read all refs in a single schema. I'll expand on this later.
  • backendc - this is where all the rendering logic will be located. We would have backends roughly per each language, they will receive schema and arguments through stdin via the CompileRequest struct, which will contain already processed schema.

Generator input and output

The generator will take one OpenAPI schema yaml file at a time. By yaml schema, we understand the usual OpenAPI file like this:

# myservice_api.yaml
openapi: "3.0.0"
info:
…
paths:
  /api/v1/something:
    ..
components:
  schemas:
    TypeX:

For this file, we’ll generate models with a client, or a server without creating a lot of files, one for models, one for client/server:

myservice_api_models.oapi.<lang>
myservice_api_client.oapi.<lang>
myservice_api_server.oapi.<lang>

CLI Usage

CLI interface should be pretty straightforward:

openapic generate models|client|server|both --output-path=dir [--package-name=package]

Open to more suggestions on this, and what else could be useful to add.

Processing the schema

Now, here’s a rough description of how we’ll process the OpenAPI schema. I’ll use pseudocode as an example of how the generated code might look like.

Schema normalization

OpenAPI allows users to split the schema definition into multiple files. Any component defined in the schema can have a reference pointing to a separate file. To simplify the schema rendering first we want to gather all components in one place, and this is what we'll mainly do in the frontendc layer. The Backendc layer will just take the processed schema and use it for rendering. Now let's go over to rendering.

Models

For each model, we’ll generate a model definition and serialization/deserialization logic with defaults and some validation:

struct CreatePetRequestBody {
    pet_name string
    pet_type int
}

serialize(CreatePetRequestBody r) JSON {
    // In the serialize function we'll generate the setters for default values
    r.pet_type = 1; // default (cat)
}

deserialize(JSON json) CreatePetRequestBody {
    // same in the deserialize function
    CreatePetRequestBody r;
    if json["pet_type"] == nil {
        r.pet_type = 1;
    }
    ...
    return r;
}

As for request/response models, I think that we need to avoid creating methods with many parameters because this will make backward compatibility harder during code regeneration. So each function will take a single struct for a request, e.g. CreatePetRequest which will contain headers, query/path params, and the request body inside. Same for the response.

struct CreatePetRequest {
    Body CreatePetRequestBody
    Headers CreatePetRequestHeaders
    Params CreatePetRequestParams
}

Let’s move to the client and server implementation.

Client

First, for both client and server we’ll take the list of operations, and, either using the operationId or path + method, we’ll get the list of possible hander names and create interfaces:

interface PetClient {
    // operationId: createPet
    // or we can use path, e.g. if we have path /pets/ with post method, it'll be petsPost
    CreatePetResponse createPet(CreatePetRequest req);
}

interface PetServer {
    CreatePetResponse createPet(CreatePetRequest req);
}

For the client, we’ll provide the default implementation, which will essentially boil down to this:

ClientImpl {
    CreatePetResponse createPet(CreatePetRequest req) {
        req_json = serialize(req);
        resp_json = http_client.make_request(method, req_json);
        return deserialize(resp_json);
    }
}

Because we’ll provide an interface for it, anyone will be able to wrap the client into something that can collect logs and metrics, or do any additional logic you want.

Server

For the server code, we don’t want to generate the full implementation for handling and parsing requests as it will force us to support multiple frameworks for each language, and it will always constrain users.

What we’re thinking of is generating framework-agnostic components to help people quickly build the server. So, as described before, we’ll generate models with serialization/deserialization, for types, requests, and responses, we’ll provide an interface for service, and we’ll have some kind of mapping of paths to operations, to help you map routes.

Pseudocode:

route_map = {
    "/pets": createPet,
}

// your server code
for route, operation_func in route_map {
    router.add(route, handler_wrapper(operation_func));
}

handler_wrapper(httpRequest, operation_func) {
    req_json = deserialize(http_request);
    resp = operation_func(req_json);
    send_response(serialize(resp));
}

Plugins or backends

We want to have support for plugins as well. Plugins are essentially our backendc layer. They will be implemented as separate executables, which we’ll be calling from openapic, providing prepared schema data in request. In a plugin, you’d be able to parse additional fields defined as x-custom-field in a schema and generate code to a different language.

Embedding the generator

Rust allows for simple embedding of library API, there are a lot of bindings generators, like pyo3 for Python.

So that's it for the overview, I'm looking forward to more discussion about this project, feel free to leave comments and visit our Discord channel and check the current repo.