Clever Engineering Blog — Always a Student

Wag: A Go Web API Generator

By Kyle VIgen on

This story begins someplace familiar to many startups: our monolithic API had become unwieldy, and we wanted to transition towards a microservice architecture. And, like other young, scrappy startups, we couldn’t afford to freeze development while we re-architected the entire system. So, instead, each time we wrote a feature we carved off the related chunk from our Node monolith and rewrote it in Go, a language we had just begun to embrace.

At first, every new Go service had its own flavor. Some used little more than the standard HTTP libraries. Others wrapped those libraries in their own handlers and injected custom middleware. A few used Thrift. And though we did accumulate some tech debt along the way, the variety was worth it. At the time, Go wasn’t just a new language to Cleverites, it was a new language to the world; we were all experimenting, learning, figuring out how to use it best.

But as the differences piled up, the benefits began to no longer outweigh the costs. Sometimes the differences were subtle, like HTTP middleware with slightly different implementations, and sometimes they weren’t, like HTTP versus Thrift, but in all cases they added cognitive overhead. Every engineer had to learn not just four different ways to write a server, but also multiple tools and libraries. And while the costs of experimentation grew, the benefits shrunk as we continued to crystallize our vision for Go microservices. It was time to standardize.

Principles

About a year and a half ago we decided to turn what we had learned into a standard for writing best-practices-infused microservices based on a couple of guiding principles:

Use HTTP

Historically, the primary argument against HTTP as an RPC protocol has been poor performance, and to some extent, it’s been a valid criticism. For most of its history you weren’t been able to do things like compress headers or multiplex HTTP requests on a TCP connection. These challenges meant that when Google and Facebook began to scale their latency-sensitive internal services to never before seen levels, they had a few options

  • Live with high TCP handshake latency
  • Maintain huge keep-alive-enabled connection pools
  • Build their own protocol optimized for the inter-service use case

They both choose the final option, building Protobufs RPC and Thrift respectively.

While Google and Facebook faced these problems inside their data centers, everyday web users encountered similar problems on the internet at large. And though web developers mostly worked around them, the industry realized that HTTP needed to be improved and started working on its successor. That effort ended in HTTP/2, a protocol that addressed end-users’ issues, while, not coincidentally, addressing the problems that had originally led Google and Facebook to build their own inter-service frameworks.

At Clever, we knew the advent of HTTP/2 as scale-ready RPC alternative meant we could mostly ignore performance. Instead, we could focus on comparing the remaining major benefit of Thrift, auto-generated client libraries, against some of the reasons we loved HTTP: unified internal and external APIs1, a rich ecosystem, built-in AWS monitoring and metrics. Since we knew many HTTP-based code generators existed, the case seemed pretty clear.

Write Code Like You’re in a Monolith

These days, if you read many tech blogs, you’ve certainly seen countless posts about how great microservices are. And at Clever we agree the benefits outweigh the costs, but we don’t forget the costs. For instance, if you were a single engineer working on a small project, would you write a single app or a slew of microservices? A single app, obviously. Why?

It’s easier: deploying one thing is easier, reviewing one thing is easier, testing one thing is easier.

It’s simpler: global stack traces, data types, and static analysis are nice; you don’t have to worry about networking.

You would only move to microservices as your product and team grew. When your “one thing” is big enough that testing, deploying, monitoring, and debugging it isn’t easier or simpler any more.

At Clever we wanted to retain as much of that feeling of a small monolith as possible, while also using microservices to help scale our team. For us, that meant a couple of things:

Auto-Generate Client Libraries and Servers With Structured Data Types

We had been writing a hodgepodge of HTTP boilerplate. Sometimes we handcrafted client libraries, and sometimes we expected callers to build their own requests. Sometimes we wrapped server handlers in a custom interface, and sometimes we used raw request objects. We weren’t really happy with any of the approaches. None of them felt like what we would write in a monolith, especially when compared our Thrift code. Thrift’s auto-generated code felt idiomatic and eliminated lots of repetitive work. We knew that the cost of bringing that with us was the engineering-time to build HTTP-based code generation (assuming we couldn’t use an off-the-shelf solution), and that investment seemed easily worth it.

Instrument and Extend the Bindings Between Services

Although you can hide a lot of the nitty-gritty of microservices behind auto-generated code, the reality of distributed systems still, inevitably, leaks in. Function calls go from taking microseconds to milliseconds; the network and dependent services expose new failure modes, stack traces disappear. These challenges have been explored in depth by companies like Netflix (with Hysterix) and Twitter (with Finagle), and I don’t want to rehash their all their ideas for solving them here, but looking at their solutions a pattern is clear: they address problems where they’re creating them, at the bindings between services. You need the power to change and instrument that code2.

Enter Wag

Once we agreed on our core principles, we began the search for a declarative format to base our code generation on. We started with Swagger, an open, well supported, though verbose, format. Overall, we liked it; the only problem was that our primary use-case was Go3, and none of the existing Go Swagger implementations captured the “feel like you’re coding in a monolith” mantra we were going for. So we looked through at a couple other frameworks, including the popular, though at the time new, gRPC. We liked gRPC — it’s interface and pluggability matched the monolith code we had in our heads — but it required HTTP/2 and used a custom serialization format (Protocol Buffers V3), and though neither of those were blockers, at that point we had analyzed the space enough to know that writing our own framework wouldn’t be too hard, especially if we leveraged the ideas, and in some cases, code, from other projects. This led us back to Swagger, and soon after, led to the birth of Wag4.

The Interface

At the core, Wag translates between the network and application code, between good RESTful design and idiomatic Go. When an engineer decides to build a service they:

  1. Define their Swagger YML
  2. Use Wag to build the bridge over to idiomatic Go.
  3. Implement the Go interface

In this example, we’ve defined this Swagger YAML to build a service to answer the kind of questions we ask at Clever: “what sections does a given student belong to”5. The generated Go code from step 2 looks something like:

type Section struct {
   ID string
   Name string
   Period string
}

type Server interface {
    // GetSectionsForStudent looks up the sections for a student
    func GetSectionsForStudent(ctx context.Context, studentID string) ([]Section, error)
}

We go about step 3 the same way we implement any other interface6.

import (
    “github.com/Clever/wag/samples/gen-go-blog/server”
    “github.com/Clever/wag/samples/gen-go-blog/models”
)

type MyServer struct { 
    db DB
}

func (m *MyServer) GetSectionsForStudent(ctx context.Context, studentID string) ([]models.Section, error) {
    return m.db.GetSectionsForStudent(ctx, studentID)
} 

Then we pass the instantiated interface into server.New to start the server.

func main() {
    m := MyServer{}
    s := server.New(&m, “localhost:6000”)
    // Serve should not return
    log.Fatal(s.Serve())
}

> curl localhost:6000/students/abc/sections

[
  {
    "id": "abc",
    "name": "Algebra",
    "period": "2"
  }
]

That’s it, no hint of the network or serialization/deserialization, just an HTTP service definition, a Go interface, and the auto-generated bridge between the two.

Adding Another Endpoint

If we later decide we want our service to answer the same question, except now about teachers, we start back on the HTTP side of the bridge and add the teacher endpoint to the Swagger YAML. When we generate the code this time, Wag overwrites the old interface with a larger one:

type Server interface {
  // GetSectionsForStudent looks up the sections for a student]
  func GetSectionsForStudent(ctx context.Context, studentID string) ([]models.Section, error)
  // GetSectionsForTeacher looks up the sections for a teacher
  func GetSectionsForTeacher(ctx context.Context, teacherID string) ([]models.Section, error)
}

And we return back to Go-land, one compiler error away from done:

cannot use &m (type *MyServer) as type server.Controller in argument to server.New: *MyServer does not implement server.Controller (missing GetSectionsForTeacher method)

 

Client Libraries

Along with a Go server, Wag also generates Go and Node client libraries. In the spirit of making it feel like you’re just using a Go interface, the Go clients are just an implementation of the server interface (and the Node clients are similar7). Theoretically you could merge the client and server into a single service without changing a line of code in either of them. The code should look familiar:

c := client.New(“localhost”)
teachers, err := c.GetSectionForTeacher(ctx, teacherID)
...

Instrumenting the Bindings Between Services

As I mentioned in “Principles” above, to have robust, debuggable microservices, you need a way to instrument the connective tissue between your services. To do this in Wag, we built an interface that allows any user-defined code to intercept HTTP requests on both the client and server.

For the server interface, we use the well established Golang pattern

func (f HandlerFunc) ServeHTTP(w ResponseWriter, r *Request)

 
On the client side, we couldn’t find any established Go patterns, so we designed one inspired by the server interface, but mirroring the Do client HTTP call.

Do(c *http.Client, r *http.Request) (*http.Response, error)

 
What we can do with this is probably best explained with an example, so let’s walk through how we use it to instrument our code for tracing.

Tracing

If you aren’t already familiar with tracing, the core idea is fairly straightforward8. It arose from the from the challenges of debugging and monitoring complex, distributed systems where a single end-user request can trigger dozens, or perhaps hundreds of internal service calls. In these environments, developers debugging issues found it hard to go from user-facing problems, like slow requests, to the root cause because “tracing” the call through all the sub-services to find the source of the problem was a massive pain. In response, developers starting tagging every sub-request with the unique identifier of the initial user request. Then they used this identifier to tie all the calls together into one large request tree, something like this.

In practice, instrumenting it for Wag meant two things.

  1. We needed a way for clients to pass their unique identifier across the network to servers
  2. We needed a transparent way to pass that unique identifier through the server code to the next client library.

We did this with a combination of HTTP headers and contexts9.

Server

http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
    traceID := r.Header.Get(“X-TRACEID”)
    if traceID == “” {
       traceID := tracing.GenerateNewTrace()
    }
    ctx = context.WithValue(r.Context(), tracing{}, traceID)
    ServeHTTP(srw, r.WithContext(ctx))
}

Client

Do(c *http.Client, r *http.Request) (*http.Response, error) {
    traceID := ctx.Value(tracing{}).(string)
    if traceID != “” {
        w.Header.Set(“X-TRACEID”, traceID)
    }
    return Do(c, r)
}

The actual code is slightly more complicated, but still, barely over a hundred lines to implement, and once you write it once, you get in all your services automatically!

Wag Today

When we first thought about the challenges that would eventually lead us to build Wag, we just focused on standardizing on an RPC protocol and making sure we didn’t have too many ways to do the same thing. But as we explored the space, we realized that we could be a bit more ambitious; we could try to build a framework for, idiomatic and robust Go HTTP microservices. Zoom forward to today, and we’re really happy with the results.

We’ve been running Wag in production for over a year now, and it powers almost every Go service we’ve written since. We’ve also migrated many of our highest traffic services to it. Every time a student logs into a learning application that request makes its way through many Wag-based services, at peak, tens of thousands of times a second.

That all said, we also still see lots of opportunities to make Wag better. We want to add more resiliency safeguards, extend tracing to the database, and make testing easier, just to name a few.

If you’re interested in helping make Wag even better or in any of the the other challenges we’re facing at Clever, check out our jobs page and apply!

Footnotes

1 Like everyone else, our external APIs are HTTP based

2 Over the past year, some companies have embraced service meshes, as opposed to “fat clients”, to solve this problem. We haven’t moved in that direction yet, but we’re keeping our eye on the space.

3 To fully realize the benefits of microservices, we knew we wanted multi-language support eventually, but to get broad internal adoption we needed a solid Go foundation. Given that swagger provides support for other lots of other languages, we knew we weren’t painting ourselves into a corner.

4 Originally, we picked the name by just pulling random letters out of Swagger to make it shorter, but it didn’t take too long for us to discover the backronym we use today: “Web API Generator”.

5 This example is inspired by actual code we have.

6 Where we have a magic DB object that answers all our questions.

7 For more information on Node, check out using the JavaScript client.

8 For a deeper dive I would recommend Google’s Dapper paper, on the open-source work it inspired like Zipkin and open tracing.

9 In Node we replace Go’s context with the Express request object.

Kyle VIgen