NSO News

Latest US news, world news, sports, business, opinion, analysis and the world's leading liberal voice.

Cloud Traffic

5 min read

I recently watched
Build an enterprise-grade service mesh with Traffic Director, featuring
Stewart Reichling and Kelsey Hightower of GCP, and of course Google Cloud’s
Traffic Director. Coming at this with a brain steeped in 5½ years of AWS
technology and culture was surprising in ways that seem worth sharing.

Stewart presents the problem of a retail app’s shopping-cart checkout code. Obviously, first you need to call a
payment service. However it’s implemented, this needs to be a synchronous call because you’re not going to start any
fulfillment work until you know the payment is OK.

If you’re a big-league operation, your payment processing needs to scale and is quite likely an external service you call out
to. Which raises the questions of how you deploy and scale it, and how clients find it. Since this is GCP, both Kubernetes and
a service fabric are assumed. I’m not going to explain “service fabric” here; if you need to know go and web-search
some combination of Envoy and Istio and Linkerd.

The first thing that surprised me was Stewart talking about the difficulty of scaling the payment service’s load balancer, and
it being yet another thing in the service to configure, bearing in mind that you need health checks, and might need to load-balance
multiple services. Fair enough, I guess. Their solution was a client-local load
balancer, embedded in sidecar code in the service mesh. Wow… in such an environment, everything I think I know about
load-balancing issues is probably wrong. There seemed to be an implicit claim that client-side load balancing is a win, but I
couldn’t quite parse the argument. Counterintuitive! Need to dig into this.

And the AWS voice in the back of my head is saying “Why don’t you put your payments service behind API Gateway? Or ALB? Or
maybe even make direct calls out to a Lambda function? (Or, obviously, their GCP equivalents.) They come with
load-balancing and monitoring and error reporting built-in. And anyhow, you’re probably going to need application-level canaries,
whichever way you go.” I worry a little bit about hiding the places where the networking happens, just like I worry about ORM
hiding the SQL. Because you can’t ignore either networking or SQL.

Google Traffic Director

Traffic Director ·
It’s an interesting beast. It turns out that there’s there’s a set of APIs called “xDS”, originally
from Envoy, nicely introduced in
The universal data plane API. They manage the
kinds of things a sidecar provides: Endpoint discovery and routing, health checks, secrets, listeners. What Google has done is
arrange for gRPC to support xDS for configuration, and it seems Traffic Director can configure and deploy your services using a
combination of K8s with a service mesh, gRPC, and even on-prem stuff; plus pretty well anything that supports xDS. Which
apparently includes Google Cloud Run.

It does a lot of useful things. Things that are useful, at least, in the world where you build your distributed app by
turning potentially any arbitrary API call into a proxied load-balanced monitored logged service, via the Service Fabric.

Is this a good thing? Sometimes, I guess, otherwise people wouldn’t be putting all this work into tooling and facilitation.
When would you choose this approach to wiring services together, as opposed to consciously
building more or less everything as a service with an endpoint, in the AWS style? I don’t know. Hypothesis: You do this when
you’re already bought-in to Kubernetes, because in that context service fabric is the native integration idiom.

I was particularly impressed by how you could set up “global” routing, which means load balancing against resources that run in
multiple Google regions (which don’t mean the same things as AWS regions or Azure regions). AWS would encourage you to use
multiple AZ’s to achieve this effect.

Also there’s a lot of support for automated-deployment operations, and I don’t know if they extend the current GCP state of the
art, but they looked decent.

Finally, I once again taken aback when Stewart pointed out that with Traffic Directors, you don’t have to screw around with
iptables to get things working. I had no idea that was something people still had to do; if this makes that go away, that’s gotta
be a good thing.

Kelsey makes it go ·
Kelsey Hightower takes 14 of the video’s 47 minutes to show how you can deploy a simple demo app on your laptop then, with the
help of Traffic Director, on various combinations of virts and K8s resources and then Google Cloud Run.
It’s impressive, but as with most K8s demos, assumes that you’ve everything up and running and configured because if you didn’t
it’d take a galaxy-brain expert like Kelsey a couple of hours (probably?) to pull that together and someone like me who’s mostly
a K8s noob, who knows, but days probably.

I dunno, I’m in a minority here but damn, is that stuff ever complicated. The number of moving parts you have to have
configured just right to get “Hello world” happening is really super intimidating.

But bear in mind it’s perfectly possible that someone coming into AWS for the first time would find the configuration work
there equally scary. To do something like this on on AWS you’d spend (I think) less time doing the service configuration, but
then you’d have to get all the IAM roles and permissions wired up so that anything could talk to
anything, which can get hairy fast. I noticed the GCP preso entirely omitted access-control issues. So, all in, I don’t have
evidence to claim “Wow, this would be simpler on AWS!” — just that the number of knobs and dials was

One thing made me gasp then laugh. Kelsey said “for the next step, you just have to put this in your Go
imports, you don’t have to use it or anything:

_ "google.golang.org/xds"

I was all “WTF how can that do anything?” but then a few minutes later he started wiring endpoint URIs into config files that
began with xdi: and oh, of course. Still, is there, a bit of a code smell happening or is that just me?

Anyhow ·
If I were already doing a bunch of service-fabric stuff, I think that Traffic Director might meet some needs of today and could
become really valuable when my app started getting heterogeneous and needed to talk to various sorts of things that aren’t in the
same service mesh.

What I missed ·
Stewart’s narrative stopped after the payment, and I’d been waiting for the fulfillment part of the puzzle, because for that,
synchronous APIs quite likely aren’t what you want, event-driven and message-based asynchronous infrastructure would come into
play. Which of course what I spent a lot of time working on recently. I wonder how that fits into the K8s/service-fabric

Leave a Reply

Your email address will not be published. Required fields are marked *