Almost one and half year ago, I wrote “Building GraphQL API with Nodejs, TypeGraphQL , Typegoose and Troubleshooting common challenges” which covers building a GraphQL API and facing most common challenges that may occur. The major challenge which I faced was N+1 problem, that was solved using Dataloader. A lot has changed since then! I have started working at Tekie as Full Stack Developer, this post covers everything we learned while building and scaling out our Backend.
Refresher: What is GraphQL
GraphQL is a query language for APIs and a runtime for fulfilling those queries with your existing data. GraphQL provides a complete and understandable description of the data in your API, gives clients the power to ask for exactly what they need and nothing more, makes it easier to evolve APIs over time, and enables powerful developer tools.
So GraphQL has two major components :
- Schema : GraphQL server uses a schema to describe the shape of request / graphs. This schema basically defines a hierarchy of types with fields that are populated from back-end data stores.
- Resolvers : Server needs to know how to populate data for every field in your schema so that it can respond to requests for that data. To accomplish this, it uses resolvers. Resolvers can be of three types-
- Queries
- Mutations
- Subscriptions
To know more, please read this post “Building GraphQL API with Nodejs, TypeGraphQL , Typegoose and Troubleshooting common challenges”.
The backstory
A year ago, when I started working at Tekie. We were more focused into building and shipping our MVP (Minimum Viable Product), Both the User facing and Internal application. We had no issues with the current architecture i.e Nodejs and GraphQL. But as we kept growing and getting more and more users onboard we started to face some latency issues from our GraphQL API.
Adoption of new features were really fast and but as our mongoDB collection size started to grow and queries kept getting more complex, latency issues were a bigger problem now. We started to scale out by increasing our MongoDB clusters and autoscaling backend infrastructure based on kubernetes and AWS. And this is exactly what you don’t want to do and burn cash at blazingly fast pace.
The usual…
Now we quickly turn our heads to optimizations from development And the first thing we explored was Dataloader as every GraphQL developer would do to reduce database trips.
After a while we found out that the way in which our backend architecture was configured it was not possible to use dataloader. Every field which we requested where actually getting resolved before the result was returned i.e Using graphQL’s field.resolve we would resolve each and every field for every individual object that existed in an array. So this was always on the nextTick of nodejs and dataloader couldn’t batch and cache those requests.
The sqaure one
Okay, After a lot of experimentation and reflection we started to test out few Queries with MongDB Aggregation. And the results are simply amazing, we had a reduction of almost 80% in our latency test.
We quickly jumped into integrating aggregation pipelines into our generic code so that every query can be optimized. As part of the plan we first decided to build a helper library to easily build aggregation pipelines but there already existed a similar library called mongodb-pipeline-builder we quickly forked the project and started building our custom solutions on top of that already existing library and published a new version mongodb-aggregation-builder.
The showdown…
As soon as we had the helper library in place, we started integrating the logic with the help of that library into our generic backend architecture. This was a major change to our architecture which can lead to production bugs. So we decided to create a custom directive called databaseController on Schema Type which would accept the mode of query.
type Topic @model @databaseController(mode: "aggregation") {
order: Int!
title: String! @trim
description: String @trim
status: ContentStatus! @defaultValue(value: "unpublished")
}
So now we were able to control which type would be allowed to query using aggregation pipeline. With this in place we were able to transform following query as seen below.
GraphQL Query:
query {
topics(filter: { status: published }, order_by: createdAt_DESC) {
id
title
courses(filter: { status: published }, first: 1, order_by: createdAt_DESC) {
id
title
}
}
}
Aggregation Pipelines :
[
{ $match : { status: "published" } },
{ $sort: { createdAt: -1 } },
{ $limit: 1000 },
{ $lookup: {
from: "Course",
let: { coursesId: "$courses.typeId" },
pipeline: [
{ $match: { $expr: { $in: ["$id", "$$coursesId" ] } } },
{ $match: { status: "published" } },
{ $sort: { createdAt: -1 } },
{ $limit: 1 },
{ $projection: { id: 1, title: 1 } }
]
as: "courses",
}
}
{ $projection: { id: 1, title: 1, courses: 1 } }
]
As a result we were able to reduce the complex query latency by an average of 50% and the number of database calls was just fired once per query 🤯 🚀 compared to 60 - 70 database calls earlier.
The more the merrier…
With the Aggregation Implemented we placed our GraphQL endpoint behind the GraphCDN services. Now with the help of GraphCDN we were able to cache our static content at an edge location. Which resulted into fewer queries hitting our backend as only dynamic data would be required and static content was always served from cache.
What's next
We were able to scale or optimize our GraphQL Backend latency and different issues. But there’s always few trade off we have to make. After a while we knew that lookup stage in aggregation pipeline are very costly operation if over used, so at that moment we just try to limit fetching relational fields.
But this continues to be a challenge for us. We now look forward to integrate Redis on Query Level with Aggregation Pipelines, architectural restructure to support Dataloader or Complex Filters inside Dataloader with Aggregation.
The Possibilities Are Endless 🔥…
Thanks for reading, stay awesome! ❤