A place to cache linked articles (think custom and personal wayback machine)
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

index.md 43KB

4 lat temu
12345
  1. title: From REST to GraphQL
  2. url: https://blog.jacobwgillespie.com/from-rest-to-graphql-b4e95e94c26b
  3. hash_url: 69b00e4dda3dc06e585c2b7bc6bf1fc2
  4. <p name="b73a" id="b73a" class="graf--p graf-after--h3"><strong class="markup--strong markup--p-strong">Disclaimer:</strong> GraphQL is still new and best practices are still emerging. This post describes some of my journey with implementing a GraphQL backend service, so it is a snapshot of what I’ve learned so far, presented in the hopes that it will be useful to others. Also, some of the specific real-world implementation details internal to Playlist have been paraphrased / simplified / anonymized for obvious reasons.</p><p name="2d84" id="2d84" class="graf--p graf-after--p">This post assumes a basic familiarity with GraphQL. If you are not already familiar with GraphQL:</p><div name="52de" id="52de" class="graf--mixtapeEmbed graf-after--p graf--last"><a href="https://code.facebook.com/posts/1691455094417024/graphql-a-data-query-language/" class="js-mixtapeImage mixtapeImage u-ignoreBlock" data-media-id="d108e529edd4185fa604ce6c2952d9ce" data-thumbnail-img-id="0*gAMwEKtvS0iiv-F_." style="background-image: url(https://d262ilb51hltx0.cloudfront.net/max/1200/0*gAMwEKtvS0iiv-F_.);"></a><a href="https://code.facebook.com/posts/1691455094417024/graphql-a-data-query-language/" data-href="https://code.facebook.com/posts/1691455094417024/graphql-a-data-query-language/" class="markup--anchor markup--mixtapeEmbed-anchor" title="https://code.facebook.com/posts/1691455094417024/graphql-a-data-query-language/" rel="nofollow"><strong class="markup--strong markup--mixtapeEmbed-strong">GraphQL: A data query language</strong><br><em class="markup--em markup--mixtapeEmbed-em">When we built Facebook&#39;s mobile applications, we needed a data-fetching API powerful enough to describe all of Facebook…</em>code.facebook.com</a></div></div></div></section><section name="ad04" class=" section--body"><div class="section-divider layoutSingleColumn"><hr class="section-divider"></div><div class="section-content"><div class="section-inner layoutSingleColumn"><h3 name="f76d" id="f76d" class="graf--h3 graf--first">REST</h3><p name="b34c" id="b34c" class="graf--p graf-after--h3">At Playlist, we have a Rails / REST-based API that powers the app. When it was created initially, we used Github’s V3 API as an inspiration and generally modeled our API structure after theirs.</p><p name="0a45" id="0a45" class="graf--p graf-after--p">Need track information?</p><pre name="5985" id="5985" class="graf--pre graf-after--p">GET /tracks/ID</pre><p name="f722" id="f722" class="graf--p graf-after--pre">Need to fetch a playlist?</p><pre name="e282" id="e282" class="graf--pre graf-after--p">GET /playlists/ID</pre><p name="2605" id="2605" class="graf--p graf-after--pre">Need a playlist’s tracks?</p><pre name="66f1" id="66f1" class="graf--pre graf-after--p">GET /playlists/ID/tracks</pre><p name="df2b" id="df2b" class="graf--p graf-after--pre">It had the benefit of simplicity — endpoints are intuitively named and can be browsed easily. Initially we even implemented URL properties on all objects so the API was browsable just by clicking (this was eventually removed in favor of smaller response payloads). Documentation described what was returned by each endpoint so our mobile team could easily integrate.</p><h4 name="a9fe" id="a9fe" class="graf--h4 graf-after--p">Bloat and Slowdowns</h4><p name="4a35" id="4a35" class="graf--p graf-after--h4">However, as time passed, payloads got larger as requirements grew. As an example, here is a simplistic playlist object response:</p><figure name="4645" id="4645" class="graf--figure graf--iframe graf-after--p"><div class="iframeContainer"><iframe width="700" height="250" src="https://blog.jacobwgillespie.com/media/edeb0585b3a0ee44560b92fb8ff3a2a0?maxWidth=700" data-media-id="edeb0585b3a0ee44560b92fb8ff3a2a0" frameborder="0"></iframe></div></figure><p name="4071" id="4071" class="graf--p graf-after--figure">It contains all the basic information about the playlist, but (almost) none of the associated objects. As a client, you would be expected to call other endpoints like <em class="markup--em markup--p-em">/playlist/ID/tracks</em> to fetch sub-resources.</p><p name="d198" id="d198" class="graf--p graf-after--p">As more associations were added, more data kept getting stuffed into the playlist response. Specifically, because we used Rails and ActionView partials, more data was added to the <em class="markup--em markup--p-em">_playlist.json.jbuilder</em> partial as lists of playlists needed more and more data.</p><p name="b30b" id="b30b" class="graf--p graf-after--p">Mobile requirements would state something like “we need to show the first three tags for each user playlist when displaying a user’s profile,” so rather than call <em class="markup--em markup--p-em">/users/USERNAME/playlists</em>, then have to make an HTTP request to <em class="markup--em markup--p-em">/playlists/ID/tags </em>once for each returned playlist, the tags got added to the playlist partial.</p><figure name="d473" id="d473" class="graf--figure graf--iframe graf-after--p"><div class="iframeContainer"><iframe width="700" height="250" src="https://blog.jacobwgillespie.com/media/cd7f51ce5528758d1724d676ba351a7e?maxWidth=700" data-media-id="cd7f51ce5528758d1724d676ba351a7e" frameborder="0"></iframe></div></figure><p name="193e" id="193e" class="graf--p graf-after--figure">Eventually, we got to something like the following for a <em class="markup--em markup--p-em">/playlists/ID</em> response:</p><figure name="a9b4" id="a9b4" class="graf--figure graf--iframe graf-after--p"><div class="iframeContainer"><iframe width="700" height="250" src="https://blog.jacobwgillespie.com/media/ef89b1c678246ddd33eee141e2bf0cfc?maxWidth=700" data-media-id="ef89b1c678246ddd33eee141e2bf0cfc" frameborder="0"></iframe></div></figure><p name="0f15" id="0f15" class="graf--p graf-after--figure">Here we’re embedding tracks and even a subset of their associations, with enough data to cover all the <em class="markup--em markup--p-em">possible</em> places an individual playlist could appear. And this data was returned for <em class="markup--em markup--p-em">every</em> place the playlist appeared.</p><p name="77af" id="77af" class="graf--p graf-after--p">This was a conscious design decision to augment responses rather than add more endpoints — we could have done something like <em class="markup--em markup--p-em">/playlists/ID/forProfile</em>, <em class="markup--em markup--p-em">/playlists/ID/forNotifications</em>, etc.</p><figure name="e913" id="e913" class="graf--figure graf--layoutOutsetLeft graf-after--p"><div class="aspectRatioPlaceholder is-locked" style="max-width: 525px; max-height: 309px;"><div class="aspect-ratio-fill" style="padding-bottom: 58.9%;"></div><div class="progressiveMedia js-progressiveMedia graf-image" data-image-id="1*meBOFStXEfFTmtLQzGFuig.jpeg" data-width="540" data-height="318" data-action="zoom" data-action-value="1*meBOFStXEfFTmtLQzGFuig.jpeg"><img src="https://d262ilb51hltx0.cloudfront.net/freeze/max/30/1*meBOFStXEfFTmtLQzGFuig.jpeg?q=20" crossorigin="anonymous" class="progressiveMedia-thumbnail js-progressiveMedia-thumbnail"><canvas class="progressiveMedia-canvas js-progressiveMedia-canvas"></canvas><img class="progressiveMedia-image js-progressiveMedia-image" data-src="https://d262ilb51hltx0.cloudfront.net/max/600/1*meBOFStXEfFTmtLQzGFuig.jpeg"><noscript class="js-progressiveMedia-inner"><img class="progressiveMedia-noscript js-progressiveMedia-inner" src="https://d262ilb51hltx0.cloudfront.net/max/600/1*meBOFStXEfFTmtLQzGFuig.jpeg"></noscript></div></div></figure><p name="6dc9" id="6dc9" class="graf--p graf-after--figure">There is something to be said for the simplicity that provides. To add a field to a track, for example, you locate the <em class="markup--em markup--p-em">_track.json.jbuilder</em> partial and add the additional field. However as views grew, performance quickly became an issue in two distinct ways.</p><p name="dca0" id="dca0" class="graf--p graf-after--p"><strong class="markup--strong markup--p-strong">First</strong>, response payloads were large, to the point that the mobile app sometimes struggled with the amount of effort it took to parse, deserialize, and store the JSON. Response times were longer, caches were larger, and every change to a small partial expanded to a much larger change all over the app.</p><p name="0cef" id="0cef" class="graf--p graf-after--p"><strong class="markup--strong markup--p-strong">Second</strong>, query performance took a hit as more and more data (especially relationships) were fetched for each request. In development with caching disabled, a single request for a playlist can request upwards of 170 database queries to pull all the relevant information.</p><p name="26c9" id="26c9" class="graf--p graf-after--p">In production, we made heavy use of Rails “Russian Doll” style caching, so for a fully cached playlist there is only one database query involved. Still, on that first load it had to execute those 170 queries to build the full response (usually fewer thanks to Russian doll caching and shared sub-resources).</p><div name="38d0" id="38d0" class="graf--mixtapeEmbed graf-after--p"><a href="http://edgeguides.rubyonrails.org/caching_with_rails.html#russian-doll-caching" class="js-mixtapeImage mixtapeImage mixtapeImage--empty u-ignoreBlock" data-media-id="ff1cfb1f6fb362986386dd3ea6ffa68a"></a><a href="http://edgeguides.rubyonrails.org/caching_with_rails.html#russian-doll-caching" data-href="http://edgeguides.rubyonrails.org/caching_with_rails.html#russian-doll-caching" class="markup--anchor markup--mixtapeEmbed-anchor" title="http://edgeguides.rubyonrails.org/caching_with_rails.html#russian-doll-caching" rel="nofollow"><strong class="markup--strong markup--mixtapeEmbed-strong">Caching with Rails: An Overview - Ruby on Rails Guides</strong><br><em class="markup--em markup--mixtapeEmbed-em">This is an introduction to three types of caching techniques: page, action and fragment caching. By default Rails…</em>edgeguides.rubyonrails.org</a></div><p name="51cf" id="51cf" class="graf--p graf-after--mixtapeEmbed">What pushed us over the edge was the <em class="markup--em markup--p-em">have_liked</em> field above. This was a boolean field indicating whether or not the currently authenticated user had liked the track. Product requirements stated that this field had to be accessible on the playlist detail view, and thus had to be included in the playlist response for each track.</p><p name="972d" id="972d" class="graf--p graf-after--p">This broke the Russian doll caching.</p><p name="196a" id="196a" class="graf--p graf-after--p">The <em class="markup--em markup--p-em">_track.json.jbuilder</em> partial became a combination of a cached portion containing “static” information about the tracks and an uncached portion containing the call to <em class="markup--em markup--p-em">current_user.have_liked?(track)</em>. Subsequently, <em class="markup--em markup--p-em">_playlist.json.jbuilder</em> and every view that referenced the track partial transformed similarly to contain a cached portion and an uncached portion.</p><p name="b5b7" id="b5b7" class="graf--p graf-after--p">Worse still, for a playlist request with 50 tracks, 50 calls to <em class="markup--em markup--p-em">have_liked?</em> were executed (N+1 query bug).</p><p name="c0b6" id="c0b6" class="graf--p graf-after--p graf--last">We had several different possible solutions, including separate sub-resource view files for separate endpoints, custom query cache management to reduce the number of additional queries, etc. However, we wanted a solution that addressed both issues and allowed for greater control.</p></div></div></section><section name="d8a2" class=" section--body section--last"><div class="section-divider layoutSingleColumn"><hr class="section-divider"></div><div class="section-content"><div class="section-inner layoutSingleColumn"><h3 name="2322" id="2322" class="graf--h3 graf--first">GraphQL</h3><p name="16e4" id="16e4" class="graf--p graf-after--h3">Enter GraphQL. Using GraphQL to power our backend, we were able to provide the mobile client exactly what it needed for each request, with no additional bloat, and were able to optimize the database and cache layer to do everything in an extremely performant way.</p><p name="4207" id="4207" class="graf--p graf-after--p">Before getting into some of the specific details, here are a few common questions / misconceptions I often encountered or experienced myself while learning about GraphQL:</p><h4 name="2359" id="2359" class="graf--h4 graf-after--p">Common Questions / Misconceptions</h4><p name="76b8" id="76b8" class="graf--p graf-after--h4"><strong class="markup--strong markup--p-strong">GraphQL sounds like graph. Does my data need to be a “graph” or do I need a “graph” database? Does it work with relational databases?</strong></p><p name="7802" id="7802" class="graf--p graf-after--p">No, you do not need a graph database, it works just fine with whatever database you have.</p><p name="b27e" id="b27e" class="graf--p graf-after--p">While IMO you can think about almost any “relational” database in terms of a “graph” — something like:</p><pre name="15c5" id="15c5" class="graf--pre graf-after--p">user --- OWNS --- playlist<br> | |<br>LIKES CONTAINS<br> | |<br> v |<br>track &lt;---------------┘</pre><p name="b7d0" id="b7d0" class="graf--p graf-after--pre">GraphQL describes and fetches data like a tree:</p><pre name="6845" id="6845" class="graf--pre graf-after--p">user<br>┖-OWNS-&gt; playlist<br> ┖-CONTAINS-&gt; track<br> ┖-LIKED_BY-&gt; users</pre><p name="1b4e" id="1b4e" class="graf--p graf-after--pre">You can use a graph database, a relational database, an in-memory array, a key-value store, whatever. At Playlist, we use Neo4j as a “primary” database, operating in full graph mode, and Redis, acting as a cache layer utilizing various different data structures including hashes, key-value pairs, and sets. Redis essentially represents the data in Neo as key-value stores by ID and ZSETs for associations by type, closely mirroring Facebook’s TAO model:</p><div name="29d8" id="29d8" class="graf--mixtapeEmbed graf-after--p"><a href="https://www.facebook.com/notes/facebook-engineering/tao-the-power-of-the-graph/10151525983993920" class="js-mixtapeImage mixtapeImage u-ignoreBlock" data-media-id="268a4f1a93077c24e3eb52a1c8bf5d24" data-thumbnail-img-id="0*CFDkYQjVcjUpmEHr." style="background-image: url(https://d262ilb51hltx0.cloudfront.net/max/1200/0*CFDkYQjVcjUpmEHr.);"></a><a href="https://www.facebook.com/notes/facebook-engineering/tao-the-power-of-the-graph/10151525983993920" data-href="https://www.facebook.com/notes/facebook-engineering/tao-the-power-of-the-graph/10151525983993920" class="markup--anchor markup--mixtapeEmbed-anchor" title="https://www.facebook.com/notes/facebook-engineering/tao-the-power-of-the-graph/10151525983993920" rel="nofollow"><strong class="markup--strong markup--mixtapeEmbed-strong">TAO: The power of the graph</strong><br><em class="markup--em markup--mixtapeEmbed-em">Facebook puts an extremely demanding workload on its data backend. Every time any one of over a billion active users…</em>www.facebook.com</a></div><p name="cebe" id="cebe" class="graf--p graf-after--mixtapeEmbed">This allows us to have an authoritative data source in Neo with the full power of Cypher queries but the performance of an in-memory key-value store for 90% of all queries.</p><p name="7d97" id="7d97" class="graf--p graf-after--p"><strong class="markup--strong markup--p-strong">GraphQL sounds like “query language” which sounds like I’m exposing the ability to query my database on the client. This sounds dangerous. What about malicious clients?</strong></p><p name="55cb" id="55cb" class="graf--p graf-after--p">No, you’re not exposing your database queries to the client any more than you were with your REST API. Okay, maybe a bit.</p><p name="6c39" id="6c39" class="graf--p graf-after--p">GraphQL is more or less a DSL on top of your own backend data fetching logic. It does not connect directly to a database. In fact, the schema you expose over GraphQL will likely not mirror your database exactly. It provides a way to describe a request for structured data, but it is then up to your backend to fulfill that request.</p><p name="2e5f" id="2e5f" class="graf--p graf-after--p">One concern is that GraphQL supports “nested” fetching, so should a malicious client request a particular recursive nested relationship an arbitrary but large number of times (like user.followers.followers…), there could be a potential performance hit on the backend. See the final section for a few ideas on how to mitigate this risk.</p><p name="8ea5" id="8ea5" class="graf--p graf-after--p"><strong class="markup--strong markup--p-strong">So, GraphQL doesn’t provide unauthenticated access to my database?</strong></p><p name="4643" id="4643" class="graf--p graf-after--p">No. Authentication is most likely handled outside of GraphQL entirely and your backend is still responsible for handling data fetching / authorization in a secure way, just like how you were doing before with REST.</p><p name="5046" id="5046" class="graf--p graf-after--p">For our new GraphQL backend, we perform authentication outside of GraphQL entirely, passing it as a request header and having the server authenticate the request and then pass the authentication context down to the GraphQL data resolvers.</p><p name="08f2" id="08f2" class="graf--p graf-after--p">At Playlist, we would eventually like to make our GraphQL backend “transport-agnostic”, so we could grab data over HTTP like normal or request data via a non-HTTP wire protocol. It would even be cool to implement some kind of live streaming updates for real-time data changes over something like MQTT. As such, we’ve considered embedding authentication information, either authentication tokens or username/password pairs, in the GraphQL requests themselves, but as of yet we have not fully explored those paths.</p><p name="435b" id="435b" class="graf--p graf-after--p"><strong class="markup--strong markup--p-strong">What about security?</strong></p><p name="f8af" id="f8af" class="graf--p graf-after--p">Again, this is completely up to your backend and it not a primary concern of GraphQL. We will see an authenticated resolver below (the function that fetches and returns data). There seem to be two predominant approaches to handling a client attempting to access something they are not authorized to view.</p><p name="0bad" id="0bad" class="graf--p graf-after--p"><strong class="markup--strong markup--p-strong">First,</strong> return null for the requested field. This seems to work well in cases where there is no real harm in asking for a particular set of data and no real harm in denying it.</p><p name="69d5" id="69d5" class="graf--p graf-after--p">A good example would be asking for the email of a user where the backend only provides the user’s email to that user themselves. If I request my own user object with the email field included, I’ll get my email. If I request another user object, the email will be null, and I can code my application to be okay with that null.</p><p name="d9d7" id="d9d7" class="graf--p graf-after--p"><strong class="markup--strong markup--p-strong">Second,</strong> return an actual error. This seems to work best if the client asking for the data needs to know why it was not provided the requested data so that it can take action on that information.</p><p name="6753" id="6753" class="graf--p graf-after--p">A good example would be attempting to access an object that requires authentication, but no authentication was provided.</p><p name="913a" id="913a" class="graf--p graf--startsWithDoubleQuote graf-after--p">“404s” are usually returned as nulls. As per convention (like on Github’s API), unauthorized objects are sometimes returned as null as well, like in the case of asking for a user’s profile when that user has blocked the currently authenticated user. A null mimics a 404 and does not leak the fact that the hidden user exists.</p><p name="c53a" id="c53a" class="graf--p graf-after--p"><strong class="markup--strong markup--p-strong">The Github repositories are confusing! Which one is really GraphQL?</strong></p><p name="c2df" id="c2df" class="graf--p graf-after--p"><a href="https://github.com/facebook/graphql" data-href="https://github.com/facebook/graphql" class="markup--anchor markup--p-anchor" rel="nofollow">facebook/graphql</a> is the specification for the GraphQL language and its implementation — it is not tied to any specific language / backend. It’s great to read to fully understand the language, especially if you’re into those things or learn best by digging into concepts and theories.</p><p name="adb1" id="adb1" class="graf--p graf-after--p"><a href="https://github.com/graphql/graphql-js" data-href="https://github.com/graphql/graphql-js" class="markup--anchor markup--p-anchor" rel="nofollow">graphql/graphql-js</a> is a reference implementation of that specification provided by Facebook, written in JS/Node. This is the place to start if you’d like to use GraphQL with a Node-based backend or just want to play around. To the best of my knowledge, this is the most complete implementation of the specification, being more or less the official reference implementation. Read the README.</p><p name="af11" id="af11" class="graf--p graf-after--p"><a href="https://github.com/graphql/express-graphql" data-href="https://github.com/graphql/express-graphql" class="markup--anchor markup--p-anchor" rel="nofollow">graphql/express-graphql</a> is a middleware for Express.js to easily create a GraphQL server with Express. I’d highly recommend reading the entire source code as it’s not terribly long, is quite easy to understand, and lends itself to explaining how to use graphql-js, even if you don’t end up using express-graphql directly.</p><p name="84aa" id="84aa" class="graf--p graf-after--p"><a href="https://github.com/graphql/graphql-relay-js" data-href="https://github.com/graphql/graphql-relay-js" class="markup--anchor markup--p-anchor" rel="nofollow">graphql/graphlq-relay-js</a> is a set of helpers to implement Relay-compatible IDs and “connections” (one to many associations, or array fields) — it is not required to use GraphQL, however we have found that being Relay-compatible has benefited us even though we’re not using Relay, with ID handling, pagination, etc. For more information on the Relay GraphQL specification, see the <a href="https://facebook.github.io/relay/docs/graphql-relay-specification.html#content" data-href="https://facebook.github.io/relay/docs/graphql-relay-specification.html#content" class="markup--anchor markup--p-anchor" rel="nofollow">Relay docs</a>.</p><p name="ab3b" id="ab3b" class="graf--p graf-after--p"><a href="https://github.com/graphql/graphiql" data-href="https://github.com/graphql/graphiql" class="markup--anchor markup--p-anchor" rel="nofollow">graphql/graphiql</a> is a web-based IDE for GraphQL. This thing is freaking awesome. GraphQL provides schema introspection, and GraphiQL provides autocomplete and syntax validation using those introspection capabilities. You can download this project directly, embed it in your app, or my favorite, download it as a standalone app in an Electon-based wrapper at <a href="https://github.com/skevy/graphiql-app" data-href="https://github.com/skevy/graphiql-app" class="markup--anchor markup--p-anchor" rel="nofollow">skevy/graphiql-app</a>.</p><p name="1b39" id="1b39" class="graf--p graf-after--p"><a href="https://github.com/facebook/dataloader" data-href="https://github.com/facebook/dataloader" class="markup--anchor markup--p-anchor" rel="nofollow">facebook/dataloader</a> is a utility module that has revolutionized data fetching in our Playlist backend. Its foundation is extremely simple — it collects the arguments of calls to load() while in the current frame of execution (an event loop tick) and then uses your custom provided logic to batch-fetch data based on the collected arguments. More on how we use DataLoader below.</p><p name="dc3a" id="dc3a" class="graf--p graf-after--p"><a href="https://github.com/graphql/swapi-graphql" data-href="https://github.com/graphql/swapi-graphql" class="markup--anchor markup--p-anchor" rel="nofollow">graphql/swapi-graphql</a> is an example project exposing the existing SWAPI as a GraphQL server. It utilizes graphql-js, express-graphql, GraphiQL, and DataLoader.</p><p name="0c33" id="0c33" class="graf--p graf-after--p"><a href="https://github.com/chentsulin/awesome-graphql" data-href="https://github.com/chentsulin/awesome-graphql" class="markup--anchor markup--p-anchor" rel="nofollow">chentsulin/awesome-graphql</a> is an awesome collection of links to GraphQL resources, projects, posts, and more. Check it out!</p><p name="2c5f" id="2c5f" class="graf--p graf-after--p"><strong class="markup--strong markup--p-strong">What is Relay? Do I need Relay too?</strong></p><p name="2852" id="2852" class="graf--p graf-after--p"><a href="https://github.com/facebook/relay" data-href="https://github.com/facebook/relay" class="markup--anchor markup--p-anchor" rel="nofollow">Relay</a> is a framework for connecting GraphQL and React in an intelligent way. You absolutely do not need Relay to take advantage of GraphQL, though if you’re using React, check it out — it may be useful in your app.</p><p name="ddbf" id="ddbf" class="graf--p graf-after--p">Relay requires a few special conventions in your GraphQL query design to support its operation, and at Playlist we’ve decided to be Relay-compatible, even though we do not use Relay itself. This has provided a consistent API for fetching by ID and representing and paginating collections of associations. The Relay documentation has more information.</p><p name="0d9a" id="0d9a" class="graf--p graf-after--p"><strong class="markup--strong markup--p-strong">Is GraphQL only for React?</strong></p><p name="17f8" id="17f8" class="graf--p graf-after--p">Nope. You can use it anyplace you used HTTP/REST previously.</p><h4 name="a47b" id="a47b" class="graf--h4 graf-after--p">Playlists and Tracks in GraphQL</h4><p name="7f39" id="7f39" class="graf--p graf-after--h4">Let’s delve into how we can solve the performance issues from the above playlist endpoint with GraphQL. We want to only return the data that is needed, and optimize our database queries so that we can avoid the N+1 bug.</p><p name="8e08" id="8e08" class="graf--p graf-after--p">Our GraphQL query will look like the following:</p><figure name="e1d1" id="e1d1" class="graf--figure graf--iframe graf-after--p"><div class="iframeContainer"><iframe width="700" height="250" src="https://blog.jacobwgillespie.com/media/aa9b758811f29b5c7d9e033ac30bbaa6?maxWidth=700" data-media-id="aa9b758811f29b5c7d9e033ac30bbaa6" frameborder="0"></iframe></div></figure><p name="e0a7" id="e0a7" class="graf--p graf-after--figure">For simplicity, the playlist ID was embedded in the query, though in practice we’d be passing the ID as a typed parameter rather than embedding it inside the query. See the GraphQL docs for more info.</p><p name="3ec1" id="3ec1" class="graf--p graf-after--p">We assume that authentication has taken place outside of GraphQL and the authentication state has been provided in the rootValue object of the GraphQL call so that our resolvers can access. See the docs for graphql-js and express-graphql for more information about rootValue, and see below for it in action.</p><p name="7ae5" id="7ae5" class="graf--p graf-after--p">First, we have to define a root query object, which is the entry point for the query. The root query object should have a field called <em class="markup--em markup--p-em">playlist</em>, since that’s what we’re providing in the query above:</p><figure name="cf58" id="cf58" class="graf--figure graf--iframe graf-after--p"><div class="iframeContainer"><iframe width="700" height="250" src="https://blog.jacobwgillespie.com/media/5dc8ce0df75b37ad4c5531c16a24bb30?maxWidth=700" data-media-id="5dc8ce0df75b37ad4c5531c16a24bb30" frameborder="0"></iframe></div></figure><p name="8251" id="8251" class="graf--p graf-after--figure">Note that we’re using ES6 syntax here. We use babel with stage set to 0 to take advantage of all the latest and greatest ES7 stuff.</p><p name="efd3" id="efd3" class="graf--p graf-after--p">We define a field that returns a playlist type (a GraphQL type definition that we define in another file and import here), set up a single argument named <em class="markup--em markup--p-em">id</em> of type non-null string, and then most importantly we define a function to “resolve” the object.</p><p name="db96" id="db96" class="graf--p graf-after--p">The first argument to resolve is the current object itself (since we’re at the root level, we ignore this argument). The second argument is the args passed to the GraphQL call, so we extract out the <em class="markup--em markup--p-em">id</em> field. The third argument provides us access to the GraphQL context, so we extract out our backend instance that we passed down from the <em class="markup--em markup--p-em">rootValue</em> elsewhere in the app and use it to fetch a playlist by ID.</p><p name="0c91" id="0c91" class="graf--p graf-after--p">It’s that simple! We load the playlist from the database, return a JS object, and we’re done at this level.</p><p name="81bc" id="81bc" class="graf--p graf-after--p">Next, let’s define the playlist schema type:</p><figure name="7014" id="7014" class="graf--figure graf--iframe graf-after--p"><div class="iframeContainer"><iframe width="700" height="250" src="https://blog.jacobwgillespie.com/media/8e888d1e7359d19f72ab240bda2a9d08?maxWidth=700" data-media-id="8e888d1e7359d19f72ab240bda2a9d08" frameborder="0"></iframe></div></figure><p name="f8e8" id="f8e8" class="graf--p graf-after--figure">So, here we define a new object type for Playlist. Since our root query resolver returned the playlist model instance, the first argument to our resolve functions at this level (named <em class="markup--em markup--p-em">it</em>) is that instance. So, for the <em class="markup--em markup--p-em">id</em> field, we are resolving by calling <em class="markup--em markup--p-em">it.uuid</em> thus exposing the <em class="markup--em markup--p-em">uuid</em> model field under the name <em class="markup--em markup--p-em">id</em>. Remember that your GraphQL schema does not need to mirror your database schema.</p><p name="f59a" id="f59a" class="graf--p graf-after--p">For the <em class="markup--em markup--p-em">name</em> field, we do not provide a resolver, because the default for a field named <em class="markup--em markup--p-em">x</em> is <em class="markup--em markup--p-em">model.x</em>.</p><p name="b80e" id="b80e" class="graf--p graf-after--p">For <em class="markup--em markup--p-em">tracks</em>, we call <em class="markup--em markup--p-em">it.tracks()</em> on the model to load tracks from the database.</p><p name="80ef" id="80ef" class="graf--p graf-after--p"><strong class="markup--strong markup--p-strong">Note:</strong> there is a resolve function for every field, but this does not mean that an individual database query is required to fetch each field. You can fetch as much or as little on <em class="markup--em markup--p-em">root.playlist</em>, so each of the sub field resolvers can return something already fetched by their parent or issue further queries as necessary.</p><p name="0872" id="0872" class="graf--p graf-after--p">Finally, let’s define the GraphQL object type for a track:</p><figure name="ddd5" id="ddd5" class="graf--figure graf--iframe graf-after--p"><div class="iframeContainer"><iframe width="700" height="250" src="https://blog.jacobwgillespie.com/media/874c6682fca25420b252174634ce1576?maxWidth=700" data-media-id="874c6682fca25420b252174634ce1576" frameborder="0"></iframe></div></figure><p name="351b" id="351b" class="graf--p graf-after--figure">Similar to before, we define the <em class="markup--em markup--p-em">id</em> and <em class="markup--em markup--p-em">title</em> fields as simple resolvers. We also add a field <em class="markup--em markup--p-em">viewerHasLiked</em> and check authentication. If the user has not been authenticated, we return <em class="markup--em markup--p-em">null</em>. Otherwise we call <em class="markup--em markup--p-em">track.userHasLiked()</em> with the currently authenticated user. Again, the <em class="markup--em markup--p-em">auth</em> object is coming from our app outside of GraphQL in an Express middleware.</p><p name="d663" id="d663" class="graf--p graf-after--p">Given that <em class="markup--em markup--p-em">Playlist.load()</em> loads a playlist, <em class="markup--em markup--p-em">playlist.tracks()</em> loads the array of tracks for that playlist from the database, and <em class="markup--em markup--p-em">track.userHasLiked()</em> queries the database for the existence of an association between a user and a track, then our GraphQL query will resolve correctly and we have essentially duplicated the functionality of the REST API, once we get the other fields defined, omitted here for brevity.</p><p name="e0c2" id="e0c2" class="graf--p graf-after--p">This solves one of our two issues with our REST API: clients can now request only the data they need, beneficial for mobile app performance in a variety of different ways. But we still have the problem of N+1 queries — if we request <em class="markup--em markup--p-em">viewerHasLiked</em> for all 50 tracks of this playlist, we will get 50 queries. We solved this using a quite ingenious little npm module from Facebook called DataLoader.</p><h4 name="ca37" id="ca37" class="graf--h4 graf-after--p">DataLoader FTW</h4><figure name="a15e" id="a15e" class="graf--figure graf--layoutOutsetLeft graf-after--h4"><div class="aspectRatioPlaceholder is-locked" style="max-width: 525px; max-height: 374px;"><div class="aspect-ratio-fill" style="padding-bottom: 71.3%;"></div><div class="progressiveMedia js-progressiveMedia graf-image" data-image-id="1*lnmwsHhskd48m8Rm1Wt3sg.jpeg" data-width="540" data-height="385" data-action="zoom" data-action-value="1*lnmwsHhskd48m8Rm1Wt3sg.jpeg"><img src="https://d262ilb51hltx0.cloudfront.net/freeze/max/30/1*lnmwsHhskd48m8Rm1Wt3sg.jpeg?q=20" crossorigin="anonymous" class="progressiveMedia-thumbnail js-progressiveMedia-thumbnail"><canvas class="progressiveMedia-canvas js-progressiveMedia-canvas"></canvas><img class="progressiveMedia-image js-progressiveMedia-image" data-src="https://d262ilb51hltx0.cloudfront.net/max/600/1*lnmwsHhskd48m8Rm1Wt3sg.jpeg"><noscript class="js-progressiveMedia-inner"><img class="progressiveMedia-noscript js-progressiveMedia-inner" src="https://d262ilb51hltx0.cloudfront.net/max/600/1*lnmwsHhskd48m8Rm1Wt3sg.jpeg"></noscript></div></div></figure><p name="bd33" id="bd33" class="graf--p graf-after--figure"><a href="https://github.com/facebook/dataloader" data-href="https://github.com/facebook/dataloader" class="markup--anchor markup--p-anchor" rel="nofollow">DataLoader</a> provides an API that consolidates any calls to <em class="markup--em markup--p-em">load()</em> in a frame of execution (event loop tick) and then batch-loads data based on the collection of calls. Additionally, it caches results by key, so subsequent calls to <em class="markup--em markup--p-em">load()</em> with the same arguments return cached directly.</p><p name="8861" id="8861" class="graf--p graf-after--p">So, if we call <em class="markup--em markup--p-em">myDataLoader.load(id)</em> many different times in a frame of execution, then once that frame completes, the data loader would be provided with an array of all the IDs and can batch-load the requested data. I would highly recommend reading the README to better understand DataLoader’s workings.</p><p name="5545" id="5545" class="graf--p graf-after--p">In our case, we can model <em class="markup--em markup--p-em">track.userHasLiked()</em> as a call to a DataLoader instance designed for resolving the the relationship between a user and track in bulk. Something like this:</p><figure name="e931" id="e931" class="graf--figure graf--iframe graf-after--p"><div class="iframeContainer"><iframe width="700" height="250" src="https://blog.jacobwgillespie.com/media/1448b4aee34252869dd9efad9d2708f7?maxWidth=700" data-media-id="1448b4aee34252869dd9efad9d2708f7" frameborder="0"></iframe></div></figure><p name="41d0" id="41d0" class="graf--p graf-after--figure">With this code in place, the 50 calls to <em class="markup--em markup--p-em">likeLoader.load()</em> will result in one call to the batch load function, meaning that our GraphQL query will now execute 3 database queries rather than 52.</p><p name="fe56" id="fe56" class="graf--p graf-after--p">As indicated on the DataLoader README, we take this one step further by composing DataLoader instances all the way to the database query level.</p><p name="4afa" id="4afa" class="graf--p graf-after--p">For example, if we wanted to fetch users by username, we would have:</p><ul class="postList"><li name="86ff" id="86ff" class="graf--li graf-after--p"><em class="markup--em markup--li-em">batchQueryLoader</em> — a DataLoader with caching disabled that accepts database queries, executes them against the database (using batch / parallel features for performance speedups), and returns the results.</li><li name="0d14" id="0d14" class="graf--li graf-after--li"><em class="markup--em markup--li-em">userByIDLoader</em> — a DataLoader that accepts IDs, uses <em class="markup--em markup--li-em">batchQueryLoader</em> to query the database, and returns user objects.</li><li name="180f" id="180f" class="graf--li graf-after--li"><em class="markup--em markup--li-em">userByUsernameLoader — </em>a DataLoader that accepts usernames, uses <em class="markup--em markup--li-em">batchQueryLoader</em> to query the database for user IDs, then calls <em class="markup--em markup--li-em">userByIDLoader</em> to return user objects.</li></ul><p name="e64a" id="e64a" class="graf--p graf-after--li">With this DataLoader composition, the batchQueryLoader, used by all other DataLoaders, ensures database activity is batched and latency is reduced. And since <em class="markup--em markup--p-em">userByUsernameLoader</em> resolves IDs then calls <em class="markup--em markup--p-em">userByIDLoader</em>, <em class="markup--em markup--p-em">userByIDLoader </em>becomes a shared cache, reducing queries overall. In our setup, we even added a DataLoader for Redis using pipelines and integrated it into our other loaders as a caching layer, further reducing query time.</p><p name="729a" id="729a" class="graf--p graf-after--p">Also, as mentioned before, DataLoaders cache their results by the arguments of <em class="markup--em markup--p-em">load()</em>. Because of this fact, we initialize DataLoaders for each request, so during the life of a single request, data is cached, then it is discarded after the request completes.</p><p name="6c3c" id="6c3c" class="graf--p graf-after--p">Using this architecture, the entire requested playlist from the beginning, the one that took 170 queries and around 15s to render, returns in about 250ms with only 3 database queries, and around 17ms reading data from the Redis cache. This solves both performance issues.</p><h3 name="5920" id="5920" class="graf--h3 graf-after--p">Future Puzzles</h3><p name="8f2c" id="8f2c" class="graf--p graf-after--h3">Looking forward, here are a few things we are currently looking to solve:</p><p name="f886" id="f886" class="graf--p graf-after--p"><strong class="markup--strong markup--p-strong">Mutations (Writes)</strong></p><p name="39c8" id="39c8" class="graf--p graf-after--p">Our GraphQL server provides read capabilities for our entire API surface, but writes have yet to be implemented. graphql-js provides an easy DSL for handling GraphQL mutations, so we shortly will be integrating writes into the GraphQL system. This appears to be a straightforward task, but it will be interesting to discover what if any insights or best practices emerge from the implementation.</p><p name="1d1d" id="1d1d" class="graf--p graf-after--p"><strong class="markup--strong markup--p-strong">Client-side Caching</strong></p><p name="f3f4" id="f3f4" class="graf--p graf-after--p">We have yet to solve caching GraphQL responses on the client. Ideally the system fetching data from a GraphQL endpoint would understand the underlying schema by utilizing schema introspection and thus would be able to intelligently cache sub-resources, so updates to a model at one location would update everywhere. Further considerations like TTLs, forced updates, etc. would need to be implemented.</p><p name="d708" id="d708" class="graf--p graf-after--p">If understand correctly, Relay may solve some of these concerns, however Relay is still new, does not currently support React Native, and does not run in a native code environment.</p><p name="cbcf" id="cbcf" class="graf--p graf-after--p"><strong class="markup--strong markup--p-strong">Real-time or Push Updates</strong></p><p name="6573" id="6573" class="graf--p graf-after--p">There are several aspects of our platform that are “real-time,” and it would be awesome to integrate these aspects into our GraphQL backend, perhaps allowing live “subscriptions” to particular sets of data.</p><p name="f862" id="f862" class="graf--p graf-after--p"><strong class="markup--strong markup--p-strong">Query Performance Protection</strong></p><p name="62ab" id="62ab" class="graf--p graf-after--p">If we expose something like the followers of a given user, then theoretically a malicious client could submit a request like <em class="markup--em markup--p-em">user.followers.followers</em>… until the server struggled to respond. We do not have a full solution for this yet, especially if we decide to expose our GraphQL endpoints as a public API at some future point. Three possible paths to explore come to mind:</p><ol class="postList"><li name="b22e" id="b22e" class="graf--li graf-after--p">Perform schema AST inspection to validate the query is not too “complex,” rejecting queries over a threshold.</li><li name="73be" id="73be" class="graf--li graf-after--li">Have some form of query “timeout,” kill requests that take too long to resolve, and rate-limit the ability of a single request to query the database.</li><li name="225f" id="225f" class="graf--li graf-after--li">Take a note from Facebook and implement a “query cache” where queries are stored in a cache and clients refer to them by ID in production rather than passing the full query, essentially whitelisting queries. This only works if the GraphQL API is only for internal clients.</li></ol><h3 name="f502" id="f502" class="graf--h3 graf-after--li">Conclusion</h3><p name="a8be" id="a8be" class="graf--p graf-after--h3">In conclusion, GraphQL is pretty awesome and has been solving some real-world problems at Playlist. For us, it is more than hype, and I wanted to share some of our findings in the hopes that it may help others understand. Cutting edge technologies and projects are fun, but can sometimes be difficult to comprehend and apply.</p><p name="2ee3" id="2ee3" class="graf--p graf-after--p">One more thing — check out this video. It was immensely helpful for me in understanding some of the benefits of GraphQL and its real-world implementation at the Financial Times.</p>