Complex Filtering in REST | The Need

alok sinha
7 min readJul 28, 2021

--

Filtering in REST is the ability to filter the results returned by a Read API.

In a restful architecture, we define CRUD (Create, Read, Update, Delete) APIs on a collection. A collection in this context is a list of a domain entity (think a collection — movies, for an entity — movie, for IMDB).

In this article, we’ll explore complex filtering using an example and also discuss a library that I’ve created to implement the same.

What is the need to filter the results?

Consider the example of a movie database MyIMDB which exposes an API to find movies:
http://myimdb.com/movies
This API is backed by the following table in the backend:

A database table of movies

This API on invocation will return a list of all the movies we have. So far so good.

What if I want to find only the movies released in the year 2014?
You can say, you can have a repository method specifically for this.
Something like:
func moviesInYear(year int)[]movie
where year will be passed as a query parameter like:
http://myimdb.com/movies?year=2014

What if I now want to find only the movies released after the year 2014?How would you pass an operation like greater than via the API to the backend. You can come up with a not so convincing solution to support another query parameter yearAfter and add one more repository method to handle this query param.
Something like:
func moviesAfterYear(year int)[]movie
where year will be passed as a query parameter like:
http://myimdb.com/movies?yearAfter=2014

What if for a new business use case, we also want to find only those movies released after 2018 and have a rating greater than 8?
You see now that as our queries become more and more complex, the approach to add a specific query parameter and a repository method suddenly starts spiraling out of control leading to a method explosion.

This is where complex REST filtering comes in handy which would allow you to query the collection in different ways without the need to add a new query parameter and a specific repository method for every new business use case.

Golang support

The first thing any developer would do is, try to find a ready to use library that can help achieve this and avoid reinventing the wheel. To my surprise, I could not find any library that does this or does this in a simple way in Golang.
In the rest of this article, I’ll cover the design considerations in developing a library and also cover some of the implementation specifics of the library.

The design

Let’s start with a complex query. We will try to come up with a solution to interpret this complex query.

A complex query
Find me all the movies with rating above 8 released after 2010 OR all the Thriller movies released before 2021.

The desired SQL query
SELECT id, title, year, rating, genre from Movies WHERE
(year > 2010 AND rating > 8) OR (year < 2021 AND genre = ‘Thriller’)

Can’t I just pass the SQL query as a query parameter?
http://myimdb.com/movies?filter=(year > 2010 AND rating > 8) OR (year < 2021 AND genre = ‘Thriller’)

Multiple reasons why one should not pass an exact SQL query as a query parameter:

  1. Attempting to take a SQL query as is in a query parameter and then use it in your data layer tightly couples your REST filtering logic to SQL. What if tomorrow you want to switch to Mongo or Cassandra or MySQL?
  2. This is also a security nightmare. Using a query parameter blindly in your data layer is highly prone to SQL injection.

Hence, we need to come up with our own query language which can be translated to the desired database specific query language.

Define a query language
We should be able to parse our home-grown query language to generate any possible SQL query an API user is allowed to and wants to fire on our collection of domain entity.

We can imagine passing a REST API query parameter filter like:
http://myimdb.com/movies?filter=(year gt 2010 and rating gt 8) or (year lt 2021 and genre eq ‘Thriller’)
You can see that our query language looks like SQL but is not SQL.

A custom query will need custom Parser Added below a flow chart of how this query parameter goes through various stages of processing, namely — Parsing(uses a token strategy) and Conversion to yield a Filter.

A flow chart illustrating creation of filter object from filter query parameter.

Parsing the query
We can come up with different parsing strategies for our custom query language. To enlist a few:

1. Simple token based: In this approach the query parameter filter string is broken apart by whitespace into a token list and then parsed for compliance Compliance checks if the query is not malformed .

2. Binary expression tree based: In this approach we break down the filter query parameter into a Binary Expression Tree after tokenization and then traverse the tree for compliance.

3. Regex based: Use regex pattern matching to tokenize the query parameter filters string.

For the more curious readers, I’ll do a deep dive into the parsing strategies in my next article.

Whichever way we parse, the goal is to come up with something called Commands.

What is a Command?
The parse output is stored into Commands. A Command is an intermediate object which is SQL or any database agnostic. A Command is made up of 3 main parts:
a. Property:
b. Operator
c. Value

Looking closely at our query parameter filter example:
filter=(year gt 2010 and rating gt 8) or (year lt 2021 and genre eq ‘Thriller’)
This filter can be parsed into Commands as below.

Token to Commands

Please note, that a left or right parenthesis is as much a Command as any other word or combination of words in the table below. This consideration allows us to carry over the parenthesis as is to the final state, which is a database specific query.

Using something called a Convertor, a Command can be converted into a Filter.

What is a Convertor and a Filter?
A command is database agnostic. A Convertor is what helps you to convert a Command to the final object which is database aware — a Filter. So, I can define and use a Postgres Convertor to convert my Command to a Postgres Filter.

How is the Filter used?
A Filter is the final object in the REST filtration process. This Filter has two attributes:
a. key
For a Postgres Filter, this will have a Postgres specific SQL query string with conditions those were passed in the query parameter filter. This will however be a formatted string with placeholders for the values.
b. value
Value is an array of values which need to be replaced in the key above.

This is how the Filter would look like for our example:

The last step would be to create a prepared statement using the Filter.key to which we pass the Filter.value. This protects us from potential SQL injection attacks.

Some more reading

Added below the UML diagram which shows the low level design of the classes/interfaces used in the library.
If you observe closely, you can see that I’ve used the Visitor Pattern to step through my tokenized array or binary expression tree and convert them into Commands.

A UML depicting the structs and interfaces.

Conclusion

In this article, we discussed the need of complex filtering in REST. We stepped through how a library can be implemented for the same. We also touched base on a very critical aspect as to how we can avoid SQL injection while allowing the API caller to pass in a query string.

Ability to handle complex filtering gives immense powers to your READ APIs. The same API which earlier used to just return a list of movies can now help you query the movie list in so many ways. While this is empowering for the business, designing your service and repository layers using filters saves you from method explosion.

In my next article, I’ll walk you through some of the parsing strategies we can use and their pros and cons. Stay tuned!

--

--

alok sinha
alok sinha

Written by alok sinha

Undivided attention to Divided (or Distributed) systems. Strive to be more informative rather than writing perfect articles.

Responses (1)