When formulaic testing becomes counter-productive
This story is meant mostly for new software developers or CS students who are wondering about some of the trade-offs encountered in professional software development. It will be hard to follow without some SQL and Javascript knowledge.
I recently came across an interesting issue with my team in FINRA. We have some AWS lambdas written in Javascript, which orchestrate SQL calls to change various things in our tables. Naturally, before any changes / additions to these lambdas are deployed, we need to be able to test locally. It does not matter whether the tables are in a DEV, QA or PROD environment, we really don’t want to be touching a live database that other people concurrently use. That’s why mocking solutions like Mockito and Jasmine have been popularized for unit testing and we have solutions for end-to-end testing, one of which will be detailed in this post.
For our use-cases, we want to be able to do both unit testing and integration testing. For the purposes of this blog post, let’s encompass both of those testing philosophies under the term formulaic testing. It is “formulaic” in the sense that, whether one authors unit, integration or e2e tests, they have to set up mock input, and several assertions to ensure that the output is the expected one. So one has to formulate (maybe orchestrate is a better term?) the code flow; the entire process has been built explicitly by the developer / tester to represent a few pre-planned scenarios.
For integration testing, we need to interact with some database framework which:
Speaks the PostGres SQL dialect, since our live databases are PostGres.
Allows quick creation of the database, tables and insertions / deletions of mock data. These lambdas run every 4 hours automatically, and we would really like each of them to be done within under 1 minute, to avoid performance hits in production.
Everything needs to be deleted in the end; we don’t want tests to leave any kind of stale data, either in a local or remote drive. Our CI/CD pipelines automatically run all tests; what are we gonna do, have tests that run on every deployment leave stale data on the relevant EC2 cluster?
To achieve these 3 goals, for our integration testing, our team has converged on using pg-mem, an in-memoryJavascript Postgres database that works entirely with memory-resident data structures. pg-mem
is pretty good and very configurable, but it is still an open - source, community - driven solution, which means that patches are few and far between. We don’t have a vendor on the phone that can assist with new features. So, if something is time-sensitive and we hit a pg-mem
snag, we have to fix it ourselves, through pg-mem
or otherwise.
At some point I was tasked with adding a Javascript module that expanded one of our lambdas to update one table based on information from other tables. Given the way that our pipeline operates, I had to make sure that:
The SQL commands orchestrated from the script work out-of-the-box when querying a local (non-live) database.
An integration test is written and passes, where the test itself needs to cover as many cases as the original issue required.
When executed against our live databases, the script affects the desired table and outputs the number of rows of the table that were affected.
And that’s where the problems begin. Problem #1 was that pg-mem
wasn’t supporting TRIM()
and UPPER()
, which are a good failsafe when you compare character string fields, and we definitely need it since some of our partner databases have the exact same syntactic conventions, while some others do not... Very thankfully, the framework allows us to re-define common SQL
functions, for example like this:
pgMem.public.registerFunction({ name: 'TRIM', args: [DataType.text], returns: DataType.text, implementation: str => return str != null ? str.trim() : null }); pgMem.getSchema().registerFunction({ name: 'now', returns: DataType.timestamp, implementation: () => moment().toISOString() });
Alright, no problem now, pg-mem
sees those functions. Within 2 days I submitted a solution and we merged it into develop
. Unfortunately, in a meeting maybe an hour after that happened, a smart product person spotted a few special cases that we had missed. While easy to convey in language, these findings were quite hard to implement in code. But no problem, we still have one week in what was a very successful sprint up until that point, so there was a lot of bandwidth to deal with this.
Now, a point about the nature of the existing implementation: it was Javascript - heavy. Lots of in-memory map
s, filter
s, even groupby
s. So now I have to extend this logic. Let’s keep this in mind for later.
I worked a full week to fix this. New, more contrived tests, additional Javascript logic, new SQL. And by Friday I am ready to test against the live database.
When I ran it on AWS, the script was timing out. My heart sank! I had been working on this for parts of 2 sprints, and now I have 3 days off starting on Monday, and oh my God, will I be able to have this on time before EOD today, and will we need to move this ticket down another sprint… Lots of concern there. Concern that one of my senior partners told me I should not have, and that they would help me and have this ready for me by the time I came back. Profusely thanking them, I went back to another ticket I could pump out by the end of the day.
When I returned to work a few days later, I noted that my collaborator had indeed been able to solve this issue by changing my implementation to one that executed a single SQL query that did the lion’s share of the heavy lifting! Very little Javascript! Super efficient, deployed, no errors in the logs, tables on our live DB look exactly as intended, perfect. We are talking about a person with decades of experience in backend engineering, so their efficiency, when compared to mine, is probably not too surprising.
But as far as surprises go, we still did have one! Once I pulled our latest develop
branch, I found out that the test I had authored was no longer passing! This is troublesome for us: we can’t have failing tests in our develop
branch! In fact, several of our Jenkins jobs will simply fail if we have test failures and we won’t be able to deploy new versions of our system. Not a good thing. So off I go to investigate what’s going on; maybe the tests I wrote were crap, after all!
Here is a high level view of what the SQL command that my collaborator used looks like:
( select DISTINCT tab1.staff_username as staff_id, tab3.rcm_organization_code as org_code from table1 tab1 JOIN table2 tab2 ON predicate1 JOIN table3 tab3 ON predicate2 UNION select DISTINCT tab4.staff_username as staff_id, tab5.rcm_organization_code as org_code from table4 tab4 JOIN table5 tab5 ON predicate3 JOIN table6 tab6 ON predicate4 WHERE NOT EXISTS(SELECT 1 from some_other_table where some_other_predicate) ) EXCEPT select other_tab.staff_id, other_tab.organization_code from another_table other_tab
Not the most complex SQL query for an engine to execute or even for a reader to understand. Unfortunately, pg-mem
does not understand what EXCEPT
means:
Error: Error: 💔 Your query failed to parse. This is most likely due to a SQL syntax error. However, you might also have hit a bug, or an unimplemented feature of pg-mem. . . . 💀 Syntax error at line 22 col 17: EXCEPT ^ Unexpected kw_except token: "except". Instead, I was expecting to see one of the following: - A "kw_union" token . . .
This is unfortunately something that is not straightforward (if at all possible) to define using registerFunction()
as we did above with TRIM()
and now()
. I did raise the issue on pg-mem’s GitHub but naturally I did not except that I would get a response on time: this is a community - driven project and the community has jobs and families. Went to StackOverflow as well, did not have much luck there either. Can’t wait on info from busy third parties; have to find something that works for now even if it is not as elegant. Speaking of inelegant solutions, here is a way to circumvent the EXCEPT
:
( select DISTINCT tab1.staff_username as staff_id, tab3.rcm_organization_code as org_code from table1 tab1 JOIN table2 tab2 ON predicate1 JOIN table3 tab3 ON predicate2 UNION select DISTINCT tab4.staff_username as staff_id, tab5.rcm_organization_code as org_code from table4 tab4 JOIN table5 tab5 ON predicate3 JOIN table6 tab6 ON predicate4 WHERE NOT EXISTS(SELECT 1 from some_other_table where some_other_predicate) ) subquery WHERE (subquery.staff_id, subquery.org_code) not in (SELECT new_tab.staff_id, new_tab.org_code from another_table new_tab)
So what’s going on here is that I just replaced the EXCEPT
with a WHERE
clause that explicitly throws away records that are already contained in the another_table
. This should work well with pg-mem
. Note that, while this is a fix for the issue, it is very likely to be slowing down the query on the live system, since we have another predicate fed into a new clause. So we are already forced to change production code to fit a test, and that is not something that we should have to do: the live system is what customers see, not our tests.
But if only that was the sole problem :( After no longer crashing in the EXCEPT
that died, now the script crashed in the WHERE NOT EXISTS
caluse that completes the second record set sent to the UNION
operation:
select DISTINCT tab4.staff_username as staff_id, tab5.rcm_organization_code as org_code from table4 tab4 JOIN table5 tab5 ON predicate3 JOIN table6 tab6 ON predicate4 WHERE NOT EXISTS(SELECT 1 from some_other_table where some_other_predicate)
And this was because some_other_predicate
includes references to tab5 and tab6
which are used in the FROM
clause. References that pg-mem
cannot, currently, resolve!
So, at this point, we have ourselves production code that works well, but to be able to validate this state to a third party, through a test, we have to either wait until pg-mem
resolves all of those issues (and others that we might not yet be aware of) or start reverting the implementation so that it now executes simpler SQL queries but with a lot of in-memory Javascript manipulations. Likely making the entire lambda timeout in production again, like the week before.
So let’s recap what our options were at this point:
Wait for
pg-mem
to enable at least one of:EXCEPT.
Embedded queries having the ability to view outside table references.
Revert the implementation to depend less on “complicated” (biiiig double quotes there) SQL queries and more on Javascript manipulations, which would invariably lead us back to the previous’ week’s blockage.
Keep a failing test in
develop
.Erase the test and call it a day.
Of all these approaches, only (4) was the one that would free me up to do more important things for the team and improve our sprint velocity, while simultaneously keeping our code base free of test failures. When I brought this up in a retro meeting, everybody agreed that (4) was the only possible solution. I was in fact shown examples from other lambdas that we authored years ago and are still deployed, working fine, but do NOT have pg-mem
integration tests!
So all this means that parts of our codebase, currently being executed against our live system, have not been formulaically tested. From my own rather academic and perfectionistic standpoint, this sounded unacceptable. But is it really unacceptable? Even if there is something in my collaborator’s script that the SQL does not catch and it represents an actual use case that our client systems encounter, we will find out about it in the future, and will fix it with a flyway script right there and then.
The entire point here is that there is a tradeoff between thorough formulaic testing that can be very expensive (in developer time and computational resources) and more important tasks that a dev team should worry themselves with. It is never your tests that the client sees, but the live system. Never tune your live code in favor of tests, even if this means leaving parts of code not formulaically tested. There are always qualitative ways to test, and future patches are always in the schedule.