Should You Build Or Buy Data Preboarding?

The decision to build or buy comes up constantly in commerce. With software, rolling your own often looks innocuous in the moment, but ultimately may be a life or death choice. Time to market, time to refresh, time to fix, time between failures, and many more indicators are driven by what engineers thought was a good idea at the time. There’s a difference between starting simple and simple thinking.

What we learned creating FlightPath Data

The people contributing to the open-source CsvPath Framework and the FlightPath products have not only built data preboarding for enterprises but also built it in a more general way as an open source product. On top of that, we use FlightPath Data and FlightPath Server daily, so in effect we’re buying a pre-built tool every day. What did we learn?

If you let it be manual it is likely to remain manual

To paraphrase Frank Herbert, once you've processed a kind of data manually, you must always process that data manually. The reason is that it’s much easier to check data in a spreadsheet or SQL console, than it is to build software and automate processes. Moreover, if someone is willing to hack on data by hand, everyone else is happy to move on to other work. Instant gratification + Somebody Else’s Problem == persistent under-investment in automation.

The most long-term successful preboarding efforts automate from the beginning. To be hyper-focused on automation while building everything yourself is hard, so buying becomes the obvious choice.

Preboarding is actually hard

No, really. What could be hard about grabbing files from an SFTP folder and jamming them into a relational database with Python? The naive solution to preboarding? Barely do it. Yes, that would be easy. On day-1.

The problems with that approach are many. They include: restated data, lost files, unmanaged scripts, changing business rules, lack of process visibility, workflows that incorporate human-driven loops, the risks of fallible human judgement, and many more problems. All these conspire against you. The naive solution quickly turns into a risky, expensive nightmare. And that’s before we even get into the details. The details are even harder.

And the core challenge is getting business rules out of human heads and into a validation and upgrading framework for automation. Building a validation framework with the power and flexibility to replace human judgement is over most developer’s pay-grade.

Solving for all those challenges with a bespoke solution requires more engineering cost than most companies are willing to spend up front. When there’s the option to buy data preboarding you should buy it.

Manual is expensive

This may seem obvious. On the other hand, most companies accept the operational overhead of manual processing, rather than invest in automated preboarding. That overhead comes in the form of risk, as well as head-count. Let’s start with the FTEs.

Two recent experiences with the file feed data preboarding efforts of PE-backed B2B services companies both started out with manual BizOps data handlers out numbering the technical staff involved 2 to 1. Even allowing for engineers being more expensive, the manual processing tax was definitely high. If we agree that some small amount of manual checks are inherent in those businesses (in those cases, invoice management and insurance benefits) we still roughly doubled the ops overhead on the inbound data flow. Moreover, in a perfect world the engineers can do other work between manual preboarding crises. That additional opportunity cost pushed the tax higher.

With a pre-built data preboarding solution you reduce costs multiple ways:

Developers don’t have to build preboarding
There are fewer BizOps FTE dedicated to manual data intervention
Developers do far less firefighting when infrastructure is robust
Tech team focus can be on what the company does, rather than how the systems do it

The last bullet is subtle. Developers tend to know their tools and systems intimately, but often don’t know the business very well. That is a continual source of inefficiency and lost opportunities. When developers can focus more on business goals and less on infrastructure it is a win for everyone.

Nobody likes to lose customers

Data problems happen. Particularly in preboarding scenarios where data is arriving from data partners that have a completely different context and incentives. The sooner you catch bad inbound data the better.

From a cost perspective it is often said that the remediation cost increases non-linearly the further in from the edge problems are caught. I.e. a $10 catch by the data preboarding system at the edge saves you from a $1000 problem if the error gets all the way to a production system.

The immediate problem is the increasing difficulty of technical triage as the data flows inbound into data lakes and the downstream applications, analytics, and AI. Each step from data source to production use adds more logic to check, more people who can make mistakes, more layers of data to untangle, etc.

The longer-term problem is that customer patience is finite. If a customer keeps catching data problems they soon won’t trust the data and will go elsewhere. This is not an abstract speculation. It happens all the time. The customer’s exit is sometimes prevented by claw-backs, discounts, and simple lock-in. Regardless of exit or not, reputational and financial damage is done. Moreover, morale takes a hit, and that is impactful. It particularly hurts when the Sales team begins to doubt the correctness of the data underlying the product they are trying to sell.

If you let it be complicated, it will be

One of the arguments against buying technology is “it doesn’t fit our process”. Sure, COTS software is always opinionated. If a tool isn’t opinionated it isn’t guiding you towards best practice.

If you are at the build vs. buy choice point with data preboarding and someone says “our workflow is too specific to buy”, push back. The goal is to run the simplest business possible. Revenue being equal, simplicity is one of the great drivers of margin. If the buy option doesn’t fit the process, consider if the reason is physics or fashion. Somethings you can change. If change would result in a simpler, off-the-rack business, that’s something to strive for, not avoid. If change is doable and NPV is good, invest in change.

So, then, It’s a Buy

Vertical integration is having a moment. SpaceX, Apple, and Amazon are all deeply vertically integrated. Most engineers like to build. But Apple doesn’t own semiconductor fabs and SpaceX does not, as far as we know, manufacture bolts and screws. You have to pick your battles.

In our case, we back open-source packaged software. You can download FlightPath Data and FlightPath Server from Apple or Microsoft and start creating consistent, maintainable, and scalable preboarding projects today with no money changing hands. For us, “buy” is not necessarily a transaction. But make no mistake, we want you to buy into CsvPath Framework and the FlightPath products. We’re here to give you honest assessments of what can work for you and to make your data preboarding successful.

For us the decision is build vs. try. We hope you'll agree and take our product out for a test flight.

Should You Build Or Buy Data Preboarding?

What we learned creating FlightPath Data