Skip to main content

Command Palette

Search for a command to run...

Your Data Lake Is Flying Blind

The Airline Analogy For Why You Need Data Preboarding

Updated
5 min read
Your Data Lake Is Flying Blind

This past week, FlightPath Server has (finally!) taken to the skies, in tandem with FlightPath Data.

In honor of FlightPath Server's release, let's do the airline analogy for data preboarding. We all talk about landing data and data in-flight. As analogies go, air travel is for sure a good one.

Imagine an airline lands a plane

The pilot has identified the flight and taxied. The plane approaches the gate only a little late. The crew docks and the door opens. The ground crew does a double-take, this wasn’t the flight on the clipboard.

Regardless, deplaning starts. 70% of the people need to catch a connecting flight. They elbow their way to the front, spill out gangway and run for the next gate.

Most of them, but not all, find the correct gate in a timely way, despite the hub airport's best attempts to confuse and mislead.

At the gate they rush onto the plane without showing tickets and grab any seat they can. The flight is, of course, overbooked, but nobody checks. Moreover, at least a few of the rushing flight-catchers don't realize they caught the wrong flight until they are in the air. The flight attendants are puzzled and immediately upgrade them to business class, because isn’t that what you do? When the flight lands... wherever it does, the misdirected travelers do the mad scramble again for another flight, hoping they end up in the right city this time. Bags have gone missing.

Now, let's pause for breath and look at that picture. And this is why it's not quite as good an analogy as retail deliveries. Basically, because it's factually correct!

Ok, ok, just kidding. It is not correct, and it's not fair to the hard working, caring airline employees that somehow manage to make flying not this experience. So I take it back, with apologies!

What really happens up there?

In the real world passengers who have connecting flights exit the plane in good order, sometimes first, if there were delays. They follow clear signs and instructions to the next gate. At the gate they check in. If they have questions, the gate attendants are there to answer them.

The flight boarding is announced at 20-minutes before, and again as boarding nears. Boarding starts after the plane is cleaned. It goes by seating groups and classes. People are more or less polite and turn-taking. Tickets are scanned carefully and emergency exit questions are asked. Regulation sized bags are stowed, others are diverted to checked. Assigned seats are taken. If a passenger is not on the manifest they are rerouted before the doors shut.

The flight is announced over and over throughout the process. Everyone knows the flight’s identity and destination. Nobody is surprised by anything, much less after the plane pulls back from the gate.

That's how it works for 12 million fliers every day. Generally, it goes surprisingly well. Seven 100ths of one percent of trips worldwide result in lost or delayed bags, and far fewer in the US, I’m happy to say. I'll take those odds! We remember the problems because they are personal, but the vast majority of those millions of trips are uneventful. Data should be so lucky!

You see what I'm driving flying at?

In the world of Data, the trip is often more chaotic. Data enters the organization from data partners to the tune of millions of datum per day. With loose processes and low-flying governance. Things get messy fast.

In many organizations, the identity of the data set — crucially, the version of the set, not the set as a concept — is unclear. The seating assignment is scrambled. The individual data points aren't checked against a schema and may not have a ticket. The next leg in the journey is often unclear. And there is no record of what data points passed what gates managed by what attendant.

Moreover, when a data point is eventually found to be in the wrong seat or on the wrong flight, getting them into the right seat or off the plane disrupts our clarity about the other data points from earlier flights, calling the whole database’s fitness for production into question. Ultimately the whole corpus of data, all of it essentially in-flight, is repeatedly perturbed and becomes suspect because of the poor handling in in-place modifications resulting from new data rushing the gates. The whole data flow grinds to a halt for re-ticketing.

Data preboarding: your traffic control, pilot, and attendant

The data preboarding process is about bringing airline-like operations to data file feed engineering and operations. Ingestion of data file feeds should have two clear stages. Preboarding, to land, register, validate, and generate metadata history. And loading, to move "ideal-form", trustworthy raw data into the data lake, data warehouse, applications, analytics, and AI.

This isn't complicated and it's not controversial. We try to take in data methodically, just like travelers try to get to the right gate. A solid preboarding process is how to take the drama and heroics out, lower review and triage costs, minimize customer risks, and help everyone be more agile and responsive.

The FlightPath Team Takes Wing

As said at the top, FlightPath Server recently joined FlightPath Data and CsvPath Framework to complete the leading data file feeds preboarding solution. FlightPath Server's role is, first, to listen for inbound data arrivals and begin the preboarding process. And second, FlightPath Server provides an API for downstream data consumers to find trustworthy data and metadata published in an immutable archive. FlightPath + CsvPath is an open and free architecture for preboarding that you can roll out rapidly.

What you get, besides peace of mind, lower costs, etc., is a solution makes data intake simple through:

  • Immutable staging

  • Durable identification

  • Validation and upgrading

  • Descriptive and lineage metadata

  • A permanent archive queryable from downstream

And it's a solution that fits into your current data estate, integrated with the same cloud services, MFT servers, databases, metadata protocols, and webhook senders and receivers you already use.

Without data we'd get nowhere. Without data preboarding we won't enjoy the trip. With FlightPath Data the air is smooth and the sun is shining. Come fly with us!

Data Preboarding Analogies

Part 1 of 2

This short series lays out some analogies between the way data file feeds are typically brought into the enterprise -- or should be -- and common scenarios that are similar but often better handled.

Up next

Your Data Lake Is Selling Sketchy Goods

The Retail Analogy For Why You Need Data Preboarding

Your Data Lake Is Flying Blind