Skip to main content

Command Palette

Search for a command to run...

Are These Activities On Your Data Arrival and Preboarding Map?

A.k.a a salty menu of all the salt in the data ingestion salt mines

Updated
4 min read
Are These Activities On Your Data Arrival and Preboarding Map?

Structured data file feeds are simple in concept. Dig below the surface to actually making it happen, though, and you see a complicated set of activities that must be orchestrated correctly for the first stage of ingestion to work reliably. Many of you know that, of course. Still, at the small and large scale ends of things it is easy to forget the whole chain. Large company operations are often so specialized individuals become insulated from activities they don't directly participate in. They become arborists, not land managers. And small companies often merge steps, edit out activities, or otherwise lighten the load wherever possible.

Stepping back to see the complete big picture can be a help

What I'm trying to do here is simply catalog the activities. Breaking down each one is a job for follow-up posts. Because we are focused on preboarding here, and CsvPath Framework in particular, I'll indicate what steps can be (better) addressed with adding an explicit and methodical preboarding stage to ingestion. That obviously isn't the whole list, but preboarding covers some activities completely, and assists in making more of them move smoothly.

The first stages of data ingestion also have an exit point. The data has to go somewhere. With a focus on data preboarding, your exit is often into the data lake or ETL staging area, though other possibilities exist. In the case of a data lake, the area containing raw data -- bronze, if you like -- may act as the storage layer for preboarding, or it may be where preboarded, trustworthy "ideal form" raw data is transferred to. Either way, that is also a topic for other follow-on discussion. It is also reasonable to say the data hasn't been full ingested until it's in the application(s) or analytics system(s). That's fair, of course; different roles have different processes, or different parts of the larger process.

Last (for now), but not least (not remotely least!) there is the financial impact of the MFT and preboarding stage of ingestion. All these activities require expensive time, attention, and technology. And they all embody risk in terms of liability, SLA metric consequences, hard-to-value-but-valuable reputation hits, and excessive cost-of-doing-business losses. The scale of data file feed value and risk can occasionally be eye-opening, even to us who have long been around it. Definitely a topic to explore further.

The preflight checklist, so to speak

So, without further ado, here is a bulleted list of ingestion activities. It is from MFT arrival to preboarding acceptance to availability downstream. No doubt I've missed or mashed together many things. You may think I'm making a mountain out of a mole hill or a mole hill out nothing. Please send me your edits and suggestions!

Customer onboarding *

  • Credentials exchange

  • Configuration of customer->MFT (file/data formats, paths, naming, schedule, protocol, error handling, whitelisting, testing)

  • MFT system configuration (infrastructure capacity, events and triggers, account setup)

  • Observability configuration (alerts config, dashboard create/edit)

  • Configuration of MFT->DataOps/biz ops teams (archiving, integration scripting/config, replay process create/edit, testing)

  • Documentation and metadata update

Operations

  • Timeliness config

  • Registration (data’s birthday, social security number, family name, street address)

  • Conformance checks (readability, size, encoding, canonical forms, datasets expected, attribution, etc.)

  • File handling (backups, rotation, versioning, retention)

  • Forwarding (workflow steps, notification)

Data acceptance

  • SME review

  • Data validation and quality management

  • Customer change negotiations

  • Data mastering

  • Internal data publishing

Configuration update

  • Review and reset on essentially any of the above

Forensics

  • Arrival how and when (provenance, arrival metrics, point-in-time MFT config review)

  • Data statistics at registration

  • Change management (lineage tracing, change data capture, point-in-time script review)

  • Chain of custody (user access tracking, workflow/transfers, permissions/credentials review)

  • Business rules review

  • Testing review (data testing, config testing, workflow testing)

Right, then — that’s your 30,000-foot view. A map for more future exploration. What is missing? Discuss! And happy preboarding!

⦿ Bold items are part of CsvPath Framework or FlightPath Server’s preboarding remit. Many can be completely handled in the Framework; for others, CsvPath is just one piece of the puzzle.