Are These Activities On Your Data Arrival and Preboarding Map?
A.k.a a salty menu of all the salt in the data ingestion salt mines

Structured data file feeds are simple in concept. Dig below the surface to actually making it happen, though, and you see a complicated set of activities that must be orchestrated correctly for the first stage of ingestion to work reliably. Many of you know that, of course. Still, at the small and large scale ends of things it is easy to forget the whole chain. Large company operations are often so specialized individuals become insulated from activities they don't directly participate in. They become arborists, not land managers. And small companies often merge steps, edit out activities, or otherwise lighten the load wherever possible.
Stepping back to see the complete big picture can be a help
What I'm trying to do here is simply catalog the activities. Breaking down each one is a job for follow-up posts. Because we are focused on preboarding here, and CsvPath Framework in particular, I'll indicate what steps can be (better) addressed with adding an explicit and methodical preboarding stage to ingestion. That obviously isn't the whole list, but preboarding covers some activities completely, and assists in making more of them move smoothly.
The first stages of data ingestion also have an exit point. The data has to go somewhere. With a focus on data preboarding, your exit is often into the data lake or ETL staging area, though other possibilities exist. In the case of a data lake, the area containing raw data -- bronze, if you like -- may act as the storage layer for preboarding, or it may be where preboarded, trustworthy "ideal form" raw data is transferred to. Either way, that is also a topic for other follow-on discussion. It is also reasonable to say the data hasn't been full ingested until it's in the application(s) or analytics system(s). That's fair, of course; different roles have different processes, or different parts of the larger process.
Last (for now), but not least (not remotely least!) there is the financial impact of the MFT and preboarding stage of ingestion. All these activities require expensive time, attention, and technology. And they all embody risk in terms of liability, SLA metric consequences, hard-to-value-but-valuable reputation hits, and excessive cost-of-doing-business losses. The scale of data file feed value and risk can occasionally be eye-opening, even to us who have long been around it. Definitely a topic to explore further.
The preflight checklist, so to speak
So, without further ado, here is a bulleted list of ingestion activities. It is from MFT arrival to preboarding acceptance to availability downstream. No doubt I've missed or mashed together many things. You may think I'm making a mountain out of a mole hill or a mole hill out nothing. Please send me your edits and suggestions!
Customer onboarding *
Credentials exchange
Configuration of customer->MFT (file/data formats, paths, naming, schedule, protocol, error handling, whitelisting, testing)
MFT system configuration (infrastructure capacity, events and triggers, account setup)
Observability configuration (alerts config, dashboard create/edit)
Configuration of MFT->DataOps/biz ops teams (archiving, integration scripting/config, replay process create/edit, testing)
Documentation and metadata update
Operations
Timeliness config
Registration (data’s birthday, social security number, family name, street address)
Conformance checks (readability, size, encoding, canonical forms, datasets expected, attribution, etc.)
File handling (backups, rotation, versioning, retention)
Forwarding (workflow steps, notification)
Data acceptance
SME review
Data validation and quality management
Customer change negotiations
Data mastering
Internal data publishing
Configuration update
- Review and reset on essentially any of the above
Forensics
Arrival how and when (provenance, arrival metrics, point-in-time MFT config review)
Data statistics at registration
Change management (lineage tracing, change data capture, point-in-time script review)
Chain of custody (user access tracking, workflow/transfers, permissions/credentials review)
Business rules review
Testing review (data testing, config testing, workflow testing)
Right, then — that’s your 30,000-foot view. A map for more future exploration. What is missing? Discuss! And happy preboarding!
⦿ Bold items are part of CsvPath Framework or FlightPath Server’s preboarding remit. Many can be completely handled in the Framework; for others, CsvPath is just one piece of the puzzle.






