by Tim Marley
Last week we talked about data management at a high level. The operating model, the responsibility, the reality that organizations are accountable for the data they collect, process, and store whether they have formally acknowledged it or not.
This week is narrower: establish and maintain a data inventory.
A solid data management program requires knowing what data you have. That means building and maintaining an inventory, with a focus on sensitive data at a minimum. But most organizations do not go into a great deal of detail about what “sensitive” actually means. That is where they get stuck.
What “sensitive” actually means
When I think about sensitivity, I go back to the CIA triad: confidentiality, integrity, and availability.
Confidentiality is keeping secret “stuff” secret. The obvious examples are regulated data. Healthcare information (PHI), cardholder data (CDE), controlled unclassified information (CUI), personally identifiable information (PII). But intellectual property (IP) belongs in this conversation too. You may not have any of the data we typically worry about from a regulatory standpoint, but you would certainly rather that the organizational secrets that help you survive as a business are not divulged to anyone you do not want them shared with.
Integrity is making sure that only people authorized to modify data are the ones doing so. In academia, we want students to know what their grades are, but we do not want them walking directly into the system to change them. In an organizational context, you are probably fine with everyone seeing what is on your website, but you do not want them editing it.
Availability is making sure data is accessible when it is needed. The example that usually comes up is payroll. We want to make sure we pay our people on time. Without that information up and running, that is a real problem.
Payroll actually hits all three. We do not want everyone knowing what their peers are being paid. There is an integrity aspect. Each employee should know what their payroll and benefits look like, but should not have the ability to change them. And it absolutely has to be available on time. Most critical datasets work this way. That is why simply labeling something “sensitive” without understanding the impact is not enough.
Inventory follows flow
In my experience, data inventory is best achieved by tracking how data flows through your environment rather than trying to catalog every file name in every folder. That isn’t realistic, and it isn’t what we’re trying to accomplish.
What you need to understand is:
- Where do you collect or generate that data?
- Where is it stored?
- Where is it processed or modified?
- Does it leave the organization, and if so, how?
Data typically does not live in a single repository. It may be stored at rest in one place, but it moves through your environment, and that means it is at rest at different points along the way. That matters, because the controls you need at each stage may be different.
If you have a folder that is heavily secured on a server but your users can export that data directly to their workstations, the controls around the server may be strong. The controls around the workstation may not be. Inventory is about identifying those realities.
This does not have to be overwhelming
The complexity of this exercise is directly tied to your environment and how structured and organized you already are. In a small or mid-sized business, this may be a manageable effort. The number of systems is smaller, the data flows are easier to map. At the enterprise level, it gets harder. More systems, more integrations, more exceptions.
There are tools that can help. It could be as simple as scripts that search your data repositories. It could be a scanning tool or a DLP solution that helps identify where certain data types exist. Those tools are useful, but they do not replace understanding. A scanner can tell you where social security numbers exist. It cannot tell you whether they should be there, who owns that dataset, or what obligations apply to it.
Why this matters
You cannot manage what you do not know. You cannot control what you have not identified.
Inventory is the foundation for everything that follows: access control, retention, disposal. We will get deeper into the data lifecycle in the coming weeks, but it starts here. If you do not know where your sensitive data lives and how it moves, every other control is guesswork.
