Date is good. We all know that modern enterprise organizations run on data and that the digitally encoded streams of information that traverse the cloud and the web enable us to work smarter, faster, better and so on.
But data, unfortunately, data also has a bad side, some negative management aspects and more than a few difficult strands of DNA. That bad side is often as simple as identifying data access competencies ie organizations know that they have data inside company systems, but these blocks of IT are disconnected, badly integrated or siloed away in some way that makes them altogether tough to work with.
According to an IDC InfoBrief, sponsored by data access and security company Immuta, data access issues remain the most frequent and tangible bottleneck to achieving what C-suite directors (typically, chief experience officers known as CXOs) would define as the point where a business has become a so-called data-driven intelligent organisation.
The lack of a modern, continuously managed and well-audited data access strategy is often the problem, especially when firms look at their time-to-data measure.
“Cloud migration, especially when companies have adopted multiple cloud data platforms, has created significant problems for businesses, in particular the ‘time-to-data delay’ where organizations struggle to efficiently manage data access demands whilst balancing privacy and security requirements,” he explains Immuta’s EMEA general manager, Colin Mitchell.
This data access bottleneck gets in the way of the business by being able to use data to make intelligent, data-led decisions. Mitchell suggests that with barriers to data access, it is important to make the C-suite aware of the problem that affects all professionals and processes across an organization and how to address it in a way that’s both efficient and future-proof.
“Organizations need to automate data access at scale—and so the technology to do that is Attribute-Based Access Control, or ABAC,” added Michell. “The move from legacy Role-Based Access Control (RBAC) to ABAC simplifies data access control and provides dynamic policy enforcement. It enables organizations to move from a default ‘no’ to a default ‘yes,’ enabling more teams to gain value from their data.
Concurring with Michell’s sentiments, but coming from a data quality rather than data access and audit perspective is Chad Sanderson. Leading the data platform team at Convoy, a company that uses data-centric technology to make freight more efficient, Sanderson and team work to deliver a platform designed to reduce shippers’ costs, increase carriers’ earnings and eliminate carbon emissions.
Existential data quality threats
Sanderson has his own personal take on what factors create the real existential threats that data quality faces. If there were four horsemen of the data apocalypse, he would list them as:
- omission date,
- waste data,
- divergence date,
- downtime date.
Taking them in order, Sanderson explains data omission as the point when we refer to missing metadata that would allow data consumers to understand how data is used, who owns it, what it means, and where it’s located.
“Data cataloging tools can solve a piece of this problem; some solve where data is located or who owns it, but there’s no one-size-fits-all tool today that solves the full problem, leading to users bouncing between too many tools to try to figure out data specifics like who owns the data and what it means, adding a lot of pain to the data discovery, modification and creation experience,” said Sanderson.
He further points to waste data. This refers to the growth of unused, unmaintained or duplicated/similar queries that inflate cost while obscuring clarity. Waste typically happens when the cost of recreating data to make it fit specific parameters is lower than using what already exists.
“So someone might say: ‘I’m going to create my own column, I’m going to modify that column and potentially add filters on top of it. I’m also going to add columns to other data across the company, and will tweak and produce them in the way that I want.’ But now you’ve created two different states of the world. So if you ask a relatively simple business question, such as “how many total active customers do I have?” Depending on the team you ask, you might get two different answers,” explained Sanderson.
Human in the loop risk
Horseman number three would be data divergence. This refers to the growing divide between what’s happening in ‘the real world’ and what’s being reflected in the data warehouse.
“When you’re consuming all of this data exhaust and reverse engineering business logic on top of it, now you have to have a human in the loop that’s thinking about how they need to modify their queries when the business evolves to keep up, otherwise those queries are going to reflect an old-state of the world. Things will be actively changing and if someone is not modifying those queries (which may be hundreds of lines long with many joins), then a single line of SQL can cause the interpretation of a metric or table to be vastly different than what’s reflected in reality ,” detailed Sanderson.
Finally, we come to data downtime. This refers to periods of time when data is partial, erroneous, absent or otherwise inaccurate. According to Sanderson, this represents a traditional data quality problem. In this scenario, you’re aware that your data is not serving the business in the way it needs to be… and there’s a tangible (or perhaps at least perceivable) gap between when it can be fixed to continue delivering value to customers.
Of all the types of data that we define today, we are not about to coin evil data or bad data with any seriousness, but we can definitely agree that enterprise data estates have a fair few pitfalls, perils and potholes to steer around. The fact that a company called Convoy should suggest we think about navigating this route is almost too good to be true.
Now we know where Martin ‘Rubber Duck’ Penwald was coming from when he said: My daddy always told me to be like a duck. Stay smooth on the surface and paddle like the devil underneath!