Data Handcuffs: The Aftermath of Success

Cleaning sensitive data at scale to enable software/data teams.

When Success = Compliance

For many of our customers success has come with accolades, trophies, …and a fresh round of compliance/governance to adhere to!

For teams who haven’t started out with data compliance in mind (which is most), this often means that there are volumes of PII, PHI, or trade secrets that have accrued in data stores.

Once these teams look to implement a compliance regimen (be it PCI, SOC2, HIPPA, etc…), they often find that they have less access to data than they did before.

In these cases, a new compliance boundary drawn around these environments means:

  1. Less data for functional testing (Did we break anything?).
  2. Less data for analytics and product telemetry (How is my product being used? What is un/popular in my offerings?).
  3. Less data for load testing (Will it still perform under this real-world load?).

If teams DO retain access to datasets, they quickly become unusable (undersized, dated, incompatible/schema drift).

These production teams (be they software or data teams) are now responsible for enhancing and maintaining a product whose data is now gated behind a compliance boundary.

Data Cleaning That Doesn’t Slow You Down

We’ve been asked to build a couple versions of this solution. Essentially, we want a capability that:

  1. Doesn’t slow down production workloads
  2. Removes or tokenizes sensitive fields (PII, PHI, trade secrets) from data, making it safe/compliant to work with
  3. Easy to configure/update for schema changes or Tenant/customer preference
  4. Allows custom handling of sensitive fields per tenant database (e.g. should I redact, replace, or tokenize out a sensitive field?)
  5. Allows for pattern recognition and data mining (see tokenization approach)

A simplified diagram of this approach can be seen below.

Keep Teams Capable/Productive

We’ve seen the above adopted by several Data/BI teams with great results.

Reach out via our website about using DevOps and Cloud Automation to enable teams that are challenged with compliance constraints.