Newsletters




DATABASE ELABORATIONS - Should PII and GDPR Drive Database Design?


When designing normalized databases, the data modeler has an option in choosing how objects are interpreted. Objects may be very specific or be abstracted to a greater or lesser extent. Entities may be created that are very abstract, such as Vehicle. Or be very specific, such as Car, Truck, Airplane; or even Red Car, Blue Car, White Car, etc. The same “abstraction versus concrete” approach flows down to the individual data items. Choosing between these perspectives is where much of the art in data modeling arises. A data modeler should abstract where necessary; and for the database model, “necessary” means when certain organizational objects are fluid or will change, or perhaps when business folks are still changing what a business concept means. Sometimes, a data modeler may abstract to assist the physical implementation of a solution. Very abstract database models may save on some storage space and may allow for writing application code that contains more reusable components.

There has always been a need to tightly control some data items, such as passwords and Social Security numbers. Today, with the rise of concerns over personally identifiable information (PII), the General Data Protection Regulation (GDPR), and other legal mandates, a much larger group of data elements must be controlled. These legal data governance issues may need to guide our hands as we establish database designs. For example, within an abstract address design, there might be a single address table. Each unique address exists as a row within the address table once. The various uses of an address by a customer, vendor, or employee is exposed only via links to an associative entity. When circumstances exist wherein only some roles and uses of a given address are restricted, then each query becomes responsible for ensuring proper governance control enforcement.

In the example of the abstracted address, if a unique address exists only once, then discerning whether it can be accessed or not can be known only in performing the proper joins for its use and relationships. Some proper access may be controlled via views, but views may not ensure absolute control. Users are very creative, and they certainly will be able to exploit any possible holes. In other scenarios, such as when all information about a specific user must be removed, the associative links may be easily identified and dropped. However, the removal of an actual address can only be done once it has been verified that the address is unused by everyone for any purpose. Another variant of abstraction is user-defined fields (UDFs). UDFs are great for adding flexibility to a solution. Should any of these restricted/controlled data content end up inside a UDF, managing access and usage can become quite difficult.

The opposite of abstraction is specific/concrete. In the context of the address example above, one would have explicit columns for customer mailing address, customer business address, vendor mailing address, employee home address, and so forth. Such explicitly named data elements make their usage more direct and simple and easier to govern and administer. On occasion, the phrase “Space is cheap” is heard. If that is true, then splitting out unique buckets covering every variant of data element user/role combination for protected information should not be a database space problem. Such an explicitly named column approach would potentially increase initial development efforts for developers of the solution, but a one-time development cost would be cheaper than any legal fees, should potential abstraction holes be accidentally exposed later. In a world coping with the legal landmines that surround the ever-tightening restrictions of PII, GDPR, and whatever may come next, it may be a better practice to bias database design choices to help manage and govern. Keep controlled elements in explicitly named columns; avoid using UDFs for anything that may be controlled; and consider further isolating restricted data items into separate tables or schemas. Remember, safety first.


Sponsors