Newsletters




Database Elaborations: Fact Tables and Their Lying Primary Keys


When working on a multidimensional design, every fact table within scope should be handled with care. In an ideal world, each low-level fact table represents the metrics related to a business event. The meaning of a fact table, ideally, should be evident based on the table name and the composition of the fact table’s primary key. Deciding on a primary key for a fact table is an important choice.

Some designers simply create a surrogate key to play the role of a primary key. The surrogate key, when used as a primary key, ensures that no semantics could ever be derived from a simple review of the table’s design. The surrogate-keyed fact table has no specific meaning at all, as even 100% duplicated data can be added with impunity to the fact as long as inserts generate a new surrogate value. The logic within the code that determines when to insert is the actual home of the table’s meaning. Hopefully, that table-insert logic is documented elsewhere as well.

When a data architect chooses not to use a specially generated surrogate key, the usual and expected fact table primary key is a
composite of foreign keys pointing to the dimensions associated with a fact. Under this composite primary key approach, the combination of dimensions describes the grain of the given fact table. A row cannot physically be inserted that is at any lower grain than that of the combined values from all those dimensions. However, it is not unusual to find dimensions associated with a fact table that are not contributing to the grain of the fact. When a designer knows that a dimension is not impacting a fact table’s grain, the designer should not include the foreign key for that dimension as part of the primary key, as doing so may mislead people about the nature of the fact.

This non-contribution-to-the-grain condition most often arises when dimensions are correlated. Data modelers should stop and give some consideration to why they are including these dimensions. A worst-case scenario might be that the designer has a hierarchy that has not been included within another dimension. It could be these hierarchy-component dimensions were added directly to the fact as a work-around to avoid snowflaking. The correction would be to expand the appropriate dimension to include the additional levels of the correlated hierarchy.

Even so, some circumstances may still cause the design to include dimensions not contributing to the fact table’s grain. If too many of these kinds of relationships exist, one may have a centipede fact with an unusually large number of dimensional relationships. Designers should seek a balance to avoid an excess.

When only a sub-set of the foreign keys comprise the full primary key of the fact table, it is not unusual for developers, or query writers, to gloss over the documentation and assume that all foreign keys are the primary key. Functionally, they have assumed a non-intended super key. This may not cause them to fall down a wrong rabbit hole all the time. After all, if [A] is a unique set of values, then [A, B] is also a unique set of values. But in some circumstances, unnecessary work might be done by these overachieving query writers.

Some database designers may feel that it’s OK to arbitrarily assign all dimensional foreign keys as part of a fact’s primary key even when one or more do not actually contribute to the fact table’s granularity. Such arbitrary labels are only exposing the designer’s sloppy approach to their models. Sloppy approaches are often found in weak designs that likely contain flaws. In such cases, these super-keyed facts are lying to users about the grain of the fact table.

 

Todd Schraml has more than 20 years of IT management, project development, business analysis, and database design experience across many industries from telecommunications to healthcare. He can be reached at TWSchraml@gmail.com.


Sponsors