A common data warehouse design will incorporate foreign keys
on the fact table related to its dimension tables. These keys improve the performance of inner
joins and make the joins reliable by insuring rows will not be dropped in a
query because we know the primary key table has the referenced values. However, they do come with a price during the loading
process -- the database needs to validate that each and every value exists in
the primary key table before they are inserted.
Using Adventure works, I’ll mock up a scenario to demonstrate the
point. First, let’s create a new table:
Create table dbo.FactInternetProductSales
(productkey int
not null,
TotalProduct money not null,
constraint [fk_FactInternetProductSales_Product]
Foreign key (productkey) references [dbo].[DimProduct] (productkey)
);
GO
Let’s suppose that we need to insert lots of data into this
table each night. Now, factInternetSales
has only a whopping 60k rows. But let’s
suppose that we have to insert that into our new table each night (with a few
additional rows from a cross join to exaggerate the point):
Insert into dbo.FactInternetProductSales
(productkey, TotalProduct)
SELECT
p.ProductKey
, d.LineTotal
FROM [AdventureWorks].[Sales].[SalesOrderDetail] d
INNER JOIN AdventureWorks.Production.Product pr
ON d.ProductID = pr.ProductID
INNER JOIN DimProduct p on pr.ProductNumber = p.ProductAlternateKey
cross join (SELECT 1 as col UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1) as t;
This produces a query plan like this:
Note the sort which prepares the dataset for a
merge join with the product dimension. This
lookup validates the foreign key, even though we just did an inner to get those very values. This whole operation is
about half the cost of the query plan.
On my system, this query
executes in about 5s. However, if I
first disable the Foreign key and then re-enable them, doing all the checking
at once I get a total duration of 2.6 seconds.
The disabling and re-enabling looks like this:
Alter table dbo.FactInternetProductSales with
check nocheck constraint all;
Alter table dbo.FactInternetProductSales with
check check constraint all;
In this case, it was about a 50% cost savings, but it may be
a bit more moderate than that depending on the volume of data. A couple of seconds do not matter on a small
fact table, but if you expand this same scenario out to wider tables with more foreign keys and larger
row counts, the cost savings may be much more dramatic. In the case of very large tables, the task of checking foreign key constraints can be done on a staging table and then switched into the main table.
Considering that this operation needs to happen for every foreign key, the incoming dataset will be sorted for each foreign key. It’s worth comparing for your largest tables and longest running
jobs.