Friday, March 23, 2012

Configuring Referential Integrity

Find a table 'tblTableName' in the database that doesn't have any referential integrity configured. This will appear as tables with no links in the diagram window and with no right menu button
Option to add related tables.

Create a group by query on the table to analyze the data. Look for anything appearing to be a unique record constraint based on real data (No auto numbers). Include all columns to start and cut away small pieces or start with a single column and add more until you find the unique record constraint based on real data. These records will appear as HAVING COUNT(*) = 1.

SELECT [ColumnName] , Recs = COUNT(*)
FROM [tblTableName]
GROUP BY [ColumnName]
--HAVING COUNT(*) = 1
ORDER BY COUNT(*) DESC

A 1
B 1
C 1

We only have one of each value so this is probably a primary table that will be linked to through foreign keys of other tables.

I use my code to search for tables containing a column named 'ColumnName'.
I'm using wild cards to look for any column somewhat matching the text we are searching for.

-- Find it somewhere!
EXEC dbatools.dbo.usp_srvr_object_find
@CSVServerNameListInclude = @@SERVERNAME
, @CSVDatabaseNameListInclude = 'DatabaseName'
, @CSVObjectTypeList ='TABLE_COLUMN'
, @CSVWildCardObjectNameList = '%ColumnName%'

Open a diagram and try linking the tables and saving. The save will only work if the data is related.
Even one missing value will keep the save from completing. Records/Values will either need to be added to the primary table or removed from the foreign table before RI can be configured.
After the data is reconciled (deleted or added) try linking the tables again.

Another method is to search the database for data matching our primary tables Primary Key.
I use my DBATools.usp_srvr_db_table_data_exists procedure to search the database for
references to a few code values. I don't use the @UseLike feature as I am looking for
specific code values.

-- Find data in any table.
EXEC DBATools.dbo.usp_srvr_db_table_data_exists
@ServerName = @@SERVERNAME
, @DatabaseName ='DatabaseName'
, @EscapeCharacter = '~'
, @SearchText = 'A'
, @UseLike = 0

I would then look at the number of occurrences of more than one value from the primary table.
Any table with one or more matches could potentially be related. Any table with just one match of each value could be the primary table.

SELECT [ColumnName] , Recs = COUNT(*)
FROM [tblTableName]                                                      
GROUP BY [ColumnName]
--HAVING COUNT(*) = 1
--HAVING COUNT(*) > 1
ORDER BY COUNT(*) DESC

Next I would make educated guesses to whittle down the tables I think may contain foreign key references to this data. Look at both the table names and column names. Some will obviously not be related. You can also look through the code for procedures using the tables in question as you may see how it is being used through existing code.

-- Find it somewhere!
EXEC dbatools.dbo.usp_srvr_object_find
@CSVServerNameListInclude = @@SERVERNAME
, @CSVDatabaseNameListInclude = 'DatabaseName'
, @CSVObjectTypeList ='TABLES,TABLE_COLUMN,PROCEDURES,PROCEDURE_CODE,SYNONYMS,PROCEDURE_CODE2'
, @CSVWildCardObjectNameList = '%tblTableName%'


As we find data we want to configure RI by linking 'like' columns in the SSMS diagram designer.
Denormalized data should never be linked to anything but a code table so we may end up having to manually create and populate some code tables. You can always try linking the columns to confirm or deny a data relationship although this may not be practical on very large tables. If your data is from a legacy (and old) system you will may have a lot of multiple column RI relationships.

Please note that configuring referential integrity is also covered in my post on legacy database conversions  as the two processes are similar in some cases. 

No comments:

Post a Comment