Monday, June 6, 2011

Database Development - Design Strategy

Using Business Objects correctly will make your application more robust and adaptable. If your business logic is encapsulated into Business Objects a term I use to refer to SQL Server user defined tables, functions, procedures, triggers and CLRs. Business Objects are constructed primarily of SQL statements a relatively simple language. When used correctly they simplify development and maintenance and are relatively transportable. They also keep your business logic in a central location which you can easily script. You can then perform text searches on the SQL statement based text/code to help you perform on going code review and analysis. Business Objects help separate your business logic from your reporting tool and user interface and can also reduce your QA efforts if the same Business Objects are used to retrieve data for the user interface as are used to populate your reports. Make sure to write your procedures and user defined tables so they return zero or more records so they can be used for both your reports and UI. Use Business Objects whenever possible for data updates and data retrieval operations for the user interface, reporting, SSIS packages, CLRs and database triggers enforcing your business rules.

Now a little about database development. Make sure every table has a clustered index so it can be defragmented automatically. For each table create a real data unique record constraint and a single column surrogate/identity/auto-number primary key (PK). The single column PK works great for an audit logging strategy since it provides record access through a single value. The single column PK also simplifies your schema diagrams and SQL statements. A real data unique record constraint is often confused with a primary key. While a primary key can serve as both I prefer to keep them separate unless I'm in MS Access and there isn't a choice. The main purpose of a unique constraint is to define what would constitute a duplicate record and to configure the table so that they are not allowed. If you haven't defined what is in your table then aggregating the data in it will not be of much use as the data would just represent something as of yet undefined. Counting values in a table without a real data unique record constraint is as meaningless as summing unrelated percentages or aggregating distinct values. In short put a real data unique record constraint on every table even most of your temp tables. Use a trigger to implement it if you can't do it any other way.  Although there may be times when it is necessary to include an identity/auto-number column in the the unique record constraint it should be avoided.

Try not to store the same data in more than one table unless it is audit log or temporal information. If you do store the same data in more than one location (denormalized data) define something about it to differentiate it from the other data such as the date and time when it was saved. If your data is supposed to stay synchronized with the other data use referential integrity and create database triggers to enforce the business rule(s). Use referential integrity on all  tables except possibly for some temp tables and use triggers to enforce business logic where necessary.

Lock your data down and have the ability to store audit logs for all transactions. Audit logging should be done at the lowest level possible with table triggers. Please do not use application code in yet another sad attempt at audit logging. Make sure your audit log tables are not in your application database. Audit log data needs to be stored somewhere where it won't bloat the database.

SSRS reporting is a pretty nice tool for it's price especially if you designed your reports using the same SQL objects you are populating your user interface with. Use CLR routines for report formatting features so one person can manage 100s of report formatting changes.

SSIS packages can be adjusted for band width usage so they can be immensely helpful as long as they are used to call your business objects or fill another need without taking on business logic that belongs in the business objects or CLR functions. 

In closing I would suggest you make sure your SQL developers can identify common SQL errors. I will try to review each of these at a later date but for now I will have to just mention a few such as

Row duplication caused by a join to a non-unique index.

Joins on denormalized data vs. joins on unique code lists.

Semi-random updates: This is where there is one record to be updated but the SQL statement specifying the value to be used in the update returns multiple values. Obviously only one value can be used to update one column of one row.

Omission of data due to inner joins.

Omission of data due to active inactive flags instead of origination_date/acquired date - maturity date/disposed date etc.

Omission of data by putting where clause criteria on left outer joined table elements in the where clause instead of the join clause.

Omission of data by joining outer or inner joined tables to each other instead of a root data set containing all possible keys for both tables.

Summing unrelated percentages.

Aggregating distinct values.

Excluding data in calculations due to bad join and where clause criteria.

That's enough for now! Thank you for your time. Hopefully my suggestions can help you with your development efforts.

No comments:

Post a Comment