Changing Database Naming Conventions

Hungarian Notation

The idea of adding a descriptive prefix to a name is called Hungarian notation. The convention came about during the development of Excel at Microsoft. A team would use this strategy to encode what the type of object was in the name of the object to avoid the misuse of it. So, for example, you might have strName or intCount.  This is a poor example because name and count already imply their data types. Surprisingly though, they are not uncommon ones to find.

Variables aside, database developers have adopted this methodology for their use and, both, user and system code began to look like this:
·         usp_MyProc
·         sp_SystemProc
·         udf_MyFunction
·         fn_SystemFunction
·         vw_MyView
·         tbl_MyTable (well, in extreme cases)

We probably all learned from code that looked more or less like this and continued the tradition by reflex in new development.

In the app-dev world, Hungarian notation was abandoned long ago. Intellisense helped to provide metadata during coding, missing casts are highlighted, refactoring with an IDE made it easy to rename objects.  People realized that an encoded prefix only added noise. Or worse yet, slowed down on-boarding as a new person would need to be trained in the lingo. It is also rather painful to enforce and, hence, a bit of a maintenance nightmare.

So what do we get for it in SQL code?  It sure doesn't help with sorting in the object explorer. It doesn't help distinguish anything in the code, itself. Procs always have EXEC before them.  Functions are followed by parentheses. Tables and views follow SELECT.  In fact, our brains are so smart that they start to not see these prefixes after a while.  Yet, it still clutters the code with additional noise and makes reading and maintaining code more work. 

Why Tables and Views Should be Named in the Same Way

Tables and views, in the very same way, represent relational sets which users access the data by.  There is not really a need to distinguish between them from a relational theory point of view.  If you have a good IDE, like SSDT, you can easily trace back to either definition by hitting F12.  For tables and views as an API to applications that access the database, such as SSIS, it is bad form altogether. What should the application care whether it is accessing a table or a view?  What if it needs to change from one type to the other?  You should not have to change the referencing application because you changed a table to a view or a view to a table.

What’s the main reason that people insist on distinguishing between tables and views with a prefix in the name?  I would give an educated guess that it is because of the conventional layout in the object explorer in SSMS which puts these two objects in two different folders.  That makes hunting down the definition of an object a pain if you have to look in multiple places.  Imagine then, placing all tables and views in one folder in SSDT in the solution explorer?  That would make it easy to find no matter which it was.  Additionally, if you needed to switch out one for the other, you would not also need to move the object to a new folder.

Domain-Based Names

When it comes to a new project, I prefer using a domain-based naming convention. It takes a bit more thought and it's a little more verbose, but in a useful way.  The prefix becomes a domain type, rather than the literal object or data type.  So, for example, a Customer table in the data warehouse might look like:
·         Dimension.Customer
o   Customer_Name
o   Customer_Id
o   Customer_Address
o   Customer_Phone

Using Schemas Effectively

Note the use of the schema in the table name above.  If our eyes learn to ignore “usp_” in front of a proc, they certainly learn to ignore “dbo.” In front of a table. So I am a supporter of useful schema names, though these can get a little sticky as I’ll explain in a moment.  For a data warehouse, having a schema for fact and dimension (fully spelt out) makes the dimensional model more self-evident.  It also sidesteps calling your customers and employees “dim” (DimCustomer, DimEmployee), which is always awkward to explain away during a presentation.  Schemas are also helpful in separating concerns in the ETL staging objects, i.e. stage, cleanse, conform, load.

When it comes to a normalized data model, schemas can be much more prolific.  For example, Microsoft promotes a blurry department/object type of categorization in Adventure Works – HumanResources (dept), Purchasing (dept), Sales (dept), and then Person (object)?  Whatever you decide, be sure to stay consistent.  A department approach is difficult to maintain for one reason because numerous departments will share data, as in the Person object in adventure works.  Also, a “junk drawer” schema might eventually appear called “common” or “shared.”  Be wary of these because, like a junk drawer, it makes things difficult to find.  The Adventure Works model is promoting the use of schemas for security purposes, rather than organizational.  For example, someone working in HR could be granted access to the entire HumanResources schema.  I do not usually draw such a hard line on security.  The point of a database is to serve up data, not hide it from organizational groups within the same company.  It makes much more sense to protect those few sensitive things well, rather than bind people from doing their job and offering up new insights.

If we use schemas organizationally, this drives straight at the heart of the data model altogether.  We ask questions like what’s the difference between HumanResources.Employee and Person.Person?  Employee has an FK to Person, so do these things really belong together with some encrypted columns?  In the end, we wind up with a model that it far more clear on where to find things -- all employee related things are in one table, shipper related things in another.  Using the domain type modeling, we have product, vendor, shipper, currency, customer.  We avoid the sales person being relegated off to their own table for no particular reason.

Column Names

Moving on to column names, they inherit the object name, such as Customer_Name, Customer_Address. Etc. this helps distinguish between columns of the same name in the same query or report coming from different sources.  We never leave a column named, id, start_date, name or something as generic as that, which often leads to a lot of poking around to understand where something came from.

If you find yourself enhancing column names with extended properties, consider how these might be eliminated with a clearer data model and better naming convention.  Extended properties are essentially comments and, like comments, indicate an inadequacy in the code’s ability to explain itself.

Procs and Functions

All that applied to tables applies to views because these should be interchangeable from a programmatic point of view.  Also, since tables and views represent the data itself, they are, indeed, things.  Thus, tables and views should always be nouns.  A view doesn’t do anything, it is that thing, so a view should never be vw_GetHighValuedCustomers, but rather, High_Valued_Customers or Customer.High_Valued_Customers.
Procs and Functions do things and, therefore, should be verbs describing their action.  Here are some examples:
·         Cleanse.Insert_Customers
·         Conform.Insert_Product
·         Report.Convert_CSV_To_Table()
·         Test.AssertExpectedEqualsActualString

Final Thoughts

The most important rule is “be consistent.”  By that, I do not mean “do not change,” but rather, do what you do the same way everywhere.  It is possible to change names in database objects, in fact, I think it’s a really good thing to get in the habit of it.  Using refactoring in SSDT, can make light work of this and so, keeps you from getting tied to a mistake or a poor convention.

Labels: ,