Hungarian Notation
The idea of adding a descriptive prefix to a name is called
Hungarian notation.
The convention came about during the development of Excel at Microsoft. A team
would use this strategy to encode what the type of object was in the name of
the object to avoid the misuse of it. So, for example, you might have strName
or intCount. This is a poor example
because name and count already imply their data types. Surprisingly though,
they are not uncommon ones to find.
Variables aside, database developers have adopted this methodology
for their use and, both, user and system code began to look like this:
·
usp_MyProc
·
sp_SystemProc
·
udf_MyFunction
·
fn_SystemFunction
·
vw_MyView
·
tbl_MyTable (well, in
extreme cases)
We probably all learned from code that looked more or less like
this and continued the tradition by reflex in new development.
In the app-dev world, Hungarian notation was abandoned long ago.
Intellisense helped to provide metadata during coding, missing casts are highlighted,
refactoring with an IDE made it easy to rename objects. People realized that an encoded prefix only
added noise. Or worse yet, slowed down on-boarding as a new person would need
to be trained in the lingo. It is also rather painful to enforce and, hence, a
bit of a maintenance nightmare.
So what do we get for it in SQL code? It sure doesn't help with sorting in the
object explorer. It doesn't help distinguish anything in the code, itself.
Procs always have EXEC before them.
Functions are followed by parentheses. Tables and views follow
SELECT. In fact, our brains are so smart
that they start to not see these prefixes after a while. Yet, it still clutters the code with
additional noise and makes reading and maintaining code more work.
Why Tables and
Views Should be Named in the Same Way
Tables and views, in the very same way, represent relational sets
which users access the data by. There is
not really a need to distinguish between them from a relational theory point of
view. If you have a good IDE, like SSDT,
you can easily trace back to either definition by hitting F12. For tables and views as an API to
applications that access the database, such as SSIS, it is bad form altogether.
What should the application care whether it is accessing a table or a
view? What if it needs to change from
one type to the other? You should not
have to change the referencing application because you changed a table to a
view or a view to a table.
What’s the main reason that people insist on distinguishing
between tables and views with a prefix in the name? I would give an educated guess that it is
because of the conventional layout in the object explorer in SSMS which puts
these two objects in two different folders.
That makes hunting down the definition of an object a pain if you have
to look in multiple places. Imagine
then, placing all tables and views in one folder in SSDT in the solution explorer? That would make it easy to find no matter which
it was. Additionally, if you needed to
switch out one for the other, you would not also need to move the object to a
new folder.
Domain-Based Names
When it comes to a new project, I prefer using a domain-based
naming convention. It takes a bit more thought and it's a little more verbose,
but in a useful way. The prefix becomes
a domain type, rather than the literal object or data type. So, for example, a Customer table in the data
warehouse might look like:
·
Dimension.Customer
o
Customer_Name
o
Customer_Id
o
Customer_Address
o
Customer_Phone
Using Schemas
Effectively
Note the use of the schema in the table name above. If our eyes learn to ignore “usp_” in front of
a proc, they certainly learn to ignore “dbo.” In front of a table. So I am a
supporter of useful schema names, though these can get a little sticky as I’ll
explain in a moment. For a data
warehouse, having a schema for fact and dimension (fully spelt out) makes the
dimensional model more self-evident. It
also sidesteps calling your customers and employees “dim” (DimCustomer,
DimEmployee), which is always awkward to explain away during a presentation. Schemas are also helpful in separating
concerns in the ETL staging objects, i.e. stage, cleanse, conform, load.
When it comes to a normalized data model, schemas can be much more
prolific. For example, Microsoft
promotes a blurry department/object type of categorization in Adventure Works –
HumanResources (dept), Purchasing (dept), Sales (dept), and then Person (object)? Whatever you decide, be sure to stay
consistent. A department approach is
difficult to maintain for one reason because numerous departments will share
data, as in the Person object in adventure works. Also, a “junk drawer” schema might eventually
appear called “common” or “shared.” Be
wary of these because, like a junk drawer, it makes things difficult to
find. The Adventure Works model is
promoting the use of schemas for security purposes, rather than organizational. For example, someone working in HR could be
granted access to the entire HumanResources schema. I do not usually draw such a hard line on
security. The point of a database is to
serve up data, not hide it from organizational groups within the same company. It makes much more sense to protect those few
sensitive things well, rather than bind people from doing their job and
offering up new insights.
If we use schemas organizationally, this drives straight at the
heart of the data model altogether. We
ask questions like what’s the difference between HumanResources.Employee and
Person.Person? Employee has an FK to
Person, so do these things really belong together with some encrypted columns? In the end, we wind up with a model that it
far more clear on where to find things -- all employee related things are in
one table, shipper related things in another.
Using the domain type modeling, we have product, vendor, shipper,
currency, customer. We avoid the sales
person being relegated off to their own table for no particular reason.
Column Names
Moving on to column names, they inherit the object name, such as
Customer_Name, Customer_Address. Etc. this helps distinguish between columns of
the same name in the same query or report coming from different sources. We never leave a column named, id,
start_date, name or something as generic as that, which often leads to a lot of
poking around to understand where something came from.
If you find yourself enhancing column names with extended
properties, consider how these might be eliminated with a clearer data model
and better naming convention. Extended
properties are essentially comments and, like comments, indicate an inadequacy
in the code’s ability to explain itself.
Procs and Functions
All that applied to tables applies to views because these
should be interchangeable from a programmatic point of view. Also, since tables and views represent the
data itself, they are, indeed, things.
Thus, tables and views should always be nouns. A view doesn’t do anything, it is that
thing, so a view should never be vw_GetHighValuedCustomers, but rather,
High_Valued_Customers or Customer.High_Valued_Customers.
Procs and Functions do things and, therefore, should be
verbs describing their action. Here are
some examples:
·
Cleanse.Insert_Customers
·
Conform.Insert_Product
·
Report.Convert_CSV_To_Table()
·
Test.AssertExpectedEqualsActualString
Final Thoughts
The most important rule is “be consistent.” By that, I do not mean “do not change,” but
rather, do what you do the same way everywhere. It is possible to change names in database
objects, in fact, I think it’s a really good thing to get in the habit of
it. Using refactoring in SSDT, can make
light work of this and so, keeps you from getting tied to a mistake or a poor
convention.
Labels: SQL, SSDT