Cleaning Up Strings with Regular Expressions in SSMS


Mark Wojciechowicz
2013-10-11

On occasion, I find it very handy to use regular expressions to clean up scripts in SSMS.  Regular Expressions (RegEx) are special strings which describe a search pattern.  They can be very powerful for cleaning up data in SSIS, but they are also very useful in modifying code.

As an example, I had a CREATE TABLE definition that I needed to convert into a SELECT statement.  I did not have access to the table in SSMS to just right click the table and Script Table As > SELECT To. 
By way of example, I will use AdventureWorks:

CREATE TABLE [dbo].[DimCustomer](
    [CustomerKey] [int] IDENTITY(1,1) NOT NULL,
    [GeographyKey] [int] NULL,
    [CustomerAlternateKey] [nvarchar](15) NOT NULL,
    [Title] [nvarchar](8) NULL,
    [FirstName] [nvarchar](50) NULL,
    [MiddleName] [nvarchar](50) NULL,
    [LastName] [nvarchar](50) NULL,
    [NameStyle] [bit] NULL,
    [BirthDate] [datetime] NULL,
    [MaritalStatus] [nchar](1) NULL,
    [Suffix] [nvarchar](10) NULL,
    [Gender] [nvarchar](1) NULL,
    [EmailAddress] [nvarchar](50) NULL,
    [YearlyIncome] [money] NULL,
    [TotalChildren] [tinyint] NULL,
    [NumberChildrenAtHome] [tinyint] NULL,
    [EnglishEducation] [nvarchar](40) NULL,
    [SpanishEducation] [nvarchar](40) NULL,
    [FrenchEducation] [nvarchar](40) NULL,
    [EnglishOccupation] [nvarchar](100) NULL,
    [SpanishOccupation] [nvarchar](100) NULL,
    [FrenchOccupation] [nvarchar](100) NULL,
    [HouseOwnerFlag] [nchar](1) NULL,
    [NumberCarsOwned] [tinyint] NULL,
    [AddressLine1] [nvarchar](120) NULL,
    [AddressLine2] [nvarchar](120) NULL,
    [Phone] [nvarchar](20) NULL,
    [DateFirstPurchase] [datetime] NULL,
    [CommuteDistance] [nvarchar](15) NULL
    )

I needed to remove everything from the end of the column name to the comma.  Additionally, I wanted to remove the brackets because they do not really add anything and I thought it would improve readability.
To do this in SSMS:

 1 - Hit CTRL + H for find and replace
 2 - Expand the Find Options and check Use Regular Expressions

 3 - In the find box enter: \].+\n
 4 - In the Replace box enter: ,\n
 5 - Hit Find next and see that the first instance of ] to the end of the line is selected

 6 - Replace all will remove all of the column definitions

Here is what the expression is doing:
To find and replace all brackets, we can use the following code:  [\[\]]  
The brackets [] mean find any of the characters in this set.  We then escape each bracket, so the search will actually find both characters.  We blank out the replace box and this will replace all brackets with an empty string.

One more useful one for tidying up is removing multiple spaces.  This can be done with [ ]+ or just a space and the plus sign.  The replacement value can be a single space so all multiple spaces will be replaced with a single space.

There are a ton of other things to do with regular expressions, so if you are interested in reading up on more, see http://www.regular-expressions.info/ for an exhaustive resource.

Labels: ,