The Wrong Way to Write a Table-Valued Function

Mark Wojciechowicz
2014-05-22

SQL offers two different structures for a table-valued function (taken directly from SSMS):
Inline Table-Valued Function which looks like this:
CREATE FUNCTION <Inline_Function_Name, sysname, FunctionName>
(     
       -- Add the parameters for the function here
       <@param1, sysname, @p1> <Data_Type_For_Param1, , int>,
       <@param2, sysname, @p2> <Data_Type_For_Param2, , char>
)
RETURNS TABLE
AS
RETURN
(
       -- Add the SELECT statement with parameter references here
       SELECT 0
)
GO

Or, Multi-Statement table-Valued Function which looks like this:
CREATE FUNCTION <Table_Function_Name, sysname, FunctionName>
(
       -- Add the parameters for the function here
       <@param1, sysname, @p1> <data_type_for_param1, , int>,
       <@param2, sysname, @p2> <data_type_for_param2, , char>
)
RETURNS
<@Table_Variable_Name, sysname, @Table_Var> TABLE
(
       -- Add the column definitions for the TABLE variable here
       <Column_1, sysname, c1> <Data_Type_For_Column1, , int>,
       <Column_2, sysname, c2> <Data_Type_For_Column2, , int>
)
AS
BEGIN
       -- Fill the table variable with the rows for your result set
      
       RETURN
END
GO

If you go to the definition on BOL, http://technet.microsoft.com/en-us/library/ms191165(v=sql.105).aspx, it gives the latter as an example, leading us to start with the more complicated and worse performing structure.  And, unless you go looking, you’d never be the wiser that there is a better performing version.  So let’s look at an example of each to understand the difference in query plans.  I’m going to use the same exact query for both:
First the inline:
CREATE FUNCTION Get_Reseller_Product_Sales ()
RETURNS TABLE
AS
RETURN
(
       SELECT
              p.EnglishProductName
              , r.ResellerName
              , sum(s.SalesAmount) AS SalesAmount
       FROM [dbo].[FactResellerSales] s
              JOIN [dbo].[DimReseller] r ON r.ResellerKey = s.ResellerKey
              JOIN [dbo].[DimProduct] p on s.ProductKey = p.ProductKey
       GROUP BY p.EnglishProductName, r.ResellerName
)
GO

Then the Multi-Statement:
CREATE FUNCTION Get_Reseller_Product_Sales_MultiStatement ()
RETURNS @Return TABLE
(
        [EnglishProductName] nvarchar(50)
       , ResellerName nvarchar(50)
       , SalesAmount money
)
AS
BEGIN
       Insert INTO @Return
       SELECT
              p.EnglishProductName
              , r.ResellerName
              , sum(s.SalesAmount) AS SalesAmount
       FROM [dbo].[FactResellerSales] s
              JOIN [dbo].[DimReseller] r ON r.ResellerKey = s.ResellerKey
              JOIN [dbo].[DimProduct] p on s.ProductKey = p.ProductKey
       GROUP BY p.EnglishProductName, r.ResellerName
       RETURN
END
GO

Now compare the Query plans:


Hmmm…  That’s surprising.  The same query resulted in two terribly different plans and the inline is assigned 100% of the cost!?  But actually, the inline finished in 440 ms., while the multi-statement finished in 868 ms.  The inline was twice as fast!  This should not be surprising because it has to both Select and Insert the dataset, while the inline is only doing the Select.

Also, notable is that the multi-statement only returns IO statistics for what happens to the table variable.  It seems the complexity and performance are both hidden from the optimizer.  Compare the inline:

Table 'DimReseller'. Scan count 5, logical reads 73, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'DimProduct'. Scan count 5, logical reads 810, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'FactResellerSales'. Scan count 5, logical reads 1644, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

to the multi-statement: 
Table '#0F975522'. Scan count 1, logical reads 270, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

The multi seems deceptively nicer.  If you look at the estimated execution plan, you will actually see two queries for the multi-statement – one is like the plan above, but the second looks exactly the same as the inline with an additional insert. 

Because io stats and execution plans are obscured, it is difficult to see that this is even a problem.  It is far worse if the TVF is buried in a query that is joining lots of tables.  I have been able to chop query times in half by finessing a multi-statement into an inline.  So my advice is simply, do whatever you can to avoid multi-statement TVFs.


Labels: ,