Mark Wojciechowicz
2014-05-22
SQL offers two different structures for a table-valued
function (taken directly from SSMS):
Inline Table-Valued Function which looks like this:
CREATE FUNCTION <Inline_Function_Name,
sysname,
FunctionName>
(
-- Add the parameters for the function here
<@param1, sysname, @p1> <Data_Type_For_Param1, , int>,
<@param2, sysname, @p2> <Data_Type_For_Param2, , char>
)
RETURNS TABLE
AS
RETURN
(
-- Add the SELECT statement with parameter references here
SELECT 0
)
GO
Or, Multi-Statement table-Valued Function which looks like
this:
CREATE FUNCTION <Table_Function_Name,
sysname,
FunctionName>
(
-- Add the parameters for the function here
<@param1, sysname, @p1> <data_type_for_param1, , int>,
<@param2, sysname, @p2> <data_type_for_param2, , char>
)
RETURNS
<@Table_Variable_Name, sysname, @Table_Var> TABLE
(
-- Add the column definitions for the TABLE variable here
<Column_1, sysname, c1> <Data_Type_For_Column1, , int>,
<Column_2, sysname, c2> <Data_Type_For_Column2, , int>
)
AS
BEGIN
-- Fill the table variable with the rows for your result
set
RETURN
END
GO
If you go to the definition on BOL,
http://technet.microsoft.com/en-us/library/ms191165(v=sql.105).aspx,
it gives the latter as an example, leading us to start with the more
complicated and worse performing structure.
And, unless you go looking, you’d never be the wiser that there is a
better performing version. So let’s look
at an example of each to understand the difference in query plans. I’m going to use the same exact
query for both:
First the inline:
CREATE FUNCTION
Get_Reseller_Product_Sales ()
RETURNS TABLE
AS
RETURN
(
SELECT
p.EnglishProductName
, r.ResellerName
, sum(s.SalesAmount) AS SalesAmount
FROM [dbo].[FactResellerSales]
s
JOIN [dbo].[DimReseller]
r ON r.ResellerKey
= s.ResellerKey
JOIN [dbo].[DimProduct]
p on s.ProductKey
= p.ProductKey
GROUP BY p.EnglishProductName,
r.ResellerName
)
GO
Then the Multi-Statement:
CREATE FUNCTION
Get_Reseller_Product_Sales_MultiStatement ()
RETURNS @Return TABLE
(
[EnglishProductName] nvarchar(50)
, ResellerName nvarchar(50)
, SalesAmount money
)
AS
BEGIN
Insert INTO @Return
SELECT
p.EnglishProductName
, r.ResellerName
, sum(s.SalesAmount) AS SalesAmount
FROM [dbo].[FactResellerSales]
s
JOIN [dbo].[DimReseller]
r ON r.ResellerKey
= s.ResellerKey
JOIN [dbo].[DimProduct]
p on s.ProductKey
= p.ProductKey
GROUP BY p.EnglishProductName,
r.ResellerName
RETURN
END
GO
Now compare the Query plans:
Hmmm… That’s
surprising. The same query resulted in two
terribly different plans and the inline is assigned
100% of the cost!? But actually, the inline finished in 440 ms., while the multi-statement
finished in 868 ms. The
inline was twice as fast! This should not be surprising because it has to both Select and Insert the dataset, while the inline is only doing the Select.
Also, notable is that the multi-statement only returns IO
statistics for what happens to the table variable. It seems the complexity and performance are
both hidden from the optimizer. Compare the inline:
Table 'DimReseller'. Scan count 5, logical reads 73, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'DimProduct'. Scan count 5, logical reads 810, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'FactResellerSales'. Scan count 5, logical reads 1644, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
to the multi-statement:
Table '#0F975522'. Scan count 1, logical reads 270, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
The multi seems deceptively nicer. If you
look at the estimated execution plan, you will actually see two queries for the
multi-statement – one is like the plan above, but the second looks exactly the
same as the inline with an additional insert.
Because io stats and execution plans are obscured,
it is difficult to see that this is even a problem. It is far worse if the TVF is buried in a
query that is joining lots of tables. I have been able to chop query times in half by finessing a multi-statement into an inline. So
my advice is simply, do whatever you can to avoid multi-statement TVFs.
Labels: T-SQL, Table-Valued Functions