An outlier is a value 1.5 times greater than the interquartile range. To understand what that means, let's look at an example dataset.

The list of numbers above is broken out into quartiles where the second quartile begins after 3, the third begins after 6 and fourth begins after 9. The interquartile range is the third quartile ending point minus the first. i.e. 9 - 3 = 6. Then to calculate if there are any outliers we can use the following formulas:

For the end of the range: Third Quartile + Interquartile Range * 1.5

For the beginning of the range: First Quartile - Interquartile Range * 1.5

Or,

End of Range (anything above this number): 9 + (6 * 1.5) = 18

Beginning of Range (anything below this number): 3 - (6 * 1.5) = -6

In the case of this dataset, that would mean the 46 would be considered an outlier and sure enough it looks like one:

To get at this calculation in SQL, we can use one of the new window functions that was introduced in 2012, Percentile_Disc(). This takes a parameter between 0 and 1 which represents the point of the quartile. To get the first quartile, you would use 0.25.

Here is a sample of how to solve it in SQL:

Create table
#SetWithOutliers (Amount
int);

Insert into
#SetWithOutliers values (1),(3),(3),(5),(6),(8),(9),(15),(46);

Declare @OutlierPoint int;

SELECT

@OutlierPoint
= (max(ThirdQuartile) - max(FirstQuartile)) * 1.5 + max(ThirdQuartile)

FROM

(

SELECT

percentile_disc(0.75) within group (order by Amount) over() as ThirdQuartile,

percentile_disc(0.25) within group (order by Amount) over() as FirstQuartile

FROM #SetWithOutliers

) quartiles

SELECT * FROM #SetWithOutliers WHERE
amount > @OutlierPoint;

DROP TABLE #SetWithOutliers;

GO

Mark