Mark Wojciechowicz
2014-08-13
I see this mistake come up a lot in which someone has constructed a query with OUTER joins that includes a WHERE clause. The difference is probably best explained by example, so have a look at the results of these queries:
Create table #vehicles
(id int
identity(1,1) primary key clustered not null
, Description varchar(100) null
, Type varchar(100) null);
Create table #traffic
(id int
not null
, Location varchar(100) null
, Volume varchar(100) null
, traffic_date date null);
Insert into #vehicles (Description, Type)
Values ('Planes', 'Air'),('Trains', 'Ground'),('Automobiles', 'Ground');
Insert into #traffic values
(1,'New York','Heavy','20140813'),
(1,'Philly','Medium','20140813'),
(1,'NJ','Light','20140813'),
(2,'New York','Heavy','20140813'),
(2,'Philly','Medium','20140813'),
(2,'NJ','Light','20140813'),
(3,'New York','Heavy','20140813'),
(3,'Philly','Medium','20140813'),
(3,'NJ','Light','20140813');
SELECT
*
FROM #vehicles v
RIGHT OUTER JOIN #traffic t on v.id = t.id
WHERE v.Type =
'Air'
SELECT
*
FROM #vehicles v
RIGHT OUTER JOIN #traffic t
on v.id = t.id
AND v.Type = 'Air'
So we have two tables we want to join, but from the vehicles table, we only want "Air" type vehicles, while from our traffic table, we still want to see all of the data. So we create a RIGHT OUTER JOIN and then throw the Type criteria in the WHERE clause. But when we look at the results, we only return 3 rows - the ones with Air type vehicles.
We take another attempt by putting the criteria in the join itself and this returns all 9 rows from the traffic table and just the 3 "Air" rows from the vehicles table. So why does that happen? In the simplest terms, when this query is evaluated by SQL, it works in the following order:
- Joins
- WHERE
- GROUP BY
- HAVING
- SELECT
- ORDER BY
The tables are first joined and then the WHERE condition is applied. So in our first case, when we filter out the rows where v.Type = 'Air', 6 of those rows are NULL when we finally apply the WHERE condition. This results in no rows being returned for those 6 rows. Ultimately, this makes the behavior of the query more like an INNER JOIN, completely defeating the purpose.
Labels: TSQL