JOINs in T-SQL
And Then There Were Two
- The Latex Manufacturing Company Example
- Inner Joins
- SQL-92 Join Syntax
- SQL-92 Two-Way Inner Joins
- SQL-92 Three-Way Inner Joins
- Join Query Table Order
- Cross Joins
- Outer Joins
- SQL-92 Two-Way Outer Joins
- SQL-92 Two-Way Left Outer Joins
- SQL-92 Two-Way Right Outer Joins
- SQL-92 Two-Way Full Outer Joins
- SQL-92 Three-Way Outer Joins
- The Catch
- Controlling the Order of Join Processing
- Self and Non-Equal Joins
- The Dating Service Scenario
- The Salary Levels Scenario
- Using Joins to Modify Data
- Using Joins to Delete Data
- Using Joins to Update Data
- Performance Considerations
- Completely Qualified Filter Criteria
- Join Hints
- Performance Guidelines
And Then There Were Two:
- The Latex Manufacturing Company Example
- Inner Joins
- SQL-92 Join Syntax
- SQL-92 Two-Way Inner Joins
Suppose we’re using the HumanResources database, and we wish to return employee and department information. The keyword INNER is optional because it is the default join type. Here is how the query would go:
USE HumanResources; SELECT empid, empname, salary, e.deptno, deptname FROM Employees AS e JOIN Departments AS d on e.deptno=d.deptno;
(Note: ANSI defines a different logical order for the phases involved in processing the query for SQL-89 and SQL-92 syntax, but because the output is the same in both cases, the query processor will probably come up with the same execution plan and process them in the same way internally. It’s a totally different story with Outer joins, and in certain situations each of the syntaxes produces a different output).
- SQL-92 Three-Way Inner Joins
Let’s say we wish to add job information to the previous output. This is how the query would go:
USE HumanResources; SELECT empid, empname, salary, e.deptno, deptname e.jobid FROM Employees AS e JOIN Departments AS d on e.deptno=d.deptno JOIN Jobs AS J ON e.jobid=j.jobid;
Adding a third table to an SQL-92 inner join requires adding another JOIN clause and another join condition in a separate ON clause.
- Join Query Table Order
In the case of inner joins involving more than two tables, it doesn’t matter in which order the tables appear in the query. The performance and output of the query would be the same. The query processor will decide on the internal order in which it accesses the tables based on cost estimation, and it will come up with the same execution plan regardless of the order of the tables in the query. With other types of joins, the order of the tables in the query may produce different results, and therefore might require a different execution plan. If we were testing the performance of a query and would like to see the effects of forcing a specific join order, we can add to the query an OPTION clause that contains a FORCE ORDER hint, as shown here:
USE HumanResources; SELECT empid, empname, salary, e.deptno, deptname e.jobid FROM Employees AS e JOIN Departments AS d on e.deptno=d.deptno; JOIN Jobs AS j ON e.jobid=j.jobid OPTION(FORCE ORDER);
- Cross Joins
A cross join produces a Cartesian product of the tables involved. It matches all the rows from one table to each of the rows in the other table. If no filter is used, the number of rows in the result set is the number of rows in one table multiplied by the number of rows in the other table. Let’s say we wish to return all possible combinations of jobs and departments. Here is how the query would go:
4
Advantage 1 of SQL-92 vs SQL-89: Cross Joins in the SQL-92 syntax explicitly includes the CROSS JOIN keywords, whereas in SQL-89 a cross join looks like an inner join where the join condition is simply not specified. This makes the SQL-89 syntax prone to errors because a programmer might forget to mention a join condition and instead of getting an error, gets a cross join. This can be a really sneaky mistake if we are joining two tables on two columns and specify only one column by mistake.
- Outer Joins
Outer joins enables us to define either one or both of the tables participating in the join as preserved tables. This means that besides returning matching rows from both tables, the rows from the preserved table that have no match in the other table are also returned, padded with NULLs instead of the values that were supposed to come from the other table.
- SQL-92 Two-Way Outer Joins
The SQL-92 syntax for an outer join between two tables is shown below:
SELECT FROM T1 {LEFT | RIGHT | FULL} [OUTER] JOIN T2 ON <join_condition> WHERE <filter>
- SQL-92 Two-Way Left Outer Joins
Say we wish to return employee and department information using the two-way left outer join. Here’s how the query would go:
USE HumanResources; SELECT * FROM Employees AS e LEFT OUTER JOIN Departments AS d ON e.deptno=d.deptno;
- SQL-92 Two-Way Right Outer Joins
Let’s say we wish to preserve all rows from the department table. We accomplish this with a right outer join query.
USE HumanResources; SELECT * FROM Employees AS e RIGHT OUTER JOIN Departments AS d ON e.deptno=d.deptno;
- SQL-92 Two-Way Full Outer Joins
If we wish to preserve all employees and all departments records, we can write the query as follows:
USE HumanResources; SELECT * FROM Employees AS e FULL OUTER JOIN Departments AS d ON e.deptno=d.deptno;
The above full outer query consists of 1.) All matching rows from both tables. 2.) All rows from the Employees tables with no matching department, with NULLs replacing the values that were supposed to come from the Departments table. 3.) All rows from the Departments table with no matching employee, with NULLS replacing the values that were supposed to come from the Employees table.
- SQL-92 Three-Way Outer Joins
Suppose we wish to produce employee, department, and job information, preserving all employee and all jobs. This requires joining all three tables. Here’s how the query would go:
USE HumanResources; SELECT * FROM Employees AS e LEFT OUTER JOIN Departments AS d ON e.deptno=d.deptno RIGHT OUTER JOIN Jobs AS j ON e.jobid=j.jobid
With left and right outer joins, the order in which the tables appear in the query determines the order of their processing. This, of course, makes sense because their order affects the output. In the case of full outer joins however, their order doesn’t affect the output; hence, it doesn’t determine their processing order.
If we wish to provide employee, department, and job information for matching employees and departments, preserving all jobs, we can write the query as follows:
USE HumanResources; SELECT * FROM Employees AS e RIGHT OUTER JOIN Jobs AS j ON e.jobid=j.jobid;
- Controlling the Order of Join Processing:
With the SQL-92 syntax, we can control the order of the query execution phases. Suppose we wish to produce department and employee information, preserving all departments, and filtering to only return departments with no matching employees. This is how the query would go:
USE HumanResources; SELECT * FROM Departments AS d LEFT OUTER JOIN Employees AS e ON d.deptno=e.deptno WHERE e.deptno IS NULL;
The nice thing about the SQL-92 syntax is that we can request a different order of execution by including the filter in the join condition, as shown below:
USE HumanResources; SELECT * FROM Departments AS d LEFT OUTER JOIN Employees AS e ON d.deptno=e.deptno AND e.deptno IS NULL;
Advantage 2 of SQL-92 vs SQL-89: We SQL-92 syntax, we can control the order of the query execution, whereas with the old-style syntax we can’t.
- Self and Non-Equal Joins
There are some situations where we need to join a table to itself (self-join). There are also situations where the relationship between rows in the participating tables is not based on exact matches; rather, it’s based on another logical condition (non-equal join).
- The Dating Service Scenario
Say, we have a database for a dating service, containing a table with the name of the candidates and their gender. We need to match all possible candidates, so for this example, the request is to match males with females. Let’s first write a CROSS JOIN query:
USE Candidates; SELECT t1.candname t2.candname FROM Candidates AS t1 CROSS JOIN Candidates AS t2;
Now, we don’t want anyone go on a date with themselves, so here is an modification to the query:
USE Candidates; SELECT t1.candname t2.candname FROM Candidates AS t1 CROSS JOIN Candidates AS t2 WHERE t1.candname <> t2.candname;
Since we’re trying to match males with females, we need to eliminate couples with the same gender. Here’s how the query would go:
USE Candidates; SELECT t1.candname, t2.candname FROM Candidates AS t1 CROSS JOIN candidates AS t2 WHERE t1.gender <> t2.gender;
Finally, we need to remove duplicates since we don’t wish the couple going on a date twice. We do this by requesting a specific gender for each column, as shown below:
USE Candidates; SELECT m.candname AS guy f.candname AS girl FROM Candidates AS m JOIN Candidates AS f ON m.gender>f.gender;
Here, nobody dates himself or herself, the genders are different, and since the letter M is higher than the letter F, we ensure that only males appear in the first column.
- The Salary Levels Scenario
Here, we have another scenario that requires non-equal joins. Suppose we’re using the HumanResources database and we wish to retreive employee and salary-level information. This involves joining the Employees and SalaryLevels tables, as shown below:
USE HumanResources; SELECT e.*, sallevel FROM Employees AS e JOIN SalaryLevels AS s on e.salary between s.lowbound AND s.highbound;
- Using Joins to Modify Data
Sometimes we may need to modify data but the criteria that define which rows need to be affected is based on data that doesn’t exist in the modified table. Instead, the required data is another table. Besides the use of subqueries, we can accomplish this by using joins in the DELETE and UPDATE statements. Please note that the queries in this section are not ANSI compliant.
- Using Joins to Delete Data
Suppose we’re using the Northwind database, and we wish to delete all rows from the Order Details table for orders made by a customer ‘VINET.’ The problem is that the Order Details table does not hold the customer information. This information exists in the Orders table, so this is how the query would go:
USE NorthWind; DELETE FROM [Order Details] FROM [Order Details] AS od JOIN orders As o ON od.orderid=o.orderid WHERE CustomerID = 'VINET';
The first occurrence (after the first FROM clause) specified which table is modified, and the second occurrence (after the second FROM clause) is used for the join operation. This syntax doesn’t allow you to specify more than one table after the first FROM clause. If it did, it wouldn’t be possible to determine which table is modified.
- Using Joins to Update Data
The UPDATE statement has a similar syntax to the DELETE statement. Suppose we wish to add a 5 percent discount to items in the Order Details table whose parts are supplied by Exotic Liquids, SupplierID 1. The issue is that the SupplierID column is in the Products table, and we wish to update the Order Details table. Here is how we can write the update query:
USE NorthWind; UPDATE od SET discount = discount + 0.05 FROM [Order Details] AS od JOIN products AS p ON p.productid = od.productid WHERE p.supplierid = 1;
- Performance Considerations
Here we discuss using hints to specify a certain join strategy and provide a few guidelines to help achieve better performance with our join queries.
- Completely Qualified Filter Criteria
While it may seem obvious that if A=B and B=C then A=C, this is not as obvious to SQL Server’s optimizer. Let’s say we want a list of orders and their details for all orderIDs from 11,000 upward. We could filter the Orders table or the Order Details table, but if we apply the filter to both tables, there is far less I/O. Here is a query for an incompletely qualified filter criteria:
USE NorthWind; SET STATISTICS IO ON SELECT * FROM orders AS o JOIN [Order Details] AS od ON od.orderid=o.orderid WHERE o.orderid >= 11000;
Let’s revise this query so that it uses a completely qualified filter criteria, as shown below:
SET STATISTICS IO ON; SELECT * FROM orders AS o JOIN [Order Details] AS od ON od.orderid = o.orderid WHERE o.orderid >= 11000 AND od.orderid >= 11000;
Notice the significant difference in the number of logical reads required to satisfy the first query as opposed to the second one. Also, the first query plan uses a nested-loops join algorithm, and the second one uses the more efficient merge join algorithm.
- Join Hints
As of SQL Server 7, three join strategies are available: nested loops, hash, and merge. If we are testing the performance of our query, and would like to force a certain join algorithm, we can specify it by using a join hint, as shown below:
FROM T1 {join_type}{join_hint} JOIN T2
The <join_hint> clause stands for LOOP, MERGE, or HASH. If we specify a join hint, specifying a join type becomes mandatory, and we cannot rely on defaults.
- Performance Guidelines
The following can help achieve better performance for our join queries:
- Create indexes on frequently joined columns. Use the Index Tuning Wizard for recommendations.
- Create covering (composite) indexes on combinations of frequently fetched columns.
- Create indexes with care, especially covering indexes. Indexes improve the performance of SELECT queries but degrade the performance of modifications.
- Separate tables that participate in joins onto different disks by using filegroups to exploit parallel disk I/O.