Joins
Version 2.57
Copyright 1997 by James Hoffman
Copyright 1996-1997, James Hoffman. This document can
be used for free by any Internet user, but cannot be included in another
document, published in any other form, or mass produced in any way.
In this section, we will only discuss inner joins, and equijoins,
as in general, they are the most useful. For more information, try the
SQL links at the bottom of the page.
Good database design suggests that each table lists data only about
a single entity, and detailed information can be obtained in a relational
database, by using additional tables, and by using a join.
First, take a look at these example tables:
AntiqueOwners
| OwnerID |
OwnerLastName |
OwnerFirstName |
| 01 |
Jones |
Bill |
| 02 |
Smith |
Bob |
| 15 |
Lawson |
Patricia |
| 21 |
Akins |
Jane |
| 50 |
Fowler |
Sam |
Orders
| OwnerID |
ItemDesired |
| 02 |
Table |
| 02 |
Desk |
| 21 |
Chair |
| 15 |
Mirror |
Antiques
| SellerID |
BuyerID |
Item |
| 01 |
50 |
Bed |
| 02 |
15 |
Table |
| 15 |
02 |
Chair |
| 21 |
50 |
Mirror |
| 50 |
01 |
Desk |
| 01 |
21 |
Cabinet |
| 02 |
21 |
Coffee Table |
| 15 |
50 |
Chair |
| 01 |
15 |
Jewelry Box |
| 02 |
21 |
Pottery |
| 21 |
02 |
Bookcase |
| 50 |
01 |
Plant Stand |
Keys
First, let's discuss the concept of keys. A primary key
is a column or set of columns that uniquely idenifies the rest of the data
in any given row. For example, in the AntiqueOwners table, the OwnerID
column uniquely identifies that row. This means two things: no two rows
can have the same OwnerID, and, even if two owners have the same first
and last names, the OwnerID column ensures that the two owners will
not be confused with each other, because the unique OwnerID column will
be used throughout the database to track the owners, rather than the names.
A foreign key is a column in a table where that column is a primary
key of another table, which means that any data in a foreign key column
must have corresponding data in the other table where that column is the
primary key. In DBMS-speak, this correspondence is known as referential
integrity. For example, in the Antiques table, both the BuyerID and
SellerID are foreign keys to the primary key of the AntiqueOwners table
(OwnerID; for purposes of argument, one has to be an Antique Owner before
one can buy or sell any items), as, in both tables, the ID rows are used
to identify the owners or buyers and sellers, and that the OwnerID is the
primary key of the AntiqueOwners table. In other words, all of this "ID"
data is used to refer to the owners, buyers, or sellers of antiques, themselves,
without having to use the actual names.
Performing a Join
The purpose of these keys is so that data can be related across
tables, without having to repeat data in every table--this is the power
of relational databases. For example, you can find the names of those who
bought a chair without having to list the full name of the buyer in the
Antiques table...you can get the name by relating those who bought a chair
with the names in the AntiqueOwners table through the use of the OwnerID,
which relates the data in the two tables. To find the names of those
who bought a chair, use the following query:
SELECT OWNERLASTNAME, OWNERFIRSTNAME
FROM ANTIQUEOWNERS, ANTIQUES
WHERE BUYERID = OWNERID AND ITEM = 'Chair';
Note the following about this query...notice that both tables involved
in the relation are listed in the FROM clause of the statement. In the
WHERE clause, first notice that the ITEM = 'Chair' part restricts the listing
to those who have bought (and in this example, thereby owns) a chair. Secondly,
notice how the ID columns are related from one table to the next by use
of the BUYERID = OWNERID clause. Only where ID's match across tables and
the item purchased is a chair (because of the AND), will the names from
the AntiqueOwners table be listed. Because the joining condition used an
equal sign, this join is called an equijoin. The result of this
query is two names: Smith, Bob & Fowler, Sam.
Dot notation refers to prefixing the table names to column names,
to avoid ambiguity, as such:
SELECT ANTIQUEOWNERS.OWNERLASTNAME, ANTIQUEOWNERS.OWNERFIRSTNAME
FROM ANTIQUEOWNERS, ANTIQUES
WHERE ANTIQUES.BUYERID = ANTIQUEOWNERS.OWNERID AND ANTIQUES.ITEM = 'Chair';
As the column names are different in each table, however, this wasn't
necessary.
DISTINCT
and Eliminating Duplicates
Let's say that you want to list the ID and names of only those
people who have sold an antique. Obviously, you want a list where each
seller is only listed once--you don't want to know how many antiques a
person sold, just the fact that this person sold one (for counts, see the
Aggregate Function section below). This means that you will need to tell
SQL to eliminate duplicate sales rows, and just list each person only once.
To do this, use the DISTINCT keyword.
First, we will need an equijoin to the AntiqueOwners table to get the
detail data of the person's LastName and FirstName. However, keep in mind
that since the SellerID column in the Antiques table is a foreign key to
the AntiqueOwners table, a seller will only be listed if there is a row
in the AntiqueOwners table listing the ID and names. We also want to eliminate
multiple occurences of the SellerID in our listing, so we use DISTINCT
on the column where the repeats may occur.
To throw in one more twist, we will also want the list alphabetized
by LastName, then by FirstName (on a LastName tie), then by OwnerID (on
a LastName and FirstName tie). Thus, we will use the ORDER BY clause:
SELECT DISTINCT SELLERID, OWNERLASTNAME, OWNERFIRSTNAME
FROM ANTIQUES, ANTIQUEOWNERS
WHERE SELLERID = OWNERID
ORDER BY OWNERLASTNAME, OWNERFIRSTNAME, OWNERID
In this example, since everyone has sold an item, we will get a listing
of all of the owners, in alphabetical order by last name. For future reference
(and in case anyone asks), this type of join is considered to be in the
category of inner joins.
Aliases &
In/Subqueries
In this section, we will talk about Aliases, In and the
use of subqueries, and how these can be used in a 3-table example. First,
look at this query which prints the last name of those owners who have
placed an order and what the order is, only listing those orders which
can be filled (that is, there is a buyer who owns that ordered item):
SELECT OWN.OWNERLASTNAME Last Name, ORD.ITEMDESIRED Item Ordered
FROM ORDERS ORD, ANTIQUEOWNERS OWN
WHERE ORD.OWNERID = OWN.OWNERID
AND ORD.ITEMDESIRED IN
This gives:
Last Name Item Ordered
--------- ------------
Smith Table
Smith Desk
Akins Chair
Lawson Mirror
There are several things to note about this query:
- First, the "Last Name" and "Item Ordered" in the
Select lines gives the headers on the report.
- The OWN & ORD are aliases; these are new names for the two tables
listed in the FROM clause that are used as prefixes for all dot notations
of column names in the query (see above). This eliminates ambiguity, especially
in the equijoin WHERE clause where both tables have the column named OwnerID,
and the dot notation tells SQL that we are talking about two different
OwnerID's from the two different tables.
- Note that the Orders table is listed first in the FROM clause; this
makes sure listing is done off of that table, and the AntiqueOwners table
is only used for the detail information (Last Name).
- Most importantly, the AND in the WHERE clause forces the In Subquery
to be invoked ("= ANY" or "= SOME" are two equivalent
uses of IN). What this does is, the subquery is performed, returning all
of the Items owned from the Antiques table, as there is no WHERE clause.
Then, for a row from the Orders table to be listed, the ItemDesired must
be in that returned list of Items owned from the Antiques table, thus listing
an item only if the order can be filled from another owner. You can think
of it this way: the subquery returns a set of Items from which each
ItemDesired in the Orders table is compared; the In condition is true only
if the ItemDesired is in that returned set from the Antiques table.
Whew! That's enough on the topic of complex SELECT queries for now.
Now on to other SQL statements.
|