Using JOIN for Suppression

We usually use JOIN to find the intersection between two different tables.  For example, if you want all the orders from the order table who belong to a certain set of customers from the customer table.

This is the most common usage of a JOIN, specifically an INNER JOIN.  The results only include the entries that matchup from each table.  There are many other situations that you can incorporate a JOIN to solve.  Particularly here we are going to talk about using a JOIN to suppress information from one table by using another.

For this example I will be using two simple tables.  The first KeyTable containing integer keys.  The second is a SuppressionTable containing integer keys as well.  Here are the sample tables with some sample data.

KeyTable

SuppressionTable

The goal that we have is to generate a list of Keys from the KeyTable that are NOT in the SuppressionTable.  A simple way to accomplish this is using the following SQL which utilizes an sub query to accomplish that.

SELECT KeyTable.Key
FROM KeyTable
WHERE Key NOT IN (
         SELECT SuppressionTable.Key 
         FROM SuppressionTable
         )

Here we incorporate a sub query to suppress the Keys.  This method is perfectly acceptable for small amounts of data, also it is simple to understand.  However, it is not an efficient method when you have millions of records in both tables.  In most database engines the code around the JOIN operation is very well optimized.  If the tables are constructed properly a JOIN result can be generated very quickly.

OUTER JOINs return all the records from one or both tables and match up the tables where the JOIN ON clause condition is meet.  We will start with the FULL OUTER JOIN which will return all records for each table and line up the results when the ON clause condition is meet between the two tables.

SELECT KeyTable.Key, SuppressionTable.Key
FROM KeyTable FULL OUTER JOIN SuppressionTable
ON KeyTable.Key = SuppressionTable.Key

The table below is the result of the FULL OUTER JOIN operation.  You can see from the result set that there are NULL values filled in when there is a entry from on of the JOIN tables that does not have a matching entry for the ON clause in the other table.

FullOuterJoin

We can use this side effect as a filter condition to remove the instances where there is no JOIN value.  Adding a condition in the WHERE clause to suppress anything from the SuppressionTable that has a NULL value.

SELECT KeyTable.Key
FROM KeyTable FULL OUTER JOIN SuppressionTable
ON KeyTable.Key = SuppressionTable.Key
WHERE SuppressionTable.Key IS NULL

This query will generate the results that we are looking for as it will display only the unmatched entries from the KeyTable that are not in the SuppressionTable.  While this is a much more efficient method for suppression data from large sets there is still a step further than we can go.

SuppressedFullOuterJoin

An alternative method would be to incorporate a LEFT or RIGHT OUTER JOIN.   Since you only need all the values from one of the tables involved this JOIN would be more efficient.  Depending on your database server you will need to look into the syntax to use for LEFT or RIGHT OUTER JOINs.  The following code will produce the same results as the FULL OUTER JOIN.

SELECT KeyTable.Key
FROM KeyTable LEFT OUTER JOIN SuppressionTable
ON KeyTable.Key = SuppressionTable.Key
WHERE SuppressionTable.Key IS NULL

This query will only pull back the records from the SuppressionTable that we find a match to for the KeyTable.  This eliminates all the unmatched entries from the SuppressionTable that have no match to the KeyTable.

Another point in using JOINs for suppression is that you can have multiple conditions in the ON clause allowing for a more flexible query than a suppression query will allow.  Though you must take care in the ON clause if you are trying to use it for any table filtering as the conditions on an ON clause do not always act as expected in an OUTER JOIN.

There we have it, an efficient and quick method for suppressing a large amount of records are a large amount of records.

* The Next page contains all the SQL used in this example *

Author: Axe
I am a senior developer. Ride my motorcycle whenever I can. Love photography, hiking, camping, cooking, coding, and cheese cake!

Leave a Reply