SQL Adventures - Flagging Records as Not Found
AKA: Choosing where you filter (WHERE vs JOIN)
I came across an interesting scenario for one of my clients. They have a system which scans database hardware, and then merges the metadata into another database for reporting.
When metadata is pulled, it goes into a harvesting database along with metadata such as which job it was part of, when it was ran, which servers were harvested, etc. After the data is harvested, another process is responsible for updating a target database, which is used by alternate applications for reporting “current status”.
For simplicity, we’ll keep the operations limited to two tables in a single database: a Scan
table, and a Target
table.
ObjectId
is shared identifier across both tables.
Some Starting Historial Data
Assume that a previous harvest->merge cycle has already put this data into our tables.
Scan Data
ScanId | ObjectId | Name | Type |
---|---|---|---|
1 | 1 | Persons |
Table |
1 | 2 | Name |
Column |
1 | 3 | Zip |
Column |
Target Data
ObjectId | Name | Type | NotFound |
---|---|---|---|
1 | Persons |
Table | NULL |
2 | Name |
Column | NULL |
3 | Zip |
Column | NULL |
Results from a new Scan after the column Zip
was removed
Now assume that between the last run, and this run, somebody removed the Zip
column. Here’s the state of our Scan
table and the Target
table is unchanged. Our task is to ensure that Zip
is marked as NotFound
Scan Data
ScanId | ObjectId | Name | Type |
---|---|---|---|
1 | 1 | Persons |
Table |
1 | 2 | Name |
Column |
1 | 3 | Zip |
Column |
2 | 1 | Persons |
Table |
2 | 2 | Name |
Column |
Take note that even though it wasn’t in this scan, Zip
is still in the table, with ObjectId = 1.
How to identify NotFound
Objects
Here’s the data if you want to follow along
-- TABLES
DECLARE @TargetObjects TABLE (
ObjectId INT NOT NULL,
NAME VARCHAR(20) NOT NULL,
TYPE VARCHAR(20) NOT NULL,
NotFound BIT NULL
)
DECLARE @ScanObjects TABLE (
ScanId INT NOT NULL,
ObjectId INT NOT NULL,
NAME VARCHAR(20) NOT NULL,
TYPE VARCHAR(20) NOT NULL
)
-- DATA
INSERT INTO @TargetObjects
( ObjectId, Name, Type )
VALUES ( 1, 'Persons', 'Table')
, ( 2, 'Name', 'Column')
, ( 3, 'Zip', 'Column')
INSERT INTO @ScanObjects
( ScanId, ObjectId, Name, Type )
VALUES ( 1, 1, 'Persons', 'Table')
, ( 1, 2, 'Name', 'Column')
, ( 1, 3, 'Zip', 'Column')
, ( 2, 1, 'Persons', 'Table')
, ( 2, 2, 'Name', 'Column')
DECLARE @ScanId INT = 2
The Wrong Way
-- The Complete Wrong Way
SELECT
'TargetObjectId' = TargetObject.ObjectId
, 'ScanObjectId' = ScanObject.ObjectId
, 'ScanScanId' = ScanObject.ScanId
, 'Name' = TargetObject.Name
, 'Type' = TargetObject.Type
FROM @TargetObjects AS TargetObject
LEFT JOIN @ScanObjects AS ScanObject
ON TargetObject.ObjectId = ScanObject.ObjectId
WHERE
ScanObject.ObjectId IS NULL
AND ScanObject.ScanId = @ScanId
The final result will give an empty dataset, which is not what we want.
TargetObjectId | ScanObjectId | ScanScanId | Name | Type |
---|---|---|---|---|
Why the Wrong Way doesn’t work
Let’s look at the data without the WHERE
clause to see what’s going on.
-- No Filter
SELECT
'TargetObjectId' = TargetObject.ObjectId
, 'ScanObjectId' = ScanObject.ObjectId
, 'ScanScanId' = ScanObject.ScanId
, 'Name' = TargetObject.Name
, 'Type' = TargetObject.Type
FROM @TargetObjects AS TargetObject
LEFT JOIN @ScanObjects AS ScanObject
ON TargetObject.ObjectId = ScanObject.ObjectId
We get this:
TargetObjectId | ScanObjectId | ScanScanId | Name | Type |
---|---|---|---|---|
1 | 1 | 1 | Persons |
Table |
2 | 2 | 1 | Name |
Column |
3 | 3 | 1 | Zip |
Column |
1 | 1 | 2 | Persons |
Table |
2 | 2 | 2 | Name |
Column |
Obviously there are no NULL
, so ScanObject.ObjectId IS NULL
will always return empty. Even we filter by ScanId
, you can see how, unsurprisingly, still nothing is NULL
, and our Zip
column does not appear at all.
-- Filtering by ScanId in WHERE
SELECT
'TargetObjectId' = TargetObject.ObjectId
, 'ScanObjectId' = ScanObject.ObjectId
, 'ScanScanId' = ScanObject.ScanId
, 'Name' = TargetObject.Name
, 'Type' = TargetObject.Type
FROM @TargetObjects AS TargetObject
LEFT JOIN @ScanObjects AS ScanObject
ON TargetObject.ObjectId = ScanObject.ObjectId
WHERE
ScanObject.ScanId = @ScanId
TargetObjectId | ScanObjectId | ScanScanId | Name | Type |
---|---|---|---|---|
1 | 1 | 2 | Persons |
Table |
2 | 2 | 2 | Name |
Column |
Right Way
(Scroll down a little for the final answer.)
The way to get what we want is to filter inside the JOIN ON
clause, and not in the WHERE
.
Here’s what we want to on the opposite side of our “LEFT JOIN”.
-- The other side of our "LEFT JOIN"
SELECT *
FROM @ScanObjects AS ScanObject
WHERE ScanObject.ScanId = @ScanId
ScanId | ObjectId | Name | Type |
---|---|---|---|
2 | 1 | Persons |
Table |
2 | 2 | Name |
Column |
Notice how ObjectId = 3
and ScanId = 2
is missing. This is what we want, because when we do our LEFT JOIN
, the record from our Target will exist, and the Scan will not.
And here’s the Final Answer:
-- The Complete Right Way
SELECT
'TargetObjectId' = TargetObject.ObjectId
, 'ScanObjectId' = ScanObject.ObjectId
, 'ScanScanId' = ScanObject.ScanId
, 'Name' = TargetObject.Name
, 'Type' = TargetObject.Type
FROM @TargetObjects AS TargetObject
LEFT JOIN @ScanObjects AS ScanObject
ON TargetObject.ObjectId = ScanObject.ObjectId
-- Do not move this filter into the WHERE clause.
AND ScanObject.ScanId = @ScanId
WHERE
ScanObject.ObjectId IS NULL
TargetObjectId | ScanObjectId | ScanScanId | Name | Type |
---|---|---|---|---|
3 | NULL |
NULL |
Zip |
Column |
Hurray!
Then, it’s a simple matter of marking the Target as “Not Found”
-- Flag missing records as NotFound
UPDATE TargetObject
SET NotFound = 1
FROM @TargetObjects AS TargetObject
LEFT JOIN @ScanObjects AS ScanObject
ON TargetObject.ObjectId = ScanObject.ObjectId
-- Do not move this filter into the WHERE clause.
AND ScanObject.ScanId = @ScanId
WHERE
ScanObject.ObjectId IS NULL
Checking our answer…
SELECT * FROM @TargetObjects
Looks good!
ObjectId | NAME | TYPE | NotFound |
---|---|---|---|
1 | Persons |
Table | NULL |
2 | Name |
Column | NULL |
3 | Zip |
Column | 1 |
In a follow-up post, I may explore this concept using MERGE