Minus operator versus 'not exists' for faster SQL query

十二月 06, 2013

From: h t t p s : // forums .oracle.com/thread/19816

It really depends on a bunch of factors.

A MINUS will do a full table scan on both tables unless there is some criteria in the where clause of both queries that allows an index range scan. A MINUS also requires that both queries have the same number of columns, and that each column has the same data type as the corresponding column in the other query (or one convertible to the same type). A MINUS will return all rows from the first query where there is not an exact match column for column with the second query. A MINUS also requires an implicit sort of both queries

NOT EXISTS will read the sub-query once for each row in the outer query. If the correlation field (you are running a correlated sub-query?) is an indexed field, then only an index scan is done.

The choice of which construct to use depends on the type of data you want to return, and also the relative sizes of the two tables/queries. If the outer table is small relative to the inner one, and the inner table is indexed (preferrable a unique index but not required) on the correlation field, then NOT EXISTS will probably be faster since the index lookup will be pretty fast, and only executed a relatively few times. If both tables a roughly the same size, then MINUS might be faster, particularly if you can live with only seeing fields that you are comparing on.

For example, if you have two tables

EMPLOYEE
EMPID      NUMBER
NAME       VARCHAR2(45)
JOB        VARCHAR2(45)
HIRE_DATE  DATE

and

RETIREE
EMPID      NUMBER
NAME       VARCHAR2(45)
RET_DATE   DATE

If you wanted to see if you had retirees that were not in the employee table, then you could use either MINUS or NOT EXISTS (or even NOT IN, but you didn't ask). However you could possibly get different results.

SELECT empid,name
FROM retirees
MINUS
SELECT empid,name
FROM employees

would show retirees not in the emp table, but it would also show a retiree who had changed their name while retired. A safer version of the above using MINUS would be

SELECT empid,name
FROM retirees
WHERE empid IN (SELECT empid
                FROM retirees
                MINUS
                SELECT empid
                FROM employees)

A full scan of retirees, a full scan of employees (Assuming indexes on both, then maybe an index fast full scan) and two sorts for the minus, at best an index probe then table access by rowid on retirees, possibly a full scan of retirees for the outer query.

Assuming that employees is indexd on empid then

SELECT empid,name
FROM retirees
WHERE NOT EXISTS (SELECT 1
                  FROM employees
                  WHERE retirees.empid = employees.empid)

requires a full scan of retirees and an index access into employees index for each row.

As with most things SQL, the only real way to tell is to benchmark.

HTH
John

搜索此博客

辛淼 Jin

Minus operator versus 'not exists' for faster SQL query

评论

发表评论

此博客中的热门博文

XML, XSL, HTML

Input in element.eleme.io

Java Base == & equals