Minus operator versus 'not exists' for faster SQL query
From: h t t p s : // forums .oracle.com/thread/19816
It really depends on a bunch of factors.
A MINUS will do a full table scan on both tables unless there is some criteria in the where clause of both queries that allows an index range scan. A MINUS also requires that both queries have the same number of columns, and that each column has the same data type as the corresponding column in the other query (or one convertible to the same type). A MINUS will return all rows from the first query where there is not an exact match column for column with the second query. A MINUS also requires an implicit sort of both queries
NOT EXISTS will read the sub-query once for each row in the outer query. If the correlation field (you are running a correlated sub-query?) is an indexed field, then only an index scan is done.
The choice of which construct to use depends on the type of data you want to return, and also the relative sizes of the two tables/queries. If the outer table is small relative to the inner one, and the inner table is indexed (preferrable a unique index but not required) on the correlation field, then NOT EXISTS will probably be faster since the index lookup will be pretty fast, and only executed a relatively few times. If both tables a roughly the same size, then MINUS might be faster, particularly if you can live with only seeing fields that you are comparing on.
For example, if you have two tables
Assuming that employees is indexd on empid then
As with most things SQL, the only real way to tell is to benchmark.
HTH
John
It really depends on a bunch of factors.
A MINUS will do a full table scan on both tables unless there is some criteria in the where clause of both queries that allows an index range scan. A MINUS also requires that both queries have the same number of columns, and that each column has the same data type as the corresponding column in the other query (or one convertible to the same type). A MINUS will return all rows from the first query where there is not an exact match column for column with the second query. A MINUS also requires an implicit sort of both queries
NOT EXISTS will read the sub-query once for each row in the outer query. If the correlation field (you are running a correlated sub-query?) is an indexed field, then only an index scan is done.
The choice of which construct to use depends on the type of data you want to return, and also the relative sizes of the two tables/queries. If the outer table is small relative to the inner one, and the inner table is indexed (preferrable a unique index but not required) on the correlation field, then NOT EXISTS will probably be faster since the index lookup will be pretty fast, and only executed a relatively few times. If both tables a roughly the same size, then MINUS might be faster, particularly if you can live with only seeing fields that you are comparing on.
For example, if you have two tables
EMPLOYEE
EMPID NUMBER
NAME VARCHAR2(45)
JOB VARCHAR2(45)
HIRE_DATE DATE
and
RETIREE
EMPID NUMBER
NAME VARCHAR2(45)
RET_DATE DATE
If you wanted to see if you had retirees
that were not in the employee table, then you could use either MINUS or
NOT EXISTS (or even NOT IN, but you didn't ask). However you could
possibly get different results. SELECT empid,name
FROM retirees
MINUS
SELECT empid,name
FROM employees
would show retirees not in the emp table, but
it would also show a retiree who had changed their name while retired. A
safer version of the above using MINUS would beSELECT empid,name
FROM retirees
WHERE empid IN (SELECT empid
FROM retirees
MINUS
SELECT empid
FROM employees)
A full scan of retirees, a
full scan of employees (Assuming indexes on both, then maybe an index
fast full scan) and two sorts for the minus, at best an index probe then
table access by rowid on retirees, possibly a full scan of retirees for
the outer query.Assuming that employees is indexd on empid then
SELECT empid,name
FROM retirees
WHERE NOT EXISTS (SELECT 1
FROM employees
WHERE retirees.empid = employees.empid)
requires a full scan of retirees and an index access into employees index for each row.As with most things SQL, the only real way to tell is to benchmark.
HTH
John
评论
发表评论