Main page
Correlated subqueries.
S. Moiseenko
Correlated subqueries sometimes let to create the query very briefly, which can
look more cumbersome when you use other means. We shall remind that correlated
subquery contains reference to the query which it contains (we shall name it
the main query) with the result that subquery executes for each string of the
main query.
Let's consider the following example . It is required to determine the date and
the flights of every passenger, committed by them at its last flight day. To
say the other way, you have to find the maximum dateof the flight for each
passenger and to find all of his flights. With determination of the maximum
date there is no problems :
SELECT id_psg, MAX( date) FROM pass_in_trip GROUP BY id_psg
However, there is no flight. If we'll try to include flight in the list of
output:
SELECT id_psg, trip_no, MAX( date) FROM pass_in_trip GROUP BY
id_psg
we receive an error message, because number of flight is not used in aggregate
function and also does not enter into the list of columns of the grouping. If
we'll do the last one:
SELECT id_psg, trip_no, MAX( date) FROM pass_in_trip GROUP BY
id_psg, trip_no
we receive last flight of the passenger by each flight which he flied. It is
completely not that problem which we try to solve. Use of a с orrelated
subquery
SELECT id_psg, trip_no, [date] FROM pass_in_trip pt_1
WHERE [date ]= (
SELECT MAX([date]) FROM pass_in_trip pt_2
WHERE pt_1.id_psg=pt_2.id_psg)
gives that is necessary for us:
id_psg |
trip_no |
date
|
10 |
1187 |
2003-04-14 00:00:00.000 |
9 |
1182 |
2003-04-13 00:00:00.000 |
8 |
1187 |
2003-04-14 00:00:00.000 |
6 |
1123 |
2003-04-08 00:00:00.000 |
5 |
1145 |
2003-04-25 00:00:00.000 |
3 |
1145 |
2003-04-05 00:00:00.000 |
3 |
1123 |
2003-04-05 00:00:00.000 |
2 |
1124 |
2003-04-02 00:00:00.000 |
1 |
1100 |
2003-04-29 00:00:00.000 |
Here for each flight it is checked, whether it is accomplished last flight day
of the given passenger. Thus if such flights there was a several, we receive
all of them.
Seeming lack of the resulted solution is just also that the subquery should be
calculated for every string of main query. To avoid it, it is possible to offer
the alternative solution, which is using joining of the table pass_in_trip with
a subquery resulted at the beginning which calculates the maximal dates for
each passenger:
SELECT pt_1.id_psg, trip_no, [date]
FROM pass_in_trip pt_1
JOIN (
SELECT id_psg, MAX( [date]) md
FROM pass_in_trip GROUP BY id_psg) pt_2
ON pt_1.id_psg=pt_2.id_psg AND [date]=md
However, as it will seem strangely , optimizer of the SQL Server 2000 will
choose for both queries the identical plan of execution at presence of a
natural index ( cluster ) on a primary key. Accordingly, and time of execution
of queries should not differ.
» Given examples here can be done directly on the website by selecting the
check box “Without checking” on the page with
SELECT exercises.
Main page