Solved: Proc Sql Joins

PrinceAde · Posted 06-10-2024 05:19 AM

Hello there,

Data three;
input x a$;
datalines;
1 a1
1 a2
2 b1
2 b2
4 d
;
run;

Data four;
input x b$;
datalines;
2 x1
2 x2
3 y
5 v
;
run;

proc sql;
title 'Table Merged';
select coalesce(three.x, four.x)
as x, a, b
from three
full join
four
on three.x = four.x;
quit;

table1
x a b

1 a2

data merged;
merge three four;
by x;
run;

table2;

x a b

1 a1

I don't understand While the proc sq join starts with table1 instead of table2.

Please clariification?

Thanks

ballardw · Posted 06-11-2024 09:40 AM

With some SQL queries involving the same data set you can get different output order. SQL is designed for SET operations not row -by-row operations. So if you do not specify something to control the order it can be problematic.

And see if you get the order I do by changing the Select to a Select Distinct:

proc sql;
title 'Table Merged';
select distinct coalesce(three.x, four.x)
as x, a, b
from three
full join
four
on three.x = four.x;
quit;

Which gives me an order of:

x	a	b
1	a1
1	a2
2	b1	x1
2	b1	x2
2	b2	x1
2	b2	x2
3		y
4	d
5		v

Merge however does process row-by-row.

View solution in original post

PaigeMiller · Posted 06-10-2024 08:09 AM

I don't understand the question.

--
Paige Miller

ballardw · Posted 06-10-2024 09:47 AM

Please reread your posted code very carefully. Then rephrase you question in terms of the code shown.

Your code does not mention "table1" or "table2" anywhere. It shows something that would make you think that those "tables" were created but if you run that code you will see all sorts of errors because where Table1 and Table2 appear they are not valid as code.

My best guess is that you are looking at a miss-matched "code" with "expected results" that do not use the table names with "example" results because of a cut-and-paste error.

@PrinceAde wrote:

Hello there,

Data three;
input x a$;
datalines;
1 a1
1 a2
2 b1
2 b2
4 d
;
run;

Data four;
input x b$;
datalines;
2 x1
2 x2
3 y
5 v
;
run;

proc sql;
title 'Table Merged';
select coalesce(three.x, four.x)
as x, a, b
from three
full join
four
on three.x = four.x;
quit;

table1
x a b

1 a2

data merged;
merge three four;
by x;
run;

table2;

x a b

1 a1

I don't understand While the proc sq join starts with table1 instead of table2.

Please clariification?

Thanks

PrinceAde · Posted 06-11-2024 06:26 AM

Hi @ballardw @PaigeMiller

I'm sorry for the confusion. If you check the result of the match merging, you will notice that the first observation is

x a b

1 a1

while for the proc sql full join it is

x a b

1 a2

I'm not sure why the sql join jumped the first observation 1 a1 in table three and decide to start from 1 a2

Data three;
input x a$;
datalines;
1 a1
1 a2
2 b1
2 b2
4 d
;
run;

Data four;
input x b$;
datalines;
2 x1
2 x2
3 y
5 v
;
run;

proc sql;
title 'Table Merged';
select coalesce(three.x, four.x)
as x, a, b
from three
full join
four
on three.x = four.x;
quit;

data merged;
merge three four;
by x;
run;

PaigeMiller · Posted 06-11-2024 07:13 AM

In SQL, the order of the output can be changed by using ORDER BY in your SQL statements.

Also please note that although SQL and MERGE in a DATA step do very similar things, they are not identical and in some cases will not produce the same results.

--
Paige Miller

PrinceAde · Posted 06-11-2024 08:14 AM

@PaigeMiller

Thank you, sir. I understand this; I just want to know if there is any rationale for the Proc SQL join behaving this way.

PaigeMiller · Posted 06-11-2024 08:16 AM

@PrinceAde wrote:

@PaigeMiller

Thank you, sir. I understand this; I just want to know if there is any rationale for the SQL behaving this way.

The default ordering of records in an SQL output is determined by the internal SQL algorithm. I am not aware of any explanation of what that internal SQL algorithm does, and its kind of irrelevant anyway as you can change the ordering of the records to whatever you want.

--
Paige Miller

ballardw · Posted 06-11-2024 09:40 AM

With some SQL queries involving the same data set you can get different output order. SQL is designed for SET operations not row -by-row operations. So if you do not specify something to control the order it can be problematic.

And see if you get the order I do by changing the Select to a Select Distinct:

proc sql;
title 'Table Merged';
select distinct coalesce(three.x, four.x)
as x, a, b
from three
full join
four
on three.x = four.x;
quit;

Which gives me an order of:

x	a	b
1	a1
1	a2
2	b1	x1
2	b1	x2
2	b2	x1
2	b2	x2
3		y
4	d
5		v

Merge however does process row-by-row.

Tom · Posted 06-11-2024 01:44 PM

Nothing has been "skipped". All observations are accounted for.

The SQL full join produces these records.

So you can see both the observations with X=1 are there.

The data step merge does a completely different operation and produces these results:

Notice the big difference in the way that the multiple observations with X=2 are handled. In SQL a full cartesian product is produced. So 2 observations from TABLE3 and 2 observations from TABLE4 generate 2x2=4 observations output. In general for N observations from one dataset and M observations from the other then the full join will produce NXM observations.

In the DATA step the observations a matched in the order they appear. So the 2 observations from TABLE3 are matched with the 2 observations from TABLE4 and the result is 2 observations. In general for N observations from one dataset and M observations from the other the merge will result in MAX(N,M) observations.

Proc Sql Joins

Re: Proc Sql Joins

Re: Proc Sql Joins

Re: Proc Sql Joins

Re: Proc Sql Joins

Re: Proc Sql Joins

Re: Proc Sql Joins

Re: Proc Sql Joins

Re: Proc Sql Joins

Re: Proc Sql Joins

Ready to join fellow brilliant minds for the SAS Hackathon?

Classroom Training Available!