Hello there,
Data three;
input x a$;
datalines;
1 a1
1 a2
2 b1
2 b2
4 d
;
run;
Data four;
input x b$;
datalines;
2 x1
2 x2
3 y
5 v
;
run;
proc sql;
title 'Table Merged';
select coalesce(three.x, four.x)
as x, a, b
from three
full join
four
on three.x = four.x;
quit;
table1
x a b
1 a2
data merged;
merge three four;
by x;
run;
table2;
x a b
1 a1
I don't understand While the proc sq join starts with table1 instead of table2.
Please clariification?
Thanks
With some SQL queries involving the same data set you can get different output order. SQL is designed for SET operations not row -by-row operations. So if you do not specify something to control the order it can be problematic.
And see if you get the order I do by changing the Select to a Select Distinct:
proc sql; title 'Table Merged'; select distinct coalesce(three.x, four.x) as x, a, b from three full join four on three.x = four.x; quit;
Which gives me an order of:
x | a | b |
---|---|---|
1 | a1 | |
1 | a2 | |
2 | b1 | x1 |
2 | b1 | x2 |
2 | b2 | x1 |
2 | b2 | x2 |
3 | y | |
4 | d | |
5 | v |
Merge however does process row-by-row.
I don't understand the question.
Please reread your posted code very carefully. Then rephrase you question in terms of the code shown.
Your code does not mention "table1" or "table2" anywhere. It shows something that would make you think that those "tables" were created but if you run that code you will see all sorts of errors because where Table1 and Table2 appear they are not valid as code.
My best guess is that you are looking at a miss-matched "code" with "expected results" that do not use the table names with "example" results because of a cut-and-paste error.
@PrinceAde wrote:
Hello there,
Data three;
input x a$;
datalines;
1 a1
1 a2
2 b1
2 b2
4 d
;
run;
Data four;
input x b$;
datalines;
2 x1
2 x2
3 y
5 v
;
run;
proc sql;
title 'Table Merged';
select coalesce(three.x, four.x)
as x, a, b
from three
full join
four
on three.x = four.x;
quit;table1
x a b1 a2
data merged;
merge three four;
by x;
run;table2;
x a b
1 a1
I don't understand While the proc sq join starts with table1 instead of table2.
Please clariification?
Thanks
I'm sorry for the confusion. If you check the result of the match merging, you will notice that the first observation is
x a b
1 a1
while for the proc sql full join it is
x a b
1 a2
I'm not sure why the sql join jumped the first observation 1 a1 in table three and decide to start from 1 a2
Data three;
input x a$;
datalines;
1 a1
1 a2
2 b1
2 b2
4 d
;
run;
Data four;
input x b$;
datalines;
2 x1
2 x2
3 y
5 v
;
run;
proc sql;
title 'Table Merged';
select coalesce(three.x, four.x)
as x, a, b
from three
full join
four
on three.x = four.x;
quit;
data merged;
merge three four;
by x;
run;
In SQL, the order of the output can be changed by using ORDER BY in your SQL statements.
Also please note that although SQL and MERGE in a DATA step do very similar things, they are not identical and in some cases will not produce the same results.
Thank you, sir. I understand this; I just want to know if there is any rationale for the Proc SQL join behaving this way.
@PrinceAde wrote:
Thank you, sir. I understand this; I just want to know if there is any rationale for the SQL behaving this way.
The default ordering of records in an SQL output is determined by the internal SQL algorithm. I am not aware of any explanation of what that internal SQL algorithm does, and its kind of irrelevant anyway as you can change the ordering of the records to whatever you want.
With some SQL queries involving the same data set you can get different output order. SQL is designed for SET operations not row -by-row operations. So if you do not specify something to control the order it can be problematic.
And see if you get the order I do by changing the Select to a Select Distinct:
proc sql; title 'Table Merged'; select distinct coalesce(three.x, four.x) as x, a, b from three full join four on three.x = four.x; quit;
Which gives me an order of:
x | a | b |
---|---|---|
1 | a1 | |
1 | a2 | |
2 | b1 | x1 |
2 | b1 | x2 |
2 | b2 | x1 |
2 | b2 | x2 |
3 | y | |
4 | d | |
5 | v |
Merge however does process row-by-row.
Nothing has been "skipped". All observations are accounted for.
The SQL full join produces these records.
So you can see both the observations with X=1 are there.
The data step merge does a completely different operation and produces these results:
Notice the big difference in the way that the multiple observations with X=2 are handled. In SQL a full cartesian product is produced. So 2 observations from TABLE3 and 2 observations from TABLE4 generate 2x2=4 observations output. In general for N observations from one dataset and M observations from the other then the full join will produce NXM observations.
In the DATA step the observations a matched in the order they appear. So the 2 observations from TABLE3 are matched with the 2 observations from TABLE4 and the result is 2 observations. In general for N observations from one dataset and M observations from the other the merge will result in MAX(N,M) observations.
Save $250 on SAS Innovate and get a free advance copy of the new SAS For Dummies book! Use the code "SASforDummies" to register. Don't miss out, May 6-9, in Orlando, Florida.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.