BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
J_hoon
Calcite | Level 5

Hi

 

I have some problem in proc sql when I merge two data sets.

 

test1                                                                                       test2

캡처1.PNG캡처2.PNG

 

 

When I merge these two data sets, I want this data set

 

res (This result made by merge options in DATA)

캡처3.PNG

 

But Using proc sql, It print wrong data set.

 

Wrong res (This result made by proc sql)

캡처4.PNG

 

This is my proc sql codes.

data test1;
input id$ id2 var1 var2;
cards;
A01 12 221 42
A02 15 947 79
A02 15 837 19
A03 21 217 12
A04 23 612 79
A04 23 182 25
A04 23 352 81
A05 30 797 42
;
run;

/* missing이 있는 data set */
data test2;
input id$ id2 var3$ var4;
cards;
A01 12 QP 96
A02 15 IM 81
A02 15 CB 81
A03 21 NR 71
A04 23 WV .
A04 23 DO .
A04 23 GV .
A05 30 MS .
;
run;

proc sql noprint;
	create table All1_2 as
		select *
	from test1, test2
	where test1.id=test2.id and test1.id2=test2.id2;
quit;

Please help me!!!

 

1 ACCEPTED SOLUTION

Accepted Solutions
PGStats
Opal | Level 21

What a SQL join does is not the same as a data step merge.SQL operations are performed without consideration for data record ordering. If you want to emulate a datastep merge operation in SQL you have to add an explicit ordering column to your data:

 

data test1;
input id$ id2 var1 var2;
if id ne lag(id) or id2 ne lag(id2) then obs = 0;
obs + 1;
cards;
A01 12 221 42
A02 15 947 79
A02 15 837 19
A03 21 217 12
A04 23 612 79
A04 23 182 25
A04 23 352 81
A05 30 797 42
;

data test2;
input id$ id2 var3$ var4;
if id ne lag(id) or id2 ne lag(id2) then obs = 0;
obs + 1;
cards;
A01 12 QP 96
A02 15 IM 81
A02 15 CB 81
A03 21 NR 71
A04 23 WV .
A04 23 DO .
A04 23 GV .
A05 30 MS .
;

proc sql;
create table all1_2 as 
select * from test1 natural full join test2;
quit;

But I can't see any good reason for trying to do that Smiley Happy

 

EDIT : Corrected code typo. Thank you @ChrisNZ .

PG

View solution in original post

5 REPLIES 5
ChrisNZ
Tourmaline | Level 20

1. See @PGStats's post for a row-by-row merge/join.

2. Read about the fundamental differences between how a data step and a SQL query work.

Many resources, like these 2:

[Edited to point to @PGStats 's post]

 

PGStats
Opal | Level 21

What a SQL join does is not the same as a data step merge.SQL operations are performed without consideration for data record ordering. If you want to emulate a datastep merge operation in SQL you have to add an explicit ordering column to your data:

 

data test1;
input id$ id2 var1 var2;
if id ne lag(id) or id2 ne lag(id2) then obs = 0;
obs + 1;
cards;
A01 12 221 42
A02 15 947 79
A02 15 837 19
A03 21 217 12
A04 23 612 79
A04 23 182 25
A04 23 352 81
A05 30 797 42
;

data test2;
input id$ id2 var3$ var4;
if id ne lag(id) or id2 ne lag(id2) then obs = 0;
obs + 1;
cards;
A01 12 QP 96
A02 15 IM 81
A02 15 CB 81
A03 21 NR 71
A04 23 WV .
A04 23 DO .
A04 23 GV .
A05 30 MS .
;

proc sql;
create table all1_2 as 
select * from test1 natural full join test2;
quit;

But I can't see any good reason for trying to do that Smiley Happy

 

EDIT : Corrected code typo. Thank you @ChrisNZ .

PG
ChrisNZ
Tourmaline | Level 20

@PGStats meant 

if id ne lag(id) or id2 ne lag(id2) then obs = 0;

not

if id ne lag(id) or id2 ne id2 then obs = 0;

 

ErikLund_Jensen
Rhodochrosite | Level 12

Hi @J_hoon 

 

A merge with duplicate keys in both data sets is always problematic, and there is no correct solution. A data step merge joins observations as they occur in input, while a sql join gives a cartesian product.

 

In your example data seems to be unordered within a set of key values, and with no information present to maintain/restore a certain order, so your result is based on the assumption that data happens to be in a useful order to do perform a row-by row merge.

 

Consider the following example:

 

data a
Strawberry red
Strawberry green

 

data b
Strawberry unripe
Strawberry ripe

 

merge a b
Strawberry red unripe
Strawberry green ripe

 

join a b
Strawberry red unripe
Strawberry red ripe
Strawberry green unripe
Strawberry green ripe

 

What do you prefer for further analysis, a merge with wrong observations connected or a join that gives you both right and wrong? - This is not a coding problem, but a design problem, and I would take it as a stop sign and reconsider the data model and the previous steps that led to this situation.

pdhokriya
Pyrite | Level 9
Kindly share proc sql query for below mentioned query



data final;
length flag1 $100.;
merge ae (in=a) ex(in=b);
by subject site;
if a and not b ;
flag1 = "Subject is NOT present in Exposure form";
run;

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 5 replies
  • 1048 views
  • 1 like
  • 5 in conversation