Solved: merging data in proc sql

J_hoon · Posted 04-14-2019 09:23 PM

Hi

I have some problem in proc sql when I merge two data sets.

test1 test2

When I merge these two data sets, I want this data set

res (This result made by merge options in DATA)

But Using proc sql, It print wrong data set.

Wrong res (This result made by proc sql)

This is my proc sql codes.

data test1;
input id$ id2 var1 var2;
cards;
A01 12 221 42
A02 15 947 79
A02 15 837 19
A03 21 217 12
A04 23 612 79
A04 23 182 25
A04 23 352 81
A05 30 797 42
;
run;

/* missing이 있는 data set */
data test2;
input id$ id2 var3$ var4;
cards;
A01 12 QP 96
A02 15 IM 81
A02 15 CB 81
A03 21 NR 71
A04 23 WV .
A04 23 DO .
A04 23 GV .
A05 30 MS .
;
run;

proc sql noprint;
	create table All1_2 as
		select *
	from test1, test2
	where test1.id=test2.id and test1.id2=test2.id2;
quit;

Please help me!!!

PGStats · Posted 04-14-2019 10:17 PM

What a SQL join does is not the same as a data step merge.SQL operations are performed without consideration for data record ordering. If you want to emulate a datastep merge operation in SQL you have to add an explicit ordering column to your data:

data test1;
input id$ id2 var1 var2;
if id ne lag(id) or id2 ne lag(id2) then obs = 0;
obs + 1;
cards;
A01 12 221 42
A02 15 947 79
A02 15 837 19
A03 21 217 12
A04 23 612 79
A04 23 182 25
A04 23 352 81
A05 30 797 42
;

data test2;
input id$ id2 var3$ var4;
if id ne lag(id) or id2 ne lag(id2) then obs = 0;
obs + 1;
cards;
A01 12 QP 96
A02 15 IM 81
A02 15 CB 81
A03 21 NR 71
A04 23 WV .
A04 23 DO .
A04 23 GV .
A05 30 MS .
;

proc sql;
create table all1_2 as 
select * from test1 natural full join test2;
quit;

But I can't see any good reason for trying to do that

EDIT : Corrected code typo. Thank you @ChrisNZ .

PG

View solution in original post

ChrisNZ · Posted 04-14-2019 09:50 PM

1. See @PGStats's post for a row-by-row merge/join.

2. Read about the fundamental differences between how a data step and a SQL query work.

Many resources, like these 2:

https://support.sas.com/resources/papers/proceedings/proceedings/sugi30/249-30.pdf

https://www.lexjansen.com/nesug/nesug06/hw/hw06.pdf

[Edited to point to @PGStats 's post]

High-Performance SAS Coding - Third Edition

PGStats · Posted 04-14-2019 10:17 PM

What a SQL join does is not the same as a data step merge.SQL operations are performed without consideration for data record ordering. If you want to emulate a datastep merge operation in SQL you have to add an explicit ordering column to your data:

data test1;
input id$ id2 var1 var2;
if id ne lag(id) or id2 ne lag(id2) then obs = 0;
obs + 1;
cards;
A01 12 221 42
A02 15 947 79
A02 15 837 19
A03 21 217 12
A04 23 612 79
A04 23 182 25
A04 23 352 81
A05 30 797 42
;

data test2;
input id$ id2 var3$ var4;
if id ne lag(id) or id2 ne lag(id2) then obs = 0;
obs + 1;
cards;
A01 12 QP 96
A02 15 IM 81
A02 15 CB 81
A03 21 NR 71
A04 23 WV .
A04 23 DO .
A04 23 GV .
A05 30 MS .
;

proc sql;
create table all1_2 as 
select * from test1 natural full join test2;
quit;

But I can't see any good reason for trying to do that

EDIT : Corrected code typo. Thank you @ChrisNZ .

PG

ChrisNZ · Posted 04-14-2019 11:18 PM

@PGStats meant

if id ne lag(id) or id2 ne lag(id2) then obs = 0;

not

if id ne lag(id) or id2 ne id2 then obs = 0;

High-Performance SAS Coding - Third Edition

ErikLund_Jensen · Posted 04-15-2019 05:08 AM

Hi @J_hoon

A merge with duplicate keys in both data sets is always problematic, and there is no correct solution. A data step merge joins observations as they occur in input, while a sql join gives a cartesian product.

In your example data seems to be unordered within a set of key values, and with no information present to maintain/restore a certain order, so your result is based on the assumption that data happens to be in a useful order to do perform a row-by row merge.

Consider the following example:

data a
Strawberry red
Strawberry green

data b
Strawberry unripe
Strawberry ripe

merge a b
Strawberry red unripe
Strawberry green ripe

join a b
Strawberry red unripe
Strawberry red ripe
Strawberry green unripe
Strawberry green ripe

What do you prefer for further analysis, a merge with wrong observations connected or a join that gives you both right and wrong? - This is not a coding problem, but a design problem, and I would take it as a stop sign and reconsider the data model and the previous steps that led to this situation.

pdhokriya · Posted 09-03-2019 09:39 AM

Kindly share proc sql query for below mentioned query

data final;
length flag1 $100.;
merge ae (in=a) ex(in=b);
by subject site;
if a and not b ;
flag1 = "Subject is NOT present in Exposure form";
run;

merging data in proc sql

Re: merging data in proc sql

Re: merging data in proc sql

Re: merging data in proc sql

Re: merging data in proc sql

Re: merging data in proc sql

Re: merging data in proc sql

SAS Innovate 2025: Call for Content

Classroom Training Available!