Solved: Re: A tricky Merge (well tricky for me at least)

righcoastmike · Posted 05-01-2019 02:31 PM

Hi All,

I have 2 datasets that look something like this:

data have1;
input studyID Var1 ;
cards;
1 1
1 1
1 1
2 0
2 0
2 0
2 0
2 0
3 1
3 1
4 0 
5 0
5 0
5 0
5 0
run;

data have2;
input studyID var2;
cards;
1 23
2 42
3 12
4 22
5 56

run;

what I want to do is merge the two datasets, but only keep the Var2 variable for the first record for each studyID with zeros for the rest.

So the "want" dataset would look like:

data want;
input studyID var1 var2 ;
cards;
1 1 23
1 1 0
1 1 0
2 0 42
2 0 0
2 0 0 
2 0 0
2 0 0
3 1 12
3 1 0
4 0 22
5 0 56
5 0 0
5 0 0
5 0 0 
run;

For context - Var 2 is an exposure variable that represents the total exposure for each studyID. I can get it so that every record for the same study ID has the same value, but it causes issues when I try to sum var2 to get the total exposure for the whole dataset.

for example:

The total exposure time for the above dataset should be 23+42+12+22+56 = 155

but if I have the var2 value on each instance of a studyid, thenI get - 23(3)+42(5)+12(2)+22+56(4) = 549

As always any help/thoughts would be much appreciated

Thanks

Mike

novinosrin · Posted 05-01-2019 02:36 PM









data have1;
input studyID Var1 ;
cards;
1 1
1 1
1 1
2 0
2 0
2 0
2 0
2 0
3 1
3 1
4 0 
5 0
5 0
5 0
5 0
run;

data have2;
input studyID var2;
cards;
1 23
2 42
3 12
4 22
5 56
;
run;


data want;
merge have1 have2;
by studyid;
if not first.studyid then var2=0;
run;

View solution in original post

novinosrin · Posted 05-01-2019 02:36 PM









data have1;
input studyID Var1 ;
cards;
1 1
1 1
1 1
2 0
2 0
2 0
2 0
2 0
3 1
3 1
4 0 
5 0
5 0
5 0
5 0
run;

data have2;
input studyID var2;
cards;
1 23
2 42
3 12
4 22
5 56
;
run;


data want;
merge have1 have2;
by studyid;
if not first.studyid then var2=0;
run;

righcoastmike · Posted 05-01-2019 03:01 PM

@novinosrin for the win (again!)

Thanks so much.

Mike

ccaulkins9 · Posted 05-21-2019 03:55 PM

and kind of like the Belmont Stakes this year. ?::O)

e-SAS regards,

ccaulkins9 · Posted 05-28-2019 12:45 PM

I'm wondering is there a way to do this with pure SQL i.e. using a proc sql statement?

e-SAS regards,

novinosrin · Posted 05-28-2019 08:35 PM

Hi @ccaulkins9


data have1;
input studyID Var1 ;
cards;
1 1
1 1
1 1
2 0
2 0
2 0
2 0
2 0
3 1
3 1
4 0 
5 0
5 0
5 0
5 0
run;

data have2;
input studyID var2;
cards;
1 23
2 42
3 12
4 22
5 56
;
run;

proc sql;
create table want(drop=rn) as
select a.*,ifn(min(rn)=rn,var2,0) as var2
from (select *,monotonic() as rn from have1) a full join have2	b
on a.studyid=b.studyid
group by a.studyid
order by studyid, rn;
quit;

However, not recommending the above to be implemented as production code as datastep is faster and straight forward

ccaulkins9 · Posted 05-30-2019 12:12 PM

@novinosrin, I think you could improve the SQL query like so:

proc sql;
create table want as /*(drop=rn)*/
select

a.studyID, a.Var1,

/*ifn(min(rn)=a.rn,b.var2,0)*/

/*I'm getting confused as to

what the initial intent was but

I'll come back to

this Var2 selecting later*/

from (

select

studyid,

var1,

monotonic() as rn from have1

) a /*var1_table*/,

have2 b /*var2_table*/
where a.studyid=b.studyid

group by

a.studyid

order by

studyid,

rn;

quit;

e-SAS regards,

Register Today!

SAS Training: Just a Click Away