Hi All,
I have 2 datasets that look something like this:
data have1;
input studyID Var1 ;
cards;
1 1
1 1
1 1
2 0
2 0
2 0
2 0
2 0
3 1
3 1
4 0
5 0
5 0
5 0
5 0
run;
data have2;
input studyID var2;
cards;
1 23
2 42
3 12
4 22
5 56
run;
what I want to do is merge the two datasets, but only keep the Var2 variable for the first record for each studyID with zeros for the rest.
So the "want" dataset would look like:
data want;
input studyID var1 var2 ;
cards;
1 1 23
1 1 0
1 1 0
2 0 42
2 0 0
2 0 0
2 0 0
2 0 0
3 1 12
3 1 0
4 0 22
5 0 56
5 0 0
5 0 0
5 0 0
run;
For context - Var 2 is an exposure variable that represents the total exposure for each studyID. I can get it so that every record for the same study ID has the same value, but it causes issues when I try to sum var2 to get the total exposure for the whole dataset.
for example:
The total exposure time for the above dataset should be 23+42+12+22+56 = 155
but if I have the var2 value on each instance of a studyid, thenI get - 23(3)+42(5)+12(2)+22+56(4) = 549
As always any help/thoughts would be much appreciated
Thanks
Mike
data have1;
input studyID Var1 ;
cards;
1 1
1 1
1 1
2 0
2 0
2 0
2 0
2 0
3 1
3 1
4 0
5 0
5 0
5 0
5 0
run;
data have2;
input studyID var2;
cards;
1 23
2 42
3 12
4 22
5 56
;
run;
data want;
merge have1 have2;
by studyid;
if not first.studyid then var2=0;
run;
data have1;
input studyID Var1 ;
cards;
1 1
1 1
1 1
2 0
2 0
2 0
2 0
2 0
3 1
3 1
4 0
5 0
5 0
5 0
5 0
run;
data have2;
input studyID var2;
cards;
1 23
2 42
3 12
4 22
5 56
;
run;
data want;
merge have1 have2;
by studyid;
if not first.studyid then var2=0;
run;
Hi @ccaulkins9
data have1;
input studyID Var1 ;
cards;
1 1
1 1
1 1
2 0
2 0
2 0
2 0
2 0
3 1
3 1
4 0
5 0
5 0
5 0
5 0
run;
data have2;
input studyID var2;
cards;
1 23
2 42
3 12
4 22
5 56
;
run;
proc sql;
create table want(drop=rn) as
select a.*,ifn(min(rn)=rn,var2,0) as var2
from (select *,monotonic() as rn from have1) a full join have2 b
on a.studyid=b.studyid
group by a.studyid
order by studyid, rn;
quit;
However, not recommending the above to be implemented as production code as datastep is faster and straight forward
@novinosrin, I think you could improve the SQL query like so:
proc sql;
create table want as /*(drop=rn)*/
select
a.studyID, a.Var1,
/*ifn(min(rn)=a.rn,b.var2,0)*/
/*I'm getting confused as to
what the initial intent was but
I'll come back to
this Var2 selecting later*/
from (
select
studyid,
var1,
monotonic() as rn from have1
) a /*var1_table*/,
have2 b /*var2_table*/
where a.studyid=b.studyid
group by
a.studyid
order by
studyid,
rn;
quit;
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.