BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
righcoastmike
Quartz | Level 8

Hi All, 

I have 2 datasets that look something like this: 

 

data have1;
input studyID Var1 ;
cards;
1 1
1 1
1 1
2 0
2 0
2 0
2 0
2 0
3 1
3 1
4 0 
5 0
5 0
5 0
5 0
run;

data have2;
input studyID var2;
cards;
1 23
2 42
3 12
4 22
5 56

run;

what I want to do is merge the two datasets, but only keep the Var2 variable for the first record for each studyID with zeros for the rest. 

 

So the "want" dataset would look like: 

 

data want;
input studyID var1 var2 ;
cards;
1 1 23
1 1 0
1 1 0
2 0 42
2 0 0
2 0 0 
2 0 0
2 0 0
3 1 12
3 1 0
4 0 22
5 0 56
5 0 0
5 0 0
5 0 0 
run;

For context - Var 2 is an exposure variable that represents the total exposure for each studyID. I can get it so that every record for the same study ID has the same value, but it causes issues when I try to sum var2 to get the total exposure for the whole dataset. 

for example: 

 

The total exposure time for the above dataset should be 23+42+12+22+56 = 155

but if I have the var2 value on each instance of a studyid, thenI get - 23(3)+42(5)+12(2)+22+56(4) = 549

 

As always any help/thoughts would be much appreciated 

 

Thanks 

 

Mike 

1 ACCEPTED SOLUTION

Accepted Solutions
novinosrin
Tourmaline | Level 20








data have1;
input studyID Var1 ;
cards;
1 1
1 1
1 1
2 0
2 0
2 0
2 0
2 0
3 1
3 1
4 0 
5 0
5 0
5 0
5 0
run;

data have2;
input studyID var2;
cards;
1 23
2 42
3 12
4 22
5 56
;
run;


data want;
merge have1 have2;
by studyid;
if not first.studyid then var2=0;
run;

View solution in original post

6 REPLIES 6
novinosrin
Tourmaline | Level 20








data have1;
input studyID Var1 ;
cards;
1 1
1 1
1 1
2 0
2 0
2 0
2 0
2 0
3 1
3 1
4 0 
5 0
5 0
5 0
5 0
run;

data have2;
input studyID var2;
cards;
1 23
2 42
3 12
4 22
5 56
;
run;


data want;
merge have1 have2;
by studyid;
if not first.studyid then var2=0;
run;
righcoastmike
Quartz | Level 8

@novinosrin  for the win (again!) 

 

Thanks so much. 

 

Mike 

 

 

ccaulkins9
Pyrite | Level 9
and kind of like the Belmont Stakes this year. ?::O)
e-SAS regards,

ccaulkins9
Pyrite | Level 9
I'm wondering is there a way to do this with pure SQL i.e. using a proc sql statement?
e-SAS regards,

novinosrin
Tourmaline | Level 20

Hi @ccaulkins9 

 


data have1;
input studyID Var1 ;
cards;
1 1
1 1
1 1
2 0
2 0
2 0
2 0
2 0
3 1
3 1
4 0 
5 0
5 0
5 0
5 0
run;

data have2;
input studyID var2;
cards;
1 23
2 42
3 12
4 22
5 56
;
run;

proc sql;
create table want(drop=rn) as
select a.*,ifn(min(rn)=rn,var2,0) as var2
from (select *,monotonic() as rn from have1) a full join have2	b
on a.studyid=b.studyid
group by a.studyid
order by studyid, rn;
quit;

However, not recommending the above to be implemented as production code as datastep is faster and straight forward

ccaulkins9
Pyrite | Level 9

@novinosrin, I think you could improve the SQL query like so:

 

proc sql;
create table want as /*(drop=rn)*/
select

a.studyID, a.Var1,

 

/*ifn(min(rn)=a.rn,b.var2,0)*/

/*I'm getting confused as to

what the initial intent was but

I'll come back to

this Var2 selecting later*/

 

from (

select

studyid,

var1,

monotonic() as rn from have1

) a /*var1_table*/,

have2 b /*var2_table*/
where a.studyid=b.studyid

 

group by

a.studyid

 

order by

studyid,

rn;

 

quit;

e-SAS regards,

sas-innovate-wordmark-2025-midnight.png

Register Today!

Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.


Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 6 replies
  • 1415 views
  • 2 likes
  • 3 in conversation