Hello SAS communities,
I have a data set like this:
id day drinks
1 2 1
1 3 3
1 5 4
2 3 1
3 2 1
3 4 2
The study lasts 5 days, and days a person doesn't have any drink (drink = 0) are not reported in the data. How could I create these observations (drink = 0 in the days the person doesn't drink) in this data set? The data I want looks like this:
id day drinks
1 1 0
1 2 1
1 3 3
1 4 0
1 5 4
2 1 0
2 2 0
2 3 1
2 4 0
2 5 0
3 1 0
3 2 1
3 3 0
3 4 2
3 5 0
Thank you!
No, just ignore that option. Not needed. Will work without it
/*Added additional variables to the sample HAVE*/
data have;
input id day drinks ;
blah=rand('integer',1,100);
blahblah=rand('integer',1,100);
blahblahblah=rand('integer',1,100);
cards;
1 2 1
1 3 3
1 5 4
2 3 1
3 2 1
3 4 2
;
data want;
if _n_=1 then do;
if 0 then set have;
dcl hash H (dataset:'have') ;
h.definekey ("id",'day') ;
h.definedata ("drinks") ;
h.definedone () ;
end;
set have(keep=id blah blahblah blahblahblah);
by id;/*assuming it's sorted by id as your sample suggests*/
if first.id;
do day=1 to 5;
rc=h.find();
if rc ne 0 then drinks=0;
output;
end;
drop rc;
run;
data have;
input id day drinks;
cards;
1 2 1
1 3 3
1 5 4
2 3 1
3 2 1
3 4 2
;
data want;
if _n_=1 then do;
dcl hash H (dataset:'have') ;
h.definekey ("id",'day') ;
h.definedata ("drinks") ;
h.definedone () ;
end;
set have(keep=id);;
by id;/*assuming it's sorted by id as your sample suggests*/
if first.id;
do day=1 to 5;
rc=h.find();
if rc ne 0 then drinks=0;
output;
end;
drop rc;
run;
Thanks a lot! I will try and update the results if it works.
My data actually has more variables in addition to these 3 variables. What if I want to keep all of the rest variables together with the three variables when creating these new observations?
The code you wrote works. However, it only keeps the 3 variables (as I posted a simplified version of my data). The other variables are just demographics and do not change in different days. How could I keep all of them when I create new observations = 0?
Thank you
Include you other variables in definedata like
h.definedata ("drinks","var1","var2"....and so on up to "varn") ;
Also add this statement
do day=1 to 5;
rc=h.find();
if rc ne 0 then do;
drinks=0;
call missing(var1,var2, .....upto varn);
output;
end;
Also please post a more representative sample so that we can avoid going back and forth.
Thanks. Can you rewrite the whole code in one block? I added the part you suggested and the variables to keep the old ones. SAS returns that There was 1 unclosed DO block.
Could you please help?
Hi @mkt_apprentice Please review the revised sample below. You will get it
/*Added additional variables to the sample HAVE*/
data have;
input id day drinks ;
blah=rand('integer',1,100);
blahblah=rand('integer',1,100);
blahblahblah=rand('integer',1,100);
cards;
1 2 1
1 3 3
1 5 4
2 3 1
3 2 1
3 4 2
;
data want;
if _n_=1 then do;
dcl hash H (dataset:'have') ;
h.definekey ("id",'day') ;
h.definedata ("drinks",'blah','blahblah','blahblahblah') ;
h.definedone () ;
end;
set have(keep=id);;
by id;/*assuming it's sorted by id as your sample suggests*/
if first.id;
do day=1 to 5;
rc=h.find();
if rc ne 0 then do;
drinks=0;
call missing(blah,blahblah,blahblahblah);
end;
output;
end;
drop rc;
run;
Thanks a lot for the code. Just one more thing that I need your help. The demographic variables that are associated each person (id) are not kept for the observation drinks = 0. They all become "." or missing data. Is there any way I can keep them in the newly created drinks=0 observations? I copied your code and added the demographic variables to the appropriate location in the code but the results include . rather than these demographic variables. Can you please help?
Comment out this part
/*call missing(blah,blahblah,blahblahblah);*/
Complete code
/*Added additional variables to the sample HAVE*/
data have;
input id day drinks ;
blah=rand('integer',1,100);
blahblah=rand('integer',1,100);
blahblahblah=rand('integer',1,100);
cards;
1 2 1
1 3 3
1 5 4
2 3 1
3 2 1
3 4 2
;
data want;
if _n_=1 then do;
if 0 then set have;
dcl hash H (dataset:'have') ;
h.definekey ("id",'day') ;
h.definedata ("drinks",'blah','blahblah','blahblahblah') ;
h.definedone () ;
end;
set have(keep=id);;
by id;/*assuming it's sorted by id as your sample suggests*/
if first.id;
do day=1 to 5;
rc=h.find();
if rc ne 0 then do;
drinks=0;
/*call missing(blah,blahblah,blahblahblah);*/
end;
output;
end;
drop rc;
run;
What does call missing do? I want these demographic variables to be the same as the ones in observations drinks differ than 0.
For example:
id day drinks age gender
1 1 2 24 1
1 2 0 24 1
Currently the resulting data is like this:
id day drinks age gender
1 1 2 24 1
1 2 0 . .
@mkt_apprentice Edited the previous with complete code
It still produces . values 😞
I owe an apology as I overlooked a minor issue. Really sorry
Please find the corrected code below
/*Added additional variables to the sample HAVE*/
data have;
input id day drinks ;
blah=rand('integer',1,100);
blahblah=rand('integer',1,100);
blahblahblah=rand('integer',1,100);
cards;
1 2 1
1 3 3
1 5 4
2 3 1
3 2 1
3 4 2
;
data want;
if _n_=1 then do;
if 0 then set have;
dcl hash H (dataset:'have',multidata:'y') ;
h.definekey ("id",'day') ;
h.definedata ("drinks") ;
h.definedone () ;
end;
set have(keep=id blah blahblah blahblahblah);
by id;/*assuming it's sorted by id as your sample suggests*/
if first.id;
do day=1 to 5;
rc=h.find();
if rc ne 0 then drinks=0;
output;
end;
drop rc;
run;
What do you mean in multidata: 'y'? Do I need to create a separate data set for the demographic variables?
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.