Hello SAS communities,
I have a data set like this:
id day drinks
1 2 1
1 3 3
1 5 4
2 3 1
3 2 1
3 4 2
The study lasts 5 days, and days a person doesn't have any drink (drink = 0) are not reported in the data. How could I create these observations (drink = 0 in the days the person doesn't drink) in this data set? The data I want looks like this:
id day drinks
1 1 0
1 2 1
1 3 3
1 4 0
1 5 4
2 1 0
2 2 0
2 3 1
2 4 0
2 5 0
3 1 0
3 2 1
3 3 0
3 4 2
3 5 0
Thank you!
No, just ignore that option. Not needed. Will work without it
/*Added additional variables to the sample HAVE*/
data have;
input id day drinks ;
blah=rand('integer',1,100);
blahblah=rand('integer',1,100);
blahblahblah=rand('integer',1,100);
cards;
1 2 1
1 3 3
1 5 4
2 3 1
3 2 1
3 4 2
;
data want;
if _n_=1 then do;
if 0 then set have;
dcl hash H (dataset:'have') ;
h.definekey ("id",'day') ;
h.definedata ("drinks") ;
h.definedone () ;
end;
set have(keep=id blah blahblah blahblahblah);
by id;/*assuming it's sorted by id as your sample suggests*/
if first.id;
do day=1 to 5;
rc=h.find();
if rc ne 0 then drinks=0;
output;
end;
drop rc;
run;
data have;
input id day drinks;
cards;
1 2 1
1 3 3
1 5 4
2 3 1
3 2 1
3 4 2
;
data want;
if _n_=1 then do;
dcl hash H (dataset:'have') ;
h.definekey ("id",'day') ;
h.definedata ("drinks") ;
h.definedone () ;
end;
set have(keep=id);;
by id;/*assuming it's sorted by id as your sample suggests*/
if first.id;
do day=1 to 5;
rc=h.find();
if rc ne 0 then drinks=0;
output;
end;
drop rc;
run;
Thanks a lot! I will try and update the results if it works.
My data actually has more variables in addition to these 3 variables. What if I want to keep all of the rest variables together with the three variables when creating these new observations?
The code you wrote works. However, it only keeps the 3 variables (as I posted a simplified version of my data). The other variables are just demographics and do not change in different days. How could I keep all of them when I create new observations = 0?
Thank you
Include you other variables in definedata like
h.definedata ("drinks","var1","var2"....and so on up to "varn") ;
Also add this statement
do day=1 to 5;
rc=h.find();
if rc ne 0 then do;
drinks=0;
call missing(var1,var2, .....upto varn);
output;
end;
Also please post a more representative sample so that we can avoid going back and forth.
Thanks. Can you rewrite the whole code in one block? I added the part you suggested and the variables to keep the old ones. SAS returns that There was 1 unclosed DO block.
Could you please help?
Hi @mkt_apprentice Please review the revised sample below. You will get it
/*Added additional variables to the sample HAVE*/
data have;
input id day drinks ;
blah=rand('integer',1,100);
blahblah=rand('integer',1,100);
blahblahblah=rand('integer',1,100);
cards;
1 2 1
1 3 3
1 5 4
2 3 1
3 2 1
3 4 2
;
data want;
if _n_=1 then do;
dcl hash H (dataset:'have') ;
h.definekey ("id",'day') ;
h.definedata ("drinks",'blah','blahblah','blahblahblah') ;
h.definedone () ;
end;
set have(keep=id);;
by id;/*assuming it's sorted by id as your sample suggests*/
if first.id;
do day=1 to 5;
rc=h.find();
if rc ne 0 then do;
drinks=0;
call missing(blah,blahblah,blahblahblah);
end;
output;
end;
drop rc;
run;
Thanks a lot for the code. Just one more thing that I need your help. The demographic variables that are associated each person (id) are not kept for the observation drinks = 0. They all become "." or missing data. Is there any way I can keep them in the newly created drinks=0 observations? I copied your code and added the demographic variables to the appropriate location in the code but the results include . rather than these demographic variables. Can you please help?
Comment out this part
/*call missing(blah,blahblah,blahblahblah);*/
Complete code
/*Added additional variables to the sample HAVE*/
data have;
input id day drinks ;
blah=rand('integer',1,100);
blahblah=rand('integer',1,100);
blahblahblah=rand('integer',1,100);
cards;
1 2 1
1 3 3
1 5 4
2 3 1
3 2 1
3 4 2
;
data want;
if _n_=1 then do;
if 0 then set have;
dcl hash H (dataset:'have') ;
h.definekey ("id",'day') ;
h.definedata ("drinks",'blah','blahblah','blahblahblah') ;
h.definedone () ;
end;
set have(keep=id);;
by id;/*assuming it's sorted by id as your sample suggests*/
if first.id;
do day=1 to 5;
rc=h.find();
if rc ne 0 then do;
drinks=0;
/*call missing(blah,blahblah,blahblahblah);*/
end;
output;
end;
drop rc;
run;
What does call missing do? I want these demographic variables to be the same as the ones in observations drinks differ than 0.
For example:
id day drinks age gender
1 1 2 24 1
1 2 0 24 1
Currently the resulting data is like this:
id day drinks age gender
1 1 2 24 1
1 2 0 . .
@mkt_apprentice Edited the previous with complete code
It still produces . values 😞
I owe an apology as I overlooked a minor issue. Really sorry
Please find the corrected code below
/*Added additional variables to the sample HAVE*/
data have;
input id day drinks ;
blah=rand('integer',1,100);
blahblah=rand('integer',1,100);
blahblahblah=rand('integer',1,100);
cards;
1 2 1
1 3 3
1 5 4
2 3 1
3 2 1
3 4 2
;
data want;
if _n_=1 then do;
if 0 then set have;
dcl hash H (dataset:'have',multidata:'y') ;
h.definekey ("id",'day') ;
h.definedata ("drinks") ;
h.definedone () ;
end;
set have(keep=id blah blahblah blahblahblah);
by id;/*assuming it's sorted by id as your sample suggests*/
if first.id;
do day=1 to 5;
rc=h.find();
if rc ne 0 then drinks=0;
output;
end;
drop rc;
run;
What do you mean in multidata: 'y'? Do I need to create a separate data set for the demographic variables?
Available on demand!
Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.