Hi All,
I have a dataset as below:
SUBJECT | PERIOD | DAY | TEST | RESULT |
100008 | 1 | 1 | Hb | 10 |
100008 | 1 | 2 | Hb | 20 |
100008 | 1 | 3 | Hb | 15 |
100008 | 2 | 1 | Hb | 14 |
100008 | 2 | 2 | Hb | 13 |
100008 | 2 | 3 | Hb | 16 |
100008 | 3 | 1 | Hb | 19 |
100008 | 3 | 2 | Hb | 10 |
100008 | 3 | 3 | Hb | 11 |
100009 | 1 | 1 | Hb | 12 |
100009 | 1 | 2 | Hb | 14 |
100009 | 1 | 3 | Hb | 11 |
100010 | 1 | 1 | Hb | 12 |
100010 | 1 | 2 | Hb | 18 |
100010 | 1 | 3 | Hb | 13 |
100010 | 2 | 1 | Hb | 11 |
100010 | 2 | 2 | Hb | 13 |
100010 | 2 | 3 | Hb | 15 |
100010 | 3 | 1 | Hb | 17 |
100010 | 3 | 2 | Hb | 11 |
100010 | 3 | 3 | Hb | 11 |
For example in this study there are 3 periods and each period is for 3 days. Here subject 100008 has completed all the three periods for all the 3 days. But subject 100009 has completed only period 1 and then got exited from the study due to some reason. Here i would like to add dummy records for all the other existing variables in dataset though subject has not completed all the visits. Output should be as below:
SUBJECT | PERIOD | DAY | TEST | RESULT |
100008 | 1 | 1 | Hb | 10 |
100008 | 1 | 2 | Hb | 20 |
100008 | 1 | 3 | Hb | 15 |
100008 | 2 | 1 | Hb | 14 |
100008 | 2 | 2 | Hb | 13 |
100008 | 2 | 3 | Hb | 16 |
100008 | 3 | 1 | Hb | 19 |
100008 | 3 | 2 | Hb | 10 |
100008 | 3 | 3 | Hb | 11 |
100009 | 1 | 1 | Hb | 12 |
100009 | 1 | 2 | Hb | 14 |
100009 | 1 | 3 | Hb | 11 |
100009 | 2 | 1 | Hb | NA |
100009 | 2 | 2 | Hb | NA |
100009 | 2 | 3 | Hb | NA |
100009 | 3 | 1 | Hb | NA |
100009 | 3 | 2 | Hb | NA |
100009 | 3 | 3 | Hb | NA |
100010 | 1 | 1 | Hb | 12 |
100010 | 1 | 2 | Hb | 18 |
100010 | 1 | 3 | Hb | 13 |
100010 | 2 | 1 | Hb | 11 |
100010 | 2 | 2 | Hb | 13 |
100010 | 2 | 3 | Hb | 15 |
100010 | 3 | 1 | Hb | 17 |
100010 | 3 | 2 | Hb | 11 |
100010 | 3 | 3 | Hb | 11 |
It means though the subject has not visited the next days or period i need to create dummy records for all the available variables in the dataset.
Please help.
data want;
set have;
by subject;
output;
if last.subject then do period = period + 1 to 3;
do day = 1 to 3;
result = "NA";
output;
end;
end;
run;
Untested; for tested code, post your example data as a data step with datalines.
Usually you can just create a skeleton dataset with all possible records and the default value and then overwrite the skeleton with the real data.
data period_day ;
if 0 then set have(keep=period day test result);
test = 'Hb';
result='NA';
do period=1 to 3;
do day=1 to 3;
output;
end;
end;
run;
proc sql ;
create table skeleton as
select *
from (select distinct subject from have) a
, period_day b
order by subject, period, day
;
quit;
data want;
merge skeleton have;
by subject period day;
run;
If you want to take care of incomplete days as well as missing periods:
data want;
set have; by subject;
output;
if last.subject then do;
call missing(result);
do p = period to 3;
do d = day+1 to 3;
period = p;
day = d;
output;
end;
day = 0;
end;
end;
drop d p;
run;
Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.