I have a scenario, where I need to create a dummy row based on the data. In the following scenario. there are four subjects who need to take four medications on the first day of every month. However, some subjects missing the medication. Only one subject (104) took all four medications in all four months and have the result. So how can I create the missing medications in each month for all subjects who missed and make the result as '.'(missing) using the do loop and based on the min and max of the 'dt' variable?
How I want the result ( Ex: I manually created for 01JAN1985 in the image, I am looking to achieve the same for all the dates, I think the final data should have 64 observations!)
Thank you for your inputs.
How I want.
data have;
input subjid dt:ddmmyy10. Med result;
format dt date9.;
cards;
101 01/01/1985 1 10
104 01/01/1985 1 10
104 01/01/1985 2 10
104 01/01/1985 3 10
104 01/01/1985 4 10
102 01/02/1985 2 10
104 01/02/1985 1 10
104 01/02/1985 2 10
104 01/02/1985 3 10
104 01/02/1985 4 10
103 01/03/1985 3 10
104 01/03/1985 1 10
104 01/03/1985 2 10
104 01/03/1985 3 10
104 01/03/1985 4 10
104 01/04/1985 1 10
104 01/04/1985 2 10
104 01/04/1985 3 10
104 01/04/1985 4 10
;
run;
If you have the distinct set of dates and drugs and ids you can easily generate the cartesian product of the three sets with SQL code. Then just merge that back with what you have.
proc sql ;
create table skeleton as
select subjid,dt,med
from subjects, dates, meds
order by subjid,dt,med
;
quit;
data want;
merge skeleton have;
by subjid dt med;
run;
Do you know which months are missing? If not is ok to just use the distinct dates in the data?
Do you know which drugs are missing? If not is ok to just use the distinct drugs in the data?
Do you know which subjects are missing? If not is ok to just use the distinct subjects in the data?
proc sql ;
create table skeleton as
select subjid,dt,med
from (select distinct subjid from have) subjects
, (select distinct dt from have) dates
, (select distinct med from have) meds
order by subjid,dt,med
;
quit;
May SQL be the solution? It doesn't use "Do Loop" though.
proc sql noprint;
create table want as
select a.subjid, b.dt, c.med, d.result
from (select distinct subjid from have) as a
cross join (select distinct dt from have) as b
cross join (select distinct med from have) as c
left join have as d on
a.subjid=d.subjid and b.dt=d.dt and c.med=d.med
;
quit;
You don't show any result for the months without any and would require some additional information. Such as are you sure you always have some med taken in the first month???
This comes close to matching what you do show though it includes the missing months as I understand the question:
data have; input subjid dt:ddmmyy10. Med result; format dt date9.; cards; 101 01/01/1985 1 10 104 01/01/1985 1 10 104 01/01/1985 2 10 104 01/01/1985 3 10 104 01/01/1985 4 10 102 01/02/1985 2 10 104 01/02/1985 1 10 104 01/02/1985 2 10 104 01/02/1985 3 10 104 01/02/1985 4 10 103 01/03/1985 3 10 104 01/03/1985 1 10 104 01/03/1985 2 10 104 01/03/1985 3 10 104 01/03/1985 4 10 104 01/04/1985 1 10 104 01/04/1985 2 10 104 01/04/1985 3 10 104 01/04/1985 4 10 ; run; /* needs to be in order later for merging*/ proc sort data=have; by subjid dt med; run; /* get first month present for each subjid*/ proc summary data=have nway; class subjid; var dt; output out=need (drop=_type_ _freq_) min=; run; /* add med and increment month values*/ data tomerge; set need; do i=0 to 3; /* if your med values in reality are not actually nice integers provide a list of the values in MED SORT ORDER (SEE THE SORT ABOVE) such as med='medabc','medjkl','medmno','medpdq'; */ do med=1 to 4; output; end; dt=intnx('month',dt,1,'b'); end; drop i; run; data want; merge tomerge have ; by subjid dt med; run;
Basically the plan is build as set with the needed dates and med values for each subjid and merge to bring in the Result value.
In FREQ procedure, there is an option "SPARSE" can generate the cartesian product, too.
proc freq data=have noprint;
tables subjid*dt*Med/sparse out=want;
run;
data want(drop=COUNT PERCENT);
set want;
if count=1 then result=10; *If result is always 10;
else result=.;
run;
Results are NOT always 10. Sorry for the confusion.
If you have the distinct set of dates and drugs and ids you can easily generate the cartesian product of the three sets with SQL code. Then just merge that back with what you have.
proc sql ;
create table skeleton as
select subjid,dt,med
from subjects, dates, meds
order by subjid,dt,med
;
quit;
data want;
merge skeleton have;
by subjid dt med;
run;
Do you know which months are missing? If not is ok to just use the distinct dates in the data?
Do you know which drugs are missing? If not is ok to just use the distinct drugs in the data?
Do you know which subjects are missing? If not is ok to just use the distinct subjects in the data?
proc sql ;
create table skeleton as
select subjid,dt,med
from (select distinct subjid from have) subjects
, (select distinct dt from have) dates
, (select distinct med from have) meds
order by subjid,dt,med
;
quit;
If you have the list of dates and drugs you can also use DO loops to generate the skeleton dataset.
data skeleton;
set have;
by subjid;
if first.subjid;
do dt='01AN2020'd,'01MAR2020'd,.....;
do med='Drug1','Drug2',..... ;
output;
end;
end;
keep subjid dt med;
run;
Thank you for taking time on this. I do have the dates, but the problem it increases/ changes every time new data comes in. Is there any way I can create the minimum and maximum date and create a macro variable to use it like "do = &mindt to &maxdt". I only know the &mindt will be the first day of any month and Year, and the same applies to the &maxdt.
To increment dates by MONTH (or really any date/time/datetime interval that is not the base unit used to store the values days/seconds/seconds) use an OFFSET integer and INTNX() function.
do offset=0 to intck('month',&mindt,&maxdt);
dt = intnx('month',&mindt,offset);
...
end;
Your sample data are sorted by DT/SUBJID, and you want data expanded and sorted by SUBJID/DT:
data have;
input subjid dt:ddmmyy10. Med result;
format dt date9.;
cards;
101 01/01/1985 1 10
104 01/01/1985 1 10
104 01/01/1985 2 10
104 01/01/1985 3 10
104 01/01/1985 4 10
102 01/02/1985 2 10
104 01/02/1985 1 10
104 01/02/1985 2 10
104 01/02/1985 3 10
104 01/02/1985 4 10
103 01/03/1985 3 10
104 01/03/1985 1 10
104 01/03/1985 2 10
104 01/03/1985 3 10
104 01/03/1985 4 10
104 01/04/1985 1 10
104 01/04/1985 2 10
104 01/04/1985 3 10
104 01/04/1985 4 10
run;
data need /view=need;
set have;
by dt subjid;
array reslts {4} _temporary_;
if first.subjid then call missing(of reslts{*});
reslts{med}=result;
if last.subjid then do med=1 to 4;
result=reslts{med};
output;
end;
run;
proc sort data=need out=want;
by subjid dt;
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.