Solved: Create dummy rows Based on Data using 'Do Loops'

SASuserlot · Posted 07-19-2023 09:13 PM

I have a scenario, where I need to create a dummy row based on the data. In the following scenario. there are four subjects who need to take four medications on the first day of every month. However, some subjects missing the medication. Only one subject (104) took all four medications in all four months and have the result. So how can I create the missing medications in each month for all subjects who missed and make the result as '.'(missing) using the do loop and based on the min and max of the 'dt' variable?

How I want the result ( Ex: I manually created for 01JAN1985 in the image, I am looking to achieve the same for all the dates, I think the final data should have 64 observations!)

Thank you for your inputs.

How I want.

data have;
input subjid dt:ddmmyy10.  Med result;
format dt date9.;
cards;
101 01/01/1985 1 10
104 01/01/1985 1 10
104 01/01/1985 2 10
104 01/01/1985 3 10
104 01/01/1985 4 10
102 01/02/1985 2 10
104 01/02/1985 1 10
104 01/02/1985 2 10
104 01/02/1985 3 10
104 01/02/1985 4 10
103 01/03/1985 3 10
104 01/03/1985 1 10
104 01/03/1985 2 10
104 01/03/1985 3 10
104 01/03/1985 4 10
104 01/04/1985 1 10
104 01/04/1985 2 10
104 01/04/1985 3 10
104 01/04/1985 4 10

;
run;

Tom · Posted 07-19-2023 10:41 PM

If you have the distinct set of dates and drugs and ids you can easily generate the cartesian product of the three sets with SQL code. Then just merge that back with what you have.

proc sql ;
create table skeleton as
  select subjid,dt,med 
  from subjects, dates, meds
  order by subjid,dt,med
;
quit;
data want;
  merge skeleton have;
  by subjid dt med;
run;

Do you know which months are missing? If not is ok to just use the distinct dates in the data?

Do you know which drugs are missing? If not is ok to just use the distinct drugs in the data?

Do you know which subjects are missing? If not is ok to just use the distinct subjects in the data?

proc sql ;
create table skeleton as
  select subjid,dt,med 
  from (select distinct subjid from have) subjects
     , (select distinct dt from have) dates
     , (select distinct med from have) meds
  order by subjid,dt,med
;
quit;

View solution in original post

whymath · Posted 07-19-2023 10:30 PM

May SQL be the solution? It doesn't use "Do Loop" though.

proc sql noprint;
  create table want as 
  select a.subjid, b.dt, c.med, d.result
  from (select distinct subjid from have) as a
  cross join (select distinct dt from have) as b
  cross join (select distinct med from have) as c
  left join have as d on
    a.subjid=d.subjid and b.dt=d.dt and c.med=d.med
  ;
quit;

ballardw · Posted 07-19-2023 10:38 PM

You don't show any result for the months without any and would require some additional information. Such as are you sure you always have some med taken in the first month???

This comes close to matching what you do show though it includes the missing months as I understand the question:

data have;
input subjid dt:ddmmyy10.  Med result;
format dt date9.;
cards;
101 01/01/1985 1 10
104 01/01/1985 1 10
104 01/01/1985 2 10
104 01/01/1985 3 10
104 01/01/1985 4 10
102 01/02/1985 2 10
104 01/02/1985 1 10
104 01/02/1985 2 10
104 01/02/1985 3 10
104 01/02/1985 4 10
103 01/03/1985 3 10
104 01/03/1985 1 10
104 01/03/1985 2 10
104 01/03/1985 3 10
104 01/03/1985 4 10
104 01/04/1985 1 10
104 01/04/1985 2 10
104 01/04/1985 3 10
104 01/04/1985 4 10

;
run;
/* needs to be in order later for merging*/
proc sort data=have;
   by subjid dt med;
run;
/* get first month present for each subjid*/
proc summary data=have nway;
   class subjid;
   var dt;
   output out=need (drop=_type_ _freq_) min=;
run;
/* add med and increment month values*/
data tomerge;
   set need;
   do i=0 to 3;
      /* if your med values in reality are not actually 
         nice integers provide a list of the values
         in MED SORT ORDER (SEE THE SORT ABOVE)
         such as med='medabc','medjkl','medmno','medpdq';
      */
      do med=1 to 4;
         output;
      end;
      dt=intnx('month',dt,1,'b');
   end;
   drop i;
run;

data want;
   merge tomerge
         have
   ;
   by subjid dt med;
run;

Basically the plan is build as set with the needed dates and med values for each subjid and merge to bring in the Result value.

whymath · Posted 07-19-2023 10:39 PM

In FREQ procedure, there is an option "SPARSE" can generate the cartesian product, too.

proc freq data=have noprint;
  tables subjid*dt*Med/sparse out=want;
run;

data want(drop=COUNT PERCENT);
  set want;
  if count=1 then result=10;  *If result is always 10;
  else result=.;
run;

SASuserlot · Posted 07-20-2023 09:01 AM

Results are NOT always 10. Sorry for the confusion.

Tom · Posted 07-19-2023 10:41 PM

If you have the distinct set of dates and drugs and ids you can easily generate the cartesian product of the three sets with SQL code. Then just merge that back with what you have.

proc sql ;
create table skeleton as
  select subjid,dt,med 
  from subjects, dates, meds
  order by subjid,dt,med
;
quit;
data want;
  merge skeleton have;
  by subjid dt med;
run;

Do you know which months are missing? If not is ok to just use the distinct dates in the data?

Do you know which drugs are missing? If not is ok to just use the distinct drugs in the data?

Do you know which subjects are missing? If not is ok to just use the distinct subjects in the data?

proc sql ;
create table skeleton as
  select subjid,dt,med 
  from (select distinct subjid from have) subjects
     , (select distinct dt from have) dates
     , (select distinct med from have) meds
  order by subjid,dt,med
;
quit;

whymath · Posted 07-19-2023 10:48 PM

I think you've thought it through.

Tom · Posted 07-19-2023 10:52 PM

If you have the list of dates and drugs you can also use DO loops to generate the skeleton dataset.

data skeleton;
  set have;
  by subjid;
  if first.subjid;
  do dt='01AN2020'd,'01MAR2020'd,.....;
     do med='Drug1','Drug2',..... ;
        output;
     end;
  end;
  keep subjid dt med;
run;

SASuserlot · Posted 07-20-2023 09:11 AM

Thank you for taking time on this. I do have the dates, but the problem it increases/ changes every time new data comes in. Is there any way I can create the minimum and maximum date and create a macro variable to use it like "do = &mindt to &maxdt". I only know the &mindt will be the first day of any month and Year, and the same applies to the &maxdt.

Tom · Posted 07-20-2023 09:38 AM

To increment dates by MONTH (or really any date/time/datetime interval that is not the base unit used to store the values days/seconds/seconds) use an OFFSET integer and INTNX() function.

do offset=0 to intck('month',&mindt,&maxdt);
  dt = intnx('month',&mindt,offset);
  ...
end;

mkeintz · Posted 07-20-2023 09:33 AM

Your sample data are sorted by DT/SUBJID, and you want data expanded and sorted by SUBJID/DT:

data have;
  input subjid dt:ddmmyy10.  Med result;
  format dt date9.;
cards;
101 01/01/1985 1 10
104 01/01/1985 1 10
104 01/01/1985 2 10
104 01/01/1985 3 10
104 01/01/1985 4 10
102 01/02/1985 2 10
104 01/02/1985 1 10
104 01/02/1985 2 10
104 01/02/1985 3 10
104 01/02/1985 4 10
103 01/03/1985 3 10
104 01/03/1985 1 10
104 01/03/1985 2 10
104 01/03/1985 3 10
104 01/03/1985 4 10
104 01/04/1985 1 10
104 01/04/1985 2 10
104 01/04/1985 3 10
104 01/04/1985 4 10
run;

data need /view=need;
  set have;
  by dt subjid;
  array reslts {4} _temporary_;
  if first.subjid then call missing(of reslts{*});
  reslts{med}=result;
  if last.subjid then do med=1 to 4;
    result=reslts{med};
    output;
  end;
run;

proc sort data=need out=want;
  by subjid dt;
run;

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

Create dummy rows Based on Data using 'Do Loops'

Re: Create dummy rows Based on Data using 'Do Loops'

Re: Create dummy rows Based on Data using 'Do Loops'

Re: Create dummy rows Based on Data using 'Do Loops'

Re: Create dummy rows Based on Data using 'Do Loops'

Re: Create dummy rows Based on Data using 'Do Loops'

Re: Create dummy rows Based on Data using 'Do Loops'

Re: Create dummy rows Based on Data using 'Do Loops'

Re: Create dummy rows Based on Data using 'Do Loops'

Re: Create dummy rows Based on Data using 'Do Loops'

Re: Create dummy rows Based on Data using 'Do Loops'

Re: Create dummy rows Based on Data using 'Do Loops'

Catch up on SAS Innovate 2026

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away