BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
SASuserlot
Barite | Level 11

I have a scenario, where I need to create a dummy row based on the data. In the following scenario. there are four subjects who need to take four medications on the first day of every month. However, some subjects missing the medication. Only one subject (104) took all four medications in all four months and have the result. So how can I create the missing medications in each month for all subjects who missed and make the result as '.'(missing) using the do loop and based on the min and max of the 'dt' variable?

 

How I want the result ( Ex: I manually created for 01JAN1985 in the image, I am looking to achieve the same for all the dates, I think the final data should have 64 observations!)

 

 

Thank you for your inputs.

 

How I want.

SASuserlot_0-1689815554872.png

 

data have;
input subjid dt:ddmmyy10.  Med result;
format dt date9.;
cards;
101 01/01/1985 1 10
104 01/01/1985 1 10
104 01/01/1985 2 10
104 01/01/1985 3 10
104 01/01/1985 4 10
102 01/02/1985 2 10
104 01/02/1985 1 10
104 01/02/1985 2 10
104 01/02/1985 3 10
104 01/02/1985 4 10
103 01/03/1985 3 10
104 01/03/1985 1 10
104 01/03/1985 2 10
104 01/03/1985 3 10
104 01/03/1985 4 10
104 01/04/1985 1 10
104 01/04/1985 2 10
104 01/04/1985 3 10
104 01/04/1985 4 10

;
run;

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

If you have the distinct set of dates and drugs and ids you can easily generate the cartesian product of the three sets with SQL code. Then just merge that back with what you have.

proc sql ;
create table skeleton as
  select subjid,dt,med 
  from subjects, dates, meds
  order by subjid,dt,med
;
quit;
data want;
  merge skeleton have;
  by subjid dt med;
run;

Do you know which months are missing?  If not is ok to just use the distinct dates in the data?

Do you know which drugs are missing?  If not is ok to just use the distinct drugs in the data?

Do you know which subjects are missing?  If not is ok to just use the distinct subjects in the data?

proc sql ;
create table skeleton as
  select subjid,dt,med 
  from (select distinct subjid from have) subjects
     , (select distinct dt from have) dates
     , (select distinct med from have) meds
  order by subjid,dt,med
;
quit;

 

 

 

View solution in original post

10 REPLIES 10
whymath
Lapis Lazuli | Level 10

May SQL be the solution? It doesn't use "Do Loop" though.

proc sql noprint;
  create table want as 
  select a.subjid, b.dt, c.med, d.result
  from (select distinct subjid from have) as a
  cross join (select distinct dt from have) as b
  cross join (select distinct med from have) as c
  left join have as d on
    a.subjid=d.subjid and b.dt=d.dt and c.med=d.med
  ;
quit;
ballardw
Super User

You don't show any result for the months without any and would require some additional information. Such as are you sure you always have some med taken in the first month???

 

This comes close to matching what you do show though it includes the missing months as I understand the question:

data have;
input subjid dt:ddmmyy10.  Med result;
format dt date9.;
cards;
101 01/01/1985 1 10
104 01/01/1985 1 10
104 01/01/1985 2 10
104 01/01/1985 3 10
104 01/01/1985 4 10
102 01/02/1985 2 10
104 01/02/1985 1 10
104 01/02/1985 2 10
104 01/02/1985 3 10
104 01/02/1985 4 10
103 01/03/1985 3 10
104 01/03/1985 1 10
104 01/03/1985 2 10
104 01/03/1985 3 10
104 01/03/1985 4 10
104 01/04/1985 1 10
104 01/04/1985 2 10
104 01/04/1985 3 10
104 01/04/1985 4 10

;
run;
/* needs to be in order later for merging*/
proc sort data=have;
   by subjid dt med;
run;
/* get first month present for each subjid*/
proc summary data=have nway;
   class subjid;
   var dt;
   output out=need (drop=_type_ _freq_) min=;
run;
/* add med and increment month values*/
data tomerge;
   set need;
   do i=0 to 3;
      /* if your med values in reality are not actually 
         nice integers provide a list of the values
         in MED SORT ORDER (SEE THE SORT ABOVE)
         such as med='medabc','medjkl','medmno','medpdq';
      */
      do med=1 to 4;
         output;
      end;
      dt=intnx('month',dt,1,'b');
   end;
   drop i;
run;

data want;
   merge tomerge
         have
   ;
   by subjid dt med;
run;

Basically the plan is build as set with the needed dates and med values for each subjid and merge to bring in the Result value.

 

 

whymath
Lapis Lazuli | Level 10

In FREQ procedure, there is an option "SPARSE" can generate the cartesian product, too.

proc freq data=have noprint;
  tables subjid*dt*Med/sparse out=want;
run;

data want(drop=COUNT PERCENT);
  set want;
  if count=1 then result=10;  *If result is always 10;
  else result=.;
run;
SASuserlot
Barite | Level 11

Results are NOT always 10. Sorry for the confusion.

Tom
Super User Tom
Super User

If you have the distinct set of dates and drugs and ids you can easily generate the cartesian product of the three sets with SQL code. Then just merge that back with what you have.

proc sql ;
create table skeleton as
  select subjid,dt,med 
  from subjects, dates, meds
  order by subjid,dt,med
;
quit;
data want;
  merge skeleton have;
  by subjid dt med;
run;

Do you know which months are missing?  If not is ok to just use the distinct dates in the data?

Do you know which drugs are missing?  If not is ok to just use the distinct drugs in the data?

Do you know which subjects are missing?  If not is ok to just use the distinct subjects in the data?

proc sql ;
create table skeleton as
  select subjid,dt,med 
  from (select distinct subjid from have) subjects
     , (select distinct dt from have) dates
     , (select distinct med from have) meds
  order by subjid,dt,med
;
quit;

 

 

 

whymath
Lapis Lazuli | Level 10
I think you've thought it through.
Tom
Super User Tom
Super User

If you have the list of dates and drugs you can also use DO loops to generate the skeleton dataset.

data skeleton;
  set have;
  by subjid;
  if first.subjid;
  do dt='01AN2020'd,'01MAR2020'd,.....;
     do med='Drug1','Drug2',..... ;
        output;
     end;
  end;
  keep subjid dt med;
run;
SASuserlot
Barite | Level 11

Thank you for taking time on this. I do have the dates, but the problem it increases/ changes every time new data comes in. Is there any way I can create the minimum and maximum date and create a macro variable to use it like "do = &mindt to &maxdt". I only know the &mindt  will be the first day of any month and Year, and the same applies to the &maxdt.

Tom
Super User Tom
Super User

To increment dates by MONTH (or really any date/time/datetime interval that is not the base unit used to store the values days/seconds/seconds) use an OFFSET integer and INTNX() function.

do offset=0 to intck('month',&mindt,&maxdt);
  dt = intnx('month',&mindt,offset);
  ...
end;
mkeintz
PROC Star

Your sample data are sorted by DT/SUBJID, and you want data expanded and sorted by SUBJID/DT:

 

data have;
  input subjid dt:ddmmyy10.  Med result;
  format dt date9.;
cards;
101 01/01/1985 1 10
104 01/01/1985 1 10
104 01/01/1985 2 10
104 01/01/1985 3 10
104 01/01/1985 4 10
102 01/02/1985 2 10
104 01/02/1985 1 10
104 01/02/1985 2 10
104 01/02/1985 3 10
104 01/02/1985 4 10
103 01/03/1985 3 10
104 01/03/1985 1 10
104 01/03/1985 2 10
104 01/03/1985 3 10
104 01/03/1985 4 10
104 01/04/1985 1 10
104 01/04/1985 2 10
104 01/04/1985 3 10
104 01/04/1985 4 10
run;

data need /view=need;
  set have;
  by dt subjid;
  array reslts {4} _temporary_;
  if first.subjid then call missing(of reslts{*});
  reslts{med}=result;
  if last.subjid then do med=1 to 4;
    result=reslts{med};
    output;
  end;
run;

proc sort data=need out=want;
  by subjid dt;
run;
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 10 replies
  • 2329 views
  • 6 likes
  • 5 in conversation