BookmarkSubscribeRSS Feed
hwangnyc
Quartz | Level 8

Hi everyone, I'm looking for a way to clean up this code. I'd like to create several binary variables, one for each month dating back to November 2007 to today. At the moment, I need to change the 3 dates per variable and rename the variable twice. Is there anyway I could streamline this process? Thanks in advance!


if assessmentdate LT '01dec2007'd and (programtypedate GE '01jan2008'd or casecloseddate GE '01jan2008'd or casedispositionid = 1) then Nov07 = 1; else Nov07 =0;


if assessmentdate LT '01jan2008'd and (programtypedate GE '01feb2008'd or casecloseddate GE '01feb2008'd or casedispositionid = 1) then Dec07 = 1; else dec07 =0;

7 REPLIES 7
Reeza
Super User
An array is the only method that springs to mind. You need to provide some sample data and more examples of your rules though. You don't state how you're using it afterward, that may be worth sharing, since there may be an easier way than creating a bunch of indicator variables.
Astounding
PROC Star

Here's an abbreviated version, using just 6 months.  Obviously, you will need to expand the list of variables to cover 10 years.  Equally obvious, I guess, is that you will need to learn the INTNX function:

 

data want;

set have;

array months {6} Nov07 Dec07 Jan08 Feb08 Mar08 Apr08;

do j=1 to 6;

   if assessmentdate < intnx('month', '15nov2007'd, j) and

   ( programtypedate ge intnx('month', '15nov2007'd, j+1) or

     casecloseddate ge intnx('month', '15nov2007'd, j+1) or

     casedispositionid = 1)

   then months{j} = 1;

   else months{j} = 0;

end;

run;

 

So adding another 9+ years of data doesn't significantly lengthen the program, except for the ARRAY statement elements.

 

It's untested code, so let me know if there's anything that needs to be tweaked.

mkeintz
PROC Star

Given the program fragment you provide, if you have

  ASSESSMENTDATE='01OCT2007'd
  PROGRAMTYPEDATE='01MAR2008'd

  CASEDISPOSITIONID=1

then BOTH nov07 and dec07 = 1

 

Does your rule intend to permit more than one dummy=1 in a single record?

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
ballardw
Super User

You might describe what you will actually do with the result. Sometimes adding all of those variables isn't actually needed. Not to mention that rerunning the process later could mean that your data set structure and other programming has to accommodate the additions.

hwangnyc
Quartz | Level 8

Hi everyone, Thanks for all of the responses.

 

I have data dating back to late 2007 and what I want to know is if a case was active or not active at any given month.

 

The variables: 

assessmentdate - is the date the when the case entered the program. 

casecloseddate - is the date the when the case excited the program. 

casedispostionid - is a categorical variable indiciating the status of the case i.e. active, lost to follow up, graduated. 

Programtypedate - was added recently to replacecasecloseddate 

 

Any suggestions would be great! Thank you!

mkeintz
PROC Star

As @Reeza said, this is an array problem.  Since you have not provided sample input and corresponding desired output, the program below is based on conjecture:

 

data have;
  input (assessmentdate programtypedate casecloseddate) (:date9.)  casedispositionid;
  format assessmentdate programtypedate casecloseddate date9. ;
datalines;
01aug1987 . . 1
01dec1987 . . 1
30nov1987 . . 1
01oct2017 . . 1
30sep2017 . . 1
01aug1987 01oct1988 . 0
01aug1987 .     01oct1988 0
01aug1987 30sep1988 . 0
01aug1987 .     30sep1988 0
01aug1997 01oct1998 . 0
01aug1997 .     01oct1998 0
01aug1997 30sep1998 . 0
01aug1997 .     30sep1998 0
run;


data want;
  set have;

  /* Array of monthly dummies starting with Y1987_11 (Nov 1987), ending with Y2017_10 (Oct 2017) */
  array dummies {*}                    Y1987_11-Y1987_12 Y1988_01-Y1988_12 Y1989_01-Y1989_12
   Y1990_01-Y1990_12 Y1991_01-Y1991_12 Y1992_01-Y1992_12 Y1993_01-Y1993_12 Y1994_01-Y1994_12
   Y1995_01-Y1995_12 Y1996_01-Y1996_12 Y1997_01-Y1997_12 Y1998_01-Y1998_12 Y1999_01-Y1999_12

   Y2000_01-Y2000_12 Y2001_01-Y2001_12 Y2002_01-Y2002_12 Y2003_01-Y2003_12 Y2004_01-Y2004_12
   Y2005_01-Y2005_12 Y2006_01-Y2006_12 Y2007_01-Y2007_12 Y2008_01-Y2008_12 Y2009_01-Y2009_12

   Y2010_01-Y2010_12 Y2011_01-Y2011_12 Y2012_01-Y2012_12 Y2013_01-Y2013_12 Y2014_01-Y2014_12
   Y2015_01-Y2015_12 Y2016_01-Y2016_12 Y2017_01-Y2017_10
   ;

  array adates{600} _temporary_     /* Monthly dummy dates, for up to 50 years following NOV1987*/;
  if _n_=1 then do I=1 to dim(dummies)+1;
    adates{I}=intnx('month','01nov1987'd,I,'begin');
  end;

  do I=dim(dummies) to 1 by -1 while(assessmentdate<adates{I});
    if casedispositionid=1 then dummies{I}=1;
    else if max(programtypedate,casecloseddate)>=adates{I+1} then dummies{I}=1;
    else dummies{I}=0;
  end;
  if I>0 then do J=1 to I;
    dummies{J}=0;
  end;
  drop I J;
run;

 

This uses to corresponding arrays: DUMMIES (one per month starting in NOV1987) and an array of target ADATES, where the i'th element of ADATES is the first of the month following the month for i'th dummy.

 

This loop starts from the upper bound of DUMMIES, and it relies on the fact that you can start at the target assessment date (in array ADATES) corresponding to Y2017_10 and step backwards in time.  At each iteration of a qualifying target assessment date, test the other vars to determine the value of the corresponding dummy. 

 

But once assessment date is no longer less than a given ADATES element, all preceding dummies must be zero.  That's the second DO loop.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
s_lassen
Meteorite | Level 14

@hwangnyc:

Another possibility is to create the wanted data in long format first, and then transpose:

data long;
  set have;
  datevar='01nov2007'd;
  do while(datevar<='01nov2017'd);
    month_id=put(datevar,monyy5.);
    next_date=intnx('month',datevar,1,'B');
    result=assessmentdate LT datevar and (programtypedate GE next_date or casecloseddate GE next_date or casedispositionid = 1);
    output;
    datevar=next_date;
    end;
  drop datevar next_date;
run;

proc transpose data=long out=want;
  by assessmentdate programtypedate casecloseddate casedispositionid notsorted;
  var result;
  id month_id;
run;

 Just remember to include any other variables that you want on the output in the BY statement.

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 7 replies
  • 1108 views
  • 0 likes
  • 6 in conversation