DATA Step, Macro, Functions and more

Not sure if I should use ARRAY and/or TRANSPOSE

Accepted Solution Solved
Reply
Contributor
Posts: 45
Accepted Solution

Not sure if I should use ARRAY and/or TRANSPOSE

Hi Everyone!

 

I have data showing every month a member is enrolled in a program. A person can be continuously enrolled, but most often a member is in and out of the program. The first observation for every member indicates the first date the member is enrolled in the program. Here is an example of the data:

 

ID

SEX

Eligible_DT

1A

F

10/01/2010

1A

F

11/20/2010

1A

F

12/05/2010

1B

M

01/01/2011

1B

M

03/06/2011

1B

M

05/12/2011

1B

M

06/14/2011

1B

M

10/15/2011

1B

M

11/27/2011

1C

F

09/16/2011

1C

F

11/30/2011

1D

M

10/01/2010

Let's call this dataset A

 

What I want to do is create dummy variables from the start date of my analysis to the last date of my analysis (Oct2009 - Sep2015). However, for the dummy variable dates, I only want MonthYear dummy variables (72 dummy variables in total: FORMAT 200910 - 201509). I did an ARRAY but the code I used created dummy variables for every single day between Oct2009 - Sep2015 instead of every single month. This resulted in thousands of dummy variables for the number of days between Oct2009 - Sep2015.

 

Here is the code I used:

 

PROC SQL;

      SELECT MIN(Eligible_DT) FORMAT YYMMN6., MAX(Eligible_DT) FORMAT YYMMN6. INTO :FIRST_DATE, :LAST_DATE

      FROM A;

QUIT;

 

DATA B1;

      SET A;

            BY ID SEX;

            ARRAY DATE(&FIRST_DATE : &LAST_DATE));

RUN;

 

 

My second try used the TRANSPOSE; here is the code:

 

PROC SORT DATA=A;

      BY ID SEX ELIGIBLE_DT;

RUN;

 

PROC TRANSPOSE DATA=A OUT=B2 PREFIX=DATE;

      BY ID SEX;

      VAR ID;

      ID ELIGIBLE_DT;

FORMAT ELIGIBLE_DT YYMMN6.;

RUN;

 

This gets me really close to what I want; however, the MonthYear dummy variables do not follow chronological order for the whole dataset (instead, it follows chronological order per member). I would like for the MonthYear dummy variables to go in chronological order for the whole dataset. Also, I would like SAS to input certain values for each dummy MonthYear depending on a certain condition of the member.

 

For every member, I want to assign "." for people not yet enrolled in the program (my analysis starts before some people enroll), "1" for people enrolled in the program at that particular month in time, and "0" if the person has been incorporated into the analysis but not enrolled in the program at that particular month.

 

Taking dataset A, here's how I want the final dataset to look like (note, it does not include all the dummy MonthYear variables of my study because I can't fit them all in here):

 

ID

SEX

DATE 2010 10

DATE 2010 11

DATE 2010 12

DATE 2011 01

DATE 2011 02

DATE 2011 03

DATE 2011 04

DATE 2011 05

DATE 2011 06

DATE 2011 07

DATE 2011 08

DATE 2011 09

DATE 2011 10

DATE 2011 11

1A

F

1

1

1

0

0

0

0

0

0

0

0

0

0

0

1B

M

.

.

.

1

0

1

0

1

1

0

0

0

1

1

1C

F

.

.

.

.

.

.

.

.

.

.

.

1

0

1

1D

M

1

0

0

0

0

0

0

0

0

0

0

0

0

0

 

 

I've been trying to figure this out all day and just can't seem to overcome the hurdle on my own.

 

I am using SAS 9.4

 

Muchas Gracias


Accepted Solutions
Solution
‎02-17-2017 06:37 PM
Trusted Advisor
Posts: 1,022

Re: Not sure if I should use ARRAY and/or TRANSPOSE

I don't know if my datast NEED is exactly the dummy set Peter had in mind, but I think it is transpose-friendly.

 

The idea is to make dataset need as 1 record for each month from OCT2009 through SEP2015 with the dummy var set to one for months in which eligible_dt occurs.  The "trick" is to use standard dates - always the first of the month for managing the calendar.

 

The other trick is to loop date variable d from the lagged value of eligible_dt to the month prior to the current eligible_dt, outputting a record each time.

 

data have;
  input ID :$2. SEX :$1. Eligible_DT :mmddyy10.;
  format eligible_dt date9.;
datalines;
1A F 10/01/2010
1A F 11/20/2010
1A F 12/05/2010
1B M 01/01/2011
1B M 03/06/2011
1B M 05/12/2011
1B M 06/14/2011
1B M 10/15/2011
1B M 11/27/2011
1C F 09/16/2011
1C F 11/30/2011
1D M 10/01/2010
run;

data need (keep=id sex monthname dummy d);
  set have;
  by id;
  if eod2=0 then set have (firstobs=2 keep=eligible_dt rename=(eligible_dt=nxtdt)) end=eod2;

  d1=intnx('month',eligible_dt,0,'beg');

  dummy=0;
  do d= ifn(first.id,'01oct2009'd,intnx('month',lag(d1),1,'beg')) by 0 while (d<d1);
    monthname=put(d,yymon7.);
    output;
    d=intnx('month',d,1,'beg');
  end;

  dummy=1; d=d1; monthname=put(d,yymon7.); output;

  dummy=0;
  if last.id then do d=  intnx('month',d1,1,'beg') by 0 while (d<='01sep2015'd);
    monthname=put(d,yymon7.);
    output;
    d=intnx('month',d,1,'beg');
  end;

  format d yymmddn8.;
run;


proc transpose data=need out=want (drop=_name_);
  by id sex ;
  var dummy;
  id monthname;
run;

 

 

Note the expressions like

   do d=xxx by 0 while (d<yyyy);

 

are identical to

   d=xxx;

   do while (d<yyyy);

View solution in original post


All Replies
Valued Guide
Posts: 2,177

Re: Not sure if I should use ARRAY and/or TRANSPOSE

Posted in reply to TXSASneophyte
I would work on generating the column names for an array.
However, if you prefer the transpose approach then prefix your data with a dummy set (ID missing) with all the dates you want in the result. PROC TRANSPOSE will use this prefix data to organise the column column order and the following, proper data, will be placed where you want it. Apply a WHERE clause on the OUT= data to remove the dummy row with missing ID
Contributor
Posts: 45

Re: Not sure if I should use ARRAY and/or TRANSPOSE

Thanks for the response! Unfortunately I have no clue what you just said, and definitely no idea how to execute what you're suggesting. 

Contributor
Posts: 45

Re: Not sure if I should use ARRAY and/or TRANSPOSE

Posted in reply to TXSASneophyte

I think I understand what you mean but can't seem to execute it correctly. What exactly goes in the prefix? And what do i put in the where statement? 

Solution
‎02-17-2017 06:37 PM
Trusted Advisor
Posts: 1,022

Re: Not sure if I should use ARRAY and/or TRANSPOSE

I don't know if my datast NEED is exactly the dummy set Peter had in mind, but I think it is transpose-friendly.

 

The idea is to make dataset need as 1 record for each month from OCT2009 through SEP2015 with the dummy var set to one for months in which eligible_dt occurs.  The "trick" is to use standard dates - always the first of the month for managing the calendar.

 

The other trick is to loop date variable d from the lagged value of eligible_dt to the month prior to the current eligible_dt, outputting a record each time.

 

data have;
  input ID :$2. SEX :$1. Eligible_DT :mmddyy10.;
  format eligible_dt date9.;
datalines;
1A F 10/01/2010
1A F 11/20/2010
1A F 12/05/2010
1B M 01/01/2011
1B M 03/06/2011
1B M 05/12/2011
1B M 06/14/2011
1B M 10/15/2011
1B M 11/27/2011
1C F 09/16/2011
1C F 11/30/2011
1D M 10/01/2010
run;

data need (keep=id sex monthname dummy d);
  set have;
  by id;
  if eod2=0 then set have (firstobs=2 keep=eligible_dt rename=(eligible_dt=nxtdt)) end=eod2;

  d1=intnx('month',eligible_dt,0,'beg');

  dummy=0;
  do d= ifn(first.id,'01oct2009'd,intnx('month',lag(d1),1,'beg')) by 0 while (d<d1);
    monthname=put(d,yymon7.);
    output;
    d=intnx('month',d,1,'beg');
  end;

  dummy=1; d=d1; monthname=put(d,yymon7.); output;

  dummy=0;
  if last.id then do d=  intnx('month',d1,1,'beg') by 0 while (d<='01sep2015'd);
    monthname=put(d,yymon7.);
    output;
    d=intnx('month',d,1,'beg');
  end;

  format d yymmddn8.;
run;


proc transpose data=need out=want (drop=_name_);
  by id sex ;
  var dummy;
  id monthname;
run;

 

 

Note the expressions like

   do d=xxx by 0 while (d<yyyy);

 

are identical to

   d=xxx;

   do while (d<yyyy);

Contributor
Posts: 45

Re: Not sure if I should use ARRAY and/or TRANSPOSE

Wow this is incredible! For the most part I see what you did (need a bit more time digesting it to fully understand it), but it does most of what I need!! 

 

One quick thing, is there a way to have "." (missing value) for the dummy MonthYear variables before the first enrolled date? Let me explain my rationale for this. For the analysis I want to do, members can be categorized into three groups: A) Member not in the study (which by default means they are not in the program) which I want to code as "." (missing); B) Member is in the study but not enrolled in the program, which I want to code as zero; C) Member is in the study and enrolled in the program, which I want to code as one. Your current code captures groups B and C perfectly! However, it also codes zero in places I want missing values.  Like this, when SAS runs the analysis, it can correctly calculate the results.

 

Sorry for the complexity with all this and do appreciate your help!   

 

 

 

 

 

Contributor
Posts: 45

Re: Not sure if I should use ARRAY and/or TRANSPOSE

Posted in reply to TXSASneophyte

Actually, nevermind! I thought my analysis some more and realized the way you set up the dataset is exactly how I need it! Ignore my message about needing missing values. 

 

Thanks for the help! 

Valued Guide
Posts: 2,177

Re: Not sure if I should use ARRAY and/or TRANSPOSE

Posted in reply to TXSASneophyte
Data prefix ;
Id = '..' ;
Sef
X = 'm' ;
Do eligible_dt = '1Oct2009'd to '29Sep2015'd by 31 ;
Output ;
End ;
Run;
Data combined ;
Set prefix your.data ;
Eligible_dt = intnx( 'month', eligible_dt, 0 );
Run ;

Proc transpose data= combined
Out= bettershape( where=( id ne '..' )) ;
Id id sex ;
Var eligible_dt ;
Run ;

This is not tested and only represents the ideas.
..... designed to indicate building and using a prefix file to control column order in transpose.
Good luck
Valued Guide
Posts: 2,177

Re: Not sure if I should use ARRAY and/or TRANSPOSE

Posted in reply to TXSASneophyte
sorry about my mobile typos
Sef
X = 'm'

Should have been
sex= 'm' ;


peterC
Valued Guide
Posts: 2,177

Re: Not sure if I should use ARRAY and/or TRANSPOSE

Posted in reply to TXSASneophyte
Sounds like I need to look at this tomorrow
Contributor
Posts: 45

Re: Not sure if I should use ARRAY and/or TRANSPOSE

Hey Peter_C, I really appreciate you taking time off your day to help me with this. I did try your code but it didn't run smoothly for me. I suspect there was something I needed to adapt on my end but failed to do so. Since MKeintz's solution works for me, I decided to go ahead and use that solution. You don't have to spend anymore time on this problem unless you really just want to challenge yourself.

 

Thanks for the assistance!

 

 

Super User
Super User
Posts: 7,076

Re: Not sure if I should use ARRAY and/or TRANSPOSE

[ Edited ]
Posted in reply to TXSASneophyte

Bascially take the min/max month for each id and use that to blow up to all possible months. 

First let's start with your sample data and convert the dates you have to the first of the month.

data have ;
  input id :$2. sex $1. eligdt :yymmdd10. @@ ;
  month = intx('month',eligdt,0,'b');
  format eligdt month yymmdd10.;
cards;
1A F 2010-10-01 1A F 2010-11-20 1A F 2010-12-05
1B M 2011-01-01 1B M 2011-03-06 1B M 2011-05-12 
1B M 2011-06-14 1B M 2011-10-15 1B M 2011-11-27
1C F 2011-09-16 1C F 2011-11-30
1D M 2010-10-01
;;;;

Now a quick PROC SUMMARY step will create a table with min/max month for each ID and add an extra row for the overall min/max.

proc summary data=have ;
  class id sex ;
  types () id*sex ;
  var month ;
  output out=ranges min=first_month max=last_month ;
run;

Now add a data step to blow that out into one record for each month in the intervals.

data all_dates ;
  set ranges ;
  do i=0 to intck('month',first_month,last_month) ;
    month = intnx('month',first_month,i,'b');
    format month yymmdd10. ;
    output;
  end;
run;

Then merge it back with the source table so that we can create the flag variable with the .,0,1 values.  We can also convert the MONTH into a value that makes a nice SAS variable name, like YyyyyMmm .

data all ;
  merge all_dates (in=in1) have (in=in2);
  by id month ;
  flag = in2 ;
  length _name_ $8 ;
  _name_ = cats('Y',put(month,yymm7.));
run;

And finally use PROC TRANSPOSE to create the dummy variables, use a WHERE= dataset option to eliminate the dummy row that is setting the overall list of months.

proc transpose data=all out=want(drop=_: where=(not missing(id))) ;
  by id sex ;
  var flag ;
run;
                Y    Y    Y    Y    Y    Y    Y    Y    Y    Y    Y    Y    Y    Y
                2    2    2    2    2    2    2    2    2    2    2    2    2    2
                0    0    0    0    0    0    0    0    0    0    0    0    0    0
                1    1    1    1    1    1    1    1    1    1    1    1    1    1
                0    0    0    1    1    1    1    1    1    1    1    1    1    1
O          s    M    M    M    M    M    M    M    M    M    M    M    M    M    M
b    i     e    1    1    1    0    0    0    0    0    0    0    0    0    1    1
s    d     x    0    1    2    1    2    3    4    5    6    7    8    9    0    1

1    1A    F    1    1    1    .    .    .    .    .    .    .    .    .    .    .
2    1B    M    .    .    .    1    0    1    0    1    1    0    0    0    1    1
3    1C    F    .    .    .    .    .    .    .    .    .    .    .    1    0    1
4    1D    M    1    .    .    .    .    .    .    .    .    .    .    .    .    .

 

Contributor
Posts: 45

Re: Not sure if I should use ARRAY and/or TRANSPOSE

Thanks for the response! I'll try it on Monday when I get back to work to see how it goes!!

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 12 replies
  • 179 views
  • 0 likes
  • 4 in conversation