DATA Step, Macro, Functions and more

Array help

Reply
Contributor
Posts: 57

Array help

I have a data test and I am trying to find # months it takes for an ID to change status from 1/0.

For e.g. for ID= XX, I want to create M1,M2,M3,......M12 variables

where M1 = # times gap = 1 in status change from last status

          M2 = # times gap = 2 in status change from last status

and so on. Please advise.

 

data test;

input ID $2. mth1 mth2 mth3 mth4 mth5 mth6 mth7 mth8 mth9 mth10 mth11 mth12;

infile datalines;

datalines;

XX 1 1 0 1 1 1 1 1 0 0 0 1

YY 0 0 0 1 1 1 1 0 1 0 0 0

ZZ 0 0 0 1 1 1 1 0 1 0 0 0

WW 0 0 1 1 0 0 0 0 1 0 0 0

PP 1 1 1 0 0 0 0 1 0 1 1 1

;

PROC PRINT;

RUN;

 

Super User
Posts: 5,516

Re: Array help

Posted in reply to Siddharth123

Best advice:  give an examle of what the result should look like.  It's a little tough to figure out from your description.

SAS Super FREQ
Posts: 3,755

Re: Array help

Posted in reply to Siddharth123

I predict that someone will answer this soon, but I wanted to mention that this analysis is closely related to the "Runs Test". There are many ways to implement the Runs Test in SAS., including the DATA step. I wrote an article about how to count the number of runs in a binary sequence, but it uses SAS/IML whereas I assume you want to use the DATA step.  Depending on what you are trying to accomplish, the articles that I linked to might help you analyze your data.

Occasional Contributor
Posts: 19

Re: Array help

Posted in reply to Siddharth123

Hi

 

You really need to provide an example of what results you are expecting to see to help the community provide and answer.

 

However, I have a potential solution that should work based on your needs and I have added plenty of comments.

 

If you have any questions then drop me a message.

 

Cheers

 

Chris

 

 

data have /* (drop = gap: i) */;

/*

There are four possible solutions based on information provided and depend on
the specific requirements or answers to these questions (both based on using the first
row of your input with ID of XX as the example);

Q1. Does the month of status change count as a gap of 0 or 1?
In example mth2 = 1 and mth 3 = 0, so status change in mth3 but is this gap counted as 0 or 1?

Q2. Do you want to count the intermediate gap values or not?
In example, mth3 = 0, mth4 to mth8 all = 1, so assuming month of
status change counts as 1, gap @ mth4 = 1, @ mth5 = 2, @ mth6 = 3, @ mth7 = 4 & @mth8 = 5.
So do you want to count the gap for mth4 and mth5 and mth6 and mth7 and mth8, or just mth8?

If answer to Q1 is 0, change the two lines with comment <=== Q1

If answer to Q2 is count only maximum gap for a sequence uncomment the line with comment <===Q2

*/

infile datalines;
input ID $2. mth1-mth12;

* define array to hold the mths data as read in (makes it easier to process);
array mth(12);
* define array to hold the calculated gaps;
array gap(12);
* define array to hold the count of the different gaps as calculated;
array m(12) (12 * 0);

* gap(1) will always be the same value on each record as no prior month to compare against;
gap(1) = 1; /* <=== Q1 */

* work through the rest of months and calculate the gaps;
* if change from previous month then set gap to 1 (or 0 if change counts as a gap of 0 and gap(1) is set to 0);
* if no change add 1 to previous gap;
* if you only want to count the maximum gap between status changes then reset the previous gap to 0;
* e.g. in row with ID XX, mth8 has a gap of 5 months since previous change;

do i = 2 to 12;
    if mth(i) ne mth(i-1) then
        gap(i) = 1; /* <=== Q1 */
    else do;
        gap(i) = gap(i-1) + 1;
        *gap(i - 1) = 0; /* <=== Q2 */
    end;
end;

* cycle through gaps and increment appropriate gap counter;
* add one to the appropriate gap counter, e.g. if value of gap8 is 5 add 1 to m5;
do i = 1 to 12;
    if gap(i) gt 0 then
        m(gap(i)) + 1;
end;

datalines;
XX 1 1 0 1 1 1 1 1 0 0 0 1
YY 0 0 0 1 1 1 1 0 1 0 0 0
ZZ 0 0 0 1 1 1 1 0 1 0 0 0
WW 0 0 1 1 0 0 0 0 1 0 0 0
PP 1 1 1 0 0 0 0 1 0 1 1 1
;

Attachment
Super Contributor
Posts: 298

Re: Array help

Posted in reply to Siddharth123

Here is one way. Look at the data set WANT. The STATUS takes either 0 or 1. The COUNT counts the STATUS until next change.

I am not sure how you want the M1, M2,  ... are to filled. Show how you want to fit to it from the WANT dat set.

 

data test;
input ID $2. mth1 mth2 mth3 mth4 mth5 mth6 mth7 mth8 mth9 mth10 mth11 mth12;
infile datalines;
datalines;
XX 1 1 0 1 1 1 1 1 0 0 0 1
YY 0 0 0 1 1 1 1 0 1 0 0 0
ZZ 0 0 0 1 1 1 1 0 1 0 0 0
WW 0 0 1 1 0 0 0 0 1 0 0 0
PP 1 1 1 0 0 0 0 1 0 1 1 1
;
run;

data want;
      set test ;
      array k[*] mth1 - mth12;
      status = .;
      count = 0;
      do i = 2 to dim(k);
         if k[i] = k[i-1] then count + 1;
         else do;
            count + 1;
            status = k[i - 1];
            output;
            /*put ID =  k[i-1] = count =; */
            count = 0;
         end;
      end;
keep ID status count;
run;
Ask a Question
Discussion stats
  • 4 replies
  • 253 views
  • 2 likes
  • 5 in conversation