counting distinct episodes

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 19
Accepted Solution

counting distinct episodes

Hello all,

 

I'm stuck and my google searches are spiraling out of control.  I need to count the distinct episodes (and number of days in each episode) in my dataset.  What I have so far looks like this:

 

data have;

      input scrssn drugclass $ day1 day2 day3 day4 day5 day6 day7;

      datalines ;

      1 CS200 1 1 1 0 0 2 2

      1 CB300 0 0 0 1 1 1 1

      1 BB100 1 1 1 1 1 0 0

      2 CC600 0 0 0 0 0 0 1

      2 FR900 1 1 1 2 2 2 2

      2 GE200 1 1 2 2 1 1 1

run;

 

What I want is the number of days in each discinct episode, where episode is any block of contiguous number of days >0 (without changes).  So for the above dataset I want:

 

data want;

      input scrssn drugclass episode ndays;

      datalines ;

      1 CS200 1 3

      1 CS200 2 2

      1 CB300 1 4

      1 BB100 1 5

      2 CC600 1 1

      2 FR900 1 3

      2 FR900 2 4

      2 GE200 1 2

      2 GE200 2 2

      2 GE200 3 3

run;

 

Any ideas??

 

Much thanks,

Kelly


Accepted Solutions
Solution
‎10-15-2015 09:07 AM
Respected Advisor
Posts: 4,606

Re: counting distinct episodes

One of many ways to do this:

 


proc transpose data=have out=temp;
by scrssn drugclass notsorted;
var day1-day7;
run;

data want;
do ndays = 1 by 1 until (last.col1);
    set temp; by scrssn drugclass col1 notsorted;
    if first.drugclass then episode = 0;
    end;
if col1 > 0 then do;
    episode + 1;
    output;
    end;
keep scrssn drugclass episode ndays;
run;
PG

View solution in original post


All Replies
Solution
‎10-15-2015 09:07 AM
Respected Advisor
Posts: 4,606

Re: counting distinct episodes

One of many ways to do this:

 


proc transpose data=have out=temp;
by scrssn drugclass notsorted;
var day1-day7;
run;

data want;
do ndays = 1 by 1 until (last.col1);
    set temp; by scrssn drugclass col1 notsorted;
    if first.drugclass then episode = 0;
    end;
if col1 > 0 then do;
    episode + 1;
    output;
    end;
keep scrssn drugclass episode ndays;
run;
PG
Valued Guide
Posts: 763

Re: counting distinct episodes

Hi, another approach that uses a transposed data set ...

 

proc transpose data=have out=temp (where=(col1 ne 0) drop=_: );
by scrssn drugclass notsorted;
var day1-day7;
run;

 

data want (keep=scrssn drugclass ndays episode);

do episode = 1 by 1 until (last.drugclass);
do ndays   = 1 by 1 until (last.col1);
  set temp; 
  by scrssn drugclass col1 notsorted;
end;
  output;
end;
run;

Respected Advisor
Posts: 4,972

Re: counting distinct episodes

Using an array should make this feasible.  See how close this comes ...

 

data want;

set have;

array day {7};

episode=day1;

ndays=1;

do i=2 to 7;

   if day{i} ne episode then do;

      if episode > 0 then output;

      ndays=1;

      episode=day{i};

   end;

   else ndays + 1;

end;

if episode > 0 then output;

run;

Valued Guide
Posts: 763

Re: counting distinct episodes

[ Edited ]

Hi, another idea with some explanation ...

 

#1  concatenate all the non-zero values of day1-day7 and add an 'X' to the end of the string  (DAYS, a character variable)

#2  use a loop to search the string DAYS

#3  start the search at position 1 in the string and look for the first character in the string

#4  search for the first NON-OCCURENCE of the first character (use the modifier 'k' in the FINDC function)

#5  use the value of START to compute NDAYS and output an observation

#6  repeat steps 3 through 5, but start the search at at a new location and search for a different value

#7  stop searching when you encounter the 'X' at the end of the string 

 

data have;
input scrssn drugclass :$5. day1-day7;
datalines ;
1 CS200 1 1 1 0 0 2 2
1 CB300 0 0 0 1 1 1 1
1 BB100 1 1 1 1 1 0 0
2 CC600 0 0 0 0 0 0 1
2 FR900 1 1 1 2 2 2 2
2 GE200 1 1 2 2 1 1 1
;


data want (keep=scrssn drugclass episode ndays);
length days $10;
set have;
days  = compress(catt(of day:,'X' ),'0');
start = 1;
hold  = 1;
do episode=1 by 1 until(char(days,start) eq 'X');
   start = findc(days,char(days,start),'k',start);
   ndays = start - hold;
   hold  = start;
   output;
end;
run;

 

data set WANT ...

Obs    scrssn    drugclass    episode    ndays

  1       1        CS200         1         3
  2       1        CS200         2         2
  3       1        CB300         1         4
  4       1        BB100         1         5
  5       2        CC600         1         1
  6       2        FR900         1         3
  7       2        FR900         2         4
  8       2        GE200         1         2
  9       2        GE200         2         2
 10       2        GE200         3         3

Occasional Contributor
Posts: 19

Re: counting distinct episodes

These are all great, thank you SO much.  I will try all of them (great SAS/coding experience for me)--so far I've used the first code/suggestion with my data and it works perfectly.  So much SAS to learn...

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 5 replies
  • 394 views
  • 4 likes
  • 4 in conversation