Solved
Contributor
Posts: 23

# counting distinct episodes

Hello all,

I'm stuck and my google searches are spiraling out of control.  I need to count the distinct episodes (and number of days in each episode) in my dataset.  What I have so far looks like this:

data have;

input scrssn drugclass \$ day1 day2 day3 day4 day5 day6 day7;

datalines ;

1 CS200 1 1 1 0 0 2 2

1 CB300 0 0 0 1 1 1 1

1 BB100 1 1 1 1 1 0 0

2 CC600 0 0 0 0 0 0 1

2 FR900 1 1 1 2 2 2 2

2 GE200 1 1 2 2 1 1 1

run;

What I want is the number of days in each discinct episode, where episode is any block of contiguous number of days >0 (without changes).  So for the above dataset I want:

data want;

datalines ;

1 CS200 1 3

1 CS200 2 2

1 CB300 1 4

1 BB100 1 5

2 CC600 1 1

2 FR900 1 3

2 FR900 2 4

2 GE200 1 2

2 GE200 2 2

2 GE200 3 3

run;

Any ideas??

Much thanks,

Kelly

Accepted Solutions
Solution
‎10-15-2015 09:07 AM
Posts: 5,535

## Re: counting distinct episodes

One of many ways to do this:

``````
proc transpose data=have out=temp;
var day1-day7;
run;

data want;
do ndays = 1 by 1 until (last.col1);
set temp; by scrssn drugclass col1 notsorted;
if first.drugclass then episode = 0;
end;
if col1 > 0 then do;
episode + 1;
output;
end;
run;``````
PG

All Replies
Solution
‎10-15-2015 09:07 AM
Posts: 5,535

## Re: counting distinct episodes

One of many ways to do this:

``````
proc transpose data=have out=temp;
var day1-day7;
run;

data want;
do ndays = 1 by 1 until (last.col1);
set temp; by scrssn drugclass col1 notsorted;
if first.drugclass then episode = 0;
end;
if col1 > 0 then do;
episode + 1;
output;
end;
run;``````
PG
Valued Guide
Posts: 765

## Re: counting distinct episodes

Hi, another approach that uses a transposed data set ...

proc transpose data=have out=temp (where=(col1 ne 0) drop=_: );
var day1-day7;
run;

data want (keep=scrssn drugclass ndays episode);

do episode = 1 by 1 until (last.drugclass);
do ndays   = 1 by 1 until (last.col1);
set temp;
end;
output;
end;
run;

Super User
Posts: 6,781

## Re: counting distinct episodes

Using an array should make this feasible.  See how close this comes ...

data want;

set have;

array day {7};

episode=day1;

ndays=1;

do i=2 to 7;

if day{i} ne episode then do;

if episode > 0 then output;

ndays=1;

episode=day{i};

end;

else ndays + 1;

end;

if episode > 0 then output;

run;

Valued Guide
Posts: 765

## Re: counting distinct episodes

[ Edited ]

Hi, another idea with some explanation ...

#1  concatenate all the non-zero values of day1-day7 and add an 'X' to the end of the string  (DAYS, a character variable)

#2  use a loop to search the string DAYS

#3  start the search at position 1 in the string and look for the first character in the string

#4  search for the first NON-OCCURENCE of the first character (use the modifier 'k' in the FINDC function)

#5  use the value of START to compute NDAYS and output an observation

#6  repeat steps 3 through 5, but start the search at at a new location and search for a different value

#7  stop searching when you encounter the 'X' at the end of the string

data have;
datalines ;
1 CS200 1 1 1 0 0 2 2
1 CB300 0 0 0 1 1 1 1
1 BB100 1 1 1 1 1 0 0
2 CC600 0 0 0 0 0 0 1
2 FR900 1 1 1 2 2 2 2
2 GE200 1 1 2 2 1 1 1
;

data want (keep=scrssn drugclass episode ndays);
length days \$10;
set have;
days  = compress(catt(of day:,'X' ),'0');
start = 1;
hold  = 1;
do episode=1 by 1 until(char(days,start) eq 'X');
start = findc(days,char(days,start),'k',start);
ndays = start - hold;
hold  = start;
output;
end;
run;

data set WANT ...

1       1        CS200         1         3
2       1        CS200         2         2
3       1        CB300         1         4
4       1        BB100         1         5
5       2        CC600         1         1
6       2        FR900         1         3
7       2        FR900         2         4
8       2        GE200         1         2
9       2        GE200         2         2
10       2        GE200         3         3

Contributor
Posts: 23