BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
kstolzmann
Obsidian | Level 7

Hello all,

 

I'm stuck and my google searches are spiraling out of control.  I need to count the distinct episodes (and number of days in each episode) in my dataset.  What I have so far looks like this:

 

data have;

      input scrssn drugclass $ day1 day2 day3 day4 day5 day6 day7;

      datalines ;

      1 CS200 1 1 1 0 0 2 2

      1 CB300 0 0 0 1 1 1 1

      1 BB100 1 1 1 1 1 0 0

      2 CC600 0 0 0 0 0 0 1

      2 FR900 1 1 1 2 2 2 2

      2 GE200 1 1 2 2 1 1 1

run;

 

What I want is the number of days in each discinct episode, where episode is any block of contiguous number of days >0 (without changes).  So for the above dataset I want:

 

data want;

      input scrssn drugclass episode ndays;

      datalines ;

      1 CS200 1 3

      1 CS200 2 2

      1 CB300 1 4

      1 BB100 1 5

      2 CC600 1 1

      2 FR900 1 3

      2 FR900 2 4

      2 GE200 1 2

      2 GE200 2 2

      2 GE200 3 3

run;

 

Any ideas??

 

Much thanks,

Kelly

1 ACCEPTED SOLUTION

Accepted Solutions
PGStats
Opal | Level 21

One of many ways to do this:

 


proc transpose data=have out=temp;
by scrssn drugclass notsorted;
var day1-day7;
run;

data want;
do ndays = 1 by 1 until (last.col1);
    set temp; by scrssn drugclass col1 notsorted;
    if first.drugclass then episode = 0;
    end;
if col1 > 0 then do;
    episode + 1;
    output;
    end;
keep scrssn drugclass episode ndays;
run;
PG

View solution in original post

5 REPLIES 5
PGStats
Opal | Level 21

One of many ways to do this:

 


proc transpose data=have out=temp;
by scrssn drugclass notsorted;
var day1-day7;
run;

data want;
do ndays = 1 by 1 until (last.col1);
    set temp; by scrssn drugclass col1 notsorted;
    if first.drugclass then episode = 0;
    end;
if col1 > 0 then do;
    episode + 1;
    output;
    end;
keep scrssn drugclass episode ndays;
run;
PG
MikeZdeb
Rhodochrosite | Level 12

Hi, another approach that uses a transposed data set ...

 

proc transpose data=have out=temp (where=(col1 ne 0) drop=_: );
by scrssn drugclass notsorted;
var day1-day7;
run;

 

data want (keep=scrssn drugclass ndays episode);

do episode = 1 by 1 until (last.drugclass);
do ndays   = 1 by 1 until (last.col1);
  set temp; 
  by scrssn drugclass col1 notsorted;
end;
  output;
end;
run;

Astounding
PROC Star

Using an array should make this feasible.  See how close this comes ...

 

data want;

set have;

array day {7};

episode=day1;

ndays=1;

do i=2 to 7;

   if day{i} ne episode then do;

      if episode > 0 then output;

      ndays=1;

      episode=day{i};

   end;

   else ndays + 1;

end;

if episode > 0 then output;

run;

MikeZdeb
Rhodochrosite | Level 12

Hi, another idea with some explanation ...

 

#1  concatenate all the non-zero values of day1-day7 and add an 'X' to the end of the string  (DAYS, a character variable)

#2  use a loop to search the string DAYS

#3  start the search at position 1 in the string and look for the first character in the string

#4  search for the first NON-OCCURENCE of the first character (use the modifier 'k' in the FINDC function)

#5  use the value of START to compute NDAYS and output an observation

#6  repeat steps 3 through 5, but start the search at at a new location and search for a different value

#7  stop searching when you encounter the 'X' at the end of the string 

 

data have;
input scrssn drugclass :$5. day1-day7;
datalines ;
1 CS200 1 1 1 0 0 2 2
1 CB300 0 0 0 1 1 1 1
1 BB100 1 1 1 1 1 0 0
2 CC600 0 0 0 0 0 0 1
2 FR900 1 1 1 2 2 2 2
2 GE200 1 1 2 2 1 1 1
;


data want (keep=scrssn drugclass episode ndays);
length days $10;
set have;
days  = compress(catt(of day:,'X' ),'0');
start = 1;
hold  = 1;
do episode=1 by 1 until(char(days,start) eq 'X');
   start = findc(days,char(days,start),'k',start);
   ndays = start - hold;
   hold  = start;
   output;
end;
run;

 

data set WANT ...

Obs    scrssn    drugclass    episode    ndays

  1       1        CS200         1         3
  2       1        CS200         2         2
  3       1        CB300         1         4
  4       1        BB100         1         5
  5       2        CC600         1         1
  6       2        FR900         1         3
  7       2        FR900         2         4
  8       2        GE200         1         2
  9       2        GE200         2         2
 10       2        GE200         3         3

kstolzmann
Obsidian | Level 7

These are all great, thank you SO much.  I will try all of them (great SAS/coding experience for me)--so far I've used the first code/suggestion with my data and it works perfectly.  So much SAS to learn...

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 967 views
  • 4 likes
  • 4 in conversation