Hello all,
I'm stuck and my google searches are spiraling out of control. I need to count the distinct episodes (and number of days in each episode) in my dataset. What I have so far looks like this:
data have;
input scrssn drugclass $ day1 day2 day3 day4 day5 day6 day7;
datalines ;
1 CS200 1 1 1 0 0 2 2
1 CB300 0 0 0 1 1 1 1
1 BB100 1 1 1 1 1 0 0
2 CC600 0 0 0 0 0 0 1
2 FR900 1 1 1 2 2 2 2
2 GE200 1 1 2 2 1 1 1
run;
What I want is the number of days in each discinct episode, where episode is any block of contiguous number of days >0 (without changes). So for the above dataset I want:
data want;
input scrssn drugclass episode ndays;
datalines ;
1 CS200 1 3
1 CS200 2 2
1 CB300 1 4
1 BB100 1 5
2 CC600 1 1
2 FR900 1 3
2 FR900 2 4
2 GE200 1 2
2 GE200 2 2
2 GE200 3 3
run;
Any ideas??
Much thanks,
Kelly
One of many ways to do this:
proc transpose data=have out=temp;
by scrssn drugclass notsorted;
var day1-day7;
run;
data want;
do ndays = 1 by 1 until (last.col1);
set temp; by scrssn drugclass col1 notsorted;
if first.drugclass then episode = 0;
end;
if col1 > 0 then do;
episode + 1;
output;
end;
keep scrssn drugclass episode ndays;
run;
One of many ways to do this:
proc transpose data=have out=temp;
by scrssn drugclass notsorted;
var day1-day7;
run;
data want;
do ndays = 1 by 1 until (last.col1);
set temp; by scrssn drugclass col1 notsorted;
if first.drugclass then episode = 0;
end;
if col1 > 0 then do;
episode + 1;
output;
end;
keep scrssn drugclass episode ndays;
run;
Hi, another approach that uses a transposed data set ...
proc transpose data=have out=temp (where=(col1 ne 0) drop=_: );
by scrssn drugclass notsorted;
var day1-day7;
run;
data want (keep=scrssn drugclass ndays episode);
do episode = 1 by 1 until (last.drugclass);
do ndays = 1 by 1 until (last.col1);
set temp;
by scrssn drugclass col1 notsorted;
end;
output;
end;
run;
Using an array should make this feasible. See how close this comes ...
data want;
set have;
array day {7};
episode=day1;
ndays=1;
do i=2 to 7;
if day{i} ne episode then do;
if episode > 0 then output;
ndays=1;
episode=day{i};
end;
else ndays + 1;
end;
if episode > 0 then output;
run;
Hi, another idea with some explanation ...
#1 concatenate all the non-zero values of day1-day7 and add an 'X' to the end of the string (DAYS, a character variable)
#2 use a loop to search the string DAYS
#3 start the search at position 1 in the string and look for the first character in the string
#4 search for the first NON-OCCURENCE of the first character (use the modifier 'k' in the FINDC function)
#5 use the value of START to compute NDAYS and output an observation
#6 repeat steps 3 through 5, but start the search at at a new location and search for a different value
#7 stop searching when you encounter the 'X' at the end of the string
data have;
input scrssn drugclass :$5. day1-day7;
datalines ;
1 CS200 1 1 1 0 0 2 2
1 CB300 0 0 0 1 1 1 1
1 BB100 1 1 1 1 1 0 0
2 CC600 0 0 0 0 0 0 1
2 FR900 1 1 1 2 2 2 2
2 GE200 1 1 2 2 1 1 1
;
data want (keep=scrssn drugclass episode ndays);
length days $10;
set have;
days = compress(catt(of day:,'X' ),'0');
start = 1;
hold = 1;
do episode=1 by 1 until(char(days,start) eq 'X');
start = findc(days,char(days,start),'k',start);
ndays = start - hold;
hold = start;
output;
end;
run;
data set WANT ...
Obs scrssn drugclass episode ndays
1 1 CS200 1 3
2 1 CS200 2 2
3 1 CB300 1 4
4 1 BB100 1 5
5 2 CC600 1 1
6 2 FR900 1 3
7 2 FR900 2 4
8 2 GE200 1 2
9 2 GE200 2 2
10 2 GE200 3 3
These are all great, thank you SO much. I will try all of them (great SAS/coding experience for me)--so far I've used the first code/suggestion with my data and it works perfectly. So much SAS to learn...
Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.
Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.