hi all:
the below data set -have
data have; input id route $ value; datalines; 101 intro 10.00 101 intro 30.00 101 oral 2.50 101 intro 11.00 101 intro 20.00 101 intro 42.00 101 intro 25.00 101 oral 5.00 101 intro 29.00 102 intro 10.00 102 intro . 102 oral 1.25 102 intro 10.00 102 intro 45.00 102 oral 3.00 ; run;
Task : to find out when route="oral" but NO proceeding value>40
the final dataset -Want look like this
101 intro 10.00
101 intro 30.00
101 oral 2.50
102 intro 10.00
102 intro .
102 oral 1.25
thank you ,
purpleclothlady
It can be done in a single data step. like this:
data want;
set have;
by id;
if first.id then do;
_start=_N_;
_max_value=value;
end;
else
_max_value=max(value,_max_value);
if route='oral';
if _max_value<=40 then do _p=_start to _N_;
set have point=_p;
output;
end;
_start=_N_+1;
_max_value=0;
retain _:;
drop _:;
run;
data intermediate;
set have;
by id;
retain maxvalue;
if first.id then maxvalue=0;
if value>maxvalue then maxvalue=value;
prev_route=lag(route);
if prev_route='oral' or first.id then group+1;
drop prev_route;
run;
proc summary nway data=intermediate(where=(route='oral'));
class group;
var maxvalue;
output out=intermediate2(drop=_:) max=groupmax;
run;
data want;
merge intermediate intermediate2;
by group;
if groupmax>40 or groupmax=. then delete;
run;
This works. I tested. thanks so much.
it is a solution too , but I don't know how to set it as another solution. this one I think is easier for me to understand.
I added comment
data intermediate; set have; by id; retain maxvalue; /*reset for maxvalue for each id*/ if first.id then maxvalue=0; /*compare value vs Retained maxvalue*/ if value>maxvalue then maxvalue=value; /*can we use lag funcation for character*/ prev_route=lag(route); /*assign route by group for later calculation*/ if prev_route='oral' or first.id then group+1; *drop prev_route; run; /*get the maxvalue by group-based on route="Oral"*/ proc summary nway data=intermediate(where=(route='oral')); class group; var maxvalue; output out=intermediate2(drop=_:) max=groupmax; run; /*merge two datesets to get the orginal data and delete those who met the criteria by group */ data want; merge intermediate intermediate2; by group; if groupmax>40 or groupmax=. then delete="DELETE"; run;
the final result and the intermediate steps look like this.
this is a great place to seek solution again many thanks @PaigeMiller @s_lassen
I don't know how to set it as another solution
@purpleclothlady , I don't think you can indicate it is another solution, so don't worry about it.
It can be done in a single data step. like this:
data want;
set have;
by id;
if first.id then do;
_start=_N_;
_max_value=value;
end;
else
_max_value=max(value,_max_value);
if route='oral';
if _max_value<=40 then do _p=_start to _N_;
set have point=_p;
output;
end;
_start=_N_+1;
_max_value=0;
retain _:;
drop _:;
run;
The first SET statement just reads the values and finds the first relevant observation for this subgroup (the _start pointer).
The second set statement is called when there is an 'oral' value and the maximum is not too high, and it reads and outputs the observations from the start of the subgroup up to and including the current observation (automatic variable _N_).
After an 'oral' observation (and. if relevant, output of the subgroup), the _max_value is set to 0 because we are starting a new subgroup, and the current calculated maximum is not part of the new subgroup. Also, the observation pointer _start is set to the next observation to be read, in case we do not get a new ID in the next observation. What happens after the DO loop is really the same as what happens at the top (when encountering a new ID), but it has to be done in both situations, unless we are absolutely sure that each ID group ends with an 'oral' observation.
If we are certain that the last observation in an ID group is an 'oral', the code can be shortened to this:
data want;
set have;
retain _max_value 0 _start 1;
_max_value=max(value,_max_value);
if route='oral';
if _max_value<=40 then do _p=_start to _N_;
set have point=_p;
output;
end;
_start=_N_+1;
_max_value=0;
retain _:;
drop _:;
run
as we do not need to worry about the ID changing without an 'oral' observation first.
I hope this explains the logic behind my quite terse (but efficient) code.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.