BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.

hi all:

the below data set -have

data have;
input id route $ value;
datalines;
101 intro 10.00 
101 intro 30.00 
101 oral 2.50 
101 intro 11.00 
101 intro 20.00 
101 intro 42.00 
101 intro 25.00 
101 oral 5.00 
101 intro 29.00 
102 intro 10.00 
102 intro .  
102 oral 1.25 
102 intro 10.00 
102 intro 45.00 
102 oral 3.00
;
run;

Task : to find out when route="oral" but NO proceeding value>40 

the final dataset -Want look like this 

101 intro 10.00

101 intro 30.00

101 oral 2.50

102 intro 10.00

102 intro .

102 oral 1.25

 

thank you ,

purpleclothlady

1 ACCEPTED SOLUTION

Accepted Solutions
s_lassen
Meteorite | Level 14

It can be done in a single data step. like this:

data want;
  set have;
  by id;
  if first.id then do;
    _start=_N_;
    _max_value=value;
    end;
  else
    _max_value=max(value,_max_value);
  if route='oral';
  if _max_value<=40 then do _p=_start to _N_;
    set have point=_p;
    output;
    end;
  _start=_N_+1;
  _max_value=0;
  retain _:;
  drop _:;
run;

View solution in original post

6 REPLIES 6
PaigeMiller
Diamond | Level 26
data intermediate;
    set have;
    by id;
    retain maxvalue;
    if first.id then maxvalue=0;
    if value>maxvalue then maxvalue=value;
    prev_route=lag(route);
    if prev_route='oral' or first.id then group+1;
    drop prev_route;
run;
proc summary nway data=intermediate(where=(route='oral'));
    class group;
    var maxvalue;
    output out=intermediate2(drop=_:) max=groupmax;
run;
data want;
    merge intermediate intermediate2;
    by group;
    if groupmax>40 or groupmax=. then delete;
run;
--
Paige Miller
purpleclothlady
Pyrite | Level 9

@PaigeMiller :

This works. I tested. thanks so much. 

it is a solution too , but I don't know how to set it as another solution. this one I think is easier for me to understand. 

I added comment

data intermediate;
set have;
by id;
retain maxvalue;
/*reset for maxvalue for each id*/
if first.id then maxvalue=0;

/*compare value vs Retained maxvalue*/
if value>maxvalue then maxvalue=value;

/*can we use lag funcation for character*/
prev_route=lag(route);

/*assign route by group for later calculation*/
if prev_route='oral' or first.id then group+1;
*drop prev_route;
run;

/*get the maxvalue by group-based on route="Oral"*/
proc summary nway data=intermediate(where=(route='oral'));
class group;
var maxvalue;
output out=intermediate2(drop=_:) max=groupmax;
run;

/*merge two datesets to get the orginal data and delete those who met the criteria by group */
data want;
merge intermediate
intermediate2;
by group;

if groupmax>40 or groupmax=. then delete="DELETE";
run;

the final result and the intermediate steps look like this. 

purpleclothlady_0-1674141116460.png

this is a great place to seek solution again many thanks @PaigeMiller @s_lassen 

 

 

PaigeMiller
Diamond | Level 26

I don't know how to set it as another solution

 

@purpleclothlady , I don't think you can indicate it is another solution, so don't worry about it.

--
Paige Miller
s_lassen
Meteorite | Level 14

It can be done in a single data step. like this:

data want;
  set have;
  by id;
  if first.id then do;
    _start=_N_;
    _max_value=value;
    end;
  else
    _max_value=max(value,_max_value);
  if route='oral';
  if _max_value<=40 then do _p=_start to _N_;
    set have point=_p;
    output;
    end;
  _start=_N_+1;
  _max_value=0;
  retain _:;
  drop _:;
run;
purpleclothlady
Pyrite | Level 9
Hi @s_lassen:
thank you for the solution. it worked. the code is short but it might take a while to digest the logic.
will you explain the logic , I don't quite understand the 2nd SET and why seting _max_value=0;
thanks so much.
purple
s_lassen
Meteorite | Level 14

The first SET statement just reads the values and finds the first relevant observation for this subgroup (the _start pointer).

The second set statement is called when there is an 'oral' value and the maximum is not too high, and it reads and outputs the observations from the start of the subgroup up to and including the current observation (automatic variable _N_). 

 

After an 'oral' observation (and. if relevant, output of the subgroup), the _max_value is set to 0 because we are starting a new subgroup, and the current calculated maximum is not part of the new subgroup. Also, the observation pointer _start is set to the next observation to be read, in case we do not get a new ID in the next observation. What happens after the DO loop is really the same as what happens at the top (when encountering a new ID), but it has to be done in both situations, unless we are absolutely sure that each ID group ends with an 'oral' observation.

 

If we are certain that the last observation in an ID group is an 'oral', the code can be shortened to this:

data want;
  set have;
  retain _max_value 0 _start 1;
  _max_value=max(value,_max_value);
  if route='oral';
  if _max_value<=40 then do _p=_start to _N_;
    set have point=_p;
    output;
    end;
  _start=_N_+1;
  _max_value=0;
  retain _:;
  drop _:;
run

as we do not need to worry about the ID changing without an 'oral' observation first.

 

I hope this explains the logic behind my quite terse (but efficient) code.

 

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 404 views
  • 2 likes
  • 3 in conversation