Solved: Re: compare proceeding values

purpleclothlady · Posted 01-18-2023 12:27 PM

hi all:

the below data set -have

data have;
input id route $ value;
datalines;
101 intro 10.00 
101 intro 30.00 
101 oral 2.50 
101 intro 11.00 
101 intro 20.00 
101 intro 42.00 
101 intro 25.00 
101 oral 5.00 
101 intro 29.00 
102 intro 10.00 
102 intro .  
102 oral 1.25 
102 intro 10.00 
102 intro 45.00 
102 oral 3.00
;
run;

Task : to find out when route="oral" but NO proceeding value>40

the final dataset -Want look like this

101 intro 10.00

101 intro 30.00

101 oral 2.50

102 intro 10.00

102 intro .

102 oral 1.25

thank you ,

purpleclothlady

s_lassen · Posted 01-19-2023 05:44 AM

It can be done in a single data step. like this:

data want;
  set have;
  by id;
  if first.id then do;
    _start=_N_;
    _max_value=value;
    end;
  else
    _max_value=max(value,_max_value);
  if route='oral';
  if _max_value<=40 then do _p=_start to _N_;
    set have point=_p;
    output;
    end;
  _start=_N_+1;
  _max_value=0;
  retain _:;
  drop _:;
run;

View solution in original post

PaigeMiller · Posted 01-18-2023 03:38 PM

data intermediate;
    set have;
    by id;
    retain maxvalue;
    if first.id then maxvalue=0;
    if value>maxvalue then maxvalue=value;
    prev_route=lag(route);
    if prev_route='oral' or first.id then group+1;
    drop prev_route;
run;
proc summary nway data=intermediate(where=(route='oral'));
    class group;
    var maxvalue;
    output out=intermediate2(drop=_:) max=groupmax;
run;
data want;
    merge intermediate intermediate2;
    by group;
    if groupmax>40 or groupmax=. then delete;
run;

--
Paige Miller

purpleclothlady · Posted 01-19-2023 10:12 AM

@PaigeMiller :

This works. I tested. thanks so much.

it is a solution too , but I don't know how to set it as another solution. this one I think is easier for me to understand.

I added comment

data intermediate;
set have;
by id;
retain maxvalue;
/*reset for maxvalue for each id*/
if first.id then maxvalue=0;

/*compare value vs Retained maxvalue*/
if value>maxvalue then maxvalue=value;

/*can we use lag funcation for character*/
prev_route=lag(route);

/*assign route by group for later calculation*/
if prev_route='oral' or first.id then group+1;
*drop prev_route;
run;

/*get the maxvalue by group-based on route="Oral"*/
proc summary nway data=intermediate(where=(route='oral'));
class group;
var maxvalue;
output out=intermediate2(drop=_:) max=groupmax;
run;

/*merge two datesets to get the orginal data and delete those who met the criteria by group */
data want;
merge intermediate
intermediate2;
by group;

if groupmax>40 or groupmax=. then delete="DELETE";
run;

the final result and the intermediate steps look like this.

this is a great place to seek solution again many thanks @PaigeMiller @s_lassen

PaigeMiller · Posted 01-19-2023 10:15 AM

I don't know how to set it as another solution

@purpleclothlady , I don't think you can indicate it is another solution, so don't worry about it.

--
Paige Miller

s_lassen · Posted 01-19-2023 05:44 AM

It can be done in a single data step. like this:

data want;
  set have;
  by id;
  if first.id then do;
    _start=_N_;
    _max_value=value;
    end;
  else
    _max_value=max(value,_max_value);
  if route='oral';
  if _max_value<=40 then do _p=_start to _N_;
    set have point=_p;
    output;
    end;
  _start=_N_+1;
  _max_value=0;
  retain _:;
  drop _:;
run;

purpleclothlady · Posted 01-19-2023 09:51 AM

Hi @s_lassen:
thank you for the solution. it worked. the code is short but it might take a while to digest the logic.
will you explain the logic , I don't quite understand the 2nd SET and why seting _max_value=0;
thanks so much.
purple

s_lassen · Posted 01-19-2023 05:05 PM

The first SET statement just reads the values and finds the first relevant observation for this subgroup (the _start pointer).

The second set statement is called when there is an 'oral' value and the maximum is not too high, and it reads and outputs the observations from the start of the subgroup up to and including the current observation (automatic variable _N_).

After an 'oral' observation (and. if relevant, output of the subgroup), the _max_value is set to 0 because we are starting a new subgroup, and the current calculated maximum is not part of the new subgroup. Also, the observation pointer _start is set to the next observation to be read, in case we do not get a new ID in the next observation. What happens after the DO loop is really the same as what happens at the top (when encountering a new ID), but it has to be done in both situations, unless we are absolutely sure that each ID group ends with an 'oral' observation.

If we are certain that the last observation in an ID group is an 'oral', the code can be shortened to this:

data want;
  set have;
  retain _max_value 0 _start 1;
  _max_value=max(value,_max_value);
  if route='oral';
  if _max_value<=40 then do _p=_start to _N_;
    set have point=_p;
    output;
    end;
  _start=_N_+1;
  _max_value=0;
  retain _:;
  drop _:;
run

as we do not need to worry about the ID changing without an 'oral' observation first.

I hope this explains the logic behind my quite terse (but efficient) code.

SAS Innovate 2025: Register Now

SAS Training: Just a Click Away