Re: Monotonicity and equality check across rows by group

lpy0521 · Posted 08-02-2019 04:02 PM

Hi there,

suppose I have dataset looks like this:

Data Have;
INPUT id $ group $ var ;
Cards;
AAA group1 0.1
AAA group2 0.2
AAA group2 0.2
AAA group3 0.3
AAA group3 0.2
AAA group3 0.3
AAA group4 0.4
AAA group4 0.4
AAA group5 0.3
BBB group1 0.4
BBB group2 0.3
BBB group2 0.2
BBB group3 0.3
BBB group3 0.3
BBB group3 0.2
BBB group4 0.3
BBB group4 0.2
BBB group5 0.3
;
RUN;

Data WANT;
INPUT id $ group $ var new_var;
Cards;
AAA group1 0.1 0.1
AAA group2 0.2 0.2
AAA group2 0.2 0.2
AAA group3 0.3 0.3
AAA group3 0.2 0.3
AAA group3 0.3 0.3
AAA group4 0.4 0.4
AAA group4 0.4 0.4
AAA group5 0.3 0.4
BBB group1 0.4 0.4
BBB group2 0.3 0.4
BBB group2 0.2 0.4
BBB group3 0.3 0.4
BBB group3 0.3 0.4
BBB group3 0.2 0.4
BBB group4 0.3 0.4
BBB group4 0.2 0.4
BBB group5 0.3 0.4
;
RUN;

in order to obtain the want data, I have 2 rules to check:

1. var should be monotonically increase from group 1 to group 5 for each id.

2. within the same group, var should be forced to equal to each other.

3. to fix the violation, I need to generate this new_var, and it should be always scanned from top.

Please advise how to do this in SAS, thanks!

Urban_Science · Posted 08-02-2019 04:34 PM

Here is a shot in the dark for you. This appears to solve your problem:

proc sort data = have;
   by id group descending var;
run;

data want;
   set have;
   retain new_var 0;
   new_var = max(var, new_var);
run;

lpy0521 · Posted 08-02-2019 04:36 PM

Sorry but I may create some confusion here. New_var doesn't exist, I actually need to generate it by following the check rules.

Urban_Science · Posted 08-02-2019 04:43 PM

The lines:

retain new_var 0;
new_var = max(var, new_var);

first create new_var, setting its value to 0 and it tells SAS that we want to keep the value of new_var when we move to the next row. Then the second line tells SAS to compare new_var's value with the current row's value for var and assign new_var the larger of the two. The result of my data step produces an identical data sets. Validation code here:

Data Have;
INPUT id $ group $ var ;
Cards;
AAA group1 0.1
AAA group2 0.2
AAA group2 0.2
AAA group3 0.3
AAA group3 0.2
AAA group3 0.3
AAA group4 0.4
AAA group4 0.4
AAA group5 0.3
BBB group1 0.4
BBB group2 0.3
BBB group2 0.2
BBB group3 0.3
BBB group3 0.3
BBB group3 0.2
BBB group4 0.3
BBB group4 0.2
BBB group5 0.3
;
RUN;

proc sort data = have;
   by id group descending var;
run;

data want;
   set have;
   retain new_var 0;
   new_var = max(var, new_var);
run;

Data WANT2;
INPUT id $ group $ var new_var;
Cards;
AAA group1 0.1 0.1
AAA group2 0.2 0.2
AAA group2 0.2 0.2
AAA group3 0.3 0.3
AAA group3 0.2 0.3
AAA group3 0.3 0.3
AAA group4 0.4 0.4
AAA group4 0.4 0.4
AAA group5 0.3 0.4
BBB group1 0.4 0.4
BBB group2 0.3 0.4
BBB group2 0.2 0.4
BBB group3 0.3 0.4
BBB group3 0.3 0.4
BBB group3 0.2 0.4
BBB group4 0.3 0.4
BBB group4 0.2 0.4
BBB group5 0.3 0.4
;
RUN;

proc sort data = WANT2;
   by id group descending var;
run;

proc compare base = want comp = want2;
run;

ballardw · Posted 08-02-2019 04:46 PM

@lpy0521 wrote:

Hi there,

suppose I have dataset looks like this:

in order to obtain the want data, I have 2 rules to check:

1. var should be monotonically increase from group 1 to group 5 for each id.

2. within the same group, var should be forced to equal to each other.

Please advise how to do this in SAS, thanks!

I don't think 1 and 2 make much sense. "forced to equal" is going to have an issue with "increase". I think may be missing a bit of either your description or the rules. As written it appears that you want a single variable to have the same value for multiple records.

Urban_Science · Posted 08-02-2019 05:00 PM

I think what Ipy0521 has a typo there. For both rules, it should be new_var instead of var. And specifically for 2, I think it should be "new_var should be the same value for the same id+group pair, e.g. both rows with id = AAA and group=group2 have new_val=0.2"

lpy0521 · Posted 08-02-2019 05:26 PM

I may not be very clear here. I want my rule to be applied in sequential order, that has been saying:

1. we first check for each id, within each group, make sure all var in this group is equal. (may using most frequent ones to replace the other or such).

2. once we make sure all values within each group are all same, we then next check var_group1 < var_group2 < var_group3 < var_group4 < var_group5 is true. (i.e., 0.1 0.2 0.3 0.4 0.5). If we find something like (0.1 0.2 0.1 0.3 0.4) then it need to be modified as (0.1, 0.2,0.2,0.2,0.2);

So here I would like to prioritize within group rule first and then monotonic by group.

Hope this is clear.

Urban_Science · Posted 08-02-2019 05:47 PM

Help me understand where my logic differs from yours:

proc sort data = have;
   by id group descending var;
run;

The makes it so that the data properly organized by id and group. "descending var" means that the first value of var for the id+group combination is going to be the largest in the group.

data want;
   set have;
   retain new_var 0;
   new_var = max(var, new_var);
run;

This data step does both of your rules at once. Since the largest value of the group is first, the max function will assign all of the same id+group combinations the same value. Since max is a monotonic function, new_var will only increase when the next group has a larger value.

Perhaps you want new_var to reset for each id, then we need to add some extra code:

data want;
   set have;
   by id;
   retain new_var;
   new_var = ifn(first.id, var, max(var, new_var));
run;

Now the data step will set the value of new_var to var at the first row of an id, and then use the larger of var and new_var thereafter.

PGStats · Posted 08-03-2019 12:38 AM

The only tricky part is "may using most frequent ones to replace the other or such". The most common value of a set is the mode which is not defined when all values are different (as with BBB group2). I chose to use the maximum value when the mode is undefined:

Data Have;
INPUT id $ group $ var ;
Cards;
AAA group1 0.1
AAA group2 0.2
AAA group2 0.2
AAA group3 0.3
AAA group3 0.2
AAA group3 0.3
AAA group4 0.4
AAA group4 0.4
AAA group5 0.3
BBB group1 0.4
BBB group2 0.3
BBB group2 0.2
BBB group3 0.3
BBB group3 0.3
BBB group3 0.2
BBB group4 0.3
BBB group4 0.2
BBB group5 0.3
;

proc sort data = have;
   by id group;
run;

proc means data=have noprint;
by id group;
output out=modes(drop=_:) mode=mode max=max;
run;

data want;
merge have modes; by id group;
retain newvar;
if first.id then newvar = coalesce(mode, max);
else if first.group then newvar = max(newvar, coalesce(mode, max));
drop mode max;
run;

PG

Ready to join fellow brilliant minds for the SAS Hackathon?

Classroom Training Available!