BookmarkSubscribeRSS Feed
lpy0521
Fluorite | Level 6

Hi there,

 

suppose I have dataset looks like this:

Data Have;
INPUT id $ group $ var ;
Cards;
AAA group1 0.1
AAA group2 0.2
AAA group2 0.2
AAA group3 0.3
AAA group3 0.2
AAA group3 0.3
AAA group4 0.4
AAA group4 0.4
AAA group5 0.3
BBB group1 0.4
BBB group2 0.3
BBB group2 0.2
BBB group3 0.3
BBB group3 0.3
BBB group3 0.2
BBB group4 0.3
BBB group4 0.2
BBB group5 0.3
;
RUN;

Data WANT;
INPUT id $ group $ var new_var;
Cards;
AAA group1 0.1 0.1
AAA group2 0.2 0.2
AAA group2 0.2 0.2
AAA group3 0.3 0.3
AAA group3 0.2 0.3
AAA group3 0.3 0.3
AAA group4 0.4 0.4
AAA group4 0.4 0.4
AAA group5 0.3 0.4
BBB group1 0.4 0.4
BBB group2 0.3 0.4
BBB group2 0.2 0.4
BBB group3 0.3 0.4
BBB group3 0.3 0.4
BBB group3 0.2 0.4
BBB group4 0.3 0.4
BBB group4 0.2 0.4
BBB group5 0.3 0.4
;
RUN;

in order to obtain the want data, I have 2 rules to check:

1. var should be monotonically increase from group 1 to group 5 for each id.

2. within the same group, var should be forced to equal to each other.

3. to fix the violation, I need to generate this new_var, and it should be always scanned from top.

 

Please advise how to do this in SAS, thanks!

8 REPLIES 8
Urban_Science
Quartz | Level 8

Here is a shot in the dark for you.  This appears to solve your problem:

proc sort data = have;
   by id group descending var;
run;

data want;
   set have;
   retain new_var 0;
   new_var = max(var, new_var);
run;
lpy0521
Fluorite | Level 6

Sorry but I may create some confusion here. New_var doesn't exist, I actually need to generate it by following the check rules.

Urban_Science
Quartz | Level 8

The lines:

 

retain new_var 0;
new_var = max(var, new_var);


first create new_var, setting its value to 0 and it tells SAS that we want to keep the value of new_var when we move to the next row. Then the second line tells SAS to compare new_var's value with the current row's value for var and assign new_var the larger of the two. The result of my data step produces an identical data sets.  Validation code here:

 

Data Have;
INPUT id $ group $ var ;
Cards;
AAA group1 0.1
AAA group2 0.2
AAA group2 0.2
AAA group3 0.3
AAA group3 0.2
AAA group3 0.3
AAA group4 0.4
AAA group4 0.4
AAA group5 0.3
BBB group1 0.4
BBB group2 0.3
BBB group2 0.2
BBB group3 0.3
BBB group3 0.3
BBB group3 0.2
BBB group4 0.3
BBB group4 0.2
BBB group5 0.3
;
RUN;

proc sort data = have;
   by id group descending var;
run;

data want;
   set have;
   retain new_var 0;
   new_var = max(var, new_var);
run;

Data WANT2;
INPUT id $ group $ var new_var;
Cards;
AAA group1 0.1 0.1
AAA group2 0.2 0.2
AAA group2 0.2 0.2
AAA group3 0.3 0.3
AAA group3 0.2 0.3
AAA group3 0.3 0.3
AAA group4 0.4 0.4
AAA group4 0.4 0.4
AAA group5 0.3 0.4
BBB group1 0.4 0.4
BBB group2 0.3 0.4
BBB group2 0.2 0.4
BBB group3 0.3 0.4
BBB group3 0.3 0.4
BBB group3 0.2 0.4
BBB group4 0.3 0.4
BBB group4 0.2 0.4
BBB group5 0.3 0.4
;
RUN;

proc sort data = WANT2;
   by id group descending var;
run;

proc compare base = want comp = want2;
run;

image.png

ballardw
Super User

@lpy0521 wrote:

Hi there,

 

suppose I have dataset looks like this:

in order to obtain the want data, I have 2 rules to check:

1. var should be monotonically increase from group 1 to group 5 for each id.

2. within the same group, var should be forced to equal to each other.

 

Please advise how to do this in SAS, thanks!


I don't think 1 and 2 make much sense. "forced to equal" is going to have an issue with "increase". I think may be missing a bit of either your description or the rules. As written it appears that you want a single variable to have the same value for multiple records.

Urban_Science
Quartz | Level 8
I think what Ipy0521 has a typo there. For both rules, it should be new_var instead of var. And specifically for 2, I think it should be "new_var should be the same value for the same id+group pair, e.g. both rows with id = AAA and group=group2 have new_val=0.2"
lpy0521
Fluorite | Level 6

I may not be very clear here. I want my rule to be applied in sequential order, that has been saying:

1. we first check for each id, within each group, make sure all var in this group is equal. (may using most frequent ones to replace the other or such).

2.  once we make sure all values within each group are all same, we then next check var_group1 < var_group2 < var_group3 < var_group4 < var_group5  is true. (i.e., 0.1 0.2 0.3 0.4 0.5). If we find something like (0.1 0.2 0.1 0.3 0.4) then it need to be modified as (0.1, 0.2,0.2,0.2,0.2);

 

So here I would like to prioritize within group rule first and then monotonic by group.

Hope this is clear.

   

Urban_Science
Quartz | Level 8

Help me understand where my logic differs from yours:

 

proc sort data = have;
   by id group descending var;
run;

The makes it so that the data properly organized by id and group.  "descending var" means that the first value of var for the id+group combination is going to be the largest in the group.

 

 

data want;
   set have;
   retain new_var 0;
   new_var = max(var, new_var);
run;

This data step does both of your rules at once.  Since the largest value of the group is first, the max function will assign all of the same id+group combinations the same value.  Since max is a monotonic function, new_var will only increase when the next group has a larger value.

 

Perhaps you want new_var to reset for each id, then we need to add some extra code:

data want;
   set have;
by id; retain new_var; new_var = ifn(first.id, var, max(var, new_var)); run;

Now the data step will set the value of new_var to var at the first row of an id, and then use the larger of var and new_var thereafter.

PGStats
Opal | Level 21

The only tricky part is "may using most frequent ones to replace the other or such". The most common value of a set is the mode which is not defined when all values are different (as with BBB group2). I chose to use the maximum value when the mode is undefined:

 

Data Have;
INPUT id $ group $ var ;
Cards;
AAA group1 0.1
AAA group2 0.2
AAA group2 0.2
AAA group3 0.3
AAA group3 0.2
AAA group3 0.3
AAA group4 0.4
AAA group4 0.4
AAA group5 0.3
BBB group1 0.4
BBB group2 0.3
BBB group2 0.2
BBB group3 0.3
BBB group3 0.3
BBB group3 0.2
BBB group4 0.3
BBB group4 0.2
BBB group5 0.3
;

proc sort data = have;
   by id group;
run;

proc means data=have noprint;
by id group;
output out=modes(drop=_:) mode=mode max=max;
run;

data want;
merge have modes; by id group;
retain newvar;
if first.id then newvar = coalesce(mode, max);
else if first.group then newvar = max(newvar, coalesce(mode, max));
drop mode max;
run;

 

PG

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 8 replies
  • 941 views
  • 0 likes
  • 4 in conversation