Solved: Re: Getting max value by group - complex table

DmytroYermak · Posted 09-11-2017 06:00 AM

Hi all,

Could you please help in the following case - how to find a max within each group in one step:

data test;
infile datalines missover;
input group $ value1 value 2 value 3 value4;
datalines;
g1 12 30
g2 14 37 15
g2 16 76 34
g3 12 22
g3 4 84
g4 7 15 10
g4 24 21 17 67
g4 14 18 94
g4 26 73 ;
run;

I wanted to get this output:

group value

g1 30

g2 76

g3 84

g4 94

Thank You!

RW9 · Posted 09-11-2017 07:09 AM

Ah, I see, then:

data test;
  infile datalines missover;
  input group $ value1 value2 value3 value4;
datalines;
g1 12 30
g2 14 37 15
g2 16 76 34
g3 12 22
g3 4 84
g4 7 15 10
g4 24 21 17 67
g4 14 18 94
g4 26 73 
;
run;

data want (keep=group max_val);
  set test;
  by group;
  retain max_val;
  if first.group then max_val=.;
  if max(of value:) > max_val then max_val=max(of value:);
  if last.group then output;
run;

View solution in original post

RW9 · Posted 09-11-2017 06:21 AM

Something like:

data test;
 infile datalines missover;
 input group $ value1 value2 value3 value4;
datalines;
g1 12 30
g2 14 37 15
g2 16 76 34
g3 12 22
g3 4 84
g4 7 15 10
g4 24 21 17 67
g4 14 18 94
g4 26 73 
;
run;
proc sql;
 create table WANT as
 select GROUP,
 sum(sum(VALUE1,VALUE2,VALUE3,VALUE4)) as VALUE
 from TEST
 group by GROUP;
quit;

However your output want dataset doesn't seem to make sense to me. Why would G1=30, when there is 12 and 30?

Much like G4, how can you get 94 when there are 9 values to add together?

DmytroYermak · Posted 09-11-2017 06:34 AM

I have received the output where sums were calculated but I need max values within each group. That is the case.

The SAS System

Obs group VALUE 1 2 3 4

g1	42
g2	192
g3	122
g4	386

RW9 · Posted 09-11-2017 07:09 AM

Ah, I see, then:

data test;
  infile datalines missover;
  input group $ value1 value2 value3 value4;
datalines;
g1 12 30
g2 14 37 15
g2 16 76 34
g3 12 22
g3 4 84
g4 7 15 10
g4 24 21 17 67
g4 14 18 94
g4 26 73 
;
run;

data want (keep=group max_val);
  set test;
  by group;
  retain max_val;
  if first.group then max_val=.;
  if max(of value:) > max_val then max_val=max(of value:);
  if last.group then output;
run;

DmytroYermak · Posted 09-11-2017 09:09 AM

Thank you, very useful!

Astounding · Posted 09-11-2017 07:15 AM

I'm not sure this is all that different:

data want;

do until (last.group);

set have;

by group;

maxval = max(maxval, of value1-value4);

end;

keep group maxval;

run;

DmytroYermak · Posted 09-11-2017 09:09 AM

It is definitely a solution as well. Thank you!

Davanden · Posted 09-11-2017 10:08 AM

That looks clever, but I don't understand it. How does BY function inside a DO loop? I've never seen that before.

--Dav

gamotte · Posted 09-11-2017 10:36 AM

@Davanden, here is how i understand it :

do until (last.group);

set have;

by group;

maxval = max(maxval, of value1-value4);

end;

The interest of this loop is to treat an entire group for each iteration of the datastep.

By putting the set ... by instructions inside the loop, you can read each record

of the group without outputting or reinitializing the PDV so that the value of MAXVAL for the group is kept

between records..

It is only at the run instruction that the result is output and the PDV is reset before handling the next group.

You end up with a shorter program than if you explicitely reset Maxval on first.group and output

on last.group.

Clever indeed.

Astounding · Posted 09-11-2017 11:04 AM

Each SET statement is permitted to use a BY statement. That sets up FIRST. and LAST. variables (regardless of whether inside a loop or not).

One of the key issues here is how the SET statement operates. It is not a label that shows where the data is coming from. Rather, it is an instruction to read the next observation from a SAS data set. It only reads one observation at a time (whether inside or outside a DO loop). So the BY statement merely identifies whether that observation is the first (or last) one for a value of GROUP.

***** EDITED HERE:

Maybe I shouldn't add this ... it might add to the confusion. Yet some of the community will find this interesting. Technically, the BY statement refers to the most recent SET / MERGE / UPDATE statement. There is no reason that it has to immediatley follow a SET, MERGE, or UPDATE statement. So this variation would work exactly the same way:

data want;

do until (last.group);

set have;

maxval = max(maxval, of value1-value4);

end;

by group;

keep group maxval;

run;

novinosrin · Posted 09-11-2017 01:41 PM

@DmytroYermak My apologies for the late entry to the party. Proc sql is the best solution for this kind of problem coz you can avoid a sort and can do it one select query-

data test;

infile datalines missover;

input group $ value1 value2 value3 value4;

datalines;

g1 12 30

g2 14 37 15

g2 16 76 34

g3 12 22

g3 4 84

g4 7 15 10

g4 24 21 17 67

g4 14 18 94

g4 26 73

;

run;

proc sql;

create table want as

select group, max(value1, value2, value3, value4) as value

from test

group by group

having value=max(value);

quit;

Regards,

Naveen Srinivasan

Astounding · Posted 09-11-2017 01:58 PM

@novinosrin,

One thing to be wary of ... I believe the SQL approach can give you multiple observations for a GROUP, if you have a tie for the largest value. If you're good enough at SQL (which rules me out), you might be able to get around that using DISCREET.

Classroom Training Available!