Hi all,
Could you please help in the following case - how to find a max within each group in one step:
data test;
infile datalines missover;
input group $ value1 value 2 value 3 value4;
datalines;
g1 12 30
g2 14 37 15
g2 16 76 34
g3 12 22
g3 4 84
g4 7 15 10
g4 24 21 17 67
g4 14 18 94
g4 26 73 ;
run;
I wanted to get this output:
group value
g1 30
g2 76
g3 84
g4 94
Thank You!
Ah, I see, then:
data test; infile datalines missover; input group $ value1 value2 value3 value4; datalines; g1 12 30 g2 14 37 15 g2 16 76 34 g3 12 22 g3 4 84 g4 7 15 10 g4 24 21 17 67 g4 14 18 94 g4 26 73 ; run; data want (keep=group max_val); set test; by group; retain max_val; if first.group then max_val=.; if max(of value:) > max_val then max_val=max(of value:); if last.group then output; run;
Something like:
data test;
infile datalines missover;
input group $ value1 value2 value3 value4;
datalines;
g1 12 30
g2 14 37 15
g2 16 76 34
g3 12 22
g3 4 84
g4 7 15 10
g4 24 21 17 67
g4 14 18 94
g4 26 73
;
run;
proc sql;
create table WANT as
select GROUP,
sum(sum(VALUE1,VALUE2,VALUE3,VALUE4)) as VALUE
from TEST
group by GROUP;
quit;
However your output want dataset doesn't seem to make sense to me. Why would G1=30, when there is 12 and 30?
Much like G4, how can you get 94 when there are 9 values to add together?
I have received the output where sums were calculated but I need max values within each group. That is the case.
The SAS System |
g1 | 42 |
g2 | 192 |
g3 | 122 |
g4 | 386 |
Ah, I see, then:
data test; infile datalines missover; input group $ value1 value2 value3 value4; datalines; g1 12 30 g2 14 37 15 g2 16 76 34 g3 12 22 g3 4 84 g4 7 15 10 g4 24 21 17 67 g4 14 18 94 g4 26 73 ; run; data want (keep=group max_val); set test; by group; retain max_val; if first.group then max_val=.; if max(of value:) > max_val then max_val=max(of value:); if last.group then output; run;
I'm not sure this is all that different:
data want;
do until (last.group);
set have;
by group;
maxval = max(maxval, of value1-value4);
end;
keep group maxval;
run;
That looks clever, but I don't understand it. How does BY function inside a DO loop? I've never seen that before.
--Dav
@Davanden, here is how i understand it :
do until (last.group);
set have;
by group;
maxval = max(maxval, of value1-value4);
end;
The interest of this loop is to treat an entire group for each iteration of the datastep.
By putting the set ... by instructions inside the loop, you can read each record
of the group without outputting or reinitializing the PDV so that the value of MAXVAL for the group is kept
between records..
It is only at the run instruction that the result is output and the PDV is reset before handling the next group.
You end up with a shorter program than if you explicitely reset Maxval on first.group and output
on last.group.
Clever indeed.
Each SET statement is permitted to use a BY statement. That sets up FIRST. and LAST. variables (regardless of whether inside a loop or not).
One of the key issues here is how the SET statement operates. It is not a label that shows where the data is coming from. Rather, it is an instruction to read the next observation from a SAS data set. It only reads one observation at a time (whether inside or outside a DO loop). So the BY statement merely identifies whether that observation is the first (or last) one for a value of GROUP.
***** EDITED HERE:
Maybe I shouldn't add this ... it might add to the confusion. Yet some of the community will find this interesting. Technically, the BY statement refers to the most recent SET / MERGE / UPDATE statement. There is no reason that it has to immediatley follow a SET, MERGE, or UPDATE statement. So this variation would work exactly the same way:
data want;
do until (last.group);
set have;
maxval = max(maxval, of value1-value4);
end;
by group;
keep group maxval;
run;
@DmytroYermak My apologies for the late entry to the party. Proc sql is the best solution for this kind of problem coz you can avoid a sort and can do it one select query-
data test;
infile datalines missover;
input group $ value1 value2 value3 value4;
datalines;
g1 12 30
g2 14 37 15
g2 16 76 34
g3 12 22
g3 4 84
g4 7 15 10
g4 24 21 17 67
g4 14 18 94
g4 26 73
;
run;
proc sql;
create table want as
select group, max(value1, value2, value3, value4) as value
from test
group by group
having value=max(value);
quit;
Regards,
Naveen Srinivasan
One thing to be wary of ... I believe the SQL approach can give you multiple observations for a GROUP, if you have a tie for the largest value. If you're good enough at SQL (which rules me out), you might be able to get around that using DISCREET.
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.