Hi,
I have a dataset from that I want to extract second maximum value from each group.
data scr;
input id value;
datalines;
11 23
11 56
11 43
11 98
12 22
12 54
12 56
13 54
13 76
13 34
;
run;
anticipating output as
id value
11 56
12 54
13 54
I have worked out my way and successfully got output but I am looking for some better code/method than this, can u please suggest any other way to solve this
here my code
data scr;
input id value;
datalines;
11 23
11 56
11 43
11 98
12 22
12 54
12 56
13 54
13 76
13 34
;
proc sort data = scr;
by id descending value;
run;
data ex (drop = row);
set scr;
by id descending value;
if first.id then row = 0;
row + 1;
if row = 2 ;
run;
Thanks in advance
manoj
Hi @s_manoj,
One option would be to use the LARGEST function:
proc transpose data=scr out=ttt;
by id;
var value;
run;
data ex;
set ttt;
array col col:;
value=largest(2, of col[*]);
keep id value;
run;
You could avoid using PROC TRANSPOSE and keep it to a single data step by writing the values of each BY group to an array (in a DOW loop) or using a hash object ...
Edit:
Here are the two single-step solutions mentioned above:
/* Alternative 1 */
data ex;
array tmp[999]; /* make sure array dimension is >= maximum group size */
do until(last.id);
set scr;
by id;
i+1;
tmp[i]=value;
end;
value=largest(2, of tmp[*]);
keep id value;
run;
/* Alternative 2 */
data ex;
dcl hash h(multidata: 'y', ordered: 'd');
h.definekey('value');
h.definedone();
dcl hiter hi('h');
do until(last.id);
set scr;
by id;
h.add();
end;
hi.first();
if hi.next() ne 0 then value=.;
run;
data scr;
input id value;
datalines;
11 23
11 56
11 43
11 98
12 22
12 54
12 56
13 54
13 76
13 34
;
proc rank data=scr out=results(where=(rvalue=2)) ties=low descending;
by id;
var value;
ranks rvalue;
run;
data scr;
input id value;
datalines;
11 23
11 56
11 43
11 98
12 22
12 54
12 56
13 54
13 76
13 34
;
proc sql;
create table want as
select id, max(value) as value
from (select * from scr group by id having max(value) ne value)
group by id;
quit;
I, personally, don't think any other methods would be any better than you current code. In fact, the only things I would change are (1) the by statement of your datastep (since descending value isn't necessary) and (2) your counter statements could be very slightly optimized. i.e.,
data ex (drop = row);
set scr;
by id;
if first.id then row = 1;
else row + 1;
if row = 2 ;
run;
Art, CEO, AnalystFinder.com
Not really "better" but just another option.
data ex;
set scr;
by id descending value;
if not first.id and lag(first.id) then
output;
run;
Above code assumes you don't care about ties but just need to pull the 2nd record per group - and no record in case there is only a single record in a group.
Please note that the suggested solutions are not all equivalent with regard to (e.g.)
For example, @novinosrin's PROC SQL solution -- unlike your own code -- would eliminate tied maximum values and deliver the next smaller value of the respective BY group. Of course, it can be modified in order to match the results of your code even in the presence of ties:
proc sql;
create table ex as
select id, max(value) as value
from ((select * from scr)
except all
(select id, max(value) from scr group by id))
group by id;
quit;
Edit: The PROC RANK approach (using the option ties=low) would exclude the entire BY group from the output dataset in the event of a tied maximum value, as no observation would receive rvalue=2.
What is the requirement if there is no second item in a group? That is when a group has only a single item? Missing value? No reporting value?
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.