DATA Step, Macro, Functions and more

Print second maximum value from each group

Reply
Contributor
Posts: 22

Print second maximum value from each group

[ Edited ]

Hi,

I have a dataset from that I want to extract second maximum value from each group.

 

data scr;
input id value;
datalines;
11 23
11 56
11 43
11 98
12 22
12 54
12 56
13 54
13 76
13 34

;

run;

anticipating output as

id value

11 56

12 54

13 54

 

I have worked out my way and successfully got output but I am looking for some better code/method than this, can u please suggest any other way to solve this

here my code

data scr;
input id value;
datalines;
11 23
11 56
11 43
11 98
12 22
12 54
12 56
13 54
13 76
13 34
;
proc sort data = scr;
by id descending value;
run;

data ex (drop = row);
set scr;
by id descending value;
if first.id then row = 0;
row + 1;
if row = 2 ;
run;

 

 

Thanks in advance

manoj

 

 

Trusted Advisor
Posts: 1,171

Re: Print second maximum value from each group

[ Edited ]

Hi @s_manoj,

 

One option would be to use the LARGEST function:

proc transpose data=scr out=ttt;
by id;
var value;
run;

data ex;
set ttt;
array col col:;
value=largest(2, of col[*]);
keep id value;
run;

You could avoid using PROC TRANSPOSE and keep it to a single data step by writing the values of each BY group to an array (in a DOW loop) or using a hash object ... 

 

Edit:

Here are the two single-step solutions mentioned above:

/* Alternative 1 */

data ex;
array tmp[999]; /* make sure array dimension is >= maximum group size */
do until(last.id);
  set scr;
  by id;
  i+1;
  tmp[i]=value;
end;
value=largest(2, of tmp[*]);
keep id value;
run;

/* Alternative 2 */

data ex;
dcl hash h(multidata: 'y', ordered: 'd');
h.definekey('value');
h.definedone();
dcl hiter hi('h');
do until(last.id);
  set scr;
  by id;
  h.add();
end;
hi.first();
if hi.next() ne 0 then value=.;
run;
Contributor
Posts: 22

Re: Print second maximum value from each group

Posted in reply to FreelanceReinhard
Thank you @FreelanceReinhard
PROC Star
Posts: 1,605

Re: Print second maximum value from each group

data scr;
input id value;
datalines;
11 23
11 56
11 43
11 98
12 22
12 54
12 56
13 54
13 76
13 34
;

proc rank data=scr out=results(where=(rvalue=2)) ties=low descending;
   by id;
	var value;
   ranks rvalue;
run;
PROC Star
Posts: 1,605

Re: Print second maximum value from each group

data scr;
input id value;
datalines;
11 23
11 56
11 43
11 98
12 22
12 54
12 56
13 54
13 76
13 34
;

proc sql;
create table want as
select id, max(value) as value
from (select * from scr group by id having max(value) ne value)
group by id;
quit;
Contributor
Posts: 22

Re: Print second maximum value from each group

Posted in reply to novinosrin
than you @novinosrin
PROC Star
Posts: 8,151

Re: Print second maximum value from each group

I, personally, don't think any other methods would be any better than you current code. In fact, the only things I would change are (1) the by statement of your datastep (since descending value isn't necessary) and (2) your counter statements could be very slightly optimized.  i.e.,

data ex (drop = row);
  set scr;
  by id;
  if first.id then row = 1;
  else row + 1;
  if row = 2 ;
run;

Art, CEO, AnalystFinder.com

 

Respected Advisor
Posts: 4,702

Re: Print second maximum value from each group

[ Edited ]

@s_manoj

Not really "better" but just another option.

data ex;
  set scr;
  by id descending value;

  if not first.id and lag(first.id) then
    output;
run;

Above code assumes you don't care about ties but just need to pull the 2nd record per group - and no record in case there is only a single record in a group.

Contributor
Posts: 22

Re: Print second maximum value from each group

Thank you @Patrick
Contributor
Posts: 22

Re: Print second maximum value from each group

Thank you @art297
Trusted Advisor
Posts: 1,171

Re: Print second maximum value from each group

[ Edited ]

Please note that the suggested solutions are not all equivalent with regard to (e.g.)

  1. tied maximum values (i. e. more than one observation in a BY group having the maximum value)
  2. BY groups containing only one observation.

For example, @novinosrin's PROC SQL solution -- unlike your own code -- would eliminate tied maximum values and deliver the next smaller value of the respective BY group. Of course, it can be modified in order to match the results of your code even in the presence of ties:

proc sql;
create table ex as
select id, max(value) as value
from ((select * from scr)
      except all
      (select id, max(value) from scr group by id))
group by id;
quit;

 

Edit: The PROC RANK approach (using the option ties=low) would exclude the entire BY group from the output dataset in the event of a tied maximum value, as no observation would receive rvalue=2.

SAS Employee
Posts: 2

Re: Print second maximum value from each group

[ Edited ]

What is the requirement if there is no second item in a group? That is when a group has only a single item? Missing value? No reporting value?

Ask a Question
Discussion stats
  • 11 replies
  • 223 views
  • 9 likes
  • 6 in conversation