Re: Print second maximum value from each group

s_manoj · Posted 06-10-2018 12:09 PM

Hi,

I have a dataset from that I want to extract second maximum value from each group.

data scr;
input id value;
datalines;
11 23
11 56
11 43
11 98
12 22
12 54
12 56
13 54
13 76
13 34

;

run;

anticipating output as

id value

11 56

12 54

13 54

I have worked out my way and successfully got output but I am looking for some better code/method than this, can u please suggest any other way to solve this

here my code

data scr;
input id value;
datalines;
11 23
11 56
11 43
11 98
12 22
12 54
12 56
13 54
13 76
13 34
;
proc sort data = scr;
by id descending value;
run;

data ex (drop = row);
set scr;
by id descending value;
if first.id then row = 0;
row + 1;
if row = 2 ;
run;

Thanks in advance

manoj

FreelanceReinh · Posted 06-10-2018 12:46 PM

Hi @s_manoj,

One option would be to use the LARGEST function:

proc transpose data=scr out=ttt;
by id;
var value;
run;

data ex;
set ttt;
array col col:;
value=largest(2, of col[*]);
keep id value;
run;

You could avoid using PROC TRANSPOSE and keep it to a single data step by writing the values of each BY group to an array (in a DOW loop) or using a hash object ...

Edit:

Here are the two single-step solutions mentioned above:

/* Alternative 1 */

data ex;
array tmp[999]; /* make sure array dimension is >= maximum group size */
do until(last.id);
  set scr;
  by id;
  i+1;
  tmp[i]=value;
end;
value=largest(2, of tmp[*]);
keep id value;
run;

/* Alternative 2 */

data ex;
dcl hash h(multidata: 'y', ordered: 'd');
h.definekey('value');
h.definedone();
dcl hiter hi('h');
do until(last.id);
  set scr;
  by id;
  h.add();
end;
hi.first();
if hi.next() ne 0 then value=.;
run;

s_manoj · Posted 06-11-2018 04:18 AM

Thank you @FreelanceReinhard

novinosrin · Posted 06-10-2018 02:11 PM

data scr;
input id value;
datalines;
11 23
11 56
11 43
11 98
12 22
12 54
12 56
13 54
13 76
13 34
;

proc rank data=scr out=results(where=(rvalue=2)) ties=low descending;
   by id;
	var value;
   ranks rvalue;
run;

novinosrin · Posted 06-10-2018 02:22 PM

data scr;
input id value;
datalines;
11 23
11 56
11 43
11 98
12 22
12 54
12 56
13 54
13 76
13 34
;

proc sql;
create table want as
select id, max(value) as value
from (select * from scr group by id having max(value) ne value)
group by id;
quit;

s_manoj · Posted 06-11-2018 04:21 AM

than you @novinosrin

art297 · Posted 06-10-2018 02:57 PM

I, personally, don't think any other methods would be any better than you current code. In fact, the only things I would change are (1) the by statement of your datastep (since descending value isn't necessary) and (2) your counter statements could be very slightly optimized. i.e.,

data ex (drop = row);
  set scr;
  by id;
  if first.id then row = 1;
  else row + 1;
  if row = 2 ;
run;

Art, CEO, AnalystFinder.com

Patrick · Posted 06-11-2018 01:39 AM

@s_manoj

Not really "better" but just another option.

data ex;
  set scr;
  by id descending value;

  if not first.id and lag(first.id) then
    output;
run;

Above code assumes you don't care about ties but just need to pull the 2nd record per group - and no record in case there is only a single record in a group.

s_manoj · Posted 06-11-2018 04:23 AM

Thank you @Patrick

s_manoj · Posted 06-11-2018 04:22 AM

Thank you @art297

FreelanceReinh · Posted 06-10-2018 05:18 PM

Please note that the suggested solutions are not all equivalent with regard to (e.g.)

tied maximum values (i. e. more than one observation in a BY group having the maximum value)
BY groups containing only one observation.

For example, @novinosrin's PROC SQL solution -- unlike your own code -- would eliminate tied maximum values and deliver the next smaller value of the respective BY group. Of course, it can be modified in order to match the results of your code even in the presence of ties:

proc sql;
create table ex as
select id, max(value) as value
from ((select * from scr)
      except all
      (select id, max(value) from scr group by id))
group by id;
quit;

Edit: The PROC RANK approach (using the option ties=low) would exclude the entire BY group from the output dataset in the event of a tied maximum value, as no observation would receive rvalue=2.

LeonidBatkhan · Posted 06-11-2018 03:18 PM

What is the requirement if there is no second item in a group? That is when a group has only a single item? Missing value? No reporting value?

➤ Leonid's SAS blog

SAS Innovate 2025: Call for Content

Classroom Training Available!