Select all observations that their weight=Second largest weight

Ronein · Posted 10-28-2023 10:35 AM

Hello

The following query select one observation that its weight value is second highest.

Data class;
input Cust_ID  weight age;
1 59 20
2 61 25
3 95 40
4 92 25
5 51 18
6 92 21
7 50 43
8 60 23
9 70 71
10 92 62
;
Run;

Data class;
set sashelp.class;
Run;

proc sql outobs = 1;
    select name, weight 
    from class
    where weight not in (select max(weight) from class)
    order by weight desc
;quit;

My question-

Let's say that there are multiple observations with weight=second highest value and I want to see these observations. What is the way to do it in one step? (So in this example I will see 3 observations since there are 3 observations with weight=92)

PaigeMiller · Posted 10-28-2023 10:53 AM

Use PROC RANK.

--
Paige Miller

Ronein · Posted 10-28-2023 11:50 AM

May you please show the full code

PaigeMiller · Posted 10-28-2023 12:06 PM

Please look at the examples at the link I gave. One of the examples is analogous to your problem, only trivial changes are needed to use that example in your situation.

--
Paige Miller

Ksharp · Posted 10-29-2023 04:59 AM

/*
It is not right tool by using SQL.
Why not try data step ?
*/
Data class;
input Cust_ID  weight age;
cards;
1 59 20
2 61 25
3 95 40
4 92 25
5 51 18
6 92 21
7 50 43
8 60 23
9 70 71
10 92 62
;
Run;

proc sql;
select * 
 from (select * from class except select * from class having weight= max(weight))
   having weight= max(weight)
;quit;

mkeintz · Posted 10-29-2023 06:23 PM

A data step can pass through the data once to establish the MAX and 2ND highest values. It can then re-read the data and keep those obs that match the second.

data want  (drop=_:);
  set have (in=firstpass)  have (in=secondpass);

  retain _2ND _MAX;
  array  w{2} _2ND _MAX; 
  
  if firstpass then do;      /*Update array W, if needed*/
    if weight^=_MAX and weight>_2ND then do;
      _2ND=weight;
      call sortn(of w{*});
    end;
  end;
  if secondpass and weight=_2ND;
run;

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

Patrick · Posted 10-29-2023 07:46 PM

As proposed by @PaigeMiller

data have;
  input Cust_ID  weight age;
  datalines;
1 59 20
2 61 25
3 95 40
4 92 25
5 51 18
6 92 21
7 50 43
8 60 23
9 70 71
10 92 62
;

proc rank data=have ties=dense descending out=want(where=(rank=2));
  ranks rank;
  var weight;
run;

proc print data=want;
run;

PaigeMiller · Posted 10-30-2023 05:50 AM

In the long run, I think SAS users are better off using built-in SAS capabilities (like PROC RANK) rather than writing their own DATA step code or SQL code to achieve the same thing. SAS has done the hard job of writing this code, making it robust to different possible problems including handling of missing values, debugging their code and testing it, and then SAS's code is proved to work in many real-world applications. In addition, if the problem changes to keeping the 5th largest ... trivial changes are needed to the PROC RANK code; I'm not sure the changes are trivial to DATA step or SQL solutions.

--
Paige Miller

mkeintz · Posted 10-30-2023 06:27 PM

@PaigeMiller wrote:

In the long run, I think SAS users are better off using built-in SAS capabilities (like PROC RANK) rather than writing their own DATA step code or SQL code

...

In addition, if the problem changes to keeping the 5th largest ... trivial changes are needed to the PROC RANK code; I'm not sure the changes are trivial to DATA step or SQL solutions.

In general (and in this case), I agree with this sentiment. But with my clients/customers, I often find it's a good idea to anticipate a sort of mission creep, where the initial simple request (as in this topic) is expanded to include other objectives that won't be satisfied by the purpose-built SAS PROC.

In this particular case, changing the problem to the N'th largest would need only a trivial modification of the DATA step solution, but (I believe) a real headache for the PROC SQL approach.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

PaigeMiller · Posted 10-31-2023 07:30 AM

Yes, "mission creep" can affect the programming that you do, its often a good thing to plan for it. On the other hand, for programmers who just want to learn how to do some task in SAS, that's not an issue.

--
Paige Miller

Select all observations that their weight=Second largest weight

Re: Select all observations that their weight=Second largest weight

Re: Select all observations that their weight=Second largest weight

Re: Select all observations that their weight=Second largest weight

Re: Select all observations that their weight=Second largest weight

Re: Select all observations that their weight=Second largest weight

Re: Select all observations that their weight=Second largest weight

Re: Select all observations that their weight=Second largest weight

Re: Select all observations that their weight=Second largest weight

Re: Select all observations that their weight=Second largest weight

Registration is open

SAS Training: Just a Click Away