- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hello
The following query select one observation that its weight value is second highest.
Data class;
input Cust_ID weight age;
1 59 20
2 61 25
3 95 40
4 92 25
5 51 18
6 92 21
7 50 43
8 60 23
9 70 71
10 92 62
;
Run;
Data class;
set sashelp.class;
Run;
proc sql outobs = 1;
select name, weight
from class
where weight not in (select max(weight) from class)
order by weight desc
;quit;
My question-
Let's say that there are multiple observations with weight=second highest value and I want to see these observations. What is the way to do it in one step? (So in this example I will see 3 observations since there are 3 observations with weight=92)
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Use PROC RANK.
Paige Miller
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Please look at the examples at the link I gave. One of the examples is analogous to your problem, only trivial changes are needed to use that example in your situation.
Paige Miller
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
/*
It is not right tool by using SQL.
Why not try data step ?
*/
Data class;
input Cust_ID weight age;
cards;
1 59 20
2 61 25
3 95 40
4 92 25
5 51 18
6 92 21
7 50 43
8 60 23
9 70 71
10 92 62
;
Run;
proc sql;
select *
from (select * from class except select * from class having weight= max(weight))
having weight= max(weight)
;quit;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
A data step can pass through the data once to establish the MAX and 2ND highest values. It can then re-read the data and keep those obs that match the second.
data want (drop=_:);
set have (in=firstpass) have (in=secondpass);
retain _2ND _MAX;
array w{2} _2ND _MAX;
if firstpass then do; /*Update array W, if needed*/
if weight^=_MAX and weight>_2ND then do;
_2ND=weight;
call sortn(of w{*});
end;
end;
if secondpass and weight=_2ND;
run;
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set
Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets
--------------------------
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
As proposed by @PaigeMiller
data have;
input Cust_ID weight age;
datalines;
1 59 20
2 61 25
3 95 40
4 92 25
5 51 18
6 92 21
7 50 43
8 60 23
9 70 71
10 92 62
;
proc rank data=have ties=dense descending out=want(where=(rank=2));
ranks rank;
var weight;
run;
proc print data=want;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
In the long run, I think SAS users are better off using built-in SAS capabilities (like PROC RANK) rather than writing their own DATA step code or SQL code to achieve the same thing. SAS has done the hard job of writing this code, making it robust to different possible problems including handling of missing values, debugging their code and testing it, and then SAS's code is proved to work in many real-world applications. In addition, if the problem changes to keeping the 5th largest ... trivial changes are needed to the PROC RANK code; I'm not sure the changes are trivial to DATA step or SQL solutions.
Paige Miller
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@PaigeMiller wrote:
In the long run, I think SAS users are better off using built-in SAS capabilities (like PROC RANK) rather than writing their own DATA step code or SQL code
...
In addition, if the problem changes to keeping the 5th largest ... trivial changes are needed to the PROC RANK code; I'm not sure the changes are trivial to DATA step or SQL solutions.
In general (and in this case), I agree with this sentiment. But with my clients/customers, I often find it's a good idea to anticipate a sort of mission creep, where the initial simple request (as in this topic) is expanded to include other objectives that won't be satisfied by the purpose-built SAS PROC.
In this particular case, changing the problem to the N'th largest would need only a trivial modification of the DATA step solution, but (I believe) a real headache for the PROC SQL approach.
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set
Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets
--------------------------
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Yes, "mission creep" can affect the programming that you do, its often a good thing to plan for it. On the other hand, for programmers who just want to learn how to do some task in SAS, that's not an issue.
Paige Miller