data s;
input id price;
datalines;
100 25
100 60
100 65
100 80
100 91
101 55
101 25
101 85
101 100
;
run;
/* proc sql method : Second max price for each id */
proc sql;
select *
from s as a
where 2=(select distinct price
from s as b
where a.price <= b.price
) ;
quit;
/* Datastep method :Second max price for each id */
data rank;
set s;
run;
proc sort data=s;
by descending id;
run;
data ranking;
set s;
by descending id ;
if first.id then ranks=1;
else ranks+1;
proc print noobs;
run;
data maxvalue ;
set ranking ;
if _n_=2;
proc print ;
run;
Q1.)Here I didn't get correct output for nth max value as of now i want to find second max price for each id how to fix this problem
You want the second highest price per id, so you need to sort by id and descending price.
Then do
data want;
set have;
by id;
if first.id
then count = 1;
else count + 1;
if count = 2;
drop count;
run;
You could use Proc Rank
data have;
input id price;
datalines;
100 25
100 25
100 60
100 60
100 65
100 80
100 91
101 55
101 25
101 85
101 100
;
proc sort data=have out=have_sorted;
by id;
run;
proc rank data=have_sorted ties=dense out=want(where=(rank=2));
by id;
var price;
ranks rank;
run;
proc print data=want;
run;
@pavank wrote:
But interviewer ask proc sql and datastep only not procedures now a days interviews ask most of secnario questions
The correct answer to the interviewer is that you should not try to do this in PROC SQL when much simpler methods are available in SAS.
I do not quite understand what you are trying to do with the SQL code.
A data step solution can be done like this:
proc sort data=s;
by id descending price;
run;
data ranks;
do rank=1 by 1 until(last.id);
set s;
by id;
output;
end;
run;
An then it just to use "where rank=2" if that is the rank you want.
SAS SQL is not really suited for this kind of question (other dialects have RANK summary functions, which can be used), so to get the second highest PRICE for each ID you will have to do something like
proc sql;
select id,max(price) from(
select * from s except
select id,max(price) from s
group by id)
group by id;
in other words, subtract the rows with the highest values from the table in order to find the second highest. If you want to use SQL to find the third or fourth highest value, things begin to get complicated, with a larger number of nested queries.
Not that this solution will handle ties differently than the data step I wrote, as that will give different ranks to tied values - a data step that gives the same rank to ties can be written like this:
proc sort data=s;
by id descending price;
run;
data ranks;
rank=0;
do _N_=1 by 1 until(last.id);
set s;
by id;
if _N_=1 or price ne lag(price) then rank+1;
output;
end;
run;
But probably the most natural solution in SAS is to use PROC RANK, as suggested by @Patrick
Here is for PROC SQL solution.But I would not advocate it.
data s;
input id price;
datalines;
100 25
100 60
100 65
100 80
100 91
101 55
101 25
101 85
101 100
;
run;
%let n=2;
proc sql;
create table want as
select a.*
from s as a,s as b
where a.id=b.id and a.price<=b.price
group by a.id,a.price
having count(*)<=&n.;
quit;
Hi Sharp
Thanks for your solution
But interviewers ask reason why we can not achieved in proc sql can you explain how to explain proper reason
SQL does not have tools for sequences (with the exception of the undocumented MONOTONIC function, which does not work as expected under certain circumstances).
As @Kurt_Bremser said PROC SQL is standard SQL (unlike ORACLE , DB2 ) can not handle the sequence records , the order of record does not honored by PROC SQL. That is why I do not recommend SQL for top nth MAX or MIN question . Date Step is right tool .
As you can see ,although I got solution by PROC SQL, that is still NOT a efficient way , if you have a big table , my PROC SQL would run forever .
I would say that PROC RANK ought to be preferred over a DATA step here. DATA steps require the user to create and then program the logic, not really that easy for most of us (even me). Whereas PROC RANK simply requires the normal information that most PROCs require: what data set, what variable(s), what output; the user does not have to create and then program the logic. Also, PROC RANK can handle ties; whereas handling ties in a DATA step requires the user to program additional logic.
Assuming the data set is amenable to transposing then Proc Transpose and Data step with the LARGEST function or CALL SORTN on an array of the transposed values would work depending on exactly how the output is desired.
/* assuming Have is output of transposed data by Id */ data ranks; set have; array c(*) col: ; do rank=1 to n(of c(*)); value = largest(rank,of c(*)); output; end; keep id rank value; run;
Skeleton of program for relatively "nice" input data and ignores any tie breaking.
But first ask the interviewer about which rules for tie-breaking are to be used for ranking tied values. Can't start the process without all the constraints.
If the data are already sorted by ID, but not necessarily by price within ID, then a single DATA step as below:
data s;
input id price;
datalines;
100 25
100 60
100 65
100 80
100 91
101 55
101 25
101 85
101 100
run;
%let rank=2;
data nth_largest (label="for N=&rank");
set s;
by id;
array larg {&rank} _temporary_;
if price>larg{1} then do;
larg{1}=price;
call sort(of larg{*});
end;
if last.id;
price=larg{1};
call missing(of larg{*});
run;
The array LARG, after the call sort, will be sorted from smallest (larg{1}) to largest (larg{&rank}). Because the array is _temporary_, its values are retained across observations.
Additional note:
There has been some discussion of tied values. In the code above, if there are duplicate instances of high values, each instance is treated as a separate value in the sort call (i.e. for ID=100, if there were two instances of price=91, then 91 would be the 2nd largest, not 85).
But if you want to control for ties (i.e. you want the nth largest distinct value), just change
if price>larg{1} then do;
to
if price>larg{1} and whichn(price,of larg{*})=0 then do;
Hello
This is a common question in SAS interviews.
This is my solution. If there are duplicate values for the nth position, reports them.
May not be the most elegant or efficient approach but it does works for the interview.
data s;
input Id Price;
datalines;
100 25
100 60
100 65
100 80
100 91
101 55
101 25
101 85
101 100
103 100
104 91
;
run;
%Let N=2;
PROC SQL;
Select Id, Price from s s1
where (&N.-1)=(select count(Distinct Price) from s s2 where s2.Price> s1.Price);
QUIT;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.