How do you find the 3rd highest salary using a self-join?
@Hakob wrote:
How do you find the 3rd highest salary using a self-join?
Can you provide more context please?
Why SQL? There are better tools for this type of question.
It was interview question.
It was interview question.
The answer I would give in the interview is the same answer I gave you. SQL is not the right tool for this.
An interviewer insisting on PROC SQL code for the 3rd highest value is not attempting to assess your SAS expertise.
would that be using order by desc together with number? The output gives nth highest salary, not only the 3rd one. Something like this:
proc sql number;
select id, salary,var1,var2
from tab1 left join
tab2
on tab1.id=tab2.id
order by salary desc;
quit;
As others said, in SAS world, proc sql is not right tool to process this kind of question. Pick up the right tool like data step or proc univariate.
Anyway, here is an example SQL to solve this question.
data have; set sashelp.baseball; keep name salary; run; proc sql; select distinct a.* from have as a,have as b where a.salary < b.salary group by a.salary having count(distinct b.salary)=2; quit; /*Check the result is right or not*/ ods select ExtremeValues; proc univariate data=have nextrval=3; var salary; run;
Hi @Ksharp , I tested your code, it works good, I think this is the only way to solve @Hakob 's question, which in fact is, using a single step to select and output the 3rd largest value of a column, and it can be achieved only through sql self join. Any other techniques consume more than one step. I tried it using a sql step and a data step, which uses one more step than yours but the logic is simple and easier. All of the above mentioned code and output are as follows.
1) using one single step to select the 3rd largest value for a column, according to @Ksharp 's solution (a single proc sql self join step):
data have;
input id salary;
datalines;
1 12
2 34
3 43
4 32
5 21
;
run;
/*find the 3rd highest salary
using a self-join*/
proc sql;
select distinct a.*
from have as a,
have as b
where a.salary<b.salary
group by a.salary
having count(distinct b.salary)=2;
quit;
2) using a proc sql and a data step:
proc sql;
create table salranks as
select id,
salary
from have
order by salary desc;
select * from salranks;
quit;
data saltop3(keep=id salary);
set salranks;
n=_n_;
if n=3;
run;
proc print data=saltop3;run;
The WHERE clause limits the cartesian product results that are considered to just those where the salary from a the second copy is larger than the current salary. Then the HAVING clause further limits the records to only those that have exactly two different salaries that are larger.
Which is ONE definition of the third largest. In this algorithm multiple similar salaries are counted as one. So in the series 100,100,95,95,95,90,80 it will pick 90.
Another definition might select 95 instead since there are two observations (both with salary=100) above it. That you could do even simpler in SQL if the goal is just for a HUMAN to see the answer (and not put the answer into a dataset or something a computer could use. Just use the OUTOBS= option of PROC SQL.
proc sql outobs=3;
select *
from have
order by salary desc
;
quit;
Note that with PROC RANK you can choose between these two and many other options for how to rank the values.
Hello, thank you for your answer.
Hi @Hakob , because I remember there is a function can return the 1st, 2nd, and 3rd largest or smallest value, I searched all over the help document and find it is the largest(n, var) function, the code is as follows:
data salary;
input id salary1-salary4;
datalines;
1 12 34 50 70
2 34 45 52 12
3 43 12 36 22
4 32 29 80 37
5 21 10 90 43
;
run;
proc print data=salary;run;
data want;
set salary;
salary3rd=largest(3,of salary1-salary4);
run;
proc print data=want;run;
This function also works in proc sql (however, I cannot write the function as largest(3, of salary1-salary4), if I write it like this I got error message):
proc sql;
select largest(3,salary1,salary2,salary3,salary4)
as salary3rd
from salary;
quit;
One problem is if the dataset only have one salary column, if I want to use this function, I need to transpose the dataset first (not sure I am right about this or not) because the code for one column does not work and I got error message like this:
proc sql;
select largest(3,salary1)
as salary3rd
from salary;
quit;
69 proc sql; 70 select largest(3,salary1) 71 as salary3rd 72 from salary; NOTE: Invalid argument 1 to function LARGEST. Missing values may be generated. 73 quit;
Other techniques to find out the 3rd largest value may include: proc univariate, which by default gives 5 largest and smallest values in the output and proc sort.
Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.