I have a dataset which needs to be summarized using PROC SQL.
Below is the sample data:
Location | Name | Bank | Amount |
DEL | A0BC | PNB | 5000 |
DEL | A1BC | SBI | 6000 |
DEL | A2BC | PNB | 4000 |
DEL | A3BC | SBI | 2000 |
DEL | A4BC | PNB | 6000 |
DEL | A5BC | SBI | 7000 |
MUM | A4BC | SBI | 5000 |
MUM | A5BC | SBI | 7000 |
MUM | A6BC | PNB | 8000 |
MUM | A7BC | PNB | 9000 |
MUM | A8BC | SBI | 8000 |
MUM | A9BC | PNB | 5000 |
BAN | A8BC | SBI | 4000 |
BAN | A9BC | PNB | 3000 |
BAN | A10BC | PNB | 7000 |
BAN | A11BC | SBI | 6000 |
BAN | A12BC | PNB | 5000 |
BAN | A13BC | SBI | 8000 |
How can i summarize it to get the top two clients identified by location based on Total volume and bank share?
Kindly guide me to get my output as below:
Location | PNB | SBI | Total | %PNB | %SBI |
DEL | 6000 | 7000 | 13000 | 46% | 54% |
DEL | 5000 | 6000 | 11000 | 45% | 55% |
MUM | 9000 | 8000 | 17000 | 53% | 47% |
MUM | 8000 | 7000 | 15000 | 53% | 47% |
BAN | 7000 | 6000 | 13000 | 54% | 46% |
BAN | 5000 | 5000 | 10000 | 50% | 50% |
Update: I have attached DS with the desired out sheet.
Airlines to be clubbed in 2 groups.
Group1. Jet Airways and Air India
Group2. Others
To identify top 15 clients location wise based upon the total volume and airline group share.
Is it necessary to do this in SQL?
Hello, @Sahil_Nanda. I am agreeing with @collinelliot, this is a difficult problem in SQL, but not too difficult in other PROCs.
By the way, you don't explain how your original 6 rows of DEL wind up being summarized into 2 rows of DEL in the output table, I doubt anyone will give attempt to explain any code to solve the problem until you explain that.
I forgot to mention rank. The result which i need is based on total volume. My DS consists of 7 entities(2 banks mentioned here) and i have to club them in two groups. Top 15 clients are required for each airline (bank here) location wise. Ok i have attached DS for your reference. Sheet 2 contains desired output.
Rank | Location | PNB | SBI | Total Volume | %PNB | %SBI |
1 | DEL | 6000 | 7000 | 13000 | 46% | 54% |
2 | DEL | 5000 | 6000 | 11000 | 45% | 55% |
1 | MUM | 9000 | 8000 | 17000 | 53% | 47% |
2 | MUM | 8000 | 7000 | 15000 | 53% | 47% |
1 | BAN | 7000 | 6000 | 13000 | 54% | 46% |
2 | BAN | 5000 | 5000 | 10000 | 50% | 50% |
Thanks for your help.
Below are two possibilities. One uses more SQL, since that's what you asked for. The other uses transpose and a data step. I did not identify the top two in SQL, which might be possible using 'monotonic' after getting the right sort order, but doing this entirely in SQL feels forced and unnecessary. I also did not replicate your final sort order, but you could do that in SQL with a boolean condition. I'll definitely be curious to see other solutions.
data have;
input Location $ Name $ Bank $ Amount;
datalines;
DEL A0BC PNB 5000
DEL A1BC SBI 6000
DEL A2BC PNB 4000
DEL A3BC SBI 2000
DEL A4BC PNB 6000
DEL A5BC SBI 7000
MUM A4BC SBI 5000
MUM A5BC SBI 7000
MUM A6BC PNB 8000
MUM A7BC PNB 9000
MUM A8BC SBI 8000
MUM A9BC PNB 5000
BAN A8BC SBI 4000
BAN A9BC PNB 3000
BAN A10BC PNB 7000
BAN A11BC SBI 6000
BAN A12BC PNB 5000
BAN A13BC SBI 8000
;
proc sort data = have;
by location bank descending amount ;
run;
data top2;
set have;
by location bank descending amount;
if first.bank then rank = 1;
else rank + 1;
if rank < 3 then output;
run;
proc sql;
CREATE TABLE want AS
SELECT location, rank,
sum(amount * (bank = 'PNB')) AS pnb,
sum(amount * (bank = 'SBI')) AS sbi,
CALCULATED pnb / sum(amount) AS pct_pnb format percent8.,
CALCULATED sbi / sum(amount) AS pct_sbi format percent8.
FROM top2
GROUP BY location, rank;
quit;
proc sort data = top2;
by location rank;
run;
proc transpose data = top2 out = top2trans;
by location rank;
id bank;
var amount;
run;
data want_alt;
set top2trans;
pct_pnb = pnb / sum(pnb, sbi);
pct_sbi = sbi / sum(pnb, sbi);
format pct_pnb pct_sbi percent8.;
run;
Thanks for your early help. In the actual DS i have 7 entities and hence selecting the top 15 from each is a lengthy process.
I am unable to work on iterative method.
Hi,
This is kind of long but basically it is repeated code. Someone might have more compact code.
data abc;
input
Location $ Name $ Bank $ Amount ;
datalines;
DEL A0BC PNB 5000
DEL A1BC SBI 6000
DEL A2BC PNB 4000
DEL A3BC SBI 2000
DEL A4BC PNB 6000
DEL A5BC SBI 7000
MUM A4BC SBI 5000
MUM A5BC SBI 7000
MUM A6BC PNB 8000
MUM A7BC PNB 9000
MUM A8BC SBI 8000
MUM A9BC PNB 5000
BAN A8BC SBI 4000
BAN A9BC PNB 3000
BAN A10BC PNB 7000
BAN A11BC SBI 6000
BAN A12BC PNB 5000
BAN A13BC SBI 8000
quit;
/* this brings first max row*/
proc sql;
create table firstmaxrowtable as
select location, bank,
max(case when bank ='PNB' or bank = 'SBI' then amount else 0 end ) as amount from abc
group by 1, 2;
quit;
/*transpose first max row data*/
proc sql;
create table firstmaxrowtransposetab as
select location,
max(case when bank ='PNB' then amount else 0 end ) as PNB,
max(case when bank ='SBI' then amount else 0 end ) as SBI from firstmaxrowtable
group by 1;
quit;
/*Second max row*/
proc sql;
create table secondmaxrowtable as
select a.location, a.bank,
max(case when a.bank ='PNB' or a.bank = 'SBI' then a.amount else 0 end ) as amount
from abc a
inner join
firstmaxrowtable b
on a.bank = b.bank
and a.location= b.location
and a.amount<>b.amount
group by a.location, a.bank;
quit;
/*transpose second max row data*/
proc sql;
create table secondmaxrowtransposetab as
select location,
max(case when bank ='PNB' then amount else 0 end ) as PNB,
max(case when bank ='SBI' then amount else 0 end ) as SBI from secondmaxrowtable
group by 1;
quit;
/*appending first and second max row tables and percentage calculation */
proc sql;
create table finaltable as
select a.*,
PNB/(PNB+SBI) as PercentPNB label='%PNB' format =percent10.,
SBI/(PNB+SBI) as PercentSBI label='%SBI' format = percent10.
from firstmaxrowtransposetab a
union all
select b.*,
PNB/(PNB+SBI) as PercentPNB label='%PNB' format =percent10.,
SBI/(PNB+SBI) as PercentSBI label='%SBI' format = percent10.
from secondmaxrowtransposetab b
order by location, PNB, SBI;
quit;
proc sql;
select * from finaltable;
quit;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.