BookmarkSubscribeRSS Feed
Sahil_Nanda
Calcite | Level 5

I have a dataset which needs to be summarized using PROC SQL.

Below is the sample data:

  

 

LocationNameBankAmount
DELA0BCPNB5000
DELA1BCSBI6000
DELA2BCPNB4000
DELA3BCSBI2000
DELA4BCPNB6000
DELA5BCSBI7000
MUMA4BCSBI5000
MUMA5BCSBI7000
MUMA6BCPNB8000
MUMA7BCPNB9000
MUMA8BCSBI8000
MUMA9BCPNB5000
BANA8BCSBI4000
BANA9BCPNB3000
BANA10BCPNB7000
BANA11BCSBI6000
BANA12BCPNB5000
BANA13BCSBI8000

 

 How can i summarize it to get the top two clients identified by location based on Total volume and bank share?

Kindly guide me to get my output as below:

 

LocationPNBSBITotal%PNB%SBI
DEL600070001300046%54%
DEL500060001100045%55%
MUM900080001700053%47%
MUM800070001500053%47%
BAN700060001300054%46%
BAN500050001000050%50%

  

Update: I have attached DS with the desired out sheet.

Airlines to be clubbed in 2 groups.

Group1. Jet Airways and Air India

Group2. Others

To identify top 15 clients location wise based upon the total volume and airline group share.

 

6 REPLIES 6
collinelliot
Barite | Level 11

Is it necessary to do this in SQL?

PaigeMiller
Diamond | Level 26

Hello, @Sahil_Nanda. I am agreeing with @collinelliot, this is a difficult problem in SQL, but not too difficult in other PROCs.

 

By the way, you don't explain how your original 6 rows of DEL wind up being summarized into 2 rows of DEL in the output table, I doubt anyone will give attempt to explain any code to solve the problem until you explain that.

--
Paige Miller
Sahil_Nanda
Calcite | Level 5

I forgot to mention rank. The result which i need is based on total volume. My DS consists of 7 entities(2 banks mentioned here) and i have to club them in two groups. Top 15 clients are required for each airline (bank here) location wise. Ok i have attached DS for your reference. Sheet 2 contains desired output.

RankLocationPNBSBITotal Volume%PNB%SBI
1DEL600070001300046%54%
2DEL500060001100045%55%
1MUM900080001700053%47%
2MUM800070001500053%47%
1BAN700060001300054%46%
2BAN500050001000050%50%

Thanks for your help.

collinelliot
Barite | Level 11

 

Below are two possibilities. One uses more SQL, since that's what you asked for. The other uses transpose and a data step. I did not  identify the top two in SQL, which might be possible using 'monotonic' after getting the right sort order, but doing this entirely in SQL feels forced and unnecessary. I also did not replicate your final sort order, but you could do that in SQL with a boolean condition. I'll definitely be curious to see other solutions.

 

data have;
    input Location $ Name $ Bank $ Amount;
datalines;
DEL A0BC PNB 5000
DEL A1BC SBI 6000
DEL A2BC PNB 4000
DEL A3BC SBI 2000
DEL A4BC PNB 6000
DEL A5BC SBI 7000
MUM A4BC SBI 5000
MUM A5BC SBI 7000
MUM A6BC PNB 8000
MUM A7BC PNB 9000
MUM A8BC SBI 8000
MUM A9BC PNB 5000
BAN A8BC SBI 4000
BAN A9BC PNB 3000
BAN A10BC PNB 7000
BAN A11BC SBI 6000
BAN A12BC PNB 5000
BAN A13BC SBI 8000
;

proc sort data = have;
    by location bank descending amount ;
run;

data top2;
    set have;
    by location bank descending amount;
    if first.bank then rank = 1;
    else rank + 1;
    if rank < 3 then output;
run;

proc sql;
    CREATE TABLE want AS
    SELECT location, rank,
           sum(amount * (bank = 'PNB')) AS pnb,
           sum(amount * (bank = 'SBI')) AS sbi,
           CALCULATED pnb / sum(amount) AS pct_pnb format percent8.,
           CALCULATED sbi / sum(amount) AS pct_sbi format percent8.
    FROM top2
    GROUP BY location, rank;
quit;

proc sort data = top2;
    by location rank;
run;

proc transpose data = top2 out = top2trans;
    by location rank;
    id bank;
    var amount;
run;

data want_alt;
    set top2trans;
    pct_pnb = pnb / sum(pnb, sbi);
    pct_sbi = sbi / sum(pnb, sbi);
    format pct_pnb pct_sbi percent8.;
run;

Sahil_Nanda
Calcite | Level 5

Thanks for your early help. In the actual DS i have 7 entities and hence selecting the top 15 from each is a lengthy process. 

I am unable to work on iterative method.

kiranv_
Rhodochrosite | Level 12

Hi,

 

This is kind of long but basically it is repeated code. Someone might have more compact code.

 

data abc;

input

Location $ Name $ Bank $ Amount ;

datalines;

DEL A0BC PNB 5000

DEL A1BC SBI 6000

DEL A2BC PNB 4000

DEL A3BC SBI 2000

DEL A4BC PNB 6000

DEL A5BC SBI 7000

MUM A4BC SBI 5000

MUM A5BC SBI 7000

MUM A6BC PNB 8000

MUM A7BC PNB 9000

MUM A8BC SBI 8000

MUM A9BC PNB 5000

BAN A8BC SBI 4000

BAN A9BC PNB 3000

BAN A10BC PNB 7000

BAN A11BC SBI 6000

BAN A12BC PNB 5000

BAN A13BC SBI 8000

quit;

 

/* this brings first max row*/

proc sql;

create table firstmaxrowtable as

select location, bank,

max(case when bank ='PNB' or bank = 'SBI' then amount else 0 end ) as amount from abc

group by 1, 2;

quit;

/*transpose first max row data*/

proc sql;

create table firstmaxrowtransposetab as

select location,

max(case when bank ='PNB' then amount else 0 end ) as PNB,

max(case when bank ='SBI' then amount else 0 end ) as SBI from firstmaxrowtable

group by 1;

quit;

/*Second max row*/

proc sql;

create table secondmaxrowtable as

select a.location, a.bank,

max(case when a.bank ='PNB' or a.bank = 'SBI' then a.amount else 0 end ) as amount

from abc a

inner join

firstmaxrowtable b

on a.bank = b.bank

and a.location= b.location

and a.amount<>b.amount

group by a.location, a.bank;

quit;

/*transpose second max row data*/

proc sql;

create table secondmaxrowtransposetab as

select location,

max(case when bank ='PNB' then amount else 0 end ) as PNB,

max(case when bank ='SBI' then amount else 0 end ) as SBI from secondmaxrowtable

group by 1;

quit;

/*appending first and second max row tables and percentage calculation */

proc sql;

create table finaltable as

select a.*,

PNB/(PNB+SBI) as PercentPNB label='%PNB' format =percent10.,

SBI/(PNB+SBI) as PercentSBI label='%SBI' format = percent10.

from firstmaxrowtransposetab a

union all

select b.*,

PNB/(PNB+SBI) as PercentPNB label='%PNB' format =percent10.,

SBI/(PNB+SBI) as PercentSBI label='%SBI' format = percent10.

from secondmaxrowtransposetab b

order by location, PNB, SBI;

quit;

proc sql;

select * from finaltable;

quit;

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 1502 views
  • 1 like
  • 4 in conversation