DATA Step, Macro, Functions and more

Using PROC SQL to summarize the data

Reply
New Contributor
Posts: 3

Using PROC SQL to summarize the data

[ Edited ]

I have a dataset which needs to be summarized using PROC SQL.

Below is the sample data:

  

 

LocationNameBankAmount
DELA0BCPNB5000
DELA1BCSBI6000
DELA2BCPNB4000
DELA3BCSBI2000
DELA4BCPNB6000
DELA5BCSBI7000
MUMA4BCSBI5000
MUMA5BCSBI7000
MUMA6BCPNB8000
MUMA7BCPNB9000
MUMA8BCSBI8000
MUMA9BCPNB5000
BANA8BCSBI4000
BANA9BCPNB3000
BANA10BCPNB7000
BANA11BCSBI6000
BANA12BCPNB5000
BANA13BCSBI8000

 

 How can i summarize it to get the top two clients identified by location based on Total volume and bank share?

Kindly guide me to get my output as below:

 

LocationPNBSBITotal%PNB%SBI
DEL600070001300046%54%
DEL500060001100045%55%
MUM900080001700053%47%
MUM800070001500053%47%
BAN700060001300054%46%
BAN500050001000050%50%

  

Update: I have attached DS with the desired out sheet.

Airlines to be clubbed in 2 groups.

Group1. Jet Airways and Air India

Group2. Others

To identify top 15 clients location wise based upon the total volume and airline group share.

 

PROC Star
Posts: 307

Re: Using PROC SQL to summarize the data

Posted in reply to Sahil_Nanda

Is it necessary to do this in SQL?

Trusted Advisor
Posts: 1,918

Re: Using PROC SQL to summarize the data

[ Edited ]
Posted in reply to Sahil_Nanda

Hello, @Sahil_Nanda. I am agreeing with @collinelliot, this is a difficult problem in SQL, but not too difficult in other PROCs.

 

By the way, you don't explain how your original 6 rows of DEL wind up being summarized into 2 rows of DEL in the output table, I doubt anyone will give attempt to explain any code to solve the problem until you explain that.

New Contributor
Posts: 3

Re: Using PROC SQL to summarize the data

[ Edited ]
Posted in reply to PaigeMiller

I forgot to mention rank. The result which i need is based on total volume. My DS consists of 7 entities(2 banks mentioned here) and i have to club them in two groups. Top 15 clients are required for each airline (bank here) location wise. Ok i have attached DS for your reference. Sheet 2 contains desired output.

RankLocationPNBSBITotal Volume%PNB%SBI
1DEL600070001300046%54%
2DEL500060001100045%55%
1MUM900080001700053%47%
2MUM800070001500053%47%
1BAN700060001300054%46%
2BAN500050001000050%50%

Thanks for your help.

PROC Star
Posts: 307

Re: Using PROC SQL to summarize the data

Posted in reply to Sahil_Nanda

 

Below are two possibilities. One uses more SQL, since that's what you asked for. The other uses transpose and a data step. I did not  identify the top two in SQL, which might be possible using 'monotonic' after getting the right sort order, but doing this entirely in SQL feels forced and unnecessary. I also did not replicate your final sort order, but you could do that in SQL with a boolean condition. I'll definitely be curious to see other solutions.

 

data have;
    input Location $ Name $ Bank $ Amount;
datalines;
DEL A0BC PNB 5000
DEL A1BC SBI 6000
DEL A2BC PNB 4000
DEL A3BC SBI 2000
DEL A4BC PNB 6000
DEL A5BC SBI 7000
MUM A4BC SBI 5000
MUM A5BC SBI 7000
MUM A6BC PNB 8000
MUM A7BC PNB 9000
MUM A8BC SBI 8000
MUM A9BC PNB 5000
BAN A8BC SBI 4000
BAN A9BC PNB 3000
BAN A10BC PNB 7000
BAN A11BC SBI 6000
BAN A12BC PNB 5000
BAN A13BC SBI 8000
;

proc sort data = have;
    by location bank descending amount ;
run;

data top2;
    set have;
    by location bank descending amount;
    if first.bank then rank = 1;
    else rank + 1;
    if rank < 3 then output;
run;

proc sql;
    CREATE TABLE want AS
    SELECT location, rank,
           sum(amount * (bank = 'PNB')) AS pnb,
           sum(amount * (bank = 'SBI')) AS sbi,
           CALCULATED pnb / sum(amount) AS pct_pnb format percent8.,
           CALCULATED sbi / sum(amount) AS pct_sbi format percent8.
    FROM top2
    GROUP BY location, rank;
quit;

proc sort data = top2;
    by location rank;
run;

proc transpose data = top2 out = top2trans;
    by location rank;
    id bank;
    var amount;
run;

data want_alt;
    set top2trans;
    pct_pnb = pnb / sum(pnb, sbi);
    pct_sbi = sbi / sum(pnb, sbi);
    format pct_pnb pct_sbi percent8.;
run;

New Contributor
Posts: 3

Re: Using PROC SQL to summarize the data

Posted in reply to collinelliot

Thanks for your early help. In the actual DS i have 7 entities and hence selecting the top 15 from each is a lengthy process. 

I am unable to work on iterative method.

PROC Star
Posts: 325

Re: Using PROC SQL to summarize the data

Posted in reply to Sahil_Nanda

Hi,

 

This is kind of long but basically it is repeated code. Someone might have more compact code.

 

data abc;

input

Location $ Name $ Bank $ Amount ;

datalines;

DEL A0BC PNB 5000

DEL A1BC SBI 6000

DEL A2BC PNB 4000

DEL A3BC SBI 2000

DEL A4BC PNB 6000

DEL A5BC SBI 7000

MUM A4BC SBI 5000

MUM A5BC SBI 7000

MUM A6BC PNB 8000

MUM A7BC PNB 9000

MUM A8BC SBI 8000

MUM A9BC PNB 5000

BAN A8BC SBI 4000

BAN A9BC PNB 3000

BAN A10BC PNB 7000

BAN A11BC SBI 6000

BAN A12BC PNB 5000

BAN A13BC SBI 8000

quit;

 

/* this brings first max row*/

proc sql;

create table firstmaxrowtable as

select location, bank,

max(case when bank ='PNB' or bank = 'SBI' then amount else 0 end ) as amount from abc

group by 1, 2;

quit;

/*transpose first max row data*/

proc sql;

create table firstmaxrowtransposetab as

select location,

max(case when bank ='PNB' then amount else 0 end ) as PNB,

max(case when bank ='SBI' then amount else 0 end ) as SBI from firstmaxrowtable

group by 1;

quit;

/*Second max row*/

proc sql;

create table secondmaxrowtable as

select a.location, a.bank,

max(case when a.bank ='PNB' or a.bank = 'SBI' then a.amount else 0 end ) as amount

from abc a

inner join

firstmaxrowtable b

on a.bank = b.bank

and a.location= b.location

and a.amount<>b.amount

group by a.location, a.bank;

quit;

/*transpose second max row data*/

proc sql;

create table secondmaxrowtransposetab as

select location,

max(case when bank ='PNB' then amount else 0 end ) as PNB,

max(case when bank ='SBI' then amount else 0 end ) as SBI from secondmaxrowtable

group by 1;

quit;

/*appending first and second max row tables and percentage calculation */

proc sql;

create table finaltable as

select a.*,

PNB/(PNB+SBI) as PercentPNB label='%PNB' format =percent10.,

SBI/(PNB+SBI) as PercentSBI label='%SBI' format = percent10.

from firstmaxrowtransposetab a

union all

select b.*,

PNB/(PNB+SBI) as PercentPNB label='%PNB' format =percent10.,

SBI/(PNB+SBI) as PercentSBI label='%SBI' format = percent10.

from secondmaxrowtransposetab b

order by location, PNB, SBI;

quit;

proc sql;

select * from finaltable;

quit;

Ask a Question
Discussion stats
  • 6 replies
  • 184 views
  • 1 like
  • 4 in conversation