Using PROC SQL to summarize the data

Sahil_Nanda · Posted 04-25-2017 09:51 AM

I have a dataset which needs to be summarized using PROC SQL.

Below is the sample data:

Location	Name	Bank	Amount
DEL	A0BC	PNB	5000
DEL	A1BC	SBI	6000
DEL	A2BC	PNB	4000
DEL	A3BC	SBI	2000
DEL	A4BC	PNB	6000
DEL	A5BC	SBI	7000
MUM	A4BC	SBI	5000
MUM	A5BC	SBI	7000
MUM	A6BC	PNB	8000
MUM	A7BC	PNB	9000
MUM	A8BC	SBI	8000
MUM	A9BC	PNB	5000
BAN	A8BC	SBI	4000
BAN	A9BC	PNB	3000
BAN	A10BC	PNB	7000
BAN	A11BC	SBI	6000
BAN	A12BC	PNB	5000
BAN	A13BC	SBI	8000

How can i summarize it to get the top two clients identified by location based on Total volume and bank share?

Kindly guide me to get my output as below:

Location	PNB	SBI	Total	%PNB	%SBI
DEL	6000	7000	13000	46%	54%
DEL	5000	6000	11000	45%	55%
MUM	9000	8000	17000	53%	47%
MUM	8000	7000	15000	53%	47%
BAN	7000	6000	13000	54%	46%
BAN	5000	5000	10000	50%	50%

Update: I have attached DS with the desired out sheet.

Airlines to be clubbed in 2 groups.

Group1. Jet Airways and Air India

Group2. Others

To identify top 15 clients location wise based upon the total volume and airline group share.

collinelliot · Posted 04-25-2017 10:53 AM

Is it necessary to do this in SQL?

PaigeMiller · Posted 04-25-2017 11:28 AM

Hello, @Sahil_Nanda. I am agreeing with @collinelliot, this is a difficult problem in SQL, but not too difficult in other PROCs.

By the way, you don't explain how your original 6 rows of DEL wind up being summarized into 2 rows of DEL in the output table, I doubt anyone will give attempt to explain any code to solve the problem until you explain that.

--
Paige Miller

Sahil_Nanda · Posted 04-25-2017 01:22 PM

I forgot to mention rank. The result which i need is based on total volume. My DS consists of 7 entities(2 banks mentioned here) and i have to club them in two groups. Top 15 clients are required for each airline (bank here) location wise. Ok i have attached DS for your reference. Sheet 2 contains desired output.

Rank	Location	PNB	SBI	Total Volume	%PNB	%SBI
1	DEL	6000	7000	13000	46%	54%
2	DEL	5000	6000	11000	45%	55%
1	MUM	9000	8000	17000	53%	47%
2	MUM	8000	7000	15000	53%	47%
1	BAN	7000	6000	13000	54%	46%
2	BAN	5000	5000	10000	50%	50%

Thanks for your help.

collinelliot · Posted 04-25-2017 11:44 AM

Below are two possibilities. One uses more SQL, since that's what you asked for. The other uses transpose and a data step. I did not identify the top two in SQL, which might be possible using 'monotonic' after getting the right sort order, but doing this entirely in SQL feels forced and unnecessary. I also did not replicate your final sort order, but you could do that in SQL with a boolean condition. I'll definitely be curious to see other solutions.

data have;
    input Location $ Name $ Bank $ Amount;
datalines;
DEL A0BC PNB 5000
DEL A1BC SBI 6000
DEL A2BC PNB 4000
DEL A3BC SBI 2000
DEL A4BC PNB 6000
DEL A5BC SBI 7000
MUM A4BC SBI 5000
MUM A5BC SBI 7000
MUM A6BC PNB 8000
MUM A7BC PNB 9000
MUM A8BC SBI 8000
MUM A9BC PNB 5000
BAN A8BC SBI 4000
BAN A9BC PNB 3000
BAN A10BC PNB 7000
BAN A11BC SBI 6000
BAN A12BC PNB 5000
BAN A13BC SBI 8000
;

proc sort data = have;
    by location bank descending amount ;
run;

data top2;
    set have;
    by location bank descending amount;
    if first.bank then rank = 1;
    else rank + 1;
    if rank < 3 then output;
run;

proc sql;
    CREATE TABLE want AS
    SELECT location, rank,
           sum(amount * (bank = 'PNB')) AS pnb,
           sum(amount * (bank = 'SBI')) AS sbi,
           CALCULATED pnb / sum(amount) AS pct_pnb format percent8.,
           CALCULATED sbi / sum(amount) AS pct_sbi format percent8.
    FROM top2
    GROUP BY location, rank;
quit;

proc sort data = top2;
    by location rank;
run;

proc transpose data = top2 out = top2trans;
    by location rank;
    id bank;
    var amount;
run;

data want_alt;
    set top2trans;
    pct_pnb = pnb / sum(pnb, sbi);
    pct_sbi = sbi / sum(pnb, sbi);
    format pct_pnb pct_sbi percent8.;
run;

Sahil_Nanda · Posted 04-25-2017 01:33 PM

Thanks for your early help. In the actual DS i have 7 entities and hence selecting the top 15 from each is a lengthy process.

I am unable to work on iterative method.

kiranv_ · Posted 04-25-2017 11:46 AM

Hi,

This is kind of long but basically it is repeated code. Someone might have more compact code.

data abc;

input

Location $ Name $ Bank $ Amount ;

datalines;

DEL A0BC PNB 5000

DEL A1BC SBI 6000

DEL A2BC PNB 4000

DEL A3BC SBI 2000

DEL A4BC PNB 6000

DEL A5BC SBI 7000

MUM A4BC SBI 5000

MUM A5BC SBI 7000

MUM A6BC PNB 8000

MUM A7BC PNB 9000

MUM A8BC SBI 8000

MUM A9BC PNB 5000

BAN A8BC SBI 4000

BAN A9BC PNB 3000

BAN A10BC PNB 7000

BAN A11BC SBI 6000

BAN A12BC PNB 5000

BAN A13BC SBI 8000

quit;

/* this brings first max row*/

proc sql;

create table firstmaxrowtable as

select location, bank,

max(case when bank ='PNB' or bank = 'SBI' then amount else 0 end ) as amount from abc

group by 1, 2;

quit;

/*transpose first max row data*/

proc sql;

create table firstmaxrowtransposetab as

select location,

max(case when bank ='PNB' then amount else 0 end ) as PNB,

max(case when bank ='SBI' then amount else 0 end ) as SBI from firstmaxrowtable

group by 1;

quit;

/*Second max row*/

proc sql;

create table secondmaxrowtable as

select a.location, a.bank,

max(case when a.bank ='PNB' or a.bank = 'SBI' then a.amount else 0 end ) as amount

from abc a

inner join

firstmaxrowtable b

on a.bank = b.bank

and a.location= b.location

and a.amount<>b.amount

group by a.location, a.bank;

quit;

/*transpose second max row data*/

proc sql;

create table secondmaxrowtransposetab as

select location,

max(case when bank ='PNB' then amount else 0 end ) as PNB,

max(case when bank ='SBI' then amount else 0 end ) as SBI from secondmaxrowtable

group by 1;

quit;

/*appending first and second max row tables and percentage calculation */

proc sql;

create table finaltable as

select a.*,

PNB/(PNB+SBI) as PercentPNB label='%PNB' format =percent10.,

SBI/(PNB+SBI) as PercentSBI label='%SBI' format = percent10.

from firstmaxrowtransposetab a

union all

select b.*,

PNB/(PNB+SBI) as PercentPNB label='%PNB' format =percent10.,

SBI/(PNB+SBI) as PercentSBI label='%SBI' format = percent10.

from secondmaxrowtransposetab b

order by location, PNB, SBI;

quit;

proc sql;

select * from finaltable;

quit;

Using PROC SQL to summarize the data

Re: Using PROC SQL to summarize the data

Re: Using PROC SQL to summarize the data

Re: Using PROC SQL to summarize the data

Re: Using PROC SQL to summarize the data

Re: Using PROC SQL to summarize the data

Re: Using PROC SQL to summarize the data

SAS Innovate 2025: Call for Content

Classroom Training Available!