BookmarkSubscribeRSS Feed
Ronein
Meteorite | Level 14

Hello

I work in a bank as a credit risk analyst.

I want to talk about  a monthly  loans data set .

In this table there are new loans that customers takes from the bank.

As you can see in the data it might happen that a customer takes more than one loan.

The target is to calculate 5 aggraegative fields:

"No_Loans"   is the number of loans (number of rows in the table)

"No_Customers"  is the number of customers who took a loan

"Sum_Loan"  is the sum of loans (in us dollars)

"Weighted_interest"  is the weighted interest average (Weighed by sum of loans)

"Pct_Sum_Loan"  is pct of sum of loans in each category

I know to calculate these fields in a long process.

My question is If someone have a suggestion how to calculate it in a shorter code?

 

Data tbl1;
input ID 1
Loan_ID 3
Sum_Loan 5-8
Interest
date_Loan :ddmmyy10.
date_Finish :ddmmyy10.
Loan_Model
grade;
format date_Loan date_Finish date9.;
Cards;
1 1 100 2.4 01/06/2018 01/06/2023 3312 4
1 2 200 2.6 02/06/2018 02/06/2021 3312 5
1 3 300 2.3 05/06/2018 05/06/2025 3313 5
2 4 400 3.1 05/06/2018 05/06/2020 3312 6
2 5 500 2.9 05/06/2018 05/06/2021 3313 5
;
run;
PROC SQL;
create table Output1 as
select grade,
count(*) as No_Loans,
count(distinct ID) as No_Customers,
sum(Sum_Loan) as Sum_Loan
from tbl1
group by grade
;
QUIT;
/*Calculate weighted average of interest by sum Loans*/
PROC SQL;
create table Output2 as
select grade,
sum(Interest*Sum_Loan)/sum(Sum_Loan) as Weighted_interest
from tbl1
group by grade
;
QUIT;
/*Calculate Percent Tamhil*/
PROC SQL;
create table Output3 as
select grade,
Sum_Loan/sum(Sum_Loan) as Pct_Sum_Loan format=percent9.1
from Output1
;
QUIT;
PROC SQL;
create table Output4 as
select a.*,b.Weighted_interest,c.Pct_Sum_Loan
from Output1 as a
left join Output2 as b
on a.grade=b.grade
left join Output3 as c
on a.grade=c.grade
;
QUIT;
PROC SQL;
create table Output5 as
select sum(No_Loans) as No_Loans,
sum(No_Customers) as No_Customers,
sum(Sum_Loan) as Sum_Loan,
sum(Weighted_interest*Pct_Sum_Loan) as Weighted_interest,
sum(Pct_Sum_Loan) as Pct_Sum_Loan format=percent9.1
from Output4
;
QUIT;
Data FinalOutput(drop=grade rename=(grade_Char=grade));
Retain grade_Char;
Set Output4 Output5;
length grade_Char $3.;
grade_Char=put(grade,2.);
If grade=. then grade_Char='All';
Run; 

 

 

 

18 REPLIES 18
PaigeMiller
Diamond | Level 26

All of these can be computed by PROC SUMMARY (with the WEIGHT statement if needed).

 

It is my experience that PROC SUMMARY will do it faster than PROC SQL, and I believe that all of your computations can be done in a single PROC SUMMARY call (although I didn't specifically verify this).

 

Furthermore, there are times when doing things in SQL produces the WRONG results, specifically, you have

 

sum(Interest*Sum_Loan)/sum(Sum_Loan) as Weighted_interest

If the value of interest is missing for an observation, then the above produces the wrong answer, where a different number of observations is used in the numerator than the denominator (which will not happen in PROC SUMMARY, which gives the right answer in this situation)

--
Paige Miller
Ronein
Meteorite | Level 14

Thanks for your reply.

I know Proc summary but it didn't work for me in order to get all calculated fields.

May you please send a code and show me that all the required fields are calculated?

I think that it is better to see a code that is working then say that something is working without show it

Joe

 

PaigeMiller
Diamond | Level 26

EXAMPLE:

 

proc summary data=tbl1;
    class grade;
    var id sum_loan;
    var interest/weight=sum_loan;
    output out=sums n(id)=no_loans sum(sum_loan)=sum_loan mean(interest)=weighted_interest;
run;
--
Paige Miller
PGStats
Opal | Level 21

Nice example @PaigeMiller! But counting IDs may give the wrong result. One should count the non missing sum_loan.

 

n(sum_loan)=no_loans

PG
Ronein
Meteorite | Level 14

Hello

You also didn't calculate "Pct_Sum_Loan".

And also didn't calculate "No_Customers"

As you can see you are not able to calculate all the required fields in one step.

So, the way that I calculated by using proc sql's was not less efficient 

 

FreelanceReinh
Jade | Level 19

@Ronein wrote (boldface added):

As you can see you are not able to calculate all the required fields in one step.

So, the way that I calculated by using proc sql's was not less efficient  


If you insist on doing all calculations in one step (which might indeed improve efficiency), you can write a single data step. Obviously, by going through all observations of TBL1 you can gather the information necessary to calculate the aggregate values. Having tools such as DOW loops (see many conference papers), arrays and hash objects at your disposal, it is possible to compute and buffer the final variable values until they are written to the output dataset.

 

For this approach it would be helpful, but not mandatory, to have TBL1 sorted or indexed by GRADE beforehand.

 

Are you familiar with the data step techniques mentioned above? If so, just give it a try. Feel free to come back with specific questions if you get stuck.

 

PaigeMiller
Diamond | Level 26

The problem with doing it in a data step is error checking. You have to make the code smart enough to handle missings properly (which the SQL shown does not do), and the code must be smart enough to handle different groups, and there may be other issues as well. SAS has already written code to do these calculations properly, with all the necessary error checking and handling of groups, and that code is PROC MEANS or PROC SUMMARY. I really don't see a need to write your own code to compute sums and means.

--
Paige Miller
FreelanceReinh
Jade | Level 19

@PaigeMiller,

 

Sure, I fully agree. I wouldn't have preferred that more complicated data step solution over the obvious and simple combination of PROC SUMMARY and PROC SQL results either. I just wanted to point out that it could be done if one insisted on putting it all into a single step. In terms of performance, the multi-threading capabilities of PROC SUMMARY and PROC SQL can be an advantage over the data step (if supported by hardware).

PaigeMiller
Diamond | Level 26

As far as I can tell, the original question did not insist on a single process step; and I certainly don't insist on that. It's hard for me to imagine a reason why anyone would insist on this, unless it was a homework assignment, which is not the case here.

--
Paige Miller
Ronein
Meteorite | Level 14
It is great but in your code you didn't calculate number of distinct customers in each category
PaigeMiller
Diamond | Level 26

@Ronein wrote:
It is great but in your code you didn't calculate number of distinct customers in each category

Yes, I leave that as an exercise for you, after all I did say it was an EXAMPLE.(Hint: run PROC SUMMARY twice)

--
Paige Miller
Ronein
Meteorite | Level 14

Hello

As you can see the code that you wrote didn't have some elements:

You also didn't calculate "Pct_Sum_Loan".

You didn't calculate "No_Customers".

In the total row instead of null value we need to have the word "All".

 

 

PaigeMiller
Diamond | Level 26

@Ronein wrote:

Hello

As you can see the code that you wrote didn't have some elements:

You also didn't calculate "Pct_Sum_Loan".

You didn't calculate "No_Customers".

In the total row instead of null value we need to have the word "All".

 

 


As you can see, I said I was giving you an EXAMPLE and never intended to do all your work for you. Use this as a learning opportunity and see if you can figure it out yourself.

--
Paige Miller
Ronein
Meteorite | Level 14
The task is to see if there is a more usefull way to create exactly same output as I created .
Until now nobody sent a usefull code that calculate all required fields in less steps

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 18 replies
  • 958 views
  • 6 likes
  • 5 in conversation