## Facing Issue while calculating the Loan Default Percentage from the available data

Solved
Occasional Contributor
Posts: 6

# Facing Issue while calculating the Loan Default Percentage from the available data

Hi There,

I'm trying to calculate the default value from the available data. I have merged the data by using following code

Data Loan_1;
Set 36_Months 60 Months;
Run;

Then I have sorted the data using proc sort
After that i have created a variable named Loan_Status2 in which I have created to values 1st is Current and 2nd is default using following code

Data Loan_11;
Set Loan_1;
If loan_Status = 'Late (31-120 Days)' or Loan_Status = 'Default' then Loan_Status2 = 1;
Else If loan_Status = 'Current' or loan_Status = 'Late (16-30 Days)' or loan_Status = 'In Grace Period' or loan_Status = 'Charged Off' or loan_Status = 'Fully Paid'  then loan_Status2 = 0;

After this I'm trying to calculate the default % by using following code

Data Loan_111;
Set Loan_11;
DefaultLoan  = (Sum (1)/Sum(Loan_Status2));
Format DefaultLoan \$Char20.;
Proc Print;
Run;

But getting wrong output

please do the needful.

Regards,
Roshan Malewar

Accepted Solutions
Solution
‎02-16-2018 06:18 AM
Super User
Posts: 6,542

## Re: Facing Issue while calculating the Loan Default Percentage from the available data

First, let's assume that you have working code that accurately creates the data set LOAN_11.  There are several areas where the syntax is questionable, so it may have been copied incorrectly in the posted code.

The SUM function does not summarize across observations.  It sums up within a single observation only.  To get the rate that you are asking for, I would guess you are looking for:

proc means data=loan_11 mean;

var loan_status2;

run;

The mean value for LOAN_STATUS2 will be the percent of loans that are in default.

As a general rule, think about using a procedure, not a DATA step, to summarize across observations.

All Replies
Solution
‎02-16-2018 06:18 AM
Super User
Posts: 6,542

## Re: Facing Issue while calculating the Loan Default Percentage from the available data

First, let's assume that you have working code that accurately creates the data set LOAN_11.  There are several areas where the syntax is questionable, so it may have been copied incorrectly in the posted code.

The SUM function does not summarize across observations.  It sums up within a single observation only.  To get the rate that you are asking for, I would guess you are looking for:

proc means data=loan_11 mean;

var loan_status2;

run;

The mean value for LOAN_STATUS2 will be the percent of loans that are in default.

As a general rule, think about using a procedure, not a DATA step, to summarize across observations.

Occasional Contributor
Posts: 6

## Facing Issue while calculating the Loan Default Percentage from the available data

Posted in reply to Astounding

Thank you very much Super User. It worked fine.

Super User
Posts: 13,064

## Re: Facing Issue while calculating the Loan Default Percentage from the available data

If you have listed all of the values for loan_status in your code you might find this interesting.

```proc format;
value \$default
'Late (31-120 Days)', 'Default'  ='In Default'
'Current', 'Late (16-30 Days)', 'In Grace Period',
'Charged Off', 'Fully Paid'  = 'Not Default'
;
run;

proc freq data=Loan_1;
tables loan_status;
format loan_status \$default.;
run;
```
☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
• 3 replies
• 104 views
• 1 like
• 3 in conversation