I have a big dataset A. The main variables in Data A are fico_score, account_number, charge_off.
I need to analysis these three variables. In every 5 score points range from min to max, to calculate the frequency of account_number, total balance of charge_off and also to show the range in the output.
The Dataset A I have like this kind:
Fico_Score Account_number Charge_off
450 123 2.5
450 456 2.1
451 213 .
The result I expect to get likes this:
Score_band Freq_acct Total_CO
450-455 45 21.1
456- 460 23 31.1
I tried these codes, but it didn’t work.
retain Score_band Freq_amt Total_co;
do i = 450 to 830 by 5;
proc means data = a noprint;
var account_number charge_off;
where fico_score >= i and Fico_score < i+5;
output n = Freq_acct sum = Total_Co;
Sort your input data, then in a DATA step, assign a new SAS variable with an incrementing value to represent each unique "group-range", and each time the current observation has exceeded your previous group-range, increment the new variable. Finally use the new variable you created with a PROC MEANS BY group processing.
According to your help, I tried these codes and finally it out.
data new (drop=i diff);
retain i 400;
diff=fico_score - i;
if diff le 5 then band= i;
else do ;
if ( diff gt 5 and diff le 10) then i=i +5 ;
* retain band;
Scott's approach is much more elegant and efficient than %DO. His approach requires roughly three passes through the data (Data step to group fico, a sort, and a means). the %DO approach requires about 80 passes through the data (MEANS is passes the data once for each level of the index variable).