When SAS was first published it was called Statistical Analysis System, because it offers a lot of statistical procedures. They are still there. And instead of programming data steps and SQL steps, you should take advantage of them, especially in the generation of frequency proportions.
I looked up the definition of Population Stability Index, which, according to http://www.stat.wmich.edu/naranjo/PSI.pdf , is defined as
where each of two populations have a variable with the same B non-empty bins, and phat{i} and qhat{i} are estimates of proportions in bin number i (p for population 1 and q for pop 2).
Generating the phat{i} and qhat{i} are trivial with proc freq, which directly generate percentages, not only for a report, but also to an output dataset. Then you can follow with a merge (by bin category) to generate each of the B components of the psi index, followed by a total of those components. I've made an example below using the variable TYPE from sashelp.cars as motor_de_decision:
data bbdd_modelo bbdd_oot;
set sashelp.cars (keep=type rename=(type=motor_de_decision));
/* TYPE has values like "SUV", "Sedan", "Truck", etc. */
if 1<=mod(_n_,10)<=5 then output bbdd_modelo;
else output bbdd_oot;
run;
proc freq data=bbdd_modelo noprint;
tables motor_de_decision
/ out=base_pop (rename=(count=base_count percent=base_pct));
run;
proc freq data=bbdd_oot noprint;
table motor_de_decision
/ out=actual_pop (rename=(count=actual_count percent=actual_pct));
run;
data psi_results (label='One obs per bin, an extra obs for all bins');
merge base_pop actual_pop end=end_of_merge;
by motor_de_decision;
base_proportion=base_pct/100;
actual_proportion=actual_pct/100;
psi_index = (base_proportion-actual_proportion)
* (log(base_proportion)-log(actual_proportion)) ;
array total_count {2} _temporary_;
total_count{1} + base_count;
total_count{2} + actual_count;
array total_psi {1} _temporary_;
total_psi{1} + psi_index;
output;
/* Now output a final, extra obs with the overall PSI index value */
if end_of_merge then do;
call missing(of _all_);
psi_index=total_psi{1};
base_count=total_count{1};
actual_count=total_count{2};
output;
end;
run;
Dataset PSI_RESULTS will have B+1 observations, one for each bin. followed by an observation containing the psi_index over all bins.
... View more