BookmarkSubscribeRSS Feed
jorgelobin
Fluorite | Level 6

Hello everyone,

I have written some code in SAS to calculate the Population Stability Index (PSI) in order to compare the distributions of a decision engine between two datasets (a base model and a current dataset). I would like to ask if the implementation is correct and if the PSI calculation is done appropriately. Any feedback or suggestions would be greatly appreciated!

Data description:

  • I am using two datasets: BBDD_MODELO (the base model) and BBDD_OOT (the current dataset).
  • Both datasets contain the variable MOTOR_DE_DECISION, which indicates different decision categories. I aim to compare the proportions of these categories between the two datasets using the PSI.

Steps already tried:

Here is the SAS code I’ve implemented so far. The script performs the following steps:

  1. Import the two Excel datasets (BBDD_MODELO and BBDD_OOT).
  2. Calculate counts and proportions for each category of the variable MOTOR_DE_DECISION in both datasets.
  3. Calculate the PSI by comparing the proportions between the base model and the current dataset.
  4. Create a summary table with the PSI values and classify the stability result as Green (G), Amber (A), or Red (R) based on the PSI threshold.
/*Lectura del archivo BBDD_MODELO.xlsx*/
PROC IMPORT OUT=BBDD_MODELO
    DATAFILE = "C:\Users\user\Desktop\BBDD_MODELO.xlsx"
    DBMS = XLSX;
    GETNAMES = YES;
RUN;

/*Lectura del archivo BBDD_OOT.xlsx*/
PROC IMPORT OUT=BBDD_OOT
    DATAFILE = "C:\Users\user\Desktop\BBDD_OOT.xlsx"
    DBMS = XLSX;
    GETNAMES = YES;
RUN;

/*Recuentos y Proporciones de la tabla base (BBDD_MODELO)*/
/*Obtenemos las frecuencias de los distintos valores de la variable MOTOR_DE_DECISION*/
PROC SQL;
	CREATE TABLE BASE_AUX AS SELECT
		MOTOR_DE_DECISION,
		COUNT(*) AS RECUENTO_BASE
	FROM BBDD_MODELO
	GROUP BY MOTOR_DE_DECISION;
QUIT;

/*Sumamos todas las frecuencias de cada categoría para obtener el total de observaciones*/
PROC SQL NOPRINT;
	SELECT SUM(RECUENTO_BASE) INTO :TOT_RECUENTO_BASE
	FROM BASE_AUX;
QUIT;

/*Calculamos la proporción de cada categoría*/
DATA BASE;
	SET BASE_AUX;
	PROPORCION_BASE = (RECUENTO_BASE/&TOT_RECUENTO_BASE);
RUN;

/*Sumamos todas las proporciones de cada categoría. Esta suma tiene que ser igual a 1*/
PROC SQL NOPRINT;
	SELECT SUM(PROPORCION_BASE) INTO :TOT_PROPORCION_BASE
	FROM BASE;
QUIT;
/*Recuentos y Proporciones de la tabla actual (BBDD_OOT)*/
/*Obtenemos las frecuencias de los distintos valores de la variable MOTOR_DE_DECISION*/
PROC SQL;
	CREATE TABLE ACTUAL_AUX AS SELECT
		MOTOR_DE_DECISION,
		COUNT(*) AS RECUENTO_ACTUAL
	FROM BBDD_OOT
	GROUP BY MOTOR_DE_DECISION;
QUIT;

/*Sumamos todas las frecuencias de cada categoría para obtener el total de observaciones*/
PROC SQL NOPRINT;
	SELECT SUM(RECUENTO_ACTUAL) INTO :TOT_RECUENTO_ACTUAL
	FROM ACTUAL_AUX;
QUIT;

/*Calculamos la proporción de cada categoría*/
DATA ACTUAL;
	SET ACTUAL_AUX;
	PROPORCION_ACTUAL = (RECUENTO_ACTUAL/&TOT_RECUENTO_ACTUAL);
RUN;

/*Sumamos todas las proporciones de cada categoría. Esta suma tiene que ser igual a 1*/
PROC SQL NOPRINT;
	SELECT SUM(PROPORCION_ACTUAL) INTO :TOT_PROPORCION_ACTUAL
	FROM ACTUAL;
QUIT;
/*Cálculo del PSI*/
DATA RESUMEN;
	MERGE BASE ACTUAL;
	BY MOTOR_DE_DECISION;
	IF RECUENTO_BASE = . THEN RECUENTO_BASE = 0;
	IF RECUENTO_ACTUAL = . THEN RECUENTO_ACTUAL = 0;
	IF RECUENTO_BASE = 0 THEN PROPORCION_BASE = (1/&TOT_RECUENTO_BASE);
	IF RECUENTO_ACTUAL = 0 THEN PROPORCION_ACTUAL = (1/&TOT_RECUENTO_ACTUAL);
	PSI = (PROPORCION_ACTUAL-PROPORCION_BASE)*LOG(PROPORCION_ACTUAL/PROPORCION_BASE);
RUN;
/*Guardamos el valor del PSI en la macro variable PSI*/
PROC SQL NOPRINT;
	SELECT SUM(PSI) INTO :PSI
	FROM RESUMEN;
QUIT;
/*Se crea la fila de totales*/
DATA TOTALES;
	MOTOR_DE_DECISION = "TOTAL";
	PSI = Ψ
	RECUENTO_BASE = &TOT_RECUENTO_BASE;
	RECUENTO_ACTUAL = &TOT_RECUENTO_ACTUAL;
	PROPORCION_BASE = &TOT_PROPORCION_BASE;
	PROPORCION_ACTUAL = &TOT_PROPORCION_ACTUAL;
	IF PSI <= 0.1 THEN RAG = "G";
		ELSE IF 0.1 < PSI <= 0.25 THEN RAG = "A";
		ELSE IF PSI > 0.25 THEN RAG = "R";
RUN;
/*Añadimos los totales a la tabla resumen*/
DATA RESULTADO;
	RETAIN MOTOR_DE_DECISION RECUENTO_BASE RECUENTO_ACTUAL PROPORCION_BASE PROPORCION_ACTUAL PSI RAG;
	SET RESUMEN TOTALES;
	FORMAT PROPORCION_BASE 6.4;
	FORMAT PROPORCION_ACTUAL 6.4;
RUN;

 

You can also check a more detailed explanation and implementation of the Population Stability Index (PSI) on my website, where you can download the datasets to test and verify the code:

PSI Implementation in SAS

Thank you in advance for your help!

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 0 replies
  • 58 views
  • 0 likes
  • 1 in conversation