Hello everyone,
I try to analyze PSI(Population Stability Index) and SSI(Stability Statistic Index) between two data sets. The one of them is Large Small&Medium Businnes mass the other is Commercial&Corporate mass. I do this analyze to understand whether the two data sets’s ranges are consistent or not.
Firstly, is this a correct or approach or there can be a better approaches?
Secondly, I have two sample data sets as below, of course, these data sets have Model variables but I just did not add the whole variables. My question is that how can I pull just one YearMonth for every Year and then compare the two data sets?
Data DataSmall;
Length CustomerID 8 YearMonth $ 10 Year $ 10 Turnover 8 ;
Infile Datalines Missover;
Input CustomerID YearMonth Year Turnover;
Format ;
Datalines;
001 201001 2010 70000
001 201002 2010 70000
001 201003 2010 70000
001 201004 2010 70000
001 201005 2010 70000
001 201006 2010 70000
001 201007 2010 70000
001 201008 2010 70000
001 201009 2010 70000
001 201010 2010 70000
001 201011 2010 70000
001 201012 2010 70000
001 201101 2011 80000
001 201102 2011 80000
001 201103 2011 80000
001 201104 2011 80000
001 201105 2011 80000
001 201106 2011 80000
001 201107 2011 80000
001 201108 2011 80000
001 201109 2011 80000
001 201110 2011 80000
001 201111 2011 80000
001 201112 2011 80000
;
Run;
Data DataCommercial;
Length CustomerID 8 YearMonth $ 10 Year $ 10 Turnover 8 ;
Infile Datalines Missover;
Input CustomerID YearMonth Year Turnover;
Format ;
Datalines;
003 201001 2010 9000000
003 201002 2010 9000000
003 201003 2010 9000000
003 201004 2010 9000000
003 201005 2010 9000000
003 201006 2010 9000000
003 201007 2010 9000000
003 201008 2010 9000000
003 201009 2010 9000000
003 201010 2010 9000000
003 201011 2010 9000000
003 201012 2010 9000000
003 201101 2011 10000000
003 201102 2011 10000000
003 201103 2011 10000000
003 201104 2011 10000000
003 201105 2011 10000000
003 201106 2011 10000000
003 201107 2011 10000000
003 201108 2011 10000000
003 201109 2011 10000000
003 201110 2011 10000000
003 201111 2011 10000000
003 201112 2011 10000000
;
Run;
I just want to get one YearMonth for every Year then implement the PSI and SSI analysis. It is like take a random sample based on CustomerNo and Year.
Here is my desired outputs;
Now, I will do the analyze, is it possible to do this?
Thank you
You want randomly pick up one obs from each year ?
proc surveyselect data=datasmall out=want1 sampsize=1 method=srs;
strata CustomerID Year ;
run;
proc surveyselect data=DataCommercial out=want2 sampsize=1 method=srs;
strata CustomerID Year ;
run;
You want randomly pick up one obs from each year ?
proc surveyselect data=datasmall out=want1 sampsize=1 method=srs;
strata CustomerID Year ;
run;
proc surveyselect data=DataCommercial out=want2 sampsize=1 method=srs;
strata CustomerID Year ;
run;
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.