I'll preface by saying I am a complete novice to SAS so may need more in depth explanations than other users.
I'm wanting to generate a cluster of data sets (say 50) comprised of 75 rolls of two dice. Then with these 50 data sets, I want to have SAS find the data set that least resembles a normal distribution. I want to view that seeds set of 75 rolls as a PROC Freq table. I have the code to randomize 75 dice rolls (I'll paste below in case you want to make changes) but have no clue where to start with coding for a cluster of data 50 data sets, and what function may exist to have SAS identify the data set that is furthest from a normal distribution.
Thanks for any help!
DATA DICE(KEEP=SUM) OUTCOMES(KEEP=OUTCOME);
DO ROLL=1 TO 75;
OUTCOME1=1+INT(6*RANUNI(123));
OUTCOME2=1+INT(6*RANUNI(123));
SUM=OUTCOME+OUTCOME2;
OUTPUT DICE;
OUTCOME=OUTCOME1; OUTPUT OUTCOMES;
OUTCOME=OUTCOME2; OUTPUT OUTCOMES;
END;
RUN;
PROC FREQ DATA=DICE;
TABLE SUM;
RUN;
PROC FREQ DATA=OUTCOMES;
TABLE OUTCOME;
RUN;
modify your code as follows
DATA DICE(KEEP=SUM cluster) OUTCOMES(KEEP=OUTCOME cluster);
do cluster=1 to 50;
DO ROLL=1 TO 75;
OUTCOME1=1+INT(6*RANUNI(123));
OUTCOME2=1+INT(6*RANUNI(123));
SUM=OUTCOME+OUTCOME2;
OUTPUT DICE;
OUTCOME=OUTCOME1; OUTPUT OUTCOMES;
OUTCOME=OUTCOME2; OUTPUT OUTCOMES;
END;
end;
RUN;
PROC FREQ DATA=DICE;
TABLE SUM;
by cluster;
RUN;
PROC FREQ DATA=OUTCOMES;
TABLE OUTCOME;
by cluster;
RUN;
If you add in a PROC UNIVARIATE, you can do a test of normality for each of the 50 clusters — although OUTCOME doesn't follow a normal distribution, and SUM is only approximately normal and would be better tested against the proper distribution, using a Chi-Squared goodness of fit test, as shown in this example.
modify your code as follows
DATA DICE(KEEP=SUM cluster) OUTCOMES(KEEP=OUTCOME cluster);
do cluster=1 to 50;
DO ROLL=1 TO 75;
OUTCOME1=1+INT(6*RANUNI(123));
OUTCOME2=1+INT(6*RANUNI(123));
SUM=OUTCOME+OUTCOME2;
OUTPUT DICE;
OUTCOME=OUTCOME1; OUTPUT OUTCOMES;
OUTCOME=OUTCOME2; OUTPUT OUTCOMES;
END;
end;
RUN;
PROC FREQ DATA=DICE;
TABLE SUM;
by cluster;
RUN;
PROC FREQ DATA=OUTCOMES;
TABLE OUTCOME;
by cluster;
RUN;
If you add in a PROC UNIVARIATE, you can do a test of normality for each of the 50 clusters — although OUTCOME doesn't follow a normal distribution, and SUM is only approximately normal and would be better tested against the proper distribution, using a Chi-Squared goodness of fit test, as shown in this example.
SAS has added a distribution that it nicer to code for things like dice rolls
roll = rand('integer',1,6);
Returns integer values with equal probability, i.e. 6-sided die above.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.