I'll preface by saying I am a complete novice to SAS so may need more in depth explanations than other users.
I'm wanting to generate a cluster of data sets (say 50) comprised of 75 rolls of two dice. Then with these 50 data sets, I want to have SAS find the data set that least resembles a normal distribution. I want to view that seeds set of 75 rolls as a PROC Freq table. I have the code to randomize 75 dice rolls (I'll paste below in case you want to make changes) but have no clue where to start with coding for a cluster of data 50 data sets, and what function may exist to have SAS identify the data set that is furthest from a normal distribution.
Thanks for any help!
DATA DICE(KEEP=SUM) OUTCOMES(KEEP=OUTCOME);
DO ROLL=1 TO 75;
OUTCOME1=1+INT(6*RANUNI(123));
OUTCOME2=1+INT(6*RANUNI(123));
SUM=OUTCOME+OUTCOME2;
OUTPUT DICE;
OUTCOME=OUTCOME1; OUTPUT OUTCOMES;
OUTCOME=OUTCOME2; OUTPUT OUTCOMES;
END;
RUN;
PROC FREQ DATA=DICE;
TABLE SUM;
RUN;
PROC FREQ DATA=OUTCOMES;
TABLE OUTCOME;
RUN;
modify your code as follows
DATA DICE(KEEP=SUM cluster) OUTCOMES(KEEP=OUTCOME cluster);
do cluster=1 to 50;
DO ROLL=1 TO 75;
OUTCOME1=1+INT(6*RANUNI(123));
OUTCOME2=1+INT(6*RANUNI(123));
SUM=OUTCOME+OUTCOME2;
OUTPUT DICE;
OUTCOME=OUTCOME1; OUTPUT OUTCOMES;
OUTCOME=OUTCOME2; OUTPUT OUTCOMES;
END;
end;
RUN;
PROC FREQ DATA=DICE;
TABLE SUM;
by cluster;
RUN;
PROC FREQ DATA=OUTCOMES;
TABLE OUTCOME;
by cluster;
RUN;
If you add in a PROC UNIVARIATE, you can do a test of normality for each of the 50 clusters — although OUTCOME doesn't follow a normal distribution, and SUM is only approximately normal and would be better tested against the proper distribution, using a Chi-Squared goodness of fit test, as shown in this example.
modify your code as follows
DATA DICE(KEEP=SUM cluster) OUTCOMES(KEEP=OUTCOME cluster);
do cluster=1 to 50;
DO ROLL=1 TO 75;
OUTCOME1=1+INT(6*RANUNI(123));
OUTCOME2=1+INT(6*RANUNI(123));
SUM=OUTCOME+OUTCOME2;
OUTPUT DICE;
OUTCOME=OUTCOME1; OUTPUT OUTCOMES;
OUTCOME=OUTCOME2; OUTPUT OUTCOMES;
END;
end;
RUN;
PROC FREQ DATA=DICE;
TABLE SUM;
by cluster;
RUN;
PROC FREQ DATA=OUTCOMES;
TABLE OUTCOME;
by cluster;
RUN;
If you add in a PROC UNIVARIATE, you can do a test of normality for each of the 50 clusters — although OUTCOME doesn't follow a normal distribution, and SUM is only approximately normal and would be better tested against the proper distribution, using a Chi-Squared goodness of fit test, as shown in this example.
SAS has added a distribution that it nicer to code for things like dice rolls
roll = rand('integer',1,6);
Returns integer values with equal probability, i.e. 6-sided die above.
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!
Still thinking about your presentation idea? The submission deadline has been extended to Friday, Nov. 14, at 11:59 p.m. ET.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.