BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
culliso3
Fluorite | Level 6

I'll preface by saying I am a complete novice to SAS so may need more in depth explanations than other users.

 

I'm wanting to generate a cluster of data sets (say 50) comprised of 75 rolls of two dice. Then with these 50 data sets, I want to have SAS find the data set that least resembles a normal distribution. I want to view that seeds set of 75 rolls as a PROC Freq table. I have the code to randomize 75 dice rolls (I'll paste below in case you want to make changes) but have no clue where to start with coding for a cluster of data 50 data sets, and what function may exist to have SAS identify the data set that is furthest from a normal distribution.

 

Thanks for any help!

 

 

DATA DICE(KEEP=SUM) OUTCOMES(KEEP=OUTCOME);
	DO ROLL=1 TO 75;
		OUTCOME1=1+INT(6*RANUNI(123));
		OUTCOME2=1+INT(6*RANUNI(123));
		SUM=OUTCOME+OUTCOME2;
		OUTPUT DICE;
		OUTCOME=OUTCOME1; OUTPUT OUTCOMES;
		OUTCOME=OUTCOME2; OUTPUT OUTCOMES;
	END;
RUN;

PROC FREQ DATA=DICE;
	TABLE SUM;
RUN;
PROC FREQ DATA=OUTCOMES;
	TABLE OUTCOME;
RUN;
1 ACCEPTED SOLUTION

Accepted Solutions
PaigeMiller
Diamond | Level 26

modify your code as follows

 

DATA DICE(KEEP=SUM cluster) OUTCOMES(KEEP=OUTCOME cluster);
        do cluster=1 to 50;
	DO ROLL=1 TO 75;
		OUTCOME1=1+INT(6*RANUNI(123));
		OUTCOME2=1+INT(6*RANUNI(123));
		SUM=OUTCOME+OUTCOME2;
		OUTPUT DICE;
		OUTCOME=OUTCOME1; OUTPUT OUTCOMES;
		OUTCOME=OUTCOME2; OUTPUT OUTCOMES;
	END;
       end;
RUN;

PROC FREQ DATA=DICE;
	TABLE SUM;
         by cluster;
RUN;
PROC FREQ DATA=OUTCOMES;
	TABLE OUTCOME;
        by cluster;
RUN;

If you add in a PROC UNIVARIATE, you can do a test of normality for each of the 50 clusters — although OUTCOME doesn't follow a normal distribution, and SUM is only approximately normal and would be better tested against the proper distribution, using a Chi-Squared goodness of fit test, as shown in this example.

--
Paige Miller

View solution in original post

2 REPLIES 2
PaigeMiller
Diamond | Level 26

modify your code as follows

 

DATA DICE(KEEP=SUM cluster) OUTCOMES(KEEP=OUTCOME cluster);
        do cluster=1 to 50;
	DO ROLL=1 TO 75;
		OUTCOME1=1+INT(6*RANUNI(123));
		OUTCOME2=1+INT(6*RANUNI(123));
		SUM=OUTCOME+OUTCOME2;
		OUTPUT DICE;
		OUTCOME=OUTCOME1; OUTPUT OUTCOMES;
		OUTCOME=OUTCOME2; OUTPUT OUTCOMES;
	END;
       end;
RUN;

PROC FREQ DATA=DICE;
	TABLE SUM;
         by cluster;
RUN;
PROC FREQ DATA=OUTCOMES;
	TABLE OUTCOME;
        by cluster;
RUN;

If you add in a PROC UNIVARIATE, you can do a test of normality for each of the 50 clusters — although OUTCOME doesn't follow a normal distribution, and SUM is only approximately normal and would be better tested against the proper distribution, using a Chi-Squared goodness of fit test, as shown in this example.

--
Paige Miller
ballardw
Super User

SAS has added a distribution that it nicer to code for things like dice rolls

      roll = rand('integer',1,6);

Returns integer values with equal probability, i.e. 6-sided die above.

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 2 replies
  • 602 views
  • 1 like
  • 3 in conversation