Difficult to discern what you want your end result to be. Your example data has 20 participants (I assume the last PAIR value is supposed to be 10, not 1). You want to randomly assign them into 2 groups (based on the PAIR ID I think?). And then further eliminate participants so you have an equal number of participants in each category/pair combination? If I got that right, then here is one approach. There are many different ways this could be solved and I'm sure mine is not the simplest. Using this small dataset of 20 you can play with the RANUNI seed value...sometimes you get a sample size of 2, other times a sample size of 1 depending upon how the random deck is shuffled. A larger dataset will obviously yield higher sample sizes. (Sorry the cutting and pasting of code didn't maintain my indents - not super readable) DATA FULLSET; INFILE CARDS; label part="part" cat="cat" pair="pair"; INPUT PART CAT PAIR; SORTVAR=RANUNI(13); *** THE SEED VALUE HERE IS ARBITRARY. CHANGING IT WILL SHUFFLE THE DECK DIFFERENTLY; CARDS; 1 1 1 2 3 2 3 2 3 4 2 4 5 3 5 6 3 6 7 3 7 8 1 8 9 1 9 10 1 10 11 1 1 12 1 2 13 2 3 14 3 4 15 2 5 16 2 6 17 3 7 18 2 8 19 2 9 20 2 10 ; RUN; PROC SORT DATA=FULLSET; BY PAIR SORTVAR; RUN; **** RANDOMLY SEPARATE INTO 2 GROUPS BASED ON PAIR; DATA FULLSET; SET FULLSET; BY PAIR; IF FIRST.PAIR THEN PAIRGROUP=1; ELSE PAIRGROUP=2; RUN; *** SUMMARIZE DATASET TO DETERMINE LOWEST FREQUENCY OF CATEGORIES FOUND; PROC SUMMARY NWAY DATA=FULLSET; CLASS PAIRGROUP CAT; OUTPUT OUT=FULLSET_SUM; RUN; *** SORT SO THE LOWEST NUMBER OF PARTICPANTS IS ON TOP; PROC SORT DATA=FULLSET_SUM; BY _FREQ_; RUN; *** NOW EXTRACT THAT FIRST OBSERVATION AND WRITE OUT TO A MACRO VARIABLE...THIS WILL BE THE NUMBER WE USE TO STRATIFY OUR SAMPLE; DATA _NULL_; SET FULLSET_SUM(OBS=1); CALL SYMPUT("STRATSAMPSIZE",_FREQ_); RUN; %PUT USING A VALUE OF &STRATSAMPSIZE. FOR STRATIFIED SAMPLE SIZE; **** RE-SORT FULLSET IN ORDER OF PAIRGROUP AND CATEGORY; PROC SORT DATA=FULLSET; BY PAIRGROUP CAT SORTVAR; RUN; DATA SAMPLE; SET FULLSET; BY PAIRGROUP CAT; SAMPCOUNT+1; IF FIRST.PAIRGROUP OR FIRST.CAT THEN SAMPCOUNT=1; IF SAMPCOUNT LE &STRATSAMPSIZE.; DROP SORTVAR SAMPCOUNT; RUN; *** VERIFIY SAMPLE SIZES ARE WHAT WE WANT - EQUAL ACROSS ALL CATEGORIES/PAIR COMBINATIONS; PROC FREQ DATA=SAMPLE; TABLES CAT*PAIRGROUP /NOROW NOCOL NOPERCENT; RUN;
... View more