BookmarkSubscribeRSS Feed
runningjay
Fluorite | Level 6

I would like to create a Random Sample using SAS EG. I have over 3,000 records split across 7 locations.

 

At these locations I have 2 different programs ('ALPHA' and 'BETA' ) and I would like to have a random sample of 50 records for each location and each program. The EG wizard accomplishes this fairly easily with the following code, but I encounter the issue outlined further in the details below:

 

/* -------------------------------------------------------------------
Code generated by SAS Task

Generated on: Tuesday, August 28, 2018 at 11:09:15 AM
By task: Random Sample

Input data: SASApp:WORK.PROG_BY_LOCATION
Server: SASApp
------------------------------------------------------------------- */
TITLE; FOOTNOTE;

%_eg_conditional_dropds(WORK.SORTTempTableSorted, "%STR(WORK.RANDRandomSamplePROG_BY_LOCAT)"n);

 

PROC SORT
DATA=WORK.PROG_BY_LOCATION()
OUT=WORK.SORTTempTableSorted;
BY LOCATION PROGRAM;
RUN;

 

PROC SURVEYSELECT DATA=WORK.SORTTempTableSorted
OUT=WORK.RANDRandomSamplePROG_BY_LOCAT
METHOD=SRS
N=50
NOPRINT
SELECTALL;
STRATA LOCATION PROGRAM;
RUN;

 

QUIT;

%_eg_conditional_dropds(WORK.SORTTempTableSorted);

 

Unfortunately when I look at my data I see that I have 5 locations that do not have 50 records associated with the 'BETA' program (see below). I would like to amend the code above to increase the number of records to 100 per location. I would like to keep all of the 'BETA' records in the sample, but at the five locations where there are not enough 'BETA' records I want to pull additional records from the 'ALPHA' group. I don't see a way to do this in the SAS EG Random Sample dialog box, but hoped there was a way to adjust the code above.

 

Any guidance is appreciated.

 

Thanks 

 

 

 PROGRAMTotal
 ALPHABETA
LOCATIONFrequencyFrequencyFrequency
A5050100
B5050100
C504696
D502474
E501969
F50656
G50555
Total350200550
2 REPLIES 2
TomKari
Onyx | Level 15

It would be possible using a bit of a brute force programming approach: i) figure out how many are missing for a stratum, ii) generate the code to select from the required alternate stratum, iii) macrofy it to handle all of the strata.

 

I'm hoping that someone more familiar with the statistics side will come to the rescue with a smart idea, otherwise I'll try to mock something up for you.

 

Try posting this in one of the statistical forums, you might catch the eye of the right person.

 

Tom

ballardw
Super User

I am not going to suggest any "automagic" solution because the solution will require macro coding; the code is likely to be ugly, fragile possibly subject to abuse.

 

Use the information from your proc freq output to modify the N= clause to match the data you have.

I think

n= (50 50 50 50 54 46 76 24 81 19 94 6 95 5)

Since you have two strata variables the first n is for the first Location and first value of Program, the second is the first Location value and the 2nd program, the third is the second Location and 1st program, second location 2nd program and so forth.

 

 

This will at least provide appropriate selection probabilities and weighting values for the different strata sizes.

Or use the information to create an appropriate SAMPSIZE data set which would have the desired sample sizes for each combination of strata variables. This set would look a lot like the n= list above.

date work.sampsize;
   input location $ program $ sampsize;
datalines;
A Alpha 50
A Beta  50
B Alpha 50
B Beta  50
C Alpha 94
C Beta  46
D Alpha 76
D Beta  24
E Alpha 81
E Beta  19
F Alpha 94
F Beta   6
G Alpha 95
G Beta   5
;
run;

You reference the data set name on the N= (or SAMPSIZE= ) option in the proc statement.

 

The data set approach would be the easiest to create an automagic and could be done to adjust either Alpha or Beta sampsizes.

But since you are using EG and the code you show looks to be pretty straight from the interface menus that might be just a bit past your experience level. This would be where one of my professors used to say "Left as an exercise for the interested reader."

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

SAS Enterprise Guide vs. SAS Studio

What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 2 replies
  • 2196 views
  • 0 likes
  • 3 in conversation