Programming the statistical procedures from SAS

How do I use seed number to expand my random sample?

Accepted Solution Solved
Reply
Contributor
Posts: 28
Accepted Solution

How do I use seed number to expand my random sample?

Hi all,

 

I am a relatively new SAS Enterprise Guide user working with a random sample. I previously selected a simple random sample (no duplicates) of 25 line items. It turns out some line items are not applicable to our review. I want to expand my sample to select new replacement line items. In other software programs I've used, I was able to do this by running a new (larger) sample using the same seed number. I tried this with all 3 options (simple/restricted, duplicates allowed but removed, and all duplicates included) but SAS isn't selecting the first 25 line items as before to be included in the larger sample. Since we've already started our review and spent time accumulating documents, I don't want to start over with a brand new sample.

 

Help! Is there any saving this situation?

 

Thanks in advance...

 

 


Accepted Solutions
Solution
‎07-06-2017 04:32 PM
SAS Super FREQ
Posts: 3,753

Re: How do I use seed number to expand my random sample?

Posted in reply to brookeewhite1

All Replies
SAS Super FREQ
Posts: 3,753

Re: How do I use seed number to expand my random sample?

Posted in reply to brookeewhite1

Please post the code that EG generates. 

Contributor
Posts: 28

Re: How do I use seed number to expand my random sample?

Here it is, except for my filepaths which I substituted <dummynames> for:

 

My original sample of 25:

 

TITLE; FOOTNOTE;

PROC SURVEYSELECT DATA=<mylibrary>.<mysourcedataset>()
OUT=WORK.<myoutputdataset>
METHOD=SRS
N=25
SEED=4262017;
RUN;

QUIT;

 

My attempt to expand the sample to 100:

 

TITLE; FOOTNOTE;

PROC SURVEYSELECT DATA=<mylibrary>.<mysourcedataset>()
OUT=WORK.<myoutputdataset>
METHOD=SRS
N=100
SEED=4262017;
RUN;

QUIT;

 

Thanks for your help!

SAS Super FREQ
Posts: 3,753

Re: How do I use seed number to expand my random sample?

Posted in reply to brookeewhite1

I will briefly describe what you can do. Please study the code I include.

1. Add a row identifier to the original data.

2. Use PROC SQL to create a macro variable 'SelectedObs' that contains the ID values of the 25 rows that were previously selected.

3. Call SURVEYSEELCT again, but use a WHERE clause:

  where ID not in (&SelectedObs);

4. Concatenate the two samples.

 

You will get a new sample that has the original sample as the first 25 rows, and 100 new observations as the remaining rows. Here is an example that uses the SasHelp.Cars data set:

data Have;
set sashelp.cars;
ID = _N_;
run;

PROC SURVEYSELECT DATA=Have OUT=sample
METHOD=SRS
N=25 SEED=4262017;
RUN;
 
proc sql noprint;                              
 select ID into :SelectedObs separated by ','
 from sample;
quit;

PROC SURVEYSELECT DATA=Have OUT=sample2
METHOD=SRS
N=100 SEED=4262017;
where ID not in (&SelectedObs);
RUN;

data All;
set Sample Sample2;
run;
Contributor
Posts: 28

Re: How do I use seed number to expand my random sample?

Thank you for your quick reply! I think I understand... basically I remove the ones I already selected and select 100 more at random from the remainder, right? I'm not a statistician so I guess I didn't realize I could do that, but it makes sense... we are continuing to allow each line item an equal probability of being selected.

 

Here is a follow-up question... We may not need the full 75 or 100 more, so I was wondering if it is also possible to get SAS to output the sample in the order lines were randomly selected (unsorted)? Then we can proceed through the list until we get at least 25 applicable lines - and may not necessarily have to look all of them up.

 

Thanks again,

SAS Super FREQ
Posts: 3,753

Re: How do I use seed number to expand my random sample?

[ Edited ]
Posted in reply to brookeewhite1

yes, your statements in the first paragraph are correct.

 

It sounds like you think that PROC SURVEYSELECT generates 100 numbers between 1 and N and then outputs those rows. That is not what happens. It goes through the data set row by row. If you are selecting 100 obs, then the first row has a 100/N probability of being chosen.

 

Either the first row is selected (and written to the output data set) or it isn't.  If it is, then the next row has a 99/(N-1) chance of being chosen. If it isn't selected, then the next row has a 100/(N-1) probability. This process continues until 100 obs are selected.

 

This same algorithm explains why N=25 and N=100 yield different rows. For a DATA step version of the algorithm, see "Method 3" in this SAS article: http://support.sas.com/kb/24/722.html. If you want randomly sorted observations, you can use "Method 2" or the method presented at http://support.sas.com/kb/24/802.html

 

Contributor
Posts: 28

Re: How do I use seed number to expand my random sample?

Thank you again! The code you originally posted did in fact get me 125 observations including my original 25 lickety-split. (Thanks!)

 

Since I promised my colleague (prematurely?) that we only had to go through the sample until we hit "25 applicable observations" I would like to press in and figure out how to make the SAMPLE2 list unsorted... where we can draw a line when we get to "25 applicable" and ignore the rest. You mentioned 3 options: 1) DATA step version (I wasn't sure what the pros/cons of this were) and 2) randomly sorted via Method 2 at http://support.sas.com/kb/24/722.html and 3) randomly sorted via "the method" at http://support.sas.com/kb/24/802.html (I saw 2 methods at this link and didn't know which to choose). Can you please help direct me a little futher?  

 

Thank you,

Solution
‎07-06-2017 04:32 PM
SAS Super FREQ
Posts: 3,753

Re: How do I use seed number to expand my random sample?

Posted in reply to brookeewhite1

Use the DATA step at http://support.sas.com/kb/24/802.html 

Contributor
Posts: 28

Re: How do I use seed number to expand my random sample?

Thank you so much! This appears to have worked!  Here is the code I used for RANUNI statement - does this look right? I wasn't sure if I put the 4262017 seed number and 7,996 line count in the right place. The 7,996 line count is after the first 25 were removed from the original 8,021 using an Enterprise Guide join tables step resulting in a dataset named "notselected".

 

 

data <mylibrary>.sampleb(drop=i);
choice=int(ranuni(4262017)*7996)+1;
set notselected point=choice nobs=n;
i+1;
/* Enter the desired sample size, 100 in this case */
if i>100 then stop;
run;

 

/* This combines the 2 samples to one data set with the original 25 lines on top.*/
data <mylibrary>.largercombinedsample;
set <myfirstsamplefile> Sampleb;
run;

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 8 replies
  • 185 views
  • 3 likes
  • 2 in conversation