Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- Analytics
- /
- Stat Procs
- /
- How do I use seed number to expand my random sampl...

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

07-06-2017 01:06 PM

Hi all,

I am a relatively new SAS Enterprise Guide user working with a random sample. I previously selected a simple random sample (no duplicates) of 25 line items. It turns out some line items are not applicable to our review. I want to expand my sample to select new replacement line items. In other software programs I've used, I was able to do this by running a new (larger) sample using the same seed number. I tried this with all 3 options (simple/restricted, duplicates allowed but removed, and all duplicates included) but SAS isn't selecting the first 25 line items as before to be included in the larger sample. Since we've already started our review and spent time accumulating documents, I don't want to start over with a brand new sample.

Help! Is there any saving this situation?

Thanks in advance...

Accepted Solutions

Solution

07-06-2017
04:32 PM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

07-06-2017 03:55 PM

Use the DATA step at http://support.sas.com/kb/24/802.html

All Replies

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

07-06-2017 01:50 PM

Please post the code that EG generates.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

07-06-2017 02:02 PM

Here it is, except for my filepaths which I substituted <dummynames> for:

**My original sample of 25:**

TITLE; FOOTNOTE;

PROC SURVEYSELECT DATA=<mylibrary>.<mysourcedataset>()

OUT=WORK.<myoutputdataset>

METHOD=SRS

N=25

SEED=4262017;

RUN;

QUIT;

**My attempt to expand the sample to 100:**

TITLE; FOOTNOTE;

PROC SURVEYSELECT DATA=<mylibrary>.<mysourcedataset>()

OUT=WORK.<myoutputdataset>

METHOD=SRS

N=100

SEED=4262017;

RUN;

QUIT;

Thanks for your help!

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

07-06-2017 02:43 PM

I will briefly describe what you can do. Please study the code I include.

1. Add a row identifier to the original data.

2. Use PROC SQL to create a macro variable 'SelectedObs' that contains the ID values of the 25 rows that were previously selected.

3. Call SURVEYSEELCT again, but use a WHERE clause:

where ID not in (&SelectedObs);

4. Concatenate the two samples.

You will get a new sample that has the original sample as the first 25 rows, and 100 new observations as the remaining rows. Here is an example that uses the SasHelp.Cars data set:

```
data Have;
set sashelp.cars;
ID = _N_;
run;
PROC SURVEYSELECT DATA=Have OUT=sample
METHOD=SRS
N=25 SEED=4262017;
RUN;
proc sql noprint;
select ID into :SelectedObs separated by ','
from sample;
quit;
PROC SURVEYSELECT DATA=Have OUT=sample2
METHOD=SRS
N=100 SEED=4262017;
where ID not in (&SelectedObs);
RUN;
data All;
set Sample Sample2;
run;
```

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

07-06-2017 02:54 PM

Thank you for your quick reply! I think I understand... basically I remove the ones I already selected and select 100 more at random from the remainder, right? I'm not a statistician so I guess I didn't realize I could do that, but it makes sense... we are continuing to allow each line item an equal probability of being selected.

Here is a follow-up question... We may not need the full 75 or 100 more, so I was wondering if it is also possible to get SAS to output the sample in the order lines were randomly selected (unsorted)? Then we can proceed through the list until we get at least 25 applicable lines - and may not necessarily have to look all of them up.

Thanks again,

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

07-06-2017 03:11 PM - edited 07-06-2017 03:13 PM

yes, your statements in the first paragraph are correct.

It sounds like you think that PROC SURVEYSELECT generates 100 numbers between 1 and N and then outputs those rows. That is not what happens. It goes through the data set row by row. If you are selecting 100 obs, then the first row has a 100/N probability of being chosen.

Either the first row is selected (and written to the output data set) or it isn't. If it is, then the next row has a 99/(N-1) chance of being chosen. If it isn't selected, then the next row has a 100/(N-1) probability. This process continues until 100 obs are selected.

This same algorithm explains why N=25 and N=100 yield different rows. For a DATA step version of the algorithm, see "Method 3" in this SAS article: http://support.sas.com/kb/24/722.html. If you want randomly sorted observations, you can use "Method 2" or the method presented at http://support.sas.com/kb/24/802.html

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

07-06-2017 03:33 PM

Thank you again! The code you originally posted did in fact get me 125 observations including my original 25 lickety-split. (Thanks!)

Since I promised my colleague (prematurely?) that we only had to go through the sample until we hit "25 applicable observations" I would like to press in and figure out how to make the SAMPLE2 list unsorted... where we can draw a line when we get to "25 applicable" and ignore the rest. You mentioned 3 options: 1) DATA step version (I wasn't sure what the pros/cons of this were) and 2) randomly sorted via Method 2 at http://support.sas.com/kb/24/722.html and 3) randomly sorted via "the method" at http://support.sas.com/kb/24/802.html (I saw 2 methods at this link and didn't know which to choose). Can you please help direct me a little futher?

Thank you,

Solution

07-06-2017
04:32 PM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

07-06-2017 03:55 PM

Use the DATA step at http://support.sas.com/kb/24/802.html

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

07-06-2017 04:31 PM

Thank you so much! This appears to have worked! Here is the code I used for RANUNI statement - does this look right? I wasn't sure if I put the 4262017 seed number and 7,996 line count in the right place. The 7,996 line count is after the first 25 were removed from the original 8,021 using an Enterprise Guide join tables step resulting in a dataset named "notselected".

data <mylibrary>.sampleb(drop=i);

choice=int(ranuni(4262017)*7996)+1;

set notselected point=choice nobs=n;

i+1;

/* Enter the desired sample size, 100 in this case */

if i>100 then stop;

run;

/* This combines the 2 samples to one data set with the original 25 lines on top.*/

data <mylibrary>.largercombinedsample;

set <myfirstsamplefile> Sampleb;

run;