turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- SAS Programming
- /
- SAS Procedures
- /
- Random sampling with Constraints

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

09-30-2015 04:44 PM

Hello everyone,

I have two constraints to choose my random sample without replacement. I want to do a survey of 100 people. My two constraints are:

- I want 10 subgroups of people whose zip code ends with 0 through 9. The first subgroup of 10 persons’ zip code ends with 0, second subgroup’s zip code ends with 1, and so on until 9. (I did this part with PROC SURVEYSELECT and STRATA)
- Then here is the tricky part (second constraint), I don’t want to choose more than 5 people from any state. In other words, I need a way to randomly select people based on their zip code AND their state. (This is the part that I need help on.)

Thanks in advance.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Suki

09-30-2015 06:48 PM

How many states are in your sample frame?

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to ballardw

10-01-2015 12:47 PM

I have 50 states in my sample frame. Also I have ANOTHER constraint that needs to be included. So just to be clear, let me repeat my problem once again (with the additional constraint).

1. My sample frame has 50 states and zip codes that end with 0 through 9. Firstly, I want to randomly sample 100 observations (10 subgroups of 10 observations each) based on their zip code. (First subgroup of 10 observations has zip code ending with 0, second subgroup has zip code ending with 1, etc.)

2. Second constraint (with the new constraint) is that I want at most 5 observations from state1, at most 3 observations from state2 and all other states must be included less than 2. This is the stage that I am having a problem.

Thanks again for your help.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Suki

09-30-2015 10:09 PM

Intuitively, that seems to violate the definition of a random sample. California has many more people living there than Montana. Why should California be limited to 5 observations? Or is my intuition just wrong here?

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Suki

09-30-2015 10:44 PM

Same question as @ballardw. If you choose 10 people from a population spreaded over 50 states, it is very unlikely that 5 of them will be from the same state. You could simply pick another sample if it ever occured.

PG

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to PGStats

10-01-2015 12:50 PM

Please see my reply above. It is unlikely but I want my sampling method to be "systematically correct" meaning that I am wondering if there is a way to do it with SAS functions.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Suki

10-01-2015 01:02 PM

Run proc freq (or favorite summarization procedure) on your resultant sample after adding a state variable (ZIPSTATE, ZIPNAME or ZIPNAMEL functions), or if the state is already in the sample. Check the counts, if not as desired, then resample.

I have dones something similar because of costs associated with a study BUT I have a sneaking feeling in the back of my mind that frequent resampling may be isn't quite getting the correct sample weights. A moderate amount of code could create this as a macro loop.

Or if your source data is large enough, request a number of Replicates (REP= option) and examine each replicate for fitting within your constraits and select appropriate replicates.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Suki

10-01-2015 02:04 PM

Following @ballardw's suggestion, here is an example. The goal here is to get no state with 3 or more selected units and at most two states with 2 selected units for any given zip code termination digit.

```
data frame;
call streamInit(17646);
do state = 1 to 50;
do id = 1 to 1000;
zip = int(10*rand("UNIFORM"));
output;
end;
end;
run;
proc sort data=frame; by zip state; run;
%macro mySurvey;
%do %until(&n3=0 AND &n2<=2);
proc surveyselect data=frame out=sample sampsize=10;
strata zip;
run;
proc sql;
select
max(n3s),
max(n2s)
into :n3, :n2
from (
select
sum(n >= 3) as n3s,
sum(n >= 2) - sum(n >= 3) as n2s
from (
select
zip,
count(*) as n
from sample
group by zip, state)
group by zip)
;
quit;
%end;
%mend mySurvey;
%mySurvey;
proc sql;
select zip, state, count(*) as n from sample group by zip, state;
quit;
```

PG