Hi guys,
I have a data set with over 250,000 observations who belong to 1000 census block groups. Each individual chose only one block group. In other word, everyone is facing 1000 choices. I wanted to randomly generate sub-choice-set including only 20 block groups (instead of 1000) for every observation. The problem here is for each individual, the block group that was chosen by that person needs to be included in the sub-choice-set. I have a "choice" variable ranging from 1 to 1009 indicating which block group a person chose.
The example data looks like:
ID year block_group choice
1 2011 410050201001 1
2 2014 410050201001 1
3 2005 415050204032 15
4 2012 415050215002 20
5 2009 410510006022 33
.
.
.
I have a vague idea about how to tackle this problem but not sure how to implement it. The steps I thought are:
1. Create an array of 20 picks pick1 – pick 20.
2. For each individual, set pick1=choice. Then draw a random number between 1 and 1009 using a random number function and compare to the array of picks for that person. If not already chosen, set pick2 to the number. Continue until all 20 picks are assigned unique numbers.
data test;
array _pick{20} pick1-pick20;/*Create an array of 20 picks for choice subsets*/
/*do obsnum=1 to last;*/
pick1=bgid;
x=randbetween(1,1009);
...
I just have a hard time to wrap my head around this. Any input or suggestion will be much appreciated.
Thank you so much for your time.
Try this one. CODE NOT TESTED.
proc surveyselect data=have method=srs sampsize=20 out=want;
 cluster choice;
run;Slight modification to @Ksharp suggestion since you absolutely want the choice to be included, use a sampsize of 19 and then append in the choice.
Thank you Reeza. The problem is if I simply append the choice set to the 19 random selected block groups, there is a possibility that the block group chosen by that individual is already in the 19 block group blocks. I've figured out a way to do this this morning. Thank you for your help!
Thanks a lot. proc surveyselect will give me a random sample but I need to be sure the block group that was chosen by an individual was included as well.
I don't understand what you mean. Can you post an example to explain this ? Post data and output .
Each individual has a chosen block group. By generating a random sample including 20 block groups, I cannot be sure that the block group that was chosen by each person is within the 20 randomly selected block groups. I have already figured out the code. Thank you anyway!
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.
Find more tutorials on the SAS Users YouTube channel.
