04-01-2015 07:46 PM
Here is a brief description of the problem:
I have a dataset containing IDs cand dates. I have assigned the cases to groups based on a year and have 3 such groups. I need to obtain random samples from this dataset in such a way that the 35%,35% and 30% of cases are picked from the three groups respectively and none of the IDs are overlapping. I intended to use PROC Surveyselect simple random sampling for this but I don't know how to assign the percentages/weights to the groups. Kindly advise.
04-01-2015 08:07 PM
The basic approach would be to have YEAR as a STRAT variable and then SAMPRATE = (.35 .35 .30)
However if the ID appears in multiple years then additional steps may be needed. If your data is large enough you may not have any duplicates selected.