I need to create a control group that has the same characteristics as the target population. I can use the following to get a random sample:
PROC SURVEYSELECT DATA=sel OUT=target METHOD=SRS SAMPRATE=.5 SEED=2;
STRATA momage gravidity fplgr smoke med_rsk lowwt bmi marstat;
However, this is code from a different programmer and I don't really know what it's doing. This is the Target dataset, a second dataset has the universe of non-selected members for the Control group.
Is there a better method? Is there anything I should adjust? Thanks
If you've already collected the data, you could use propensity score matching. There is a huge literature on it.
I've not used SURVEYSELECT, but my general concern with trying to match on that many variables is that it is nearly impossible to do. That STRATA statement would indicate several hundred different buckets.