01-08-2015 02:16 PM
Hello!
I drew a sample data set (child-level data) from a "total population" data set (child level) based on certain criteria and using the call rantbl and call ranuni statements.
To check whether the survey sample was a good reflection of the total pop data set, I created weights for each child and produced frequency tables for the
-total pop
-survey sample, unweighted
-survey sample, weighted
The freqs point to the data set possibly being outside of sampling error/something wrong in the program.
Would you have suggestions for how to check the reliability of the survey sample compared to the total pop data set in other ways?
How about a surveyfreq procedure to look at confidence intervals?
Thank you very much in advance
01-08-2015 05:22 PM
The first thing I check with weighted data is to the WEIGHTS sum to the POPULATION total, or at least within a small rounding error. If not then the weight creation process is suspect.
Could you provide some examples of the frequencies you are having problems with? And the code used to generate them.
01-08-2015 06:35 PM
Thanks, ballardw! But it's not possible to tell if the problem is in the sampling or the weighting from just comparing the sum of the all the records’ weight values in the survey sample file to the total number of records is it?
01-08-2015 07:06 PM
If using an inverse probability of selection to weight data then the sum of the weights should be very close to the number in the base population.
Example:
proc sort data=sashelp.class out=work.classsort;by sex;run;
proc surveyselect data=work.classsort out=work.example
sampsize= (2 4) outsize;
strata sex;
run;
proc means sum; var SamplingWeight; run;
Note that the sum of the SamplingWeight variable is very close to the number of records in the base data set.
Sampling can be specified to use a rate such as samprate (.3 .5) to select 30 percent of female and 50 percent of males.