How to compare total pop and survey sample data sets?

Reply
Frequent Contributor
Posts: 131

How to compare total pop and survey sample data sets?

Hello!

I drew a sample data set (child-level data) from a "total population" data set (child level) based on certain criteria and using the call rantbl and call ranuni statements.

To check whether the survey sample was a good reflection of the total pop data set, I created weights for each child and produced frequency tables for the

-total pop

-survey sample, unweighted

-survey sample, weighted

The freqs point to the data set possibly being outside of sampling error/something wrong in the program.

Would you have suggestions for how to check the reliability of the survey sample compared to the total pop data set in other ways?

How about a surveyfreq procedure to look at confidence intervals?

Thank you very much in advance

Grand Advisor
Posts: 10,251

Re: How to compare total pop and survey sample data sets?

The first thing I check with weighted data is to the WEIGHTS sum to the POPULATION total, or at least within a small rounding error. If not then the weight creation process is suspect.

Could you provide some examples of the frequencies you are having problems with? And the code used to generate them.

Frequent Contributor
Posts: 131

Re: How to compare total pop and survey sample data sets?

Thanks, ballardw!  But it's not possible to tell if the problem is in the sampling or the weighting from just comparing the sum of the all the records’ weight values in the survey sample file  to the total number of records is it?

Grand Advisor
Posts: 10,251

Re: How to compare total pop and survey sample data sets?

If using an inverse probability of selection to weight data then the sum of the weights should be very close to the number in the base population.

Example:


proc sort data=sashelp.class out=work.classsort;by sex;run;
proc surveyselect data=work.classsort out=work.example
      sampsize= (2 4) outsize;
strata sex;
run;

proc means sum; var SamplingWeight; run;

Note that the sum of the SamplingWeight variable is very close to the number of records in the base data set.

Sampling can be specified to use a rate such as samprate (.3 .5) to select 30 percent of female and 50 percent of males.

Ask a Question
Discussion stats
  • 3 replies
  • 212 views
  • 3 likes
  • 2 in conversation