BookmarkSubscribeRSS Feed
AIUQ532
Calcite | Level 5

Recently, we conducted an online survey and asked people whether you would buy ABC company's products.. The survey contained only one simple question of whether you would buy ABC company's products.. We didn't captured any other customer information (for Privacy issues) except their location. From the data , it seems that we are having a sampling bias as some data is skewed to specific location as the survey population is not spread as per the actual population. 

 
This could be because , we didn't cherry pick the respondents (purposely didn't do that) and allowed them to participate at their own will ( Selective bias) as well as we couldn't penetrated to the entire region ( product spread) , so received under coverage sample. 
 
My question is that , how can this sample still be used to predict whether a group of population would buy ABC Company's products , if this can be done then how do I take care of sampling bias caused because of 2 things - Selection bias and Under Coverage penetration bias.. 
 
I read about this on web and seems like this problem is quite common in doing Survey analysis, but I couldn't really make a break thru on how to adjust the survey population or opinions weights on the final outcome ( Buy Vs Non Buy).. 
 
Thanks and appreciate your help ! 
Sachin
3 REPLIES 3
Damien_Mather
Lapis Lazuli | Level 10

I would recommend using proc surveyselect with unrestricted random sampling to draw the number of samples with reps if needed for under-represented locations from your survey that you would expect by location so the resampled total approximately equals the original survey total. You will get some repeats in the underweighted locations but that's how you rebalance the overall sample.

 

Next time just use quota sampling by location.

AIUQ532
Calcite | Level 5

Thanks Damien - I will try your solution ! Although I feel that since this survey haven't reached to the entire region and is being Self Selective because its offered on web, hence knowing the characteristics of the people who are left out is hard to measure.. In this case, fitting a probability model say Logit model will be biased to the characterictics of those people who took self survey . I read a SAS paper on how to adjust Self Selection and Under Coverage bias, but didn't quite understood how to apply that paper techniques to my Survey /Business problem.

 

Will do some more research and come back to this forum again.. Thanks and appreciate your help - Sachin

Damien_Mather
Lapis Lazuli | Level 10

The best you can probably do is to compare the distibutions of any demographics you measured in your online survey to those obtained by national census or similarly geographically intensive large survey methods. What is left over after you do that is something that I too have been wondering about since 1992, and would be interested in what you find. Undercoverage also surfaces as an issue in the other context of secondary internal data predictive modelling, but with different problems and solutions.  

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 3 replies
  • 1604 views
  • 0 likes
  • 2 in conversation