12-05-2016 01:05 AM
Recently, we conducted an online survey and asked people whether you would buy ABC company's products.. The survey contained only one simple question of whether you would buy ABC company's products.. We didn't captured any other customer information (for Privacy issues) except their location. From the data , it seems that we are having a sampling bias as some data is skewed to specific location as the survey population is not spread as per the actual population.
12-08-2016 08:48 AM
I would recommend using proc surveyselect with unrestricted random sampling to draw the number of samples with reps if needed for under-represented locations from your survey that you would expect by location so the resampled total approximately equals the original survey total. You will get some repeats in the underweighted locations but that's how you rebalance the overall sample.
Next time just use quota sampling by location.
12-08-2016 01:04 PM
Thanks Damien - I will try your solution ! Although I feel that since this survey haven't reached to the entire region and is being Self Selective because its offered on web, hence knowing the characteristics of the people who are left out is hard to measure.. In this case, fitting a probability model say Logit model will be biased to the characterictics of those people who took self survey . I read a SAS paper on how to adjust Self Selection and Under Coverage bias, but didn't quite understood how to apply that paper techniques to my Survey /Business problem.
Will do some more research and come back to this forum again.. Thanks and appreciate your help - Sachin
12-08-2016 07:42 PM
The best you can probably do is to compare the distibutions of any demographics you measured in your online survey to those obtained by national census or similarly geographically intensive large survey methods. What is left over after you do that is something that I too have been wondering about since 1992, and would be interested in what you find. Undercoverage also surfaces as an issue in the other context of secondary internal data predictive modelling, but with different problems and solutions.