Solved: Re: Bootstrapping: to use or not to?

mszommer · Posted 02-16-2021 03:56 AM

Hello,

I would need help w.r.t deciding on an approach. I sent out a survey and about 2000 of 15000 recipients completed the survey.

Doubts have been raised on accepting the results based on the survey responses, as stakeholders claim participants to be different from non-participants.

I now have a list of 2000 participants and 13000 non-participants. I decided to draw two random samples from the non-participants list using the below code:

proc surveyselect data=know.input_non_participants_v1
        out=know.nonp_srs2 /*create two samples*/
        sampsize=2000
        method=srs
       seed=830
       stats;
       run;

I had thought of comparing the three groups using ANOVA to test for differences among means. However, I was asked if I would use bootstrapping. I have never used it before and from my reading, I gather that it resamples the existing data set and creates a sampling distribution (plots the sample means of the 'n' samples). How would that compare the non-participant group with the participant?

Would ANOVA not do the work? Is my approach correct? Compare 2 randomly selected sample(s) of non-participants about the same size as the participants and test for difference among the means? If not, what must my approach be?

I'm pressed for time and really look forward to your advice.

Regards,

MS

Rick_SAS · Posted 02-16-2021 06:39 AM

An ANOVA helps you decide whether the means of groups are different based on an observed sample. Since you have the complete list of 15000 participants, you can use an ANOVA or a two-sample t test on the complete list to test whether the means of the responders/nonresponders are different for various variables.

As you know, ANOVA and t tests are based on certain distributional assumptions. If you do not want to make those assumptions, you can use the bootstrap instead. Bootstrapping can help you determine the probability that the statistic you observed occurred by random chance.

Suppose in your survey, the mean income of the 2000 responders is $100k. Suppose your colleague thinks that richer people are more likely to respond and poorer people are less likely to respond.

The null hypothesis for this case is that everyone that gets the mailing is equally likely to reply. Can we test that hypothesis? One way is to randomly choose 2000 subjects from the complete list of 15000. Compute the mean income for that sample (maybe it is $57k). Now randomly choose another set of 2000 from the 15000 and compute the mean income (maybe it is $62k). When you repeat this process over and over, you get a distribution of mean incomes under the hypothesis that everyone is equally likely to respond. If your observed statistic ($100k) is an extreme value for this distribution (much higher than most of the other values), then you reject the null hypothesis and conclude that the value you observed is unlikely to occur if everyone is equally likely to reply.

For tons of SAS code related to bootstrapping, see The Essential Guide to Bootstrapping in SAS. In particular, see this introductory article first to make sure you understand the basics.

One way to approach your problem is to use PROC TTEST, which includes a BOOTSTRAP statement. You can use it to test whether the mean of the responders is different from the mean of the nonresponders for the variable(s) of interest.

View solution in original post

Rick_SAS · Posted 02-16-2021 06:39 AM

An ANOVA helps you decide whether the means of groups are different based on an observed sample. Since you have the complete list of 15000 participants, you can use an ANOVA or a two-sample t test on the complete list to test whether the means of the responders/nonresponders are different for various variables.

As you know, ANOVA and t tests are based on certain distributional assumptions. If you do not want to make those assumptions, you can use the bootstrap instead. Bootstrapping can help you determine the probability that the statistic you observed occurred by random chance.

Suppose in your survey, the mean income of the 2000 responders is $100k. Suppose your colleague thinks that richer people are more likely to respond and poorer people are less likely to respond.

The null hypothesis for this case is that everyone that gets the mailing is equally likely to reply. Can we test that hypothesis? One way is to randomly choose 2000 subjects from the complete list of 15000. Compute the mean income for that sample (maybe it is $57k). Now randomly choose another set of 2000 from the 15000 and compute the mean income (maybe it is $62k). When you repeat this process over and over, you get a distribution of mean incomes under the hypothesis that everyone is equally likely to respond. If your observed statistic ($100k) is an extreme value for this distribution (much higher than most of the other values), then you reject the null hypothesis and conclude that the value you observed is unlikely to occur if everyone is equally likely to reply.

For tons of SAS code related to bootstrapping, see The Essential Guide to Bootstrapping in SAS. In particular, see this introductory article first to make sure you understand the basics.

One way to approach your problem is to use PROC TTEST, which includes a BOOTSTRAP statement. You can use it to test whether the mean of the responders is different from the mean of the nonresponders for the variable(s) of interest.

mszommer · Posted 02-17-2021 05:07 AM

Hello @Rick_SAS,

Thank you for your reply. I followed the introductory article that you suggested in addition to this article and it was just the help that I needed. Really appreciate it!

Regards,

MS

ballardw · Posted 02-16-2021 10:38 AM

I might also be tempted to look at other known demographic characteristics of the survey sample and see if the response rate (none of the question responses, just whether they responded or not) is different between groups.

Characteristics might be almost anything.

Also, did you get "unable to contact" types of responses? You don't mention the methodology for contacting people. You say "recipient" but how are you sure that the person invited actually received the invitation?

I've worked many surveys and a completion rate that high is not bad and if done by email might be considered extremely high as there are so many filters you have to get past.

mszommer · Posted 02-17-2021 12:24 AM

Hello @ballardw, thank you for your reply.

The survey was sent out via an e-newsletter to customers, who have subscribed to receive communication from us. The questionnaire also had questions relating to demographics, so we have that information of respondents, but do not have demographic information of non-respondents. What we have is their purchase records and that is what we are comparing. To see if they are inherently different (claims of my colleagues) or not.

Bootstrapping: to use or not to?

Re: Bootstrapping: to use or not to?

Re: Bootstrapping: to use or not to?

Re: Bootstrapping: to use or not to?

Re: Bootstrapping: to use or not to?

Re: Bootstrapping: to use or not to?

Ready to join fellow brilliant minds for the SAS Hackathon?