Hi all
Although I've been doing survey analysis for a while, I've never had to dig into the data to find possible duplicates. These surveys are public and semi-anonymous (the person can provide an email address if they want, but there is no connection between the responses and the email), so I have more things to think about than the usual face-to-face surveys I'm used to analysing.
I have toyed with the idea of using IPs (can't if they're using public wifi, possible multiple responders using the same); I also can't be sure that the person is going to answer all the questions the same way (they may be trying to get multiple gift cards). I've looked at something called the Hamming Distance (SAS documentation) but no idea if that's an appropriate method.
I apologise this is so vague, but I literally don't even know where to begin. Any suggestions would be appreciated!
Chris
Darn, I was hoping that multiple times was going to be easier...….we're looking for people going in and answering the survey 2 or more times. The current proposal is that we'll review every 15 participants, using the entire database as the comparison (someone may answer again three weeks later).
I have the basics down (for example, data that is clearly gibberish or made up, inconsistent answers, etc.) but I recall reading about a way to detect response patterns in surveys, but I can't recall specifics and the stuff I find is beyond complicated.
Appreciate your time!
Chris
That I could do! Didn't think of that, thanks so much 🙂
Since you mention "attempting to get multiple gift cards" perhaps one place to look is the "where to send the gift card" data.
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.