01-11-2017 12:06 PM
I need to validate a new questionnaire made of 62 binary items. I have to perform EFA, defining reliability and implementing CFA.
Unfortunately, I got a lot of missing values (80% of subjects have at least one missing answer) and I don't know how to deal with them. i don't think a listwise apporach could be a solution here, but should I impute those values before proceeding with EFA? If so, do you have any suggestions on which method(s) should I use in SAS?
Many thanks in advance for your time and help,
01-11-2017 01:27 PM
EFA? CFA? TLA's (Three letter acronyms) are not always known to everyone and often are jargon to a specific field so it may help to describe what these activities entail.
Are you sure that some, if not all, of the missing values should not be missing? Many surveys involve skip patterns where the response to one question determines whether or not the respondent should be asked other questions. Example: Male respondents should not generally be asked about female targeted products or gender specific health issues.
01-12-2017 06:27 AM
I'm sorry, I was referring to Exploratory Factorial Analysis (EFA) and Confirmatory Factorial Analysis (CFA).
Unfortunately it isn't the case of this survey, all the questions should have been answered (i.e. there aren't skip patterns).
01-12-2017 01:14 PM
Is thare a pattern to the missing such that one or two questions account for the majority of the missings or are the scattered pretty much across all of the variables?
If a couple of variables account for the missing you might consider the analyis without them. There may also be a sytemic reason for just a few to be missing such as a very poorly phrased question "Have you stopped beating your wife yet?" or asking for a yes/no answer when the question asked (possibly by implication) to consider more than two valid answers.
And by any chance, are you looking at recoded data that reduced a multiple response down to two categories? Possibly the original question had yes, no, I don't know and refused to answer as responses and only the yes/no are what you see. Then you might actually consider a different coding /recoding scheme or analysis.
Do you have any characteristics of the respondents that might tend to make them similar such as age, race, gender, location, activity to make groups? If so one approach may be to pick the most frequent response within the group for the variable to impute the missing. Or randomly assign a value with probability equal to the proportion of responses to the non-missing within a group.
But I would say in a very general term that the "questionaire validation" is a failure as the first stage: answer all the questions failed.
01-16-2017 11:45 AM
Thanks for your suggestions.
Unfortunately I previously checked and there's no pattern to the missing, they are pretty much scattered along each question.
Regarding your second question, actually there aren't different type of missing, let's say, but just only one to take into consideration.
Today while browsing, I found the paper attached here and I was thinking to apply a single imputation stochastic logistic regression using PROC MI.