Hello,
This lies more along the realm of data management.
I have a dataset where I want to flag observations for errors. The variables are a set of questions/responses and the responses are either Yes (1) or No (0). As you'll see in the example dataset below, variables Q4a1 to Q4a9 are a group of responses from 'Question4.' If a respondent answers 'Yes' to question 4, then at least one entry (Yes) must be made in variables Q4a1 to Q4a9.
Example:
Question4-Do you have a favorite food? (Yes-1/No-0)
Check box
Q4a1- Pizza (Response is either 1 or 0)
Q4a2-Steak (1/0)
Q4a4- Pasta (1/0)
Q4a5- Other (1/0)
.....etc.
I want to be able to flag respondents where they answered yes to Question 4 but they made no selection for Q4a1 through Q4a9.
I created an array but i'm running into difficulty where it seems to be flagging respondents (variable Data1-Data12) that didn't make a selection for the last response (Q4a9) instead of flagging 'Data2' and 'Data4' where no entries were made in Q4a1 - Q4a9.
The frequency table should be:
Q4a | Freq |
0 | 9 |
1 | 2 |
Where 2 respondents that made no selections in Q4a1-Q4a9 are flagged.
But instead I'm getting this:
Q4a | Freq |
0 | 4 |
1 | 7 |
Where it seems to be flagging non-responses for Q4a9.
Have attached code below:
data DataInput;
input Data $ Question1:32. Question2:32. Question3:32. Question4:32. Q4a1:32. Q4a2:32. Q4a3:32. Q4a4:32. Q4a5:32. Q4a6:32. Q4a7:32. Q4a8:32. Q4a9:32.;
datalines;
DATA1 0 1 1 0 1 0 0 0 0 0 0 0 1 1
DATA2 1 0 0 1 0 0 0 0 0 0 0 0 0 0
DATA3 1 1 0 1 1 1 1 0 1 0 1 1 0 0
DATA4 1 0 0 1 0 0 0 0 0 0 0 0 0 0
DATA5 1 1 0 1 0 1 0 1 1 1 1 0 1 1
DATA6 1 1 0 1 0 1 0 1 1 1 1 0 1 1
DATA7 1 1 0 1 0 1 0 1 1 1 1 0 1 1
DATA8 1 1 0 1 0 1 0 1 1 1 1 0 1 1
DATA9 1 1 0 1 0 0 0 0 0 0 0 1 0 0
DATA10 1 1 0 1 0 0 0 0 0 0 0 1 0 0
DATA11 1 1 0 1 0 1 0 1 1 1 1 1 0 0
DATA12 1 1 0 1 0 1 0 1 1 1 1 1 0 0
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
data datainput_array;
set datainput;
array Q4a_ARRAY (9) Q4a1 - Q4a9;
do i= 1 to 9;
if Question1=1 then do;
if Question4=1 and
(Q4a_ARRAY(i) = 0) then Q4a=1; else Q4a=0;
end;
end;
run;
proc freq data=datainput_array;
TABLE Q4a;
RUN;
proc sort data=datainput_array out=Q4a_0;
Where Q4a=0;
by Data;
run;
proc sort data=datainput_array out=Q4a_1;
Where Q4a=1;
by Data;
run;
Hi @AO88
Welcome to the community!
Could this code answer your question?
data datainput_array;
set datainput;
array Q4a_ARRAY (*) Q4a:;
/*Q4a_flag = 0 if all values are set to 0*/
/*Q4a_flag = 1 if at least one value is set to 1*/
if sum(of Q4a_ARRAY(*)) = 0 then Q4a_flag = 0;
else if sum(of Q4a_ARRAY(*)) > 0 then Q4a_flag = 1;
run;
proc freq data=datainput_array;
TABLE Question4*Q4a_flag / nocum nopercent norow nocol out=Q4_stat (drop=percent) sparse;
run;
Best,
Hi @AO88
Welcome to the community!
Could this code answer your question?
data datainput_array;
set datainput;
array Q4a_ARRAY (*) Q4a:;
/*Q4a_flag = 0 if all values are set to 0*/
/*Q4a_flag = 1 if at least one value is set to 1*/
if sum(of Q4a_ARRAY(*)) = 0 then Q4a_flag = 0;
else if sum(of Q4a_ARRAY(*)) > 0 then Q4a_flag = 1;
run;
proc freq data=datainput_array;
TABLE Question4*Q4a_flag / nocum nopercent norow nocol out=Q4_stat (drop=percent) sparse;
run;
Best,
Awesome!
For sure, you can.
Please does not hesitate to share you code if you experience trouble to achieve this.
Best,
The issue with your code comes from the statement:
if Question4=1 and
(Q4a_ARRAY(i) = 0) then Q4a=1; else Q4a=0;
Indeed, the value of the next question will override the value of the previous one.
Ex. with data1:
Question1 | Question2 | Question3 | Question4 | Q4a1 | Q4a2 | Q4a3 | Q4a4 | Q4a5 | Q4a6 | Q4a7 | Q4a8 | Q4a9 | ||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
DATA1 | 0 | 1 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
i=1 -> Question4=1 and(Q4a_ARRAY(1) = 0 => Q4a is set to 0;
i=2 -> Question4=1 and(Q4a_ARRAY(2) = 1 => Q4a is now set to 1;
i=3 -> Question4=1 and(Q4a_ARRAY(3) = 0 => Q4a is now set to 0;
...
-> the final value of Q4a corresponds to the last check for Q4a_ARRAY(9)
Hi @ed_sas_member (and anyone else as well)
I've got another question.
What if the outcome wasn't a binomial response and instead was free text. So it's the same set-up except the respondent has to type something in. How can I check to see that at least one response was typed out in the Q4a1-Q4a9? (Doesn't matter the content but just being able to note that something was or wasn't entered).
How would I handle the array if the variables are character and not numeric?
Before I would've done it like this (using the same array):
To flag respondents where all responses to Q4a1-Q4a9 are missing.
if Question4=1 and
(Q4a_ARRAY(i)= '') then Q4a=1; else Q4a=0;
end;
Hi @AO88
In this case, you could do this:
data datainput_array;
set datainput;
array Q4a_ARRAY (*) $ Q4a:;
/* Q4a_flag = 0 if at least one Q4ax is not null (including trailing blanks if any)*/
/* Q4a_flag = 1 if at all Q4ax are null (including trailing blanks if any)*/
do i=1 to dim(Q4a_ARRAY);
if strip(Q4a_ARRAY(i)) ne "" then do;
Q4a_flag = 0;
leave;
end;
else Q4a_flag = 1;
end;
run;
proc print;
proc freq data=datainput_array;
TABLE Question4*Q4a_flag / nocum nopercent norow nocol out=Q4_stat (drop=percent) sparse;
run;
Best,
Hi @AO88
I see two approaches:
- the one you suggest : put Q4a1-Q4a8 in the array to create a temporary flag. Then, compare this flag to Q4a9 -> e.g. if flag=0 and Q4a9=" " then Q4a_flag=0
- another approach could be to create a numeric dummy variable which is equal to 1 when Q4a9 is not null and à otherwise. -> and then put Q4a1-Q4a8 and this dummy variable in the array.
Hope this helps!
My best,
Available on demand!
Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.