Hello,
This lies more along the realm of data management.
I have a dataset where I want to flag observations for errors. The variables are a set of questions/responses and the responses are either Yes (1) or No (0). As you'll see in the example dataset below, variables Q4a1 to Q4a9 are a group of responses from 'Question4.' If a respondent answers 'Yes' to question 4, then at least one entry (Yes) must be made in variables Q4a1 to Q4a9.
Example:
Question4-Do you have a favorite food? (Yes-1/No-0)
Check box
Q4a1- Pizza (Response is either 1 or 0)
Q4a2-Steak (1/0)
Q4a4- Pasta (1/0)
Q4a5- Other (1/0)
.....etc.
I want to be able to flag respondents where they answered yes to Question 4 but they made no selection for Q4a1 through Q4a9.
I created an array but i'm running into difficulty where it seems to be flagging respondents (variable Data1-Data12) that didn't make a selection for the last response (Q4a9) instead of flagging 'Data2' and 'Data4' where no entries were made in Q4a1 - Q4a9.
The frequency table should be:
Q4a | Freq |
0 | 9 |
1 | 2 |
Where 2 respondents that made no selections in Q4a1-Q4a9 are flagged.
But instead I'm getting this:
Q4a | Freq |
0 | 4 |
1 | 7 |
Where it seems to be flagging non-responses for Q4a9.
Have attached code below:
data DataInput;
input Data $ Question1:32. Question2:32. Question3:32. Question4:32. Q4a1:32. Q4a2:32. Q4a3:32. Q4a4:32. Q4a5:32. Q4a6:32. Q4a7:32. Q4a8:32. Q4a9:32.;
datalines;
DATA1 0 1 1 0 1 0 0 0 0 0 0 0 1 1
DATA2 1 0 0 1 0 0 0 0 0 0 0 0 0 0
DATA3 1 1 0 1 1 1 1 0 1 0 1 1 0 0
DATA4 1 0 0 1 0 0 0 0 0 0 0 0 0 0
DATA5 1 1 0 1 0 1 0 1 1 1 1 0 1 1
DATA6 1 1 0 1 0 1 0 1 1 1 1 0 1 1
DATA7 1 1 0 1 0 1 0 1 1 1 1 0 1 1
DATA8 1 1 0 1 0 1 0 1 1 1 1 0 1 1
DATA9 1 1 0 1 0 0 0 0 0 0 0 1 0 0
DATA10 1 1 0 1 0 0 0 0 0 0 0 1 0 0
DATA11 1 1 0 1 0 1 0 1 1 1 1 1 0 0
DATA12 1 1 0 1 0 1 0 1 1 1 1 1 0 0
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
data datainput_array;
set datainput;
array Q4a_ARRAY (9) Q4a1 - Q4a9;
do i= 1 to 9;
if Question1=1 then do;
if Question4=1 and
(Q4a_ARRAY(i) = 0) then Q4a=1; else Q4a=0;
end;
end;
run;
proc freq data=datainput_array;
TABLE Q4a;
RUN;
proc sort data=datainput_array out=Q4a_0;
Where Q4a=0;
by Data;
run;
proc sort data=datainput_array out=Q4a_1;
Where Q4a=1;
by Data;
run;
Hi @AO88
Welcome to the community!
Could this code answer your question?
data datainput_array;
set datainput;
array Q4a_ARRAY (*) Q4a:;
/*Q4a_flag = 0 if all values are set to 0*/
/*Q4a_flag = 1 if at least one value is set to 1*/
if sum(of Q4a_ARRAY(*)) = 0 then Q4a_flag = 0;
else if sum(of Q4a_ARRAY(*)) > 0 then Q4a_flag = 1;
run;
proc freq data=datainput_array;
TABLE Question4*Q4a_flag / nocum nopercent norow nocol out=Q4_stat (drop=percent) sparse;
run;
Best,
Hi @AO88
Welcome to the community!
Could this code answer your question?
data datainput_array;
set datainput;
array Q4a_ARRAY (*) Q4a:;
/*Q4a_flag = 0 if all values are set to 0*/
/*Q4a_flag = 1 if at least one value is set to 1*/
if sum(of Q4a_ARRAY(*)) = 0 then Q4a_flag = 0;
else if sum(of Q4a_ARRAY(*)) > 0 then Q4a_flag = 1;
run;
proc freq data=datainput_array;
TABLE Question4*Q4a_flag / nocum nopercent norow nocol out=Q4_stat (drop=percent) sparse;
run;
Best,
Awesome!
For sure, you can.
Please does not hesitate to share you code if you experience trouble to achieve this.
Best,
The issue with your code comes from the statement:
if Question4=1 and
(Q4a_ARRAY(i) = 0) then Q4a=1; else Q4a=0;
Indeed, the value of the next question will override the value of the previous one.
Ex. with data1:
Question1 | Question2 | Question3 | Question4 | Q4a1 | Q4a2 | Q4a3 | Q4a4 | Q4a5 | Q4a6 | Q4a7 | Q4a8 | Q4a9 | ||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
DATA1 | 0 | 1 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
i=1 -> Question4=1 and(Q4a_ARRAY(1) = 0 => Q4a is set to 0;
i=2 -> Question4=1 and(Q4a_ARRAY(2) = 1 => Q4a is now set to 1;
i=3 -> Question4=1 and(Q4a_ARRAY(3) = 0 => Q4a is now set to 0;
...
-> the final value of Q4a corresponds to the last check for Q4a_ARRAY(9)
Hi @ed_sas_member (and anyone else as well)
I've got another question.
What if the outcome wasn't a binomial response and instead was free text. So it's the same set-up except the respondent has to type something in. How can I check to see that at least one response was typed out in the Q4a1-Q4a9? (Doesn't matter the content but just being able to note that something was or wasn't entered).
How would I handle the array if the variables are character and not numeric?
Before I would've done it like this (using the same array):
To flag respondents where all responses to Q4a1-Q4a9 are missing.
if Question4=1 and
(Q4a_ARRAY(i)= '') then Q4a=1; else Q4a=0;
end;
Hi @AO88
In this case, you could do this:
data datainput_array;
set datainput;
array Q4a_ARRAY (*) $ Q4a:;
/* Q4a_flag = 0 if at least one Q4ax is not null (including trailing blanks if any)*/
/* Q4a_flag = 1 if at all Q4ax are null (including trailing blanks if any)*/
do i=1 to dim(Q4a_ARRAY);
if strip(Q4a_ARRAY(i)) ne "" then do;
Q4a_flag = 0;
leave;
end;
else Q4a_flag = 1;
end;
run;
proc print;
proc freq data=datainput_array;
TABLE Question4*Q4a_flag / nocum nopercent norow nocol out=Q4_stat (drop=percent) sparse;
run;
Best,
Hi @AO88
I see two approaches:
- the one you suggest : put Q4a1-Q4a8 in the array to create a temporary flag. Then, compare this flag to Q4a9 -> e.g. if flag=0 and Q4a9=" " then Q4a_flag=0
- another approach could be to create a numeric dummy variable which is equal to 1 when Q4a9 is not null and à otherwise. -> and then put Q4a1-Q4a8 and this dummy variable in the array.
Hope this helps!
My best,
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.