BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
AO88
Fluorite | Level 6

Hello,

 

This lies more along the realm of data management.

I have a dataset where I want to flag observations for errors. The variables are a set of questions/responses and the responses are either Yes (1) or No (0). As you'll see in the example dataset below, variables Q4a1 to Q4a9 are a group of responses from  'Question4.' If a respondent answers 'Yes' to question 4, then at least one entry (Yes) must be made in variables Q4a1 to Q4a9.

 

Example:

 

Question4-Do you have a favorite food? (Yes-1/No-0)

 

Check box

Q4a1- Pizza (Response is either 1 or 0)

Q4a2-Steak (1/0)

Q4a4- Pasta (1/0)

Q4a5- Other (1/0)

.....etc.

 

I want to be able to flag respondents where they answered yes to Question 4 but they made no selection for Q4a1 through Q4a9.

 

I created an array but i'm running into difficulty where it seems to be flagging respondents (variable Data1-Data12) that didn't make a selection for the last response (Q4a9) instead of flagging 'Data2' and 'Data4' where no entries were made in Q4a1 - Q4a9.

 

The frequency table should be:

 

Q4aFreq
09
12

 

Where 2 respondents that made no selections in Q4a1-Q4a9 are flagged.

 

But instead I'm getting this:

Q4aFreq
04
17

 

Where it seems to be flagging non-responses for Q4a9.

 

 

 

Have attached code below:

 

 

 

data DataInput;

input Data $ Question1:32. Question2:32. Question3:32. Question4:32. Q4a1:32. Q4a2:32. Q4a3:32. Q4a4:32. Q4a5:32. Q4a6:32. Q4a7:32. Q4a8:32. Q4a9:32.;

datalines;

 

DATA1 0 1 1 0 1 0 0 0 0 0 0 0 1 1

DATA2 1 0 0 1 0 0 0 0 0 0 0 0 0 0

DATA3 1 1 0 1 1 1 1 0 1 0 1 1 0 0

DATA4 1 0 0 1 0 0 0 0 0 0 0 0 0 0

DATA5 1 1 0 1 0 1 0 1 1 1 1 0 1 1

DATA6 1 1 0 1 0 1 0 1 1 1 1 0 1 1

DATA7 1 1 0 1 0 1 0 1 1 1 1 0 1 1

DATA8 1 1 0 1 0 1 0 1 1 1 1 0 1 1

DATA9 1 1 0 1 0 0 0 0 0 0 0 1 0 0

DATA10 1 1 0 1 0 0 0 0 0 0 0 1 0 0

DATA11 1 1 0 1 0 1 0 1 1 1 1 1 0 0

DATA12 1 1 0 1 0 1 0 1 1 1 1 1 0 0

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;

 

 

data datainput_array;

set datainput;

array Q4a_ARRAY (9) Q4a1 - Q4a9;

do i= 1 to 9;

if Question1=1 then do;

if Question4=1 and

(Q4a_ARRAY(i) = 0) then Q4a=1; else Q4a=0;

end;

end;

run;

 

proc freq data=datainput_array;

TABLE Q4a;

RUN;

 

proc sort data=datainput_array out=Q4a_0;

Where Q4a=0;

by Data;

run;

 

proc sort data=datainput_array out=Q4a_1;

Where Q4a=1;

by Data;

run;

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
ed_sas_member
Meteorite | Level 14

Hi @AO88 

Welcome to the community!

Could this code answer your question?

data datainput_array;
	set datainput;
	array Q4a_ARRAY (*) Q4a:;
	
	/*Q4a_flag = 0 if all values are set to 0*/
	/*Q4a_flag = 1 if at least one value is set to 1*/
	if sum(of Q4a_ARRAY(*)) = 0 then Q4a_flag = 0;
	else if sum(of Q4a_ARRAY(*)) > 0 then Q4a_flag = 1;
run;

proc freq data=datainput_array;
	TABLE Question4*Q4a_flag / nocum nopercent norow nocol out=Q4_stat (drop=percent) sparse;
run;

Best,

View solution in original post

9 REPLIES 9
ed_sas_member
Meteorite | Level 14

Hi @AO88 

Welcome to the community!

Could this code answer your question?

data datainput_array;
	set datainput;
	array Q4a_ARRAY (*) Q4a:;
	
	/*Q4a_flag = 0 if all values are set to 0*/
	/*Q4a_flag = 1 if at least one value is set to 1*/
	if sum(of Q4a_ARRAY(*)) = 0 then Q4a_flag = 0;
	else if sum(of Q4a_ARRAY(*)) > 0 then Q4a_flag = 1;
run;

proc freq data=datainput_array;
	TABLE Question4*Q4a_flag / nocum nopercent norow nocol out=Q4_stat (drop=percent) sparse;
run;

Best,

AO88
Fluorite | Level 6
Thank you @ed_sas_member ! Yes this worked. Got an error when I did proc freq, but then I changed it from Q4a_all to Q4a_flag.

Can I still be able to include a "then do" statement to this?
ed_sas_member
Meteorite | Level 14

Awesome!

For sure, you can.

Please does not hesitate to share you code if you experience trouble to achieve this.

Best,

ed_sas_member
Meteorite | Level 14

The issue with your code comes from the statement:

if Question4=1 and
(Q4a_ARRAY(i) = 0) then Q4a=1; else Q4a=0;

Indeed, the value of the next question will override the value of the previous one.

Ex. with data1:

  Question1 Question2 Question3 Question4 Q4a1 Q4a2 Q4a3 Q4a4 Q4a5 Q4a6 Q4a7 Q4a8 Q4a9
  DATA1 0 1 1 0 1 0 0 0 0 0 0 0 1

 


i=1 -> Question4=1 and(Q4a_ARRAY(1) = 0 => Q4a is set to 0;

i=2 -> Question4=1 and(Q4a_ARRAY(2) = 1 => Q4a is now set to 1;

i=3 -> Question4=1 and(Q4a_ARRAY(3) = 0 => Q4a is now set to 0;

...

-> the final value of Q4a corresponds to the last check for  Q4a_ARRAY(9)

AO88
Fluorite | Level 6
Okay I see now, so that's why when doing a freq it only showed responses for Q4a9. Once again thank you!!! This really helped!
AO88
Fluorite | Level 6

Hi @ed_sas_member (and anyone else as well)

I've got another question.

 

What if the outcome wasn't a binomial response and instead was free text. So it's the same set-up except the respondent has to type something in. How can I check to see that at least one response was typed out in the Q4a1-Q4a9? (Doesn't matter the content but just being able to note that something was or wasn't entered).

 

How would I handle the array if the variables are character and not numeric?

 

Before I would've done it like this (using the same array):

 

To flag respondents where all responses to Q4a1-Q4a9  are missing.

 

if Question4=1 and

(Q4a_ARRAY(i)= '') then Q4a=1; else Q4a=0;

end;

 

 

ed_sas_member
Meteorite | Level 14

Hi @AO88 

 

In this case, you could do this:

data datainput_array;
	set datainput;
	array Q4a_ARRAY (*) $ Q4a:;
	/* Q4a_flag = 0 if at least one Q4ax is not null (including trailing blanks if any)*/
	/* Q4a_flag = 1 if at all Q4ax are null (including trailing blanks if any)*/
	do i=1 to dim(Q4a_ARRAY);
		if strip(Q4a_ARRAY(i)) ne "" then do;
				Q4a_flag = 0;
				leave;
			end;
		else Q4a_flag = 1;
	end;
run;
proc print;

proc freq data=datainput_array;
	TABLE Question4*Q4a_flag / nocum nopercent norow nocol out=Q4_stat (drop=percent) sparse;
run;

Best,

 

AO88
Fluorite | Level 6
Another question as well. What would also happen if Variable Q4a9 was a character variable. In that case would array be set up like:

array Q4a_ARRAY (8) Q4a1-Q4a8;
do i= 1 to 8;
/*Q4a_flag = 0 if all values are set to 0*/
/*Q4a_flag = 1 if at least one value is set to 1*/
if sum(of Q4a_ARRAY(i)) = 0 then Q4a_flag = 1;
else if sum(of Q4a_ARRAY(*)) > 0 then Q4a_flag = 0;
run;


ed_sas_member
Meteorite | Level 14

Hi @AO88 

 

I see two approaches:

- the one you suggest : put Q4a1-Q4a8 in the array to create a temporary flag. Then, compare this flag to Q4a9 -> e.g. if flag=0 and Q4a9=" " then Q4a_flag=0

- another approach could be to create a numeric dummy variable which is equal to 1 when Q4a9 is not null and à otherwise. -> and then put Q4a1-Q4a8 and this dummy variable in the array.

 

Hope this helps!

 

My best,

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 9 replies
  • 1480 views
  • 3 likes
  • 2 in conversation