BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
AO88
Fluorite | Level 6

Hello,

 

This lies more along the realm of data management.

I have a dataset where I want to flag observations for errors. The variables are a set of questions/responses and the responses are either Yes (1) or No (0). As you'll see in the example dataset below, variables Q4a1 to Q4a9 are a group of responses from  'Question4.' If a respondent answers 'Yes' to question 4, then at least one entry (Yes) must be made in variables Q4a1 to Q4a9.

 

Example:

 

Question4-Do you have a favorite food? (Yes-1/No-0)

 

Check box

Q4a1- Pizza (Response is either 1 or 0)

Q4a2-Steak (1/0)

Q4a4- Pasta (1/0)

Q4a5- Other (1/0)

.....etc.

 

I want to be able to flag respondents where they answered yes to Question 4 but they made no selection for Q4a1 through Q4a9.

 

I created an array but i'm running into difficulty where it seems to be flagging respondents (variable Data1-Data12) that didn't make a selection for the last response (Q4a9) instead of flagging 'Data2' and 'Data4' where no entries were made in Q4a1 - Q4a9.

 

The frequency table should be:

 

Q4aFreq
09
12

 

Where 2 respondents that made no selections in Q4a1-Q4a9 are flagged.

 

But instead I'm getting this:

Q4aFreq
04
17

 

Where it seems to be flagging non-responses for Q4a9.

 

 

 

Have attached code below:

 

 

 

data DataInput;

input Data $ Question1:32. Question2:32. Question3:32. Question4:32. Q4a1:32. Q4a2:32. Q4a3:32. Q4a4:32. Q4a5:32. Q4a6:32. Q4a7:32. Q4a8:32. Q4a9:32.;

datalines;

 

DATA1 0 1 1 0 1 0 0 0 0 0 0 0 1 1

DATA2 1 0 0 1 0 0 0 0 0 0 0 0 0 0

DATA3 1 1 0 1 1 1 1 0 1 0 1 1 0 0

DATA4 1 0 0 1 0 0 0 0 0 0 0 0 0 0

DATA5 1 1 0 1 0 1 0 1 1 1 1 0 1 1

DATA6 1 1 0 1 0 1 0 1 1 1 1 0 1 1

DATA7 1 1 0 1 0 1 0 1 1 1 1 0 1 1

DATA8 1 1 0 1 0 1 0 1 1 1 1 0 1 1

DATA9 1 1 0 1 0 0 0 0 0 0 0 1 0 0

DATA10 1 1 0 1 0 0 0 0 0 0 0 1 0 0

DATA11 1 1 0 1 0 1 0 1 1 1 1 1 0 0

DATA12 1 1 0 1 0 1 0 1 1 1 1 1 0 0

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;

 

 

data datainput_array;

set datainput;

array Q4a_ARRAY (9) Q4a1 - Q4a9;

do i= 1 to 9;

if Question1=1 then do;

if Question4=1 and

(Q4a_ARRAY(i) = 0) then Q4a=1; else Q4a=0;

end;

end;

run;

 

proc freq data=datainput_array;

TABLE Q4a;

RUN;

 

proc sort data=datainput_array out=Q4a_0;

Where Q4a=0;

by Data;

run;

 

proc sort data=datainput_array out=Q4a_1;

Where Q4a=1;

by Data;

run;

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
ed_sas_member
Meteorite | Level 14

Hi @AO88 

Welcome to the community!

Could this code answer your question?

data datainput_array;
	set datainput;
	array Q4a_ARRAY (*) Q4a:;
	
	/*Q4a_flag = 0 if all values are set to 0*/
	/*Q4a_flag = 1 if at least one value is set to 1*/
	if sum(of Q4a_ARRAY(*)) = 0 then Q4a_flag = 0;
	else if sum(of Q4a_ARRAY(*)) > 0 then Q4a_flag = 1;
run;

proc freq data=datainput_array;
	TABLE Question4*Q4a_flag / nocum nopercent norow nocol out=Q4_stat (drop=percent) sparse;
run;

Best,

View solution in original post

9 REPLIES 9
ed_sas_member
Meteorite | Level 14

Hi @AO88 

Welcome to the community!

Could this code answer your question?

data datainput_array;
	set datainput;
	array Q4a_ARRAY (*) Q4a:;
	
	/*Q4a_flag = 0 if all values are set to 0*/
	/*Q4a_flag = 1 if at least one value is set to 1*/
	if sum(of Q4a_ARRAY(*)) = 0 then Q4a_flag = 0;
	else if sum(of Q4a_ARRAY(*)) > 0 then Q4a_flag = 1;
run;

proc freq data=datainput_array;
	TABLE Question4*Q4a_flag / nocum nopercent norow nocol out=Q4_stat (drop=percent) sparse;
run;

Best,

AO88
Fluorite | Level 6
Thank you @ed_sas_member ! Yes this worked. Got an error when I did proc freq, but then I changed it from Q4a_all to Q4a_flag.

Can I still be able to include a "then do" statement to this?
ed_sas_member
Meteorite | Level 14

Awesome!

For sure, you can.

Please does not hesitate to share you code if you experience trouble to achieve this.

Best,

ed_sas_member
Meteorite | Level 14

The issue with your code comes from the statement:

if Question4=1 and
(Q4a_ARRAY(i) = 0) then Q4a=1; else Q4a=0;

Indeed, the value of the next question will override the value of the previous one.

Ex. with data1:

  Question1 Question2 Question3 Question4 Q4a1 Q4a2 Q4a3 Q4a4 Q4a5 Q4a6 Q4a7 Q4a8 Q4a9
  DATA1 0 1 1 0 1 0 0 0 0 0 0 0 1

 


i=1 -> Question4=1 and(Q4a_ARRAY(1) = 0 => Q4a is set to 0;

i=2 -> Question4=1 and(Q4a_ARRAY(2) = 1 => Q4a is now set to 1;

i=3 -> Question4=1 and(Q4a_ARRAY(3) = 0 => Q4a is now set to 0;

...

-> the final value of Q4a corresponds to the last check for  Q4a_ARRAY(9)

AO88
Fluorite | Level 6
Okay I see now, so that's why when doing a freq it only showed responses for Q4a9. Once again thank you!!! This really helped!
AO88
Fluorite | Level 6

Hi @ed_sas_member (and anyone else as well)

I've got another question.

 

What if the outcome wasn't a binomial response and instead was free text. So it's the same set-up except the respondent has to type something in. How can I check to see that at least one response was typed out in the Q4a1-Q4a9? (Doesn't matter the content but just being able to note that something was or wasn't entered).

 

How would I handle the array if the variables are character and not numeric?

 

Before I would've done it like this (using the same array):

 

To flag respondents where all responses to Q4a1-Q4a9  are missing.

 

if Question4=1 and

(Q4a_ARRAY(i)= '') then Q4a=1; else Q4a=0;

end;

 

 

ed_sas_member
Meteorite | Level 14

Hi @AO88 

 

In this case, you could do this:

data datainput_array;
	set datainput;
	array Q4a_ARRAY (*) $ Q4a:;
	/* Q4a_flag = 0 if at least one Q4ax is not null (including trailing blanks if any)*/
	/* Q4a_flag = 1 if at all Q4ax are null (including trailing blanks if any)*/
	do i=1 to dim(Q4a_ARRAY);
		if strip(Q4a_ARRAY(i)) ne "" then do;
				Q4a_flag = 0;
				leave;
			end;
		else Q4a_flag = 1;
	end;
run;
proc print;

proc freq data=datainput_array;
	TABLE Question4*Q4a_flag / nocum nopercent norow nocol out=Q4_stat (drop=percent) sparse;
run;

Best,

 

AO88
Fluorite | Level 6
Another question as well. What would also happen if Variable Q4a9 was a character variable. In that case would array be set up like:

array Q4a_ARRAY (8) Q4a1-Q4a8;
do i= 1 to 8;
/*Q4a_flag = 0 if all values are set to 0*/
/*Q4a_flag = 1 if at least one value is set to 1*/
if sum(of Q4a_ARRAY(i)) = 0 then Q4a_flag = 1;
else if sum(of Q4a_ARRAY(*)) > 0 then Q4a_flag = 0;
run;


ed_sas_member
Meteorite | Level 14

Hi @AO88 

 

I see two approaches:

- the one you suggest : put Q4a1-Q4a8 in the array to create a temporary flag. Then, compare this flag to Q4a9 -> e.g. if flag=0 and Q4a9=" " then Q4a_flag=0

- another approach could be to create a numeric dummy variable which is equal to 1 when Q4a9 is not null and à otherwise. -> and then put Q4a1-Q4a8 and this dummy variable in the array.

 

Hope this helps!

 

My best,

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 9 replies
  • 505 views
  • 3 likes
  • 2 in conversation