BookmarkSubscribeRSS Feed
marylin5828
Calcite | Level 5

Hello, I am using a dataset on a large set of individuals. There are 6 questions regarding english proficiency ( each has their own column). I need to separate individuals by the categories speaks some English and no English. My issues is that there are 6 different columns containing yes or no, I need to have Sas combine those who answered yes to 3 or more questions as speaks some english and I need to have those who answered yes to 2 or less of the questions to be categorized as does not speak english. 

5 REPLIES 5
PaigeMiller
Diamond | Level 26

First, I point out that having numeric values of 0 and 1 instead of character values No and Yes as the possible answers makes the coding an awful lot easier.

 

Nevertheless, I point you to a similar problem which was discussed recently here in the forums, which applies to this situation as well:

https://communities.sas.com/t5/SAS-Programming/Sum-by-an-ID-variable-according-to-a-logical-test/m-p...

 

--
Paige Miller
indiaink1
Calcite | Level 5

Try using an array to convert the yes/no columns to numeric, then use the sum function.

 

data have;
	input id var1 $ var2 $ var3 $ var4 $ var5 $ var6 $;
datalines;
1 No No Yes Yes No Yes
2 Yes No Yes No Yes Yes
3 No Yes No Yes No No
4 Yes Yes No No Yes No
5 No No No No Yes No
6 Yes No No Yes No No
7 No No No No No No
8 Yes Yes Yes Yes Yes Yes
9 No Yes No Yes Yes No
10 No Yes No Yes Yes No
run;

data want;
	set have;
	array vars(6) $ var1-var6;
	array vars_n(6) var_n1-var_n6;
	do i=1 to 6;
		if vars(i) = "Yes" then vars_n(i) = 1;
		else if vars(i) = "No" then vars_n(i) = 0;
	end;
	sum = sum(of var_n1-var_n6);
	if sum >= 3 then English = 1;
	else if 0 <= sum < 3 then English = 0;
run;
marylin5828
Calcite | Level 5

Hello,
 I am stuck on the part where I convert it to numeric. They are already numeric but I combined the answers 2-9 as 0.

this is the code I used to filter the numeric variables.

 

data WORK.DATA01;
array vars(6) $ varB2AX--varB3CX;
array vars_n(6) var_nB2AX--var_nB3CX;
do i=1 to 6;
if vars (i)= "1" then vars_n(i) =1;
else if vars (i)= "2-9" then vars_n(i)=0;
end;
 
the error messages I am getting are
ERROR: Variable varB2AX cannot be found on the list of previously defined variables.
ERROR: Too few variables defined for the dimension(s) specified for the array vars.
ERROR: Variable var_nB2AX cannot be found on the list of previously defined variables.
ERROR: Too few variables defined for the dimension(s) specified for the array vars_n.
 
Not sure why it is saying it cannot be found if that is the name of the variable.
 
thanks!
PaigeMiller
Diamond | Level 26

The first problem is that you don't have a SET statement. There may be other problems as well.

 

Also, depending on your data, this may not work as you can't test if a number is between 2 and 9 via this statement:

if vars (i)= "2-9"
--
Paige Miller
Tom
Super User Tom
Super User

Just use the answer from this question.

https://communities.sas.com/t5/SAS-Procedures/Count-a-specific-value-across-columns/td-p/635511

data want;
set have;
  nyes=count(catx('#',of q1 q2 q3 q4 q5 q6 ),'Yes');
  english = (nyes > 2) ;
run;