BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Joshua017
Fluorite | Level 6

I want to create a variable that can count how many "Yes" for each person/record. Is there a better way to do it? My data set actually have more than 3000 records...... 

Thank you so much!

 

 NameQ1Q2Q3Q4Q5Yes_count
1DannyYesYesNoNoNo 
2MaryYesNoYesYesYes 
3JohnYesYesYesYesNo 
4LisaNoYesYesYesYes 
5MikeNoYesNoNoYes 
6PhilipYesYesYesYesNo 
7SuiYesYesNoNoYes 
8ZackNoYesNoYesNo 
9PamYesYesNoNoYes 
10WilliamNoNoYesNoYes 
1 ACCEPTED SOLUTION

Accepted Solutions
FreelanceReinh
Jade | Level 19

Okay, it seems that the Yes/No variables might be too sparse to select them conveniently using name range lists. But perhaps they can be characterized as the "Char 3" variables in the dataset? If so, the following approach should work:

proc sql;
select name into :ynvars separated by ' '
from dictionary.columns
where libname='WORK' & memname='HAVE' & type='char' & length=3;
quit;

data want;
set have;
array qs &ynvars;
yes_count=count(catx('01'x, of qs[*]), 'Yes');
run;

The PROC SQL step selects the names of the "Char 3" variables from metadata and stores them as a blank-separated list in macro variable YNVARS, which is then used in the array definition of the subsequent data step. (Of course, you'll have to replace "WORK" and "HAVE" by your libname and memname.) Obviously, other character variables with length 3, but different contents (e.g. STATE with values 'TX', 'NC' etc.) wouldn't be a problem for the 'Yes' count. I've made the count a bit more robust by using CATX with a "rare" delimiter (in place of CATS, which could "create" a 'Yes' by concatenating something like 'NY' and 'est').

 

EDIT: To exclude unwanted "Char 3" variables whose names don't start with "Q", you can extend the WHERE condition:

where libname='WORK' & memname='HAVE' & type='char' & length=3 & name eqt 'Q';

View solution in original post

7 REPLIES 7
novinosrin
Tourmaline | Level 20
data have;
input obs (Name 	Q1	Q2	Q3	Q4	Q5) ($)	;
cards;
1	Danny	Yes	Yes	No	No	No	 
2	Mary	Yes	No	Yes	Yes	Yes	 
3	John	Yes	Yes	Yes	Yes	No	 
4	Lisa	No	Yes	Yes	Yes	Yes	 
5	Mike	No	Yes	No	No	Yes	 
6	Philip	Yes	Yes	Yes	Yes	No	 
7	Sui	Yes	Yes	No	No	Yes	 
8	Zack	No	Yes	No	Yes	No	 
9	Pam	Yes	Yes	No	No	Yes	 
10	William	No	No	Yes	No	Yes	 
;
data want;
set have;
yes_count=count(cats(of q:),'Yes');
run;
ballardw
Super User

@Joshua017 wrote:

I want to create a variable that can count how many "Yes" for each person/record. Is there a better way to do it? My data set actually have more than 3000 records...... 

Thank you so much!

 

  Name Q1 Q2 Q3 Q4 Q5 Yes_count
1 Danny Yes Yes No No No  
2 Mary Yes No Yes Yes Yes  
3 John Yes Yes Yes Yes No  
4 Lisa No Yes Yes Yes Yes  
5 Mike No Yes No No Yes  
6 Philip Yes Yes Yes Yes No  
7 Sui Yes Yes No No Yes  
8 Zack No Yes No Yes No  
9 Pam Yes Yes No No Yes  
10 William No No Yes No Yes  

One thing is that coding with character values for Yes/No is ugly at best and cumbersome and confusing at worst.

data want;
   set have;
   array qs q: ; /* assumes all the variables you want that are coded Yes/no
                 are named starting with Q and that the variables that start with 
                 Q are coded yes/no if not list them out*/
   do i= 1 dim(qs);
      yes_count= sum(yes_count, (qs[i]='Yes'));
   end;
run;

The (qs[I]='Yes') takes advantage of the SAS behavior that a true comparison will return a numeric 1 and false a 0.

 

If the variables had been coded 1/0 for yes/no to begin with:

yes_count= sum(of q1-q5); (or however many are involved)

Joshua017
Fluorite | Level 6

Thank you so much for your reply! Actually I have about 150 variables and about 50 of them have Yes/No and rest of them with Name, date, and demographic information. Is it anyway that I don't have to list all 50 variables out? Thanks again!!

novinosrin
Tourmaline | Level 20

@Joshua017  You can use variable lists like i showed in my code.

 

or group the 50 in an array q:

 

array qs q1-q50 ; 
yes_count=count(cats(of qs(*)),'Yes');

Are you able to follow both of these solutions? 

FreelanceReinh
Jade | Level 19

@Joshua017 wrote:

I have about 150 variables and about 50 of them have Yes/No and rest of them with Name, date, and demographic information. Is it anyway that I don't have to list all 50 variables out? Thanks again!!


Chances are very good that you can abbreviate the list by using one or another type (or even combining different types) of SAS variable lists, as novinosrin mentioned already. If your Yes/No variables do not have a common name prefix, name range lists can be particularly useful. Just apply PROC CONTENTS with option VARNUM to your dataset and see if the (I assume: character) variables in question form one or a few blocks of consecutive rows in the output (possibly interrupted by numeric variables).

 

Example (focusing on numeric variables, though):

proc contents data=sashelp.heart varnum;
run;

Output (highlighting added):


proc_contents_varnum.png

Result: The eight numeric variables highlighted above can be referred to as

AgeCHDdiag-numeric-Weight MRW--Cholesterol

where appropriate.

 

Similarly, Chol_Status--Weight_Status (3 variables) or DeathCause-character-BP_Status (4 variables) are valid lists of character variables -- at least as long as the structure (variable order) of the dataset is not changed.

 

 

Joshua017
Fluorite | Level 6

sas.jpg

That's part of the dataset. For variables start with Q, some of them are Yes/No. I want to count how many Yes for each record. How could I put them in a array or I can just count all variables with "Yes"?

FreelanceReinh
Jade | Level 19

Okay, it seems that the Yes/No variables might be too sparse to select them conveniently using name range lists. But perhaps they can be characterized as the "Char 3" variables in the dataset? If so, the following approach should work:

proc sql;
select name into :ynvars separated by ' '
from dictionary.columns
where libname='WORK' & memname='HAVE' & type='char' & length=3;
quit;

data want;
set have;
array qs &ynvars;
yes_count=count(catx('01'x, of qs[*]), 'Yes');
run;

The PROC SQL step selects the names of the "Char 3" variables from metadata and stores them as a blank-separated list in macro variable YNVARS, which is then used in the array definition of the subsequent data step. (Of course, you'll have to replace "WORK" and "HAVE" by your libname and memname.) Obviously, other character variables with length 3, but different contents (e.g. STATE with values 'TX', 'NC' etc.) wouldn't be a problem for the 'Yes' count. I've made the count a bit more robust by using CATX with a "rare" delimiter (in place of CATS, which could "create" a 'Yes' by concatenating something like 'NY' and 'est').

 

EDIT: To exclude unwanted "Char 3" variables whose names don't start with "Q", you can extend the WHERE condition:

where libname='WORK' & memname='HAVE' & type='char' & length=3 & name eqt 'Q';

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 1859 views
  • 4 likes
  • 4 in conversation