I want to create a variable that can count how many "Yes" for each person/record. Is there a better way to do it? My data set actually have more than 3000 records......
Thank you so much!
Name | Q1 | Q2 | Q3 | Q4 | Q5 | Yes_count | |
1 | Danny | Yes | Yes | No | No | No | |
2 | Mary | Yes | No | Yes | Yes | Yes | |
3 | John | Yes | Yes | Yes | Yes | No | |
4 | Lisa | No | Yes | Yes | Yes | Yes | |
5 | Mike | No | Yes | No | No | Yes | |
6 | Philip | Yes | Yes | Yes | Yes | No | |
7 | Sui | Yes | Yes | No | No | Yes | |
8 | Zack | No | Yes | No | Yes | No | |
9 | Pam | Yes | Yes | No | No | Yes | |
10 | William | No | No | Yes | No | Yes |
Okay, it seems that the Yes/No variables might be too sparse to select them conveniently using name range lists. But perhaps they can be characterized as the "Char 3" variables in the dataset? If so, the following approach should work:
proc sql;
select name into :ynvars separated by ' '
from dictionary.columns
where libname='WORK' & memname='HAVE' & type='char' & length=3;
quit;
data want;
set have;
array qs &ynvars;
yes_count=count(catx('01'x, of qs[*]), 'Yes');
run;
The PROC SQL step selects the names of the "Char 3" variables from metadata and stores them as a blank-separated list in macro variable YNVARS, which is then used in the array definition of the subsequent data step. (Of course, you'll have to replace "WORK" and "HAVE" by your libname and memname.) Obviously, other character variables with length 3, but different contents (e.g. STATE with values 'TX', 'NC' etc.) wouldn't be a problem for the 'Yes' count. I've made the count a bit more robust by using CATX with a "rare" delimiter (in place of CATS, which could "create" a 'Yes' by concatenating something like 'NY' and 'est').
EDIT: To exclude unwanted "Char 3" variables whose names don't start with "Q", you can extend the WHERE condition:
where libname='WORK' & memname='HAVE' & type='char' & length=3 & name eqt 'Q';
data have;
input obs (Name Q1 Q2 Q3 Q4 Q5) ($) ;
cards;
1 Danny Yes Yes No No No
2 Mary Yes No Yes Yes Yes
3 John Yes Yes Yes Yes No
4 Lisa No Yes Yes Yes Yes
5 Mike No Yes No No Yes
6 Philip Yes Yes Yes Yes No
7 Sui Yes Yes No No Yes
8 Zack No Yes No Yes No
9 Pam Yes Yes No No Yes
10 William No No Yes No Yes
;
data want;
set have;
yes_count=count(cats(of q:),'Yes');
run;
@Joshua017 wrote:
I want to create a variable that can count how many "Yes" for each person/record. Is there a better way to do it? My data set actually have more than 3000 records......
Thank you so much!
Name Q1 Q2 Q3 Q4 Q5 Yes_count 1 Danny Yes Yes No No No 2 Mary Yes No Yes Yes Yes 3 John Yes Yes Yes Yes No 4 Lisa No Yes Yes Yes Yes 5 Mike No Yes No No Yes 6 Philip Yes Yes Yes Yes No 7 Sui Yes Yes No No Yes 8 Zack No Yes No Yes No 9 Pam Yes Yes No No Yes 10 William No No Yes No Yes
One thing is that coding with character values for Yes/No is ugly at best and cumbersome and confusing at worst.
data want; set have; array qs q: ; /* assumes all the variables you want that are coded Yes/no are named starting with Q and that the variables that start with Q are coded yes/no if not list them out*/ do i= 1 dim(qs); yes_count= sum(yes_count, (qs[i]='Yes')); end; run;
The (qs[I]='Yes') takes advantage of the SAS behavior that a true comparison will return a numeric 1 and false a 0.
If the variables had been coded 1/0 for yes/no to begin with:
yes_count= sum(of q1-q5); (or however many are involved)
Thank you so much for your reply! Actually I have about 150 variables and about 50 of them have Yes/No and rest of them with Name, date, and demographic information. Is it anyway that I don't have to list all 50 variables out? Thanks again!!
@Joshua017 You can use variable lists like i showed in my code.
or group the 50 in an array q:
array qs q1-q50 ;
yes_count=count(cats(of qs(*)),'Yes');
Are you able to follow both of these solutions?
@Joshua017 wrote:
I have about 150 variables and about 50 of them have Yes/No and rest of them with Name, date, and demographic information. Is it anyway that I don't have to list all 50 variables out? Thanks again!!
Chances are very good that you can abbreviate the list by using one or another type (or even combining different types) of SAS variable lists, as novinosrin mentioned already. If your Yes/No variables do not have a common name prefix, name range lists can be particularly useful. Just apply PROC CONTENTS with option VARNUM to your dataset and see if the (I assume: character) variables in question form one or a few blocks of consecutive rows in the output (possibly interrupted by numeric variables).
Example (focusing on numeric variables, though):
proc contents data=sashelp.heart varnum;
run;
Output (highlighting added):
Result: The eight numeric variables highlighted above can be referred to as
AgeCHDdiag-numeric-Weight MRW--Cholesterol
where appropriate.
Similarly, Chol_Status--Weight_Status (3 variables) or DeathCause-character-BP_Status (4 variables) are valid lists of character variables -- at least as long as the structure (variable order) of the dataset is not changed.
That's part of the dataset. For variables start with Q, some of them are Yes/No. I want to count how many Yes for each record. How could I put them in a array or I can just count all variables with "Yes"?
Okay, it seems that the Yes/No variables might be too sparse to select them conveniently using name range lists. But perhaps they can be characterized as the "Char 3" variables in the dataset? If so, the following approach should work:
proc sql;
select name into :ynvars separated by ' '
from dictionary.columns
where libname='WORK' & memname='HAVE' & type='char' & length=3;
quit;
data want;
set have;
array qs &ynvars;
yes_count=count(catx('01'x, of qs[*]), 'Yes');
run;
The PROC SQL step selects the names of the "Char 3" variables from metadata and stores them as a blank-separated list in macro variable YNVARS, which is then used in the array definition of the subsequent data step. (Of course, you'll have to replace "WORK" and "HAVE" by your libname and memname.) Obviously, other character variables with length 3, but different contents (e.g. STATE with values 'TX', 'NC' etc.) wouldn't be a problem for the 'Yes' count. I've made the count a bit more robust by using CATX with a "rare" delimiter (in place of CATS, which could "create" a 'Yes' by concatenating something like 'NY' and 'est').
EDIT: To exclude unwanted "Char 3" variables whose names don't start with "Q", you can extend the WHERE condition:
where libname='WORK' & memname='HAVE' & type='char' & length=3 & name eqt 'Q';
Registration is open! SAS is returning to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team. Register for just $495 by 12/31/2023.
If you are interested in speaking, there is still time to submit a session idea. More details are posted on the website.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.