Help using Base SAS procedures

Count distinct values by subjects

Accepted Solution Solved
Reply
New Contributor
Posts: 3
Accepted Solution

Count distinct values by subjects

I want to create a variable that can count how many "Yes" for each person/record. Is there a better way to do it? My data set actually have more than 3000 records...... 

Thank you so much!

 

 NameQ1Q2Q3Q4Q5Yes_count
1DannyYesYesNoNoNo 
2MaryYesNoYesYesYes 
3JohnYesYesYesYesNo 
4LisaNoYesYesYesYes 
5MikeNoYesNoNoYes 
6PhilipYesYesYesYesNo 
7SuiYesYesNoNoYes 
8ZackNoYesNoYesNo 
9PamYesYesNoNoYes 
10WilliamNoNoYesNoYes 

Accepted Solutions
Solution
3 weeks ago
Trusted Advisor
Posts: 1,256

Re: Count distinct values by subjects

[ Edited ]
Posted in reply to Joshua017

Okay, it seems that the Yes/No variables might be too sparse to select them conveniently using name range lists. But perhaps they can be characterized as the "Char 3" variables in the dataset? If so, the following approach should work:

proc sql;
select name into :ynvars separated by ' '
from dictionary.columns
where libname='WORK' & memname='HAVE' & type='char' & length=3;
quit;

data want;
set have;
array qs &ynvars;
yes_count=count(catx('01'x, of qs[*]), 'Yes');
run;

The PROC SQL step selects the names of the "Char 3" variables from metadata and stores them as a blank-separated list in macro variable YNVARS, which is then used in the array definition of the subsequent data step. (Of course, you'll have to replace "WORK" and "HAVE" by your libname and memname.) Obviously, other character variables with length 3, but different contents (e.g. STATE with values 'TX', 'NC' etc.) wouldn't be a problem for the 'Yes' count. I've made the count a bit more robust by using CATX with a "rare" delimiter (in place of CATS, which could "create" a 'Yes' by concatenating something like 'NY' and 'est').

 

EDIT: To exclude unwanted "Char 3" variables whose names don't start with "Q", you can extend the WHERE condition:

where libname='WORK' & memname='HAVE' & type='char' & length=3 & name eqt 'Q';

View solution in original post


All Replies
PROC Star
Posts: 1,833

Re: Count distinct values by subjects

Posted in reply to Joshua017
data have;
input obs (Name 	Q1	Q2	Q3	Q4	Q5) ($)	;
cards;
1	Danny	Yes	Yes	No	No	No	 
2	Mary	Yes	No	Yes	Yes	Yes	 
3	John	Yes	Yes	Yes	Yes	No	 
4	Lisa	No	Yes	Yes	Yes	Yes	 
5	Mike	No	Yes	No	No	Yes	 
6	Philip	Yes	Yes	Yes	Yes	No	 
7	Sui	Yes	Yes	No	No	Yes	 
8	Zack	No	Yes	No	Yes	No	 
9	Pam	Yes	Yes	No	No	Yes	 
10	William	No	No	Yes	No	Yes	 
;
data want;
set have;
yes_count=count(cats(of q:),'Yes');
run;
Super User
Posts: 13,583

Re: Count distinct values by subjects

Posted in reply to Joshua017

@Joshua017 wrote:

I want to create a variable that can count how many "Yes" for each person/record. Is there a better way to do it? My data set actually have more than 3000 records...... 

Thank you so much!

 

  Name Q1 Q2 Q3 Q4 Q5 Yes_count
1 Danny Yes Yes No No No  
2 Mary Yes No Yes Yes Yes  
3 John Yes Yes Yes Yes No  
4 Lisa No Yes Yes Yes Yes  
5 Mike No Yes No No Yes  
6 Philip Yes Yes Yes Yes No  
7 Sui Yes Yes No No Yes  
8 Zack No Yes No Yes No  
9 Pam Yes Yes No No Yes  
10 William No No Yes No Yes  

One thing is that coding with character values for Yes/No is ugly at best and cumbersome and confusing at worst.

data want;
   set have;
   array qs q: ; /* assumes all the variables you want that are coded Yes/no
                 are named starting with Q and that the variables that start with 
                 Q are coded yes/no if not list them out*/
   do i= 1 dim(qs);
      yes_count= sum(yes_count, (qs[i]='Yes'));
   end;
run;

The (qs[I]='Yes') takes advantage of the SAS behavior that a true comparison will return a numeric 1 and false a 0.

 

If the variables had been coded 1/0 for yes/no to begin with:

yes_count= sum(of q1-q5); (or however many are involved)

New Contributor
Posts: 3

Re: Count distinct values by subjects

Thank you so much for your reply! Actually I have about 150 variables and about 50 of them have Yes/No and rest of them with Name, date, and demographic information. Is it anyway that I don't have to list all 50 variables out? Thanks again!!

PROC Star
Posts: 1,833

Re: Count distinct values by subjects

Posted in reply to Joshua017

@Joshua017  You can use variable lists like i showed in my code.

 

or group the 50 in an array q:

 

array qs q1-q50 ; 
yes_count=count(cats(of qs(*)),'Yes');

Are you able to follow both of these solutions? 

Trusted Advisor
Posts: 1,256

Re: Count distinct values by subjects

Posted in reply to Joshua017

@Joshua017 wrote:

I have about 150 variables and about 50 of them have Yes/No and rest of them with Name, date, and demographic information. Is it anyway that I don't have to list all 50 variables out? Thanks again!!


Chances are very good that you can abbreviate the list by using one or another type (or even combining different types) of SAS variable lists, as novinosrin mentioned already. If your Yes/No variables do not have a common name prefix, name range lists can be particularly useful. Just apply PROC CONTENTS with option VARNUM to your dataset and see if the (I assume: character) variables in question form one or a few blocks of consecutive rows in the output (possibly interrupted by numeric variables).

 

Example (focusing on numeric variables, though):

proc contents data=sashelp.heart varnum;
run;

Output (highlighting added):


proc_contents_varnum.png

Result: The eight numeric variables highlighted above can be referred to as

AgeCHDdiag-numeric-Weight MRW--Cholesterol

where appropriate.

 

Similarly, Chol_Status--Weight_Status (3 variables) or DeathCause-character-BP_Status (4 variables) are valid lists of character variables -- at least as long as the structure (variable order) of the dataset is not changed.

 

 

New Contributor
Posts: 3

Re: Count distinct values by subjects

Posted in reply to FreelanceReinhard

sas.jpg

That's part of the dataset. For variables start with Q, some of them are Yes/No. I want to count how many Yes for each record. How could I put them in a array or I can just count all variables with "Yes"?

Solution
3 weeks ago
Trusted Advisor
Posts: 1,256

Re: Count distinct values by subjects

[ Edited ]
Posted in reply to Joshua017

Okay, it seems that the Yes/No variables might be too sparse to select them conveniently using name range lists. But perhaps they can be characterized as the "Char 3" variables in the dataset? If so, the following approach should work:

proc sql;
select name into :ynvars separated by ' '
from dictionary.columns
where libname='WORK' & memname='HAVE' & type='char' & length=3;
quit;

data want;
set have;
array qs &ynvars;
yes_count=count(catx('01'x, of qs[*]), 'Yes');
run;

The PROC SQL step selects the names of the "Char 3" variables from metadata and stores them as a blank-separated list in macro variable YNVARS, which is then used in the array definition of the subsequent data step. (Of course, you'll have to replace "WORK" and "HAVE" by your libname and memname.) Obviously, other character variables with length 3, but different contents (e.g. STATE with values 'TX', 'NC' etc.) wouldn't be a problem for the 'Yes' count. I've made the count a bit more robust by using CATX with a "rare" delimiter (in place of CATS, which could "create" a 'Yes' by concatenating something like 'NY' and 'est').

 

EDIT: To exclude unwanted "Char 3" variables whose names don't start with "Q", you can extend the WHERE condition:

where libname='WORK' & memname='HAVE' & type='char' & length=3 & name eqt 'Q';
☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 7 replies
  • 256 views
  • 4 likes
  • 4 in conversation