BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
SAS_Novice22
Quartz | Level 8

Hi SAS Community,

 

I have a dataset that allows individuals to select yes to multiple response variables for each event. I would like to evaluate how many subjects/events had one or more of these responses selected 'yes'.

SAS_Novice22_0-1654695998673.png

Each row represents an event. Subjects can have multiple events and therefore have multiple rows per subject

TITLE "Count of DISEASE attacks by TYPE of trigger - HOW TO EVALUATE MULTIPLE TRIGGERS"; 
PROC FREQ DATA=WORK.DISEASE_ERT; 
	TABLE DISEASE5A_CHEMICALS DISEASE5A_ESTROGEN DISEASE5A_DRUGS DISEASE5A_STRESS DISEASE5A_COLD DISEASE5A_FOOD DISEASE5A_INFECTION DISEASE5A_PHYSICAL DISEASE5A_PROLONGED DISEASE5A_COMPRESSION DISEASE5A_OTHER DISEASE5A_OTHERTEXT; 
RUN;

So my question is how would I get started? By generating a new variable? Combining response categories? a series of if then statements?

 

Thank you again for your support.

T.

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

The array definition is NOT needed just to use the OF keyword in a function like WHICH() that accepts a flexible number of arguments.  Just list the variables after the OF keyword.  It might same some typing if you re-use the same list multiple times in the same dataset.

 

If you want check for multiple YES values you might try a combination of one of the CAT...() functions to smush all of the values into one long string and the COUNT() function to count the number of times the string YES appears there. 

 

So to count how many of the DIESEASE5A_ variables have YES then you can do this:

n_yes = count(cats(of DISEASE5A_: ),'YES');

View solution in original post

16 REPLIES 16
PaigeMiller
Diamond | Level 26

UNTESTED CODE

 

data want;
    set have;
    yes=whichc('YES',of disease:)>0;
run;

The variable YES will be 1 if there is at least 1 'YES' in the row, and 0 otherwise.

 

Or

 

data want;
    set have;
    array d DISEASE5A_CHEMICALS DISEASE5A_ESTROGEN DISEASE5A_DRUGS DISEASE5A_STRESS DISEASE5A_COLD
        DISEASE5A_FOOD DISEASE5A_INFECTION DISEASE5A_PHYSICAL DISEASE5A_PROLONGED DISEASE5A_COMPRESSION 
        DISEASE5A_OTHER DISEASE5A_OTHERTEXT;
    yes=whichc('YES',of d[*])>0;
run;
--
Paige Miller
SAS_Novice22
Quartz | Level 8

Hi @PaigeMiller,


Thank you - this is really helpful!


I have tried out the code and both codes work so thank you again. The only thing I am wondering is if these codes can still be used if I only want to know if 'YES' was selected for 2 or more responses? Looking at 1 is not overly helpful since I have a variable that is an overall variable (yes/no) indicating that they have responded to at least one of the responses. I am interested in 2 or more responses (multiple responses). 

 

So this code gets really close ... but its still not exactly what I want, I need to modify the code slightly so that the YES=1 if two or more of the interested variables had a 'YES' response. Do you know what that additional code would look like?  


Here is my SAS code:

data DISEASE_ERT;
    set DISEASE_ERT;
    array d DISEASE5A_CHEMICALS DISEASE5A_ESTROGEN DISEASE5A_DRUGS 
	DISEASE5A_STRESS DISEASE5A_COLD DISEASE5A_FOOD DISEASE5A_INFECTION 
	DISEASE5A_PHYSICAL DISEASE5A_PROLONGED DISEASE5A_COMPRESSION DISEASE5A_OTHER;
    Trigger=whichc('YES',of d[*])>0;
run;

and...

data DISEASE_ERT;
    set DISEASE_ERT;
    yes=whichc('YES',of DISEASE5A_CHEMICALS DISEASE5A_ESTROGEN DISEASE5A_DRUGS 
	DISEASE5A_STRESS DISEASE5A_COLD DISEASE5A_FOOD DISEASE5A_INFECTION 
	DISEASE5A_PHYSICAL DISEASE5A_PROLONGED DISEASE5A_COMPRESSION DISEASE5A_OTHER:)>0;
run;

Also, is there a reason to use one code "array vs. yes" over another? For the purposes of my own learning.

 

Thanks for your patience while I learn SAS and coding generally.

Thank you,

T

Reeza
Super User
The code will differ significantly if you have it as you stated in your original post: "Subjects can have multiple events and therefore have multiple rows per subject" compared to looking at multiple columns.
SAS_Novice22
Quartz | Level 8
That makes sense, I think for the purposes of what I am looking for number of triggers by event (per row) will be fine. I think I need to use something basic like a sum feature and then evaluate by categories of total 2 or more triggers and 4 or more triggers etc.
PaigeMiller
Diamond | Level 26

To find out if two or more YES appear, you would have to do a loop to actually test all the variables one-by-one to see if there is a YES.

 

 

--
Paige Miller
Reeza
Super User

If in columns and you've only filled out the YES and the No's are blanks then use the dimension of the array as the total count and then subtract the number of blanks using the CMISS function.

 

NumDiags = dim(d) - cmiss(d);

Or concatenate all the columns and use COUNT() to count the number of occurrences of YES.

 

NumDiags = count(catx("|", of d(*)), "YES", 'it');
PaigeMiller
Diamond | Level 26

Brilliant answer, @Reeza !

--
Paige Miller
SAS_Novice22
Quartz | Level 8
Thanks Reeza, I am going to try this out. For my learning can you please tell me What these items stand for (e.g., d(*), |, dim(d), cmiss(d), catx, "it"?
If I were to use this would my code look something like:

NumofTriggers = count(caretx("|", of d(DISEASE5A_CHEMICALS DISEASE5A_ESTROGEN DISEASE5A_DRUGS
DISEASE5A_STRESS DISEASE5A_COLD DISEASE5A_FOOD DISEASE5A_INFECTION
DISEASE5A_PHYSICAL DISEASE5A_PROLONGED DISEASE5A_COMPRESSION DISEASE5A_OTHER)), "YES", 'it');
Tom
Super User Tom
Super User

The array definition is NOT needed just to use the OF keyword in a function like WHICH() that accepts a flexible number of arguments.  Just list the variables after the OF keyword.  It might same some typing if you re-use the same list multiple times in the same dataset.

 

If you want check for multiple YES values you might try a combination of one of the CAT...() functions to smush all of the values into one long string and the COUNT() function to count the number of times the string YES appears there. 

 

So to count how many of the DIESEASE5A_ variables have YES then you can do this:

n_yes = count(cats(of DISEASE5A_: ),'YES');
SAS_Novice22
Quartz | Level 8
Thanks @Tom, I am going to give this a try it may be exactly what I need, just simple counts. Thanks!
Reeza
Super User

You say multiple rows? How?

Show the input and expected output. 

 

Images are not helpful to write programs.

SAS_Novice22
Quartz | Level 8
Thanks Reeza, I will try to show my input and expected output going forward, I need to learn how to share data on this forum.
ballardw
Super User

I am going to make suggestion that you do not use character values of 'Yes' 'Y' 'True' 'T' 'No' 'N' 'False' or 'F' for variables that only take on two values when present. If you read such data as numeric with 1 for 'Yes' 'True' or similar and 0 for 'No' 'False' or similar you can accomplish many things easily.

 

This does require some manipulation of the data, either when read (easy when reading text files) or adding additional variables with the numeric values (routine and tedious but simple).

 

First, please do not provide data as pictures.l We can't code against pictures. At least provide text.

 

This is an example of using custom informats and formats to read and display values and create some variables from the read data that are similar to typical things done with a number of YES/NO type variables.

Proc format;
/* values to read are yes, y, no or n, 
   if other than those it should be missing
*/
invalue Yesnoonly (upcase)
'YES','Y'=1
'NO','N' =0
other =.
;
/* values are yes or Y and if not present
   actually mean no (or not selected)
   such as check box data on forms
*/
invalue YesOnly (upcase)
'YES','Y'=1
other=0
;
/* values are like yes, y, no, n
   and have a special value for not entered, 
   not applicable or similar
   which would depend on your data documentation
   this uses x
*/
invalue ynx (upcase)
'YES','Y'=1
'NO','N' =0
'X'      =.X
other= .
;
value yesnoonly
1='Yes'
0='No'
.='Missing'
;
value yesonly
1='Yes'
0=' '
;
value ynx
1='Yes'
0='No'
.x='N/A'
.='Missing'
;

run;

/* example reading data*/
data example;
   input x1 :yesnoonly.  x2 :Yesonly. x3 :ynx.;
datalines;
y  yes yES
n  .   no
Yes abc x
n  n  n 
yes YES YES
;

/* example using the numeric x1,x2, x3 numerics*/
data want;
  set example;
  NumberYes = sum(of x1-x3);
  NumberResponses = n(of x1-x3);
  MeanYes   = mean(of x1-x3);
  Anyyes  = (max(of x1-x3) = 1);
  Anymissing = (nmiss(of x1-x3)=1);
run;

Proc means data=want mean min max sum n;
   var Numberyes NumberResponses MeanYes Anyyes Anymissing;
run;

/* and display similar to original values*/
proc print data=example noobs;
   var x1 x2 x3;
   format  x1 yesnoonly.  x2 Yesonly. x3 ynx.;
run;

Proc format is used to create custom informat and formats. INVALUE is used to read values. The value lists have the text to be read on the left of the = and the result on the right. The option UPCASE means to make the value uppercase before comparing to the list values. This can be quite helpful with manually entered data or any source that is not consistent with the case of characters. The different invalue sets have a brief description of the kind of data to read and how it is treated. There are also custom formats, created with Value statement that will show the numeric values as text similar to what was read.

The first data step uses the different informats to read values. Note that for inline data in the datalines that period (.) would be read as missing value.

Then we create in the second data step per line summary values (this is the core of your question). SUMming is much simpler that a slew of If / then /else with a number of variables. It demonstrates one of the uses of varaible lists with the "of x1 - x3" syntax. If you have variables with a common name base, such as X or Disease, followed by sequential integers you can reference them in many places with the X1-X(n) where n is the last of the list you want. In some functions you precede the list with "of" to make sure the function understand you want the list and not the subtraction that "x1-x3" could also mean. Notice that if you have 50 variables named X1 to X50 the only change needed to get the sum or other summary is to change the 3 to 50. Cannot do anything similar with character variables easily.

 

The two ANY variables created demonstrate that SAS uses 1/0 as True/False (or yes/no) for results of logical comparisons. The Mean of a 1/0 coded variable is the decimal percentage of 1 values, 0.5 = 50%, and will exclude the missing values in the denominator.

 

The we use Proc Means as one want to get the summary for those added variables.

The mean of NumberYes of 1.4 is across all this example data that the average number of Yes response is 1.4, the Min of 0 means that at least one record had no yes and the max of 3 indicates that at least one record had 3 yes, sum of 7 means that across all the variables and responses there were 7 total yes entered.

The mean for NumberResponses of 2.8 means that the average number of non-missing values was 2.8. The sum of 14 is a number of non-missing responses (out of 15 the n*number of variables considered)

 

The mean for Anyyes of 0.6 means that 60% of the records had at least one Yes recorded.

The mean for Anymissing of 0.2 means that 20% of the records had at least on missing value.

 

The proc print demonstrates using the formats to display original values but a bit more consistent than the original spelling.

 

Since you did not provide text of the variables and I am too lazy to type that many names that long I am not going to show exact code to recode your existing data. Some like:

 

data need;
   set yourdatasetname;
   array d (*) <list your disease variable names of character values here>;
   array n (*) <matching list of numerical variables here>;
/*or Array n(15) ; the number is the number of variables and creates numeric variables n1 - n15*/ do i=1 to dim(d); n[i] = input(d[i], yesonly.); end; NumberYes = sum(of n(*)); run;

Arrays are shorthand to reference similar variables. The summary functions will treat "of arrayname(*)" as a list of all variables in the array.

 

You can use other lists.

ballardw
Super User

An addendum to my long post of creating numeric values:

 

If you use Proc Means with your Subject Id variable or other grouping variable(s) as CLASS variables you get these summaries per subject. So can get overall sums of "Yes" and similar. Plus if you need a per record item like "2 or more" yes in a single observation then

 

NumYes2ormore = (sum(of d(*)) ge 2);

creates an indicator for that. Similar breakpoints can be created fairly easily once the concept is digested. Then a procedure like Means would give you a number (sum) or percent(mean) of events, or what ever your record means, that had "2 or more yes".

Or exactly 2.

Or a sum of a subset if you only wanted to consider 3 or 4 of the variables for some reason, such as trauma or contagious diseases or chronic condition variables.

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 16 replies
  • 1869 views
  • 12 likes
  • 5 in conversation