BookmarkSubscribeRSS Feed
wj2
Quartz | Level 8 wj2
Quartz | Level 8

Hello, 

 

I have a set of 21 different binary (yes vs. no) variables and I would ultimately like to determine the average number of total "Yes" responses among my sample. That is, what is the average number of "Yes" responses out of the total of 21 possible "Yes" responses among the sample? Can someone please suggest an efficient way of coding a variable to do this? Thanks!

6 REPLIES 6
Reeza
Super User

Are you doing this per line or for the whole data set?

In general, you can just take the mean for a whole line.

new_var = mean(of var1-var21);

If it's all variables that need to be considered together it becomes a bit harder. Please clarify, some sample input and output data would help as well.


@wj2 wrote:

Hello, 

 

I have a set of 21 different binary (yes vs. no) variables and I would ultimately like to determine the average number of total "Yes" responses among my sample. That is, what is the average number of "Yes" responses out of the total of 21 possible "Yes" responses among the sample? Can someone please suggest an efficient way of coding a variable to do this? Thanks!


 

wj2
Quartz | Level 8 wj2
Quartz | Level 8

Thank you both for the prompt reply. Basically, I am working with a large survey data set (>4,000 subjects). The 21 variables correspond to 21 different medications used (yes (1) vs. no (0)). Among the total sample, I would like to know the average number of different medications used. So far, I have tried something like this but I'm not sure if this is correct: 

 

new_var= (drug1=1)+(drug2=1)+(drug3=1)+(drug4=1)+(drug5=1)+(drug6=1)+(drug7=1)+(drug8=1)+ (drug9=1)+(drug10=1)+(drug11=1)+(drug12=1)+(drug13=1)+(drug14=1)+(drug15=1)+(drug16=1)+ (drug17=1)+(drug18=1)+(drug19=1)+(drug20+1)+(drug21=1);

 

To find the mean, I have just used the proc means procedure for the new variable:

proc means data=X;

var new_var;

run;

 

However, I am not sure if this is correct? Is there a better or more efficient way of doing this? 

Please let me know if I can clarify further. 

 

Thanks.

ballardw
Super User

@wj2 wrote:

Thank you both for the prompt reply. Basically, I am working with a large survey data set (>4,000 subjects). The 21 variables correspond to 21 different medications used (yes (1) vs. no (0)). Among the total sample, I would like to know the average number of different medications used. So far, I have tried something like this but I'm not sure if this is correct: 

 

new_var= (drug1=1)+(drug2=1)+(drug3=1)+(drug4=1)+(drug5=1)+(drug6=1)+(drug7=1)+(drug8=1)+ (drug9=1)+(drug10=1)+(drug11=1)+(drug12=1)+(drug13=1)+(drug14=1)+(drug15=1)+(drug16=1)+ (drug17=1)+(drug18=1)+(drug19=1)+(drug20+1)+(drug21=1);

 

To find the mean, I have just used the proc means procedure for the new variable:

proc means data=X;

var new_var;

run;

 

However, I am not sure if this is correct? Is there a better or more efficient way of doing this? 

Please let me know if I can clarify further. 

 

Thanks.


Note that your code may create some 0 values for the sum that don't exist in your data if you have missing values for any of those drug variables. That may be your desire but be aware of the difference.

 

If your drug variable is coded 0/1 and is numeric then you can get that sum as :

new_var = sum (of drug:) ; if that is ALL of the variables that start with drug in the name. If there are others such as DRUG_date those would get used in the shorthand list created with  the :  

 

Or declare an array:

 

array d drug1-drug21;

new_var = sum(of drug(*));

 

For added entertainment try adding:

new_var2 = mean(of drug(*));

and then include new_var2 in your proc means.

 

Your current approach would have the overall mean from Proc Means as "mean number of drugs per subject",

 

 

 

 

wj2
Quartz | Level 8 wj2
Quartz | Level 8

Hi ballardw,

 

Thank you for the suggestions. I ran the code I mentioned in my previous reply and I got the output shown below. I'm not sure why 22 values are showing when there is only 21 variables. Any feedback on this would be much appreciated. 

new_var

Frequency

Percent

Cumulative
Frequency

Cumulative
Percent

1

52

1.18

52

1.18

2

439

9.95

491

11.13

3

466

10.56

957

21.69

4

464

10.52

1421

32.21

5

510

11.56

1931

43.77

6

413

9.36

2344

53.13

7

463

10.49

2807

63.62

8

368

8.34

3175

71.96

9

352

7.98

3527

79.94

10

291

6.60

3818

86.54

11

225

5.10

4043

91.64

12

159

3.60

4202

95.24

13

92

2.09

4294

97.33

14

51

1.16

4345

98.48

15

24

0.54

4369

99.03

16

15

0.34

4384

99.37

17

11

0.25

4395

99.61

18

4

0.09

4399

99.71

19

4

0.09

4403

99.80

20

3

0.07

4406

99.86

21

1

0.02

4407

99.89

22

5

0.11

4412

100.00

 
mkeintz
PROC Star

Well, this problem is begging you to look at the data!!!

 

data problem;

  set have;

  where sum(of drug:) >21;

run;

 

Or better yet (in case an observation has both a -1 and a +2 - which wouldn't produce a sum over 21):

 

  data problem;

    where max(of drug:) >1  or min(of drug:)<0;

  run;

 

 

If Socrates were alive now, perhaps he would issue a corollary to his best-known dictum:

   Know Thy Data

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
PeterClemmensen
Tourmaline | Level 20

Please be more specific. If possible, provide some example data and what you want your desired outcome to look like 🙂

sas-innovate-white.png

Missed SAS Innovate in Orlando?

Catch the best of SAS Innovate 2025 — anytime, anywhere. Stream powerful keynotes, real-world demos, and game-changing insights from the world’s leading data and AI minds.

 

Register now

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 6 replies
  • 3165 views
  • 0 likes
  • 5 in conversation