BookmarkSubscribeRSS Feed
nikknock
Calcite | Level 5

I'm wondering how to sum 3 variables. Each variable either has a "yes", "no" or "maybe". I want to create a new dataset where I sum up those variables to find the total number of yes, no and maybe answers. How do you do that? Thanks so much

12 REPLIES 12
PaigeMiller
Diamond | Level 26

PROC FREQ

--
Paige Miller
nikknock
Calcite | Level 5

Shouldn't there be a sum command at some point?

Tom
Super User Tom
Super User

You aren't "summing" but instead are "counting".  PROC FREQ data=mytable; TABLES myvar ; RUN;  will produce a report of the number of observations for each value of MYVAR.  So if you variable has values yes/no/maybe then it will produce three rows of output.  The freq (or count) is what you are calling the sum.

nikknock
Calcite | Level 5

Thanks. That gave me 3 tables each giving me the count and percentage of the yes, no, maybe answers for each variable. But I want to add them up and get the total number of yes, no and maybes across all of them plus the percentage of each. There must be a way to do this w/o me doing it by hand?

PaigeMiller
Diamond | Level 26

Hi, Nikknock ... you are not adding or summing with text variables ... let's get those words out of the discussion ... we are counting.

You could use a SAS data step before PROC FREQ to convert your three variables into a single variable. Something like this:

data want;

     set have;

     value=x1; output;

     value=x2; output;

     value=x3; output;

run;

proc freq;

tables value;

run;

--
Paige Miller
Tom
Super User Tom
Super User

You need to clarify what you want to count then.

If you want to count the number of yes's in the 3 variables for this particular observation then there is not a good single command.

For three is probably easiest to just list them as in this response below from JohnW. 

You could use array processing to loop over the variables.

array resp var1-var3 ;

yes=0;

no=0;

maybe=0;

do over resp;

   if resp='yes' then yes=yes+1;

   else if resp='no' then no=no+1;

   else if resp='maybe' then maybe=maybe+1;

run;

PaigeMiller
Diamond | Level 26

nikknock wrote:

Shouldn't there be a sum command at some point?

What is the sum of five different observations that are in the following sequence: "Yes", "Yes", "No", "Yes", "Maybe"??

--
Paige Miller
JohnW_
Calcite | Level 5

Not sure if you're looking to sum across variables, but if you are, you could do something like:

 

DATA
INDATA;

INPUT VAR1 $ VAR2 $ VAR3 $;

DATALINES;

Yes No Maybe

No Yes Maybe

No No Yes

;

RUN;

DATA Counts;

SET INDATA;

Yes   = (VAR1='Yes')  +(VAR2='Yes')   +(VAR3='Yes');

No    = (VAR1='No')   +(VAR2='No')    +(VAR3='No');

Maybe = (VAR1='Maybe')+(VAR2='Maybe') +(VAR3='Maybe');

RUN;

PROC MEANS

DATA=Counts

NOPRINT;

VAR  Yes No Maybe;

OUTPUT     OUT=MEANS SUM(Yes No Maybe)=Yes No Maybe;

RUN;

JohnW_
Calcite | Level 5

Paige and Tom, those are much more elegant responses than my brute force technique!

The only other thing I will add is a data quality check...if you find (perhaps from an initial PROC FREQ as suggested above) that your data are like the following:

Yes YES yes Y

No NO no N

etc.,

it may be easiest to create a format to bin these together to cleaned up values with PROC FORMAT and apply the format in one of your steps.  You may also benefit from the TRIM function if there are extra spaces making otherwise similar looking responses show up as different rows.  If your data don't suffer from this problem, then be sure to thank the person who gave it to you!

Happy counting.

PaigeMiller
Diamond | Level 26

Mixed case shouldn't be a problem, just convert all characters to Upper (or Lower) case before you run PROC FREQ

If sometimes the word YES appears and other times the single letter Y appears (and similarly for NO and MAYBE), then you could easily truncate all strings to the first character, and count those.

The only real problem would be if there's even less consistency in the data, and YES can also appear as sí or DA or JA or OUI, then you might have to resort to formats

--
Paige Miller
Mit
Calcite | Level 5 Mit
Calcite | Level 5

Hi Nikknock

The following code works for me.

DATA INDATA;

INPUT VAR1 $ VAR2 $ VAR3 $;

DATALINES;
Yes No Maybe
No Yes Maybe
No No Yes
;

RUN;


data indata;
set indata;
id=_n_;
run;
proc sort data=indata;
by id;
run;
proc transpose data=indata out=outdata;
by id;
var VAR1   VAR2   VAR3  ;
run;
data outdata;
set outdata;
count=1;
drop id;
rename _name_=VAR;
run;
proc sort data=outdata;
by VAR col1;
;
run;
proc summary data=outdata ;
by VAR col1;
var count;
output out=want (drop=_:)sum=;
run;

and the output is


VAR


COL1


count


VAR1


No


2


VAR1


Yes


1


VAR2


No


2


VAR2


Yes


1


VAR3


Maybe


2


VAR3


Yes


1

GiridharGoutham
Calcite | Level 5

Hi Plz check this and reply back. If this is not what you want I can learn from this.

DATA a;

INPUT VAR1 $ VAR2 $ VAR3 $;

DATALINES;

Yes No Maybe

No Yes Maybe

No No Yes

;

RUN;

proc tabulate data=a out=b;

class var1 var2 var3;

table var1 var2 var3;

run;

data c;

set b;

if var1='' and var2 ne '' then var1=var2;

if var1='' and var3 ne '' then var1=var3;

drop var2 var3;

run;

proc sort data=c;by _type_;run;

options missing=0;

proc transpose data=c out=d(drop=_name_ _type_);

id var1;

by _TYPE_;

var n;

run;

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 12 replies
  • 7999 views
  • 2 likes
  • 6 in conversation