I'm wondering how to sum 3 variables. Each variable either has a "yes", "no" or "maybe". I want to create a new dataset where I sum up those variables to find the total number of yes, no and maybe answers. How do you do that? Thanks so much
PROC FREQ
Shouldn't there be a sum command at some point?
You aren't "summing" but instead are "counting". PROC FREQ data=mytable; TABLES myvar ; RUN; will produce a report of the number of observations for each value of MYVAR. So if you variable has values yes/no/maybe then it will produce three rows of output. The freq (or count) is what you are calling the sum.
Thanks. That gave me 3 tables each giving me the count and percentage of the yes, no, maybe answers for each variable. But I want to add them up and get the total number of yes, no and maybes across all of them plus the percentage of each. There must be a way to do this w/o me doing it by hand?
Hi, Nikknock ... you are not adding or summing with text variables ... let's get those words out of the discussion ... we are counting.
You could use a SAS data step before PROC FREQ to convert your three variables into a single variable. Something like this:
data want;
set have;
value=x1; output;
value=x2; output;
value=x3; output;
run;
proc freq;
tables value;
run;
You need to clarify what you want to count then.
If you want to count the number of yes's in the 3 variables for this particular observation then there is not a good single command.
For three is probably easiest to just list them as in this response below from JohnW.
You could use array processing to loop over the variables.
array resp var1-var3 ;
yes=0;
no=0;
maybe=0;
do over resp;
if resp='yes' then yes=yes+1;
else if resp='no' then no=no+1;
else if resp='maybe' then maybe=maybe+1;
run;
nikknock wrote:
Shouldn't there be a sum command at some point?
What is the sum of five different observations that are in the following sequence: "Yes", "Yes", "No", "Yes", "Maybe"??
Not sure if you're looking to sum across variables, but if you are, you could do something like:
DATA
INDATA;
INPUT VAR1 $ VAR2 $ VAR3 $;
DATALINES;
Yes No Maybe
No Yes Maybe
No No Yes
;
RUN;
DATA Counts;
SET INDATA;
Yes = (VAR1='Yes') +(VAR2='Yes') +(VAR3='Yes');
No = (VAR1='No') +(VAR2='No') +(VAR3='No');
Maybe = (VAR1='Maybe')+(VAR2='Maybe') +(VAR3='Maybe');
RUN;
PROC MEANS
DATA=Counts
NOPRINT;
VAR Yes No Maybe;
OUTPUT OUT=MEANS SUM(Yes No Maybe)=Yes No Maybe;
RUN;
Paige and Tom, those are much more elegant responses than my brute force technique!
The only other thing I will add is a data quality check...if you find (perhaps from an initial PROC FREQ as suggested above) that your data are like the following:
Yes YES yes Y
No NO no N
etc.,
it may be easiest to create a format to bin these together to cleaned up values with PROC FORMAT and apply the format in one of your steps. You may also benefit from the TRIM function if there are extra spaces making otherwise similar looking responses show up as different rows. If your data don't suffer from this problem, then be sure to thank the person who gave it to you!
Happy counting.
Mixed case shouldn't be a problem, just convert all characters to Upper (or Lower) case before you run PROC FREQ
If sometimes the word YES appears and other times the single letter Y appears (and similarly for NO and MAYBE), then you could easily truncate all strings to the first character, and count those.
The only real problem would be if there's even less consistency in the data, and YES can also appear as sí or DA or JA or OUI, then you might have to resort to formats
Hi Nikknock
The following code works for me.
DATA INDATA;
INPUT VAR1 $ VAR2 $ VAR3 $;
DATALINES;
Yes No Maybe
No Yes Maybe
No No Yes
;
RUN;
data indata;
set indata;
id=_n_;
run;
proc sort data=indata;
by id;
run;
proc transpose data=indata out=outdata;
by id;
var VAR1 VAR2 VAR3 ;
run;
data outdata;
set outdata;
count=1;
drop id;
rename _name_=VAR;
run;
proc sort data=outdata;
by VAR col1;
;
run;
proc summary data=outdata ;
by VAR col1;
var count;
output out=want (drop=_:)sum=;
run;
and the output is
VAR | COL1 | count |
VAR1 | No | 2 |
VAR1 | Yes | 1 |
VAR2 | No | 2 |
VAR2 | Yes | 1 |
VAR3 | Maybe | 2 |
VAR3 | Yes | 1 |
Hi Plz check this and reply back. If this is not what you want I can learn from this.
DATA a;
INPUT VAR1 $ VAR2 $ VAR3 $;
DATALINES;
Yes No Maybe
No Yes Maybe
No No Yes
;
RUN;
proc tabulate data=a out=b;
class var1 var2 var3;
table var1 var2 var3;
run;
data c;
set b;
if var1='' and var2 ne '' then var1=var2;
if var1='' and var3 ne '' then var1=var3;
drop var2 var3;
run;
proc sort data=c;by _type_;run;
options missing=0;
proc transpose data=c out=d(drop=_name_ _type_);
id var1;
by _TYPE_;
var n;
run;
Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.
Register today!Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.