Hi all,
I have about 14 variables in total and if 2 or more of those variables are answered "Yes", I need to create a new variable to define a disorder.
In the past, I have used if then statements to define a new variable but because there are so many possibilities, with several combinations, I was wondering if there is a faster way of coding this?
Thank you in advance!
The OP is asking for "2 or more", so instead of > 2 you would use
if sum(of num(*))> GE then new_var="Yes";
Though I would probably use:
newvar= ( sum(of num(*)) ge 2 );
to get a numeric dichotomous 1/0 coded variable. The data description provided indicates that should not be unfamiliar with this project and it is easy to get certain summary statistics (sum is number of Yes, mean is percent yes)
All the variables are stored in one data set
I am not familiar with this SAS code, can you please elaborate what is to be indicated in the " "? does this code get used with proc print?
X = catx(“ “, of var1-var14);
N_Yes = countw(x, ‘Yes’);
Nothing is to be added, this runs in a data step, not PROC PRINT.
Concatenate all the responses into a single variable separated by a space
CATX(“ “ , of var1-var14) ;
X will look like: No Yes No Yes No No NO etc.
Counts the number of times Yes is in the string.
COUNTW(X, “Yes”);
Should return 2.
Please check the documentation on the functions to see what they do in more details.
@kgrover wrote:
All the variables are stored in one data set
I am not familiar with this SAS code, can you please elaborate what is to be indicated in the " "? does this code get used with proc print?
X = catx(“ “, of var1-var14);
N_Yes = countw(x, ‘Yes’);
Actually you should provide some example data. "Yes" could be coded a moderately large number of ways. Also are all of your variables involved the same type: all numeric or all character? All coded the same?
They are all numeric, I have attached a screenshot of the variables
If you have all numeric values which are to be included for comparison then simply use arrays with _numeric_.
data have;
input ID $ ABC BCD CDE DEF;
datalines ;
A 1 0 0 0 0
B 0 1 1 1 0
C 0 0 0 0 0
D 0 1 1 1 0
;
run;
data want;
set have;
array num{*} _numeric_;
if sum(of num(*))>2 then new_var="Yes";
else new_var="No";
run;
For suppose if you have extra numeric variable something like ID that needs be ignored in sum function then you need to define the variables in array. Double dash(--) would be useful to specify all variables by simply typing the first and last variables names in the order they exist in dataset.
data have;
input ID ABC BCD CDE DEF;
datalines ;
101 1 0 0 0 0
102 0 1 1 1 0
103 0 0 0 0 0
104 0 1 1 1 0
;
run;
data want;
set have;
array num{*} ABC--DEF /* Order in which they exist in dataset*/ ;
if sum(of num(*))>2 then new_var="Yes";
else new_var="No";
run;
The OP is asking for "2 or more", so instead of > 2 you would use
if sum(of num(*))> GE then new_var="Yes";
Though I would probably use:
newvar= ( sum(of num(*)) ge 2 );
to get a numeric dichotomous 1/0 coded variable. The data description provided indicates that should not be unfamiliar with this project and it is easy to get certain summary statistics (sum is number of Yes, mean is percent yes)
Thank you so much!
I am trying to run this code, but for some reason "sum" is not in blue
data merged11; set merged10;
array num{*} CIINTERF--CIPROBLM;
if sum(of num(*)) >= 2 then tobdodr = 1;
else tobdodr = 0;run;
is something wrong with my code?
@kgrover wrote:
Thank you so much!
I am trying to run this code, but for some reason "sum" is not in blue
data merged11; set merged10;
array num{*} CIINTERF--CIPROBLM;
if sum(of num(*)) >= 2 then tobdodr = 1;
else tobdodr = 0;run;
is something wrong with my code?
The syntax highlighter does not automatically highlight most actual numeric functions an a data step. Possibly this decision on the part of the SAS designers is because so many people routinely create variables such as sum, count, min, max and who knows what all. So don't let non-highlighted functions worry you too much. If you misspell one such as summ( of ...) then you may get a message
ERROR 68-185: The function SUMM is unknown, or cannot be accessed.
with underscores indicating where you used the misspelled function.
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.