BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
kgrover
Calcite | Level 5

Hi all,

 

I have about 14 variables in total and if 2 or more of those variables are answered "Yes", I need to create a new variable to define a disorder. 

 

In the past, I have used if then statements to define a new variable but because there are so many possibilities, with several combinations, I was wondering if there is a faster way of coding this?

 

Thank you in advance!

 

1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

@SuryaKiran

The OP is asking for "2 or more", so instead of > 2 you would use

if sum(of num(*))> GE then new_var="Yes";

Though I would probably use:

 

newvar= ( sum(of num(*)) ge 2 );

to get a numeric dichotomous 1/0 coded variable. The data description provided indicates that should not be unfamiliar with this project and it is easy to get certain summary statistics (sum is number of Yes, mean is percent yes)

 

View solution in original post

9 REPLIES 9
Reeza
Super User
How are your variables stored? There’s a few options, one is to concatenate all the results and use COUNTW to count the number of YES or NO.

X = catx(“ “, of var1-var14);
N_Yes = countw(x, ‘Yes’);
kgrover
Calcite | Level 5

All the variables are stored in one data set 

 

I am not familiar with this SAS code, can you please elaborate what is to be indicated in the " "? does this code get used with proc print?  

 

X = catx(“ “, of var1-var14);
N_Yes = countw(x, ‘Yes’); 

Reeza
Super User

Nothing is to be added, this runs in a data step, not PROC PRINT. 

 

Concatenate all the responses into a single variable separated by a space

CATX(“ “ , of var1-var14) ; 

X will look like: No Yes No Yes No No NO etc.

 

Counts the number of times Yes is in the string. 

COUNTW(X, “Yes”);

Should return 2. 


Please check the documentation on the functions to see what they do in more details. 

 


@kgrover wrote:

All the variables are stored in one data set 

 

I am not familiar with this SAS code, can you please elaborate what is to be indicated in the " "? does this code get used with proc print?  

 

X = catx(“ “, of var1-var14);
N_Yes = countw(x, ‘Yes’); 


 

 

 

ballardw
Super User

Actually you should provide some example data. "Yes" could be coded a moderately large number of ways. Also are all of your variables involved the same type: all numeric or all character? All coded the same?

kgrover
Calcite | Level 5

They are all numeric, I have attached a screenshot of the variables 

SuryaKiran
Meteorite | Level 14

If you have all numeric values which are to be included for comparison then simply use arrays with _numeric_.

 

data have;
input ID $ ABC BCD CDE DEF;
datalines ;
A 1 0 0 0 0
B 0 1 1 1 0
C 0 0 0 0 0
D 0 1 1 1 0
;
run;

data want;
set have;
array num{*} _numeric_;
if sum(of num(*))>2 then new_var="Yes";
else new_var="No";
run;

For suppose if you have extra numeric variable something like ID that needs be ignored in sum function then you need to define the variables in array. Double dash(--) would be useful to specify all variables by simply typing the first and last variables names in the order they exist in dataset. 

data have;
input ID ABC BCD CDE DEF;
datalines ;
101 1 0 0 0 0
102 0 1 1 1 0
103 0 0 0 0 0
104 0 1 1 1 0
;
run;

data want;
set have;
array num{*} ABC--DEF /* Order in which they exist in dataset*/ ;
if sum(of num(*))>2 then new_var="Yes";
else new_var="No";
run;

 

 

 

Thanks,
Suryakiran
ballardw
Super User

@SuryaKiran

The OP is asking for "2 or more", so instead of > 2 you would use

if sum(of num(*))> GE then new_var="Yes";

Though I would probably use:

 

newvar= ( sum(of num(*)) ge 2 );

to get a numeric dichotomous 1/0 coded variable. The data description provided indicates that should not be unfamiliar with this project and it is easy to get certain summary statistics (sum is number of Yes, mean is percent yes)

 

kgrover
Calcite | Level 5

Thank you so much! 

 

I am trying to run this code, but for some reason "sum" is not in blue 

 

data merged11; set merged10;
array num{*} CIINTERF--CIPROBLM;
if sum(of num(*)) >= 2 then tobdodr = 1;
else tobdodr = 0;run; 

 

is something wrong with my code? 

ballardw
Super User

@kgrover wrote:

Thank you so much! 

 

I am trying to run this code, but for some reason "sum" is not in blue 

 

data merged11; set merged10;
array num{*} CIINTERF--CIPROBLM;
if sum(of num(*)) >= 2 then tobdodr = 1;
else tobdodr = 0;run; 

 

is something wrong with my code? 


The syntax highlighter does not automatically highlight most actual numeric functions an a data step. Possibly this decision on the part of the SAS designers is because so many people routinely create variables such as sum, count, min, max and who knows what all. So don't let non-highlighted functions worry you too much. If you misspell one such as summ( of ...) then you may get a message

ERROR 68-185: The function SUMM is unknown, or cannot be accessed.

with underscores indicating where you used the misspelled function.

 

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 9 replies
  • 1377 views
  • 1 like
  • 4 in conversation