Hi, I have a lots of variables, q1 to q20 let's say. And I want to create a new variable by several conditions based on the 20 variables:
1. if any one of the 20 variables > 0 then flag=1; (as long as anyone of them > 0)
2.else if any one of the 20 variables = -7 or -8 or -9 then flag=.; (for the rest of the values which <=0)
3. else if ALL of the 20 variables = 0 then flag=0; (for the rest of the values which =0)
For example
q1 q2 q3 ......... q20 flag
20 0 0 0 1
20 20 0 0 1
20 0 -7 0 1
0 0 0 0 0
0 -8 0 0 .
-8 -7 -7 0 .
Because there are too many variables, I tried array:
flag=0;
array q q1-q20;
do over q;
if q > 0 then flag=1;
else if q in ('7', '8', '9') then flag=.;
end;
run;
But the else if here didn't work. Does anyone know how to deal with many variables and multiple conditions? Thank you!
data want;
set have;
array q q1-20;
/* Condition 1 */
if max(of q:)>0 then flag=1;
/* Condition 2 */
else do i=1 to 20;
if q(i) in (-9 -8 -7) then flag=.;
leave;
end;
/* Condition 3 */
else if max(of q:)=0 and min(of q:)=0 then flag=0;
run;
I wonder if there is a flaw in your logic. If q1 = -1 and all the other q's are equal to zero, then none of the conditions execute. The code mirrors your logic. However, this gives a value of FLAG equal to missing, is that what you want?
Try this
data have;
q1=1; q2=0; q3= 2; q4=3; q5=4; output;
q1=0; q2=0; q3= 0; q4=0; q5=0; output;
q1=0; q2=0; q3=-9; q4=0; q5=-7; output;
run;
/* This should work with your data set of 20 vars */
data want;
set have;
if (sum(of q:)=0) then flag=0;
else if (max(of q:) > 0) then flag=1;
else if (sum(whichn(-9, of q:),whichn(-8, of q:),whichn(-7, of q:) ) > 0) then flag=-1;
run;
@SAS-questioner wrote:
Could you be more specific about the flaw? if q1= -7 or -8 or -9 and all the other q's are equal to 0, then flag should be missing. I tried your code, but error said "ERROR 160-185: No matching IF-THEN clause" for the last else there.
What happens if your logic encounters q1 = -1 (not negative seven) and the rest of the q values are zeros? It also is missing according to your stated logic.
When you get an error in the log, show us the entire log for this data step. Do not show us a partial log, and do not show us just the error messages.
Below option should work even if the sum of all q vars should result in zero (obs 7 in sample data).
data have;
length flag_want 8;
infile datalines truncover;
input (q1-q20 flag_want) (:best32.);
datalines;
20 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
20 20 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
20 0 -7 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 -8 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 .
-8 -7 -7 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 .
-7 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 1
;
run;
data want;
length flag_derived 8;
set have;
array a_q (*) q:;
do i=1 to dim(a_q);
/* 1. if any one of the 20 variables > 0 then flag=1 */
if a_q[i]>0 then do; flag_derived=1; leave; end;
/* 2.else if any one of the 20 variables = -7 or -8 or -9 then flag=. */
else
if a_q[i] in (-7,-8,-9) then flag_derived=sum(flag_derived,2);
/* 3. else if ALL of the 20 variables = 0 then flag=0 */
else
if a_q[i]=0 then flag_derived=sum(flag_derived,0);
end;
if flag_derived>1 then call missing(flag_derived);
drop i;
run;
proc print data=want;
run;
Does not look like you need to use looping. Instead just use functions that can operate on multiple variables.
data have;
input q1-q20 flag_want;
datalines;
20 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
20 20 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
20 0 -7 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 -8 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 .
-8 -7 -7 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 .
-7 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 1
;
data want;
set have;
if max(0,of q1-q20) > 0 then flag=1;
else if whichn(-7,of q1-q20)
or whichn(-8,of q1-q20)
or whichn(-9,of q1-q20)
then flag=.;
else if 0=min(0,of q1-q20)=max(0,of q1-q20) then flag=0;
run;
And perhaps if those special values, -7, -8 and -9 are the only valid values that are less then zero then the second IF can be simpler. For example you could use MIN() function.
else min(0,of q1-q20)<0 then flag=.;
Or if -7, -8 and -9 are really just different special missing codes if you actually code the data using special missing values (instead of valid numbers) you could use the NMISS() function. For example by using .A, .B, and .C instead of -7, -8 and -9.
missing abc ;
data have;
input q1-q20 flag_want;
datalines;
20 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
20 20 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
20 0 A 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 B 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 .
B A A 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 .
A 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 1
;
data want;
set have;
if max(0,of q1-q20) > 0 then flag=1;
else if 0<nmiss(of q1-q20) then flag=.;
else if 0=min(0,of q1-q20)=max(0,of q1-q20) then flag=0;
run;
If you really want to use looping try these loops.
data want;
set have ;
array q q1-q20;
flag=0;
do index=1 to dim(q) while(flag=0);
if 0 < q[index] then flag=1;
end;
do index=1 to dim(q) while(flag=0);
if q[index] in (-7 -8 -9) then flag=.;
else if q[index] ne 0 then put 'WARNING: Unexpected value. ' q[index]=;
end;
drop index;
run;
data have;
input q1-q20 ;
datalines;
20 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
20 20 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
20 0 -7 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 -8 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 .
-8 -7 -7 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 .
-7 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 1
;
data want;
set have;
array x{*} _numeric_ ;
if max(of x{*})>0 then flag=1;
else if ((-7 in x) or (-8 in x) or (-9 in x)) and max(of x{*})<=0 then flag=.;
else if max(of x{*})=0 and min(of x{*})=0 then flag=0;
run;
@Ksharp Neat. Not sure if I was ever aware of syntax like (-7 in x). That's certainly useful.
Hi @SAS-questioner , does this answer your question?
data have;
input q1 q2 q3 q4 q5 q6;
datalines;
20 0 0 4 5 0
20 20 0 -1 -2 0
20 0 -7 1 2 0
0 0 0 0 0 0
0 -8 0 -1 -3 0
-8 -7 -7 -1 -1 0
1 2 3 0 -1 -20
-9 20 1 2 3 1
-7 -1 0 0 -8 -9
;
run;
proc print data=have;run;
data want (drop=i);
set have;
array q[*] q:;
do i=1 to dim(q);
if q[i]>0 then flag=1;
else if q[i] in (-7,-8,-9)
then flag=.;
else if max(of q:)=0 and
min(of q:)=0
then flag=0;
end;
run;
proc print data=want;run;
And @SAS-questioner , according to your if/else condition, if any row has all values<=0, the flag of that row is missing, regardless there is -7, -8, and -9 in that row or not. I think this is also the question @PaigeMiller prompted. See the example:
data have;
input q1 q2 q3 q4 q5 q6;
datalines;
20 0 0 4 5 0
20 20 0 -1 -2 0
20 0 -7 1 2 0
0 0 0 0 0 0
0 -8 0 -1 -3 0
-8 -7 -7 -1 -1 0
1 2 3 0 -1 -20
-9 20 1 2 3 1
-7 -1 0 0 -8 -9
-1 -2 -3 -4 -5 -6
0 -1 -2 -3 0 -1
;
run;
proc print data=have;run;
data want (drop=i);
set have;
array q[*] q:;
do i=1 to dim(q);
if q[i]>0 then flag=1;
else if q[i] in (-7,-8,-9)
then flag=.;
else if max(of q:)=0 and
min(of q:)=0
then flag=0;
end;
run;
proc print data=want;run;
Calling all data scientists and open-source enthusiasts! Want to solve real problems that impact your company or the world? Register to hack by August 31st!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.