I have a small data set like this on the bottom.
The logic is: if any of the variable (S1-S5) contains 0, the sex is defined as 'M', otherwise as 'F'.
My question is: is there any way we can replace the repeated if statements with a more efficient coding style?
It will be very helpful when the variable goes from S1 to S200 something.
Thanks.
*;
data one;
input S1-S5 var2;
datalines;
0 . . . . 11
. 0 . . . 22
. . 0 . . 33
1 . . . . 44
. . . . 0 55
;
data two;
set one;
if S1=0 then sex='M';
else if S2=0 then sex='M';
else if S3=0 then sex='M';
else if S4=0 then sex='M';
else if S5=0 then sex='M';
else sex='F';
run;
*;
data one;
input S1-S5 var2;
datalines;
0 . . . . 11
. 0 . . . 22
. . 0 . . 33
1 . . . . 44
. . . . 0 55
;
data want(drop=i);
set one;
sex = 'F';
array s(*) s1-s5;
do i=1 to dim(s);
if s(i) = 0 then sex='M';
end;
run;
If you only have 5 I wouldn't even bother with an array statement. By default the max/min/sum functions ignore missing values so something like the following can work.
I've included for the case where all may be missing with an else clause.
data two;
set one;
if max(s1,s2,s3,s4,s5)=0 then sex='M';
else if max(s1,s2,s3,s4,s5)=1 then sex='F';
else sex='U';
run;
data one;
input S1-S5 var2;
datalines;
0 . . . . 11
. 0 . . . 22
. . 0 . . 33
1 . . . . 44
. . . . 0 55
;
data want(drop=i);
set one;
sex = 'F';
array s(*) s1-s5;
do i=1 to dim(s);
if s(i) = 0 then sex='M';
end;
run;
how about
if s1 * s2 * s3 * s4 * s5 = 0 then set='M';
else sex='F';
This will generate missing values but on the same lines we can do something like this.
data want;
set one;
if sum(of s:) = 0 then sex='M';
else sex='F';
run;
Hi, ALL,
Thanks for you quick help on this topic.
I personally think the array one works the best for my situation and beyond (e.g. when you have character variables instead of numeric ones in this case). All other responses helped to solve my problem. Thank you all.
Joe
DBailey suggested:
how about
if s1 * s2 * s3 * s4 * s5 = 0 then sex='M';
else sex='F';
Sorry, but this doesn't work. The product of a missing value with anything is missing:
data one;
input S1-S5 var2;
datalines;
0 . . . . 11
. 0 . . . 22
. . 0 . . 33
1 . . . . 44
. . . . 0 55
;
data two;
set one;
if s1 * s2 * s3 * s4 * s5 = 0 then sex='M';
else sex='F';
run;
proc print data= two;
run;
Result:
Obs | S1 | S2 | S3 | S4 | S5 | var2 | sex |
1 | 0 | . | . | . | . | 11 | F |
2 | . | 0 | . | . | . | 22 | F |
3 | . | . | 0 | . | . | 33 | F |
4 | 1 | . | . | . | . | 44 | F |
5 | . | . | . | . | 0 | 55 | F |
If you think arrays would be your best choice then I'd suggest using something like:
data two;
set one;
array s(*) $ s1-s5;
if '0' in s then sex='M';
else sex='F';
run;
Thanks for Arthur's additional comments.
I really learned something from you guys.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.