Hi SAS users,
Could you please help me simplify this code? I need to create a vector of 21 variables that take value of 1 if a subject dropped out of the study, and 0 otherwise. I currently use the code below, but it is silly and inefficient. The repetitive pattern suggests that I can use a loop.
data want;
set have;
cens_1 = 0;
if min(of cd4_2-cd4_21) < 0 then cens_2 = 1; /*If all values of cd4 from visit 2 to 21 are missing, thencens_2 takes a value of 1*/
else cens_2 = 0;
if min(of cd4_3-cd4_21) < 0 then cens_3 = 1;
else cens_3 = 0;
if min(of cd4_4-cd4_21) < 0 then cens_4 = 1;
else cens_4 = 0;
if min(of cd4_5-cd4_21) < 0 then cens_5 = 1;
else cens_5 = 0; etc...
run;
I tried a do loop:
data want;
set have;
array cd4(21) cd4_1-cd4_21;
arrray cens(21) cens_1-cens_21;
do i = 1 to 21;
if min(of cd4(i) - cd4_21) < 0 then cens(i) = 1;
else cens(i) = 0;
end;
run;
The problem is in the expression (of cd4(i) - cd4_21). SAS says: ERROR: Missing numeric suffix on a numbered variable list (NAME-cd4_21).
I would appreciate any help!
Thanks!
See if this does what you are looking for. I only used 5 values to make smaller example that could actually be manually inspected.
data junk; array cd4(5) ; do i=1 to 5; cd4[i]= rand('uniform'); output; end; run; data want; set junk; array cd4(5); array cens(5); array t(5) _temporary_; do i= 1 to 5; call missing(of t(*)); do j= i to 5; t[j]= cd4[j]; end; cens[i] = (min(of t(*)) < 0); end; drop i j; run;
You have discovered one of the odd things of SAS MIN, MAX and related functions when working with arrays, the OF arrayref doesn't like to mixed with anything much less potential operators like -.
The above code copies the CD values into a temporary array which allows using the (of array(*)) to inspect all values of the array.
CALL MISSING is used to reset the array. The temporary array is NOT written out to the data set. Care needs to be used with temporary arrays as the values will by default persist as if RETAINED unless reset.
If you are searching for missing values though I would suggest using
cens[I] = missing( min(of t[I])) instead of "< 0" just in case.
Note that
(min(of t(*)) < 0)
is a logical comparison that returns 1 when true and 0 when false. Which means you get rid of a bunch of all the If/then/else.
I am trying to get the variable cens(t) take a value of 1 at visit t if the value of cd4(t) was missing at this visit and onward, and if cens(t)=1 then all subsequent values of cens(t) should be 1. This is what my current code is doing. But the problem is, I have to write the same two lines 20 times:
if min(of cd4_2-cd4_21) < 0 then cens_2 = 1;
else cens_2 = 0; .......
I thought that a do loop would be a better choice, but cannot get mine working.
data example;
input cd4_1 cd4_2 cd4_3 cd4_4 cd4_5 ;
cards ;
340 . 440 320 278
1750 1600 1500 1600 1800
334 363 . 507 502
620 720 550 660 .
573 590 . . .
800 860 . . 900
390 . . . .
806 622 522 . .
1324 1060 1140 750 1500
700 730 830 880 750 220
;
run;
proc print data=example; run;
A more efficient approach would minimize looping and computations. Consider:
data want;
set have;
cens_1=0;
if min(of cd4_5 - cd4_21) < 0 then cens_5=1;
else cens_5=0;
if cens_5 = 1 or cd4_4 < 0 then cens_4=1;
else cens_4=0;
if cens4 = 1 or cd4_3 < 0 then cens_3=1;
else cens_3=0;
if cens3 = 1 or cd4_2 < 0 then cens_2=1;
else cens_2=0;
run;
I can't vouch for the accuracy of the logic, but these statements should replicate the values that your current program generates. If the logic looks right, and if you really need 21 CENS_ values computed, we can look at using arrays to shorten the coding burden.
See if this does what you are looking for. I only used 5 values to make smaller example that could actually be manually inspected.
data junk; array cd4(5) ; do i=1 to 5; cd4[i]= rand('uniform'); output; end; run; data want; set junk; array cd4(5); array cens(5); array t(5) _temporary_; do i= 1 to 5; call missing(of t(*)); do j= i to 5; t[j]= cd4[j]; end; cens[i] = (min(of t(*)) < 0); end; drop i j; run;
You have discovered one of the odd things of SAS MIN, MAX and related functions when working with arrays, the OF arrayref doesn't like to mixed with anything much less potential operators like -.
The above code copies the CD values into a temporary array which allows using the (of array(*)) to inspect all values of the array.
CALL MISSING is used to reset the array. The temporary array is NOT written out to the data set. Care needs to be used with temporary arrays as the values will by default persist as if RETAINED unless reset.
If you are searching for missing values though I would suggest using
cens[I] = missing( min(of t[I])) instead of "< 0" just in case.
Note that
(min(of t(*)) < 0)
is a logical comparison that returns 1 when true and 0 when false. Which means you get rid of a bunch of all the If/then/else.
Thank you!
I just tried your code, and it produces only zero values in cens_1-cens_21 variables.
@Dinurik wrote:
Thank you!
I just tried your code, and it produces only zero values in cens_1-cens_21 variables.
When I run my code on my data the result looks like:
Obs cd41 cd42 cd43 cd44 cd45 cens1 cens2 cens3 cens4 cens5 1 0.16056 . . . . 0 1 1 1 1 2 0.16056 0.38588 . . . 0 0 1 1 1 3 0.16056 0.38588 0.24446 . . 0 0 0 1 1 4 0.16056 0.38588 0.24446 0.27629 . 0 0 0 0 1 5 0.16056 0.38588 0.24446 0.27629 0.14308 0 0 0 0 0
Which shows 1 when the corresponding values are all missing.
So perhaps your data is not as described or you haven't completely described your problem.
Should also show the actual code that you ran from the log. Copy the code and any messages from the log and paste into a code box.
I changed some variable names because I don't like that many _ characters. Did you forget to change my cd4 to cd4_ or cens_????
Yes, sorry. I made a mistake in your code before - that's why it didn't work.
It works perfectly now. Thank you so much! This is such a better solution that writing conditions for each of 21 variables!
data want;
set have;
array cd4_(21);
array cens(21);
array t(21) _temporary_;
do i= 1 to 21;
call missing(of t(*));
do j= i to 21;
t[j]= cd4_[j];
end;
cens[i] = (min(of t(*)) < 0);
end;
drop i j;
run;
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.