I am trying to use an array to tag various diagnosis variables, but only need the first part of the value to do the tagging. All my variables are in CHAR format.
I am trying to use the substring function within the array, but it is running into some errors.
Some values of the diag code are V1234, 25011, or 41402. I only need to tag values based on the first three character values: V12, 250, 414.
data want;
set have;
Label diag_tag = "diagnosis tag";
array D_DIAG {8} diag_2 - diag_9;
do i = 1 to 8;
if substr(d_diag,1,3) = "V12" then diag_tag = 1 ;
end;
if diag_tag = "" then diag_tag = 0;
run;
Below just "consolidating" what other's already posted plus a few more tweaks to make your usage of arrays a bit more dynamic.
data have;
array diag_ {2:9} $4 (8*'xxxx');
output;
diag_5='2509';
output;
diag_5='X250';
output;
run;
data want;
set have;
label diag_tag = "diagnosis tag";
array d_diag {*} diag_2 - diag_9;
diag_tag = 0;
do i = 1 to dim(d_diag);
/* if substrn(d_diag[i],1,3) in ('V12', '250', '414') then */
if d_diag[i] in: ('V12', '250', '414') then
do;
diag_tag=1;
/* no further looping required */
leave;
end;
end;
run;
Variable i in the 2nd row only got to 4 as that's when the loop could get ended. This saves some processing time.
@pbhatt wrote:
I am trying to use the substring function within the array, but it is running into some errors.
Whenever you get errors, you need to show us the entire log for this DATA step (if that's where the error is), or show us the incorrect output (if that's where the error is) and explain what you expect to see.
@pbhatt wrote:
I am trying to use an array to tag various diagnosis variables, but only need the first part of the value to do the tagging. All my variables are in CHAR format.
I am trying to use the substring function within the array, but it is running into some errors.
Some values of the diag code are V1234, 25011, or 41402. I only need to tag values based on the first three character values: V12, 250, 414.
data want;
set have;Label diag_tag = "diagnosis tag";
array D_DIAG {8} diag_2 - diag_9;
do i = 1 to 8;
if substr(d_diag,1,3) = "V12" then diag_tag = 1 ;
end;
if diag_tag = "" then diag_tag = 0;
run;
You are not referencing the value of the array variable anywhere. You probably meant
if substr(d_diag[i] ,1,3) = "V12" then diag_tag = 1 ;
if you were attempting to get the first 3 characters of an array element. You need to reference the index (position or however you want to think of it in the array with the loop the counter. That means place the counter in parentheses after the array name. Square brackets also work. I use them so I kind find array index values easier.
If you want to see if diag_tag was not assigned use the Missing function. You generate a conversion to character note because of comparing the numeric variable to a character string.
if missing( diag_tag) then diag_tag = 0;
Missing function works with both character and numeric values so you don't have to concern yourself with specific character for comparison. The function returns 1/0 for missing/non-missing and since SAS uses 1 for true and 0 for false you can use it directly with a comparison.
You could avoid the possible confusion between index values of 1 to 8 and variable suffixes of 2 to 9 by using
array D_DIAG {2:9} diag_2 - diag_9; do i = 2 to 9;
When you use the colon between to two values that is lower and upper bound definition which can make some coding much more understandable or easier to debug if the index variable values match the variable suffixes.
When you want to examine the beginning of a character field (such as the first three characters), you don't need to use SUBSTR. Looking at this code:
do i = 1 to 8;
if substr(d_diag,1,3) = "V12" then diag_tag = 1 ;
end;
if diag_tag = "" then diag_tag = 0;
run;
It could be corrected and rewritten in this fashion:
diag_tag=0;
do i=1 to 8;
if d_diag{i} in: ('V12', '250', '414') then diag_tag=1;
end;
run;
The colon after the equal sign is a significant character, that limits the comparison to the length of the shorter string (in this case three characters).
Note that you can short circuit the loop once one of the diagnosis codes is found.
diag_tag=0;
do i=1 to dim(d_diag) until(diag_tag);
if d_diag{i} in: ('V12', '250', '414') then diag_tag=1;
end;
Or even more concisely
do i=1 to dim(d_diag) until(diag_tag);
diag_tag = d_diag{i} in: ('V12' '250' '414') ;
end;
Below just "consolidating" what other's already posted plus a few more tweaks to make your usage of arrays a bit more dynamic.
data have;
array diag_ {2:9} $4 (8*'xxxx');
output;
diag_5='2509';
output;
diag_5='X250';
output;
run;
data want;
set have;
label diag_tag = "diagnosis tag";
array d_diag {*} diag_2 - diag_9;
diag_tag = 0;
do i = 1 to dim(d_diag);
/* if substrn(d_diag[i],1,3) in ('V12', '250', '414') then */
if d_diag[i] in: ('V12', '250', '414') then
do;
diag_tag=1;
/* no further looping required */
leave;
end;
end;
run;
Variable i in the 2nd row only got to 4 as that's when the loop could get ended. This saves some processing time.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.