BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
pbhatt
Calcite | Level 5

I am trying to use an array to tag various diagnosis variables, but only need the first part of the value to do the tagging. All my variables are in CHAR format. 

I am trying to use the substring function within the array, but it is running into some errors. 

Some values of the diag code are V1234, 25011, or 41402. I only need to tag values based on the first three character values: V12, 250, 414. 

 

data want;
set have;

Label diag_tag =  "diagnosis tag";

array D_DIAG {8} diag_2 - diag_9;

do i = 1 to 8;

       if substr(d_diag,1,3) = "V12" then diag_tag = 1 ;

end;

if diag_tag = "" then diag_tag = 0;

run;

1 ACCEPTED SOLUTION

Accepted Solutions
Patrick
Opal | Level 21

Below just "consolidating" what other's already posted plus a few more tweaks to make your usage of arrays a bit more dynamic.

 

data have;
  array diag_ {2:9} $4 (8*'xxxx');
  output;
  diag_5='2509';
  output;
  diag_5='X250';
  output;
run;

data want;
  set have;
  label diag_tag =  "diagnosis tag";
  array d_diag {*} diag_2 - diag_9;
  diag_tag = 0;
  do i = 1 to dim(d_diag);
/*    if substrn(d_diag[i],1,3) in ('V12', '250', '414') then */
    if d_diag[i] in: ('V12', '250', '414') then 
      do;
        diag_tag=1;
        /* no further looping required */
        leave;
      end;
  end;
run;

 

Patrick_0-1630727559438.png

 

Variable i in the 2nd row only got to 4 as that's when the loop could get ended. This saves some processing time.

 

View solution in original post

9 REPLIES 9
PaigeMiller
Diamond | Level 26

@pbhatt wrote:

 

I am trying to use the substring function within the array, but it is running into some errors. 


Whenever you get errors, you need to show us the entire log for this DATA step (if that's where the error is), or show us the incorrect output (if that's where the error is) and explain what you expect to see.

--
Paige Miller
ballardw
Super User

@pbhatt wrote:

I am trying to use an array to tag various diagnosis variables, but only need the first part of the value to do the tagging. All my variables are in CHAR format. 

I am trying to use the substring function within the array, but it is running into some errors. 

Some values of the diag code are V1234, 25011, or 41402. I only need to tag values based on the first three character values: V12, 250, 414. 

 

data want;
set have;

Label diag_tag =  "diagnosis tag";

array D_DIAG {8} diag_2 - diag_9;

do i = 1 to 8;

       if substr(d_diag,1,3) = "V12" then diag_tag = 1 ;

end;

if diag_tag = "" then diag_tag = 0;

run;


You are not referencing the value of the array variable anywhere. You probably meant

 if substr(d_diag[i] ,1,3) = "V12" then diag_tag = 1 ;

if you were attempting to get the first 3 characters of an array element. You need to reference the index (position or however you want to think of it in the array with the loop the counter. That means place the counter in parentheses after the array name. Square brackets also work. I use them so I kind find array index values easier.

 

If you want to see if diag_tag was not assigned use the Missing function. You generate a conversion to character note because of comparing the numeric variable to a character string.

if missing( diag_tag)  then diag_tag = 0;

Missing function works with both character and numeric values so you don't have to concern yourself with specific character for comparison. The function returns 1/0 for missing/non-missing and since SAS uses 1 for true and 0 for false you can use it directly with a comparison.

 

You could avoid the possible confusion between index values of 1 to 8 and variable suffixes of 2 to 9 by using

array D_DIAG {2:9} diag_2 - diag_9;

do i = 2 to 9;

When you use the colon between to two values that is lower and upper bound definition which can make some coding much more understandable or easier to debug if the index variable values match the variable suffixes.

 

pbhatt
Calcite | Level 5
Thanks so much for your detailed response. I went through and made sure to refer to the position of the variable within the array.
Astounding
PROC Star

When you want to examine the beginning of a character field (such as the first three characters), you don't need to use SUBSTR.  Looking at this code:

do i = 1 to 8;
   if substr(d_diag,1,3) = "V12" then diag_tag = 1 ;
end;
if diag_tag = "" then diag_tag = 0;
run;

It could be corrected and rewritten in this fashion:

diag_tag=0;
do i=1 to 8;
   if d_diag{i} in: ('V12', '250', '414') then diag_tag=1;
end;
run;

The colon after the equal sign is a significant character, that limits the comparison to the length of the shorter string (in this case three characters).

Tom
Super User Tom
Super User

Note that you can short circuit the loop once one of the diagnosis codes is found.

diag_tag=0;
do i=1 to dim(d_diag) until(diag_tag);
  if d_diag{i} in: ('V12', '250', '414') then diag_tag=1;
end;

 Or even more concisely

do i=1 to dim(d_diag) until(diag_tag);
  diag_tag = d_diag{i} in: ('V12' '250' '414') ;
end;
pbhatt
Calcite | Level 5
Thanks again. This was really helpful as it sped up the processing time for large datasets.
pbhatt
Calcite | Level 5
Thanks for the reply. I looked up how to use the colon after the equal sign and that is an absolute game-changer in identifying a broad set of codes.
Patrick
Opal | Level 21

Below just "consolidating" what other's already posted plus a few more tweaks to make your usage of arrays a bit more dynamic.

 

data have;
  array diag_ {2:9} $4 (8*'xxxx');
  output;
  diag_5='2509';
  output;
  diag_5='X250';
  output;
run;

data want;
  set have;
  label diag_tag =  "diagnosis tag";
  array d_diag {*} diag_2 - diag_9;
  diag_tag = 0;
  do i = 1 to dim(d_diag);
/*    if substrn(d_diag[i],1,3) in ('V12', '250', '414') then */
    if d_diag[i] in: ('V12', '250', '414') then 
      do;
        diag_tag=1;
        /* no further looping required */
        leave;
      end;
  end;
run;

 

Patrick_0-1630727559438.png

 

Variable i in the 2nd row only got to 4 as that's when the loop could get ended. This saves some processing time.

 

pbhatt
Calcite | Level 5
Thanks for consolidating previous replies and providing a visual example. I also read up on arrays to get a better understanding.
This article was very helpful as well
https://support.sas.com/resources/papers/proceedings/proceedings/sugi30/242-30.pdf

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 9 replies
  • 781 views
  • 2 likes
  • 6 in conversation