BookmarkSubscribeRSS Feed
Antarimoon
Calcite | Level 5

Hello.

 

I have a data set with 57 character variables that need to be searched for over 100 distinct ICD 9 codes that are character strings.  I need to create a new variable=1 if person has at least 1 of the desired codes in any of the 57 variables or variable=0 if none of the codes are in any of the variables.  I could use some advice. All I know is that I need to use an array(s) to create the new variable.  I have tried the code below, but it gives me a 0 for every observation which I know is wrong.  I'm not sure where I am going wrong. Any advice would be appreciated.

 

 

data test;
set qft_int;
length diagnosis9 $1379 ;
diagnosis9=catx(" ", "ICD9:042", "ICD9:136.3", "ICD9:199.2", "ICD9:238.73", "ICD9:238.76", "ICD9:238.77", "ICD9:238.79", "ICD9:260", "ICD9:261", "ICD9:262", "ICD9:279.00", "ICD9:279.01", "ICD9:279.02",
"ICD9:279.03", "ICD9:279.04", "ICD9:279.05", "ICD9:279.06", "ICD9:279.09", "ICD9:279.10", "ICD9:279.11", "ICD9:279.12", "ICD9:279.13", "ICD9:279.19", "ICD9:279.2", "ICD9:279.3",
"ICD9:279.4", "ICD9:279.41", "ICD9:279.49", "ICD9:279.50", "ICD9:279.51", "ICD9:279.52", "ICD9:279.53", "ICD9:279.8", "ICD9:279.9", "ICD9:284.09", "ICD9:284.1", "ICD9:284.11",
"ICD9:284.12", "ICD9:284.19", "ICD9:288.0", "ICD9:288.00", "ICD9:288.01", "ICD9:288.02", "ICD9:288.03", "ICD9:288.09", "ICD9:288.1", "ICD9:288.2", "ICD9:288.4", "ICD9:288.50",
"ICD9:288.51", "ICD9:288.59", "ICD9:289.53", "ICD9:289.83", "ICD9:403.01", "ICD9:403.11", "ICD9:403.91", "ICD9:404.02", "ICD9:404.03", "ICD9:404.12", "ICD9:404.13", "ICD9:404.92",
"ICD9:404.93", "ICD9:579.3", "ICD9:585", "ICD9:585.5", "ICD9:585.6", "ICD9:996.8", "ICD9:996.80", "ICD9:996.81", "ICD9:996.81", "ICD9:996.82", "ICD9:996.83", "ICD9:996.84", "ICD9:996.85",
"ICD9:996.86", "ICD9:996.87", "ICD9:996.88", "ICD9:996.89", "ICD9:V42.0", "ICD9:V42.1", "ICD9:V42.6", "ICD9:V42.7", "ICD9:V42.8", "ICD9:V42.81", "ICD9:V42.82", "ICD9:V42.83", "ICD9:V42.84",
"ICD9:V42.89", "ICD9:V45.1", "ICD9:V45.11", "ICD9:V56.0", "ICD9:V56.1", "ICD9:V56.2", "ICD9:00.18", "ICD9:33.5", "ICD9:33.50", "ICD9:33.51", "ICD9:33.52", "ICD9:33.6", "ICD9:37.5",
"ICD9:37.51", "ICD9:41.0", "ICD9:41.00", "ICD9:41.01", "ICD9:41.02", "ICD9:41.03", "ICD9:41.04", "ICD9:41.05", "ICD9:41.06", "ICD9:41.07", "ICD9:41.08", "ICD9:41.09", "ICD9:50.51",
"ICD9:50.59", "ICD9:52.80", "ICD9:52.81", "ICD9:52.82", "ICD9:52.83", "ICD9:52.85", "ICD9:52.86", "ICD9:55.69");


immune9=0;
retain immune9;
array diagnosisimmune{57} PrimaryDiag SecondaryDiag_1-SecondaryDiag_56 ;

do i= 1 to dim(diagnosisimmune);
if index(diagnosisimmune(i), diagnosis9) then immune9 =1;
end;

drop i;
run;

16 REPLIES 16
Astounding
PROC Star

The biggest thing to correct is your use of INDEX.  You have the parameters reversed.  The string to be searched comes first, and the string you are searching for comes second.

 

Secondarily, INDEX does not search for words.  It searches for strings.  So if one of your variables had a value of "28" or "CD9" it would be found.  That sounds like the wrong result.  There are functions that search for words instead of strings ... I think INDEXW and FINDW would be on the list but you might have to doublecheck that part.

Antarimoon
Calcite | Level 5

Thank you for your suggestion. I don't believe that the parameters are reversed in the index statement.  I want to search the 57 variables that I have entered in the array diagnosisimmune for the strings entered in the diagnosis9 variable. Each string is seperated by a space. I tried INDEXW but I still get all observations of immune9=0.

ballardw
Super User

Please look at this somewhat reduced version of what you are attempting with index and why it isn't going to work the way your code is structured. You are using the INDEXRESULTW form in the code below:

 

 data junk;
   x='one code';
   y='one code, two code';
   indexresultw= index(x,y);
   indexresultc= index(y,x);
   put indexresultw= indexresultc=;
run;

the LONG string goes first with Index and the short string second (if you expect to find the short string in the long string ever)

 

Astounding
PROC Star

You'll still need to switch the parameters.  Here's the documentation:

 

http://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a000212235.htm

 

After switching, note that your 57 elements must contain the full word.  This will match:

 

ICD9:136.3

 

Neither of these will match:

 

ICD9:136

136.3

Antarimoon
Calcite | Level 5

Thank you.  You were right.  Switching the parameters and a few other small changes to my code were all it took to get this to work.

sjarvis847
Fluorite | Level 6

It would be great if you could post the code that worked! I am running into the same issue and trying to start with your not working code then revise per the other peoples instructions has been a puzzle in itself. Any chance you'd be willing to share what worked for you?

sjarvis847
Fluorite | Level 6

Hello! I am attempting to search through 44 procedure codes (proc1-proc44) using an array to find specific codes related to a hip fracture (hipproc). I am using the following code and getting and error that the numeric variables are being converted to characters. I see that character variables can be used if you type in a $ after the subscript, but I still get the same error with that. Any suggestions: 

 

data gfp.main;
set gfp.main;
array procedure {43} $ proc1 - proc43;
do i = 1 to 43;
if procedure[i] = ('0QB4' or '0QB5' or '0QB6' or '0QB7' or '0QB8' or '0QB9' or '0QBB' or '0QBC' or '0SB9' or '0SBB') then hipproc= 1;
if hipproc NE 1 then hipproc= 0;
END;
RUN;

 

thanks so much for your help! 

Reeza
Super User

Please post this as a new question.

When you post it on an old thread, only people who previously replied to this thread will see the response. 

 

A hint for your answer use IN operator not =

https://www.tutorialspoint.com/sas/sas_operators.htm

ballewnick
Fluorite | Level 6

Hello,

 

I have a similar task that I am trying to perform that I was hoping you could also provide some insight on. I want to state up front that I am new to SAS (and this community) so I apologize if my question is not clear or if I am not asking it in the correct forum.

 

I have a data set with ~10 million records/rows. I want to check for the presence of any one of multiple strings across 9 of the variables in the data set. I then want to create a new variable (dis2) in my data set that will take the value of either 0 or 1. Dis2 should have a value of 1 if at least one of the 9 variables has any one of the strings and 0 if none of the 9 variables have any of the strings.

 

I created a macro statement to hold the strings I want to seach for:

%let dc2 = %str('28730', '28739', '2875','280', '281', '282', '283');

 

I then want to check each of these 9 varibales to see if any one of the strings are in any one of the variables:

c_dx1, c_dx2, c_dx3, c_dx4, c_dx5, c_dx6, c_dx7, c_dx8, c_hcpc

 

I was thinking that I could put the 9 variables in an array but I am not sure how to put everything together and create dis2. Any help is greatly appreciated. If you can provide example code of how to solve this problem that would be ideal. Something like:

 

%let dc2 = %str('28730', '28739', '2875','280', '281', '282', '283');

 

data example_w_dis2;

     set example;

     .......;

     .......;

 

Thanks!

 

Astounding
PROC Star

It's important to clarify what you are looking for.  If "280" is in c_dx1 does that mean that c_dx1 equals "280" (with maybe a few blanks at the end), or could either of these also be a match:

 

28055

12802

ballewnick
Fluorite | Level 6

Astounding,

 

Thanks for pointing this out. "280" should match strings that start with "280" that may then also have additional numbers in the fourth and fifth position. For example, "280", "2801", 2802" and "28041" should all be recognized as matches. Conversely, "12802" should not be a match.

Astounding
PROC Star

Thanks for clarifying.  Here's a straightforward way to approach the problem:

 

data want;

set have;

array diags {9} c_dx1-c_dx8 c_hcpc;

found = 0;

do _n_=1 to 9 until (found=1);

   if diags{_n_} in : (&dc2) then found=1;

end;

run;

 

Obviously, this depends on having set up a macro variable &DC2.  But you could just as easily hard-code the list of values within the program if necessary.

 

The key is the colon.  The comparison operator in : will find values that begin with the list of values within &DC2.

ballewnick
Fluorite | Level 6

Astounding,

 

Thanks! I created a macro with the strings in it I want to search for and then tried to include it in the data step. Can you please tell me if this looks right to you:

 

%macro check_gastro_hemg(dx);
(substr(&dx,1,4) in ('5310', '5312', '5314',
'5316', '5320', '5322', '5324', '5326', '5330',
'5332', '5334', '5336', '5340', '5342', '5344', '5346', '5693')) or
(&dx in ('53082', '53783', '53784', '56985', '56986')) or
(substr(&dx,1,3)='578') or
(substr(&dx,1,3)='535' and substr(&dx,5,1)='1')

%mend;

 

 

data want;

set have;

array diags {9} c_dx1-c_dx8 c_hcpc;

found = 0;

do _n_=1 to 9 until (found=1);

  if %check_gastro_hemg(diags) then then found=1;

end;

run;

Astounding
PROC Star

Just a couple of comments ... you'll have to do the testing because I can't (at least not this week).

 

You're trying to use "diags" to refer to one of the elements in the array ... rather than using, for example, diags{1} to refer to the first element.  That's the older style of arrays and can be done, but might require some changes to the program.  Things you might have to change ...

 

The array statement might have to remove {9} and just list the elements.  

 

The index for the array in the DO loop might have to use _i_ instead of _n_.

 

That should take care of the syntax.  However, if your data set is large, note that you are using SUBSTR a number of times.  Since you are examining the beginning of a string, it would be faster to use in : as in my original examples.  Here's what your macro could look like if speed is an issue:

 

%macro check_gastro_hemg(dx);

 

&dx in : ('5310', '5312', '5314',

'5316', '5320', '5322', '5324', '5326', '5330',
'5332', '5334', '5336', '5340', '5342', '5344', '5346', '5693',
'53082', '53783', '53784', '56985', '56986',
'578') or
(&dx =: '535' and substr(&dx,5,1)='1')

 

%mend;

 

The colon modifies character comparisons, by basing the comparison on whichever string is shorter.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 16 replies
  • 12042 views
  • 6 likes
  • 6 in conversation