BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
Wolverine
Quartz | Level 8

I was given an Excel file with 23 tabs.  Each of these tabs includes a list of diagnosis and/or medical procedure codes that represent a given medical condition.  Many of these codes are stems, so a code such as 0WJG may in fact represent 0WJG1, 0WJG2, 0WJGA, etc. I need to match these codes to patient data files to flag patients who have these conditions.  These data files have 37 diagnosis variables and 26 procedure variables. 

 

I've done similar work in the past, but with 2 important differences -- all the codes were completely listed (ie, no code stems), and there were only a few DX/proc variables in the patient data.  So I would import the Excel tabs, create a flag variable for each list of codes from each tab, and then combine those tabs into a single SAS file. Then I used PROC SQL to merge the data files with the flag file on DX_code OR proc_code.  Then I took the max of each flag variable, grouped by patient, to identify all patients who had at least 1 record with that flag.

 

The current situation is more complicated.  I have syntax that works, but it's tedious to program.  I will be using this data quite a bit in the future, so I'd like to find a better way.  I manually copied the codes from each tab and created list variables.  I then used arrays to match across multiple variables, using the in: operator to match codes to code stems.  Below is a portion of this syntax for 2 of the code lists.

 

%let Surg_mgmt_other_list = ("49320","49000","58578","0WJG","0WJG","0UQ9");/*DX and proc*/	
%let Cervical_inj_list = ("57720","0UQC");/*DX and proc*/

DATA want; SET have;
 array dx414 PRINCIPAL_DIAGNOSIS_CODE DIAGNOSIS_CODE_1-DIAGNOSIS_CODE_36;
  do index=1 to dim(dx414) until(Surg_mgmt_other_indc);
    Surg_mgmt_other_indc= dx414[index] in: &Surg_mgmt_other_list. ; 
end;
 array pr414 PROCEDURE_CODE ICD_PROCEDURE_CODE_1-ICD_PROCEDURE_CODE_25;
  do index=1 to dim(pr414) until(Surg_mgmt_other_indc);
    Surg_mgmt_other_indc= pr414[index] in: &Surg_mgmt_other_list. ; 
end;
IF Surg_mgmt_other_indc ^= 1 THEN Surg_mgmt_other_indc = 0;

 array dx415 PRINCIPAL_DIAGNOSIS_CODE DIAGNOSIS_CODE_1-DIAGNOSIS_CODE_36;
  do index=1 to dim(dx415) until(Cervical_inj_indc);
    Cervical_inj_indc= dx415[index] in: &Cervical_inj_list. ; 
end;
 array pr415 PROCEDURE_CODE ICD_PROCEDURE_CODE_1-ICD_PROCEDURE_CODE_25;
  do index=1 to dim(pr415) until(Cervical_inj_indc);
    Cervical_inj_indc= pr415[index] in: &Cervical_inj_list. ; 
end;
IF Cervical_inj_indc ^= 1 THEN Cervical_inj_indc = 0;
1 ACCEPTED SOLUTION

Accepted Solutions
Reeza
Super User
proc sql noprint;
select quote(dx) into :surg_mgmt_other_list separated by ", "
from excel_import1;
quit;

proc sql noprint;
select quote(dx) into :cervical_inj_list separated by ", "
from excel_import2;
quit;


DATA want; 
SET have;
 
array dx_codes PRINCIPAL_DIAGNOSIS_CODE DIAGNOSIS_CODE_1-DIAGNOSIS_CODE_36;
 array proc_codes PROCEDURE_CODE ICD_PROCEDURE_CODE_1-ICD_PROCEDURE_CODE_25;

Surg_mgmt_other_indc=0;
Cervical_inj_indc=0;

  do index=1 to dim(dx_codes );
    if dx_codes[index] in: (&Surg_mgmt_other_list. ) then Surg_mgmt_other_indc=1;
    if dx_codes[index] in: (&Cervical_inj_list.) then Cervical_inj_indc=1;
end;



  do index=1 to dim(proc_codes);
    if proc_codes[index] in: (&Surg_mgmt_other_list.) then Surg_mgmt_other_indc= 1 ; 
    if  proc_codes[index] in: (&Cervical_inj_list.) then Cervical_inj_indc=1; 
end;

run;

 

You're correct that would have coded the variables incorrectly as it's an assignment, try this approach instead.

View solution in original post

9 REPLIES 9
Reeza
Super User

The current situation is more complicated.  I have syntax that works, but it's tedious to program.  I will be using this data quite a bit in the future, so I'd like to find a better way.  I manually copied the codes from each tab and created list variables.  I then used arrays to match across multiple variables, using the in: operator to match codes to code stems.  Below is a portion of this syntax for 2 of the code lists.

 

I would replace this step with a similar process to what you had before. Import the data and create the macro lists using a SQL step. 

 

I don't think this will scale well, but another option is to flip your diag list to a long format and merge with the other lists, identify the flags and merge back the flags with the main data set.

You would need to replicate the process with the proc list. Since both of these lists usually have a lot of empty fields this isn't too bad of a process.

 

It's a somewhat easier process but I can understand the desire to keep it all in a single data step using the codes above. 

You can simplify your code a bit as follows, declaring arrays only once, two loops rather than 4. You could add back a more complex UNTIL statement checking both or just let it loop through the full list - I'd probably let it loop through all. Untested as no data. 

 

proc sql noprint;
select quote(dx) into :surg_mgmt_other_list separated by ", "
from excel_import1;
quit;

proc sql noprint;
select quote(dx) into :cervical_inj_list separated by ", "
from excel_import2;
quit;


DATA want; 
SET have;
 
array dx_codes PRINCIPAL_DIAGNOSIS_CODE DIAGNOSIS_CODE_1-DIAGNOSIS_CODE_36;
 array proc_codes PROCEDURE_CODE ICD_PROCEDURE_CODE_1-ICD_PROCEDURE_CODE_25;

Surg_mgmt_other_indc=0;
Cervical_inj_indc=0;

  do index=1 to dim(dx_codes );
    Surg_mgmt_other_indc= dx_codes[index] in: (&Surg_mgmt_other_list. ); 
    Cervical_inj_indc= dx_codes[index] in: (&Cervical_inj_list.) ; 
end;



  do index=1 to dim(proc_codes) until(Surg_mgmt_other_indc=1 and Cervical_inj_indc=1);
    Surg_mgmt_other_indc= proc_codes[index] in: (&Surg_mgmt_other_list.) ; 
    Cervical_inj_indc= proc_codes[index] in: (&Cervical_inj_list.); 
end;

HTH

 

pink_poodle
Barite | Level 11
This makes me think of hash objects. I think this can be done nicely by loading the xcel worksheets via datasets into hash lookup tables. The only hurdle is stem matching. I think it can be overcome by creating a stem variable for the patients that is a substring of the original longer ICD code. This stem on the subjects data set can be matched to diagnosis using hashes. How is a hash better than merging? It is an alternative that seems more fun and flexible: https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.5/lrcon/n1b4cbtmb049xtn1vh9x4waiioz4.htm
Wolverine
Quartz | Level 8

@Reeza wrote:

I would replace this step with a similar process to what you had before. Import the data and create the macro lists using a SQL step. 

 

You can simplify your code a bit as follows, declaring arrays only once, two loops rather than 4. You could add back a more complex UNTIL statement checking both or just let it loop through the full list - I'd probably let it loop through all. Untested as no data. 


This looks promising!  It would eliminate the most tedious part of the job -- manually copying the codes and formatting them into lists.  I use a macro to import the tabs from the Excel file, so I would only have to add the list-creating proc sql once. Something like this:

proc sql noprint;
select quote(dx) into :&sheet._list separated by ", "
from &sheet.;
quit;

And simplifying the do loops will save a lot of lines of code.  I also like that it's similar to the existing syntax that I already understand.  I will start implementing these changes and let u know how it works.  Thanks!😁

 

Wolverine
Quartz | Level 8

Some other tasks came up, but I finally got a chance to get back to this, and unfortunately, the array part of the syntax doesn't work properly.  All but one of the _indc variables = 0 for every record in the data file.  The only exception came up with 54 records that = 1 (out of ~3.5M records).  In any event, the frequencies don't match the frequencies from my original approach.

 

Could the proc code array be overwriting the DX array?  I just don't understand arrays well enough to know what it's doing at each step.

 

 

Reeza
Super User

Check that the macro variables are being created the same. The arrays have no change except you're not declaring them multiple points of time.

Wolverine
Quartz | Level 8

I haven't updated that part of the syntax yet, so the _list variables are exactly the same.  I'm still using the manually-created hardcoded lists

Reeza
Super User
proc sql noprint;
select quote(dx) into :surg_mgmt_other_list separated by ", "
from excel_import1;
quit;

proc sql noprint;
select quote(dx) into :cervical_inj_list separated by ", "
from excel_import2;
quit;


DATA want; 
SET have;
 
array dx_codes PRINCIPAL_DIAGNOSIS_CODE DIAGNOSIS_CODE_1-DIAGNOSIS_CODE_36;
 array proc_codes PROCEDURE_CODE ICD_PROCEDURE_CODE_1-ICD_PROCEDURE_CODE_25;

Surg_mgmt_other_indc=0;
Cervical_inj_indc=0;

  do index=1 to dim(dx_codes );
    if dx_codes[index] in: (&Surg_mgmt_other_list. ) then Surg_mgmt_other_indc=1;
    if dx_codes[index] in: (&Cervical_inj_list.) then Cervical_inj_indc=1;
end;



  do index=1 to dim(proc_codes);
    if proc_codes[index] in: (&Surg_mgmt_other_list.) then Surg_mgmt_other_indc= 1 ; 
    if  proc_codes[index] in: (&Cervical_inj_list.) then Cervical_inj_indc=1; 
end;

run;

 

You're correct that would have coded the variables incorrectly as it's an assignment, try this approach instead.

Wolverine
Quartz | Level 8
This works and provides the same exact frequencies as the original programming. Thanks!
Tom
Super User Tom
Super User

There is no need to define multiple arrays that contain the exact same set of variables.

array dx414 PRINCIPAL_DIAGNOSIS_CODE DIAGNOSIS_CODE_1-DIAGNOSIS_CODE_36;
array dx415 PRINCIPAL_DIAGNOSIS_CODE DIAGNOSIS_CODE_1-DIAGNOSIS_CODE_36;

array pr414 PROCEDURE_CODE ICD_PROCEDURE_CODE_1-ICD_PROCEDURE_CODE_25;
array pr415 PROCEDURE_CODE ICD_PROCEDURE_CODE_1-ICD_PROCEDURE_CODE_25;

Just make one array for each type and then re-use those same arrays when searching for other indications.

array dx_codes PRINCIPAL_DIAGNOSIS_CODE DIAGNOSIS_CODE_1-DIAGNOSIS_CODE_36;
array pr_codes PROCEDURE_CODE ICD_PROCEDURE_CODE_1-ICD_PROCEDURE_CODE_25;

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 9 replies
  • 1066 views
  • 1 like
  • 4 in conversation