07-08-2015 11:35 AM
I have files that include a variable that lists diagnosis codes associated with various medical specialties. I need to compare values from another file to determine if a given record included a diagnosis that matched the medical specialty. I've been copying and pasting the list directly into the syntax:
IF DX in ("03282","03640"....) THEN cardiology_flag = 1;
This works but is unwieldy and difficult to update. There are 10 specialties with 800-2500 codes each!
The list is stored in a file named as such: DXcodes_cardiology.sas7bdat. There are 2 variables in this file that contain valid codes, named DX_5A and DX_5B.
So how do I get SAS to look at a record in a file called DXcardiology._spec_cat.sas7bdat, and compare the the value stored in the variable DX to the list of codes in the variables DX_5A and DX_5B in DXcodes_cardiology.sas7bdat, and then give the variable cardiology_flag a value 1 if the code is present in the list, and a value of 0 if it's not present in the list?
07-08-2015 11:54 AM
You can use proc format, or proc sql;
proc sql noprint;
separated by ','
This will give you a macro variable &prefixlist
07-13-2015 02:02 PM
Mark, I took what you provided and rewrote it a bit to work with my variable names and file names.
Select DX_5A, DX_5B
Into X_5Alist, X_5Blist
Separated by ","
This works and gives me an output that looks like a crosstab frequency for the 2 variables. So as I'm trying to understand this, it looks like this is the 1st step of the process -- the values present in the variables assigned into a pair of macro variables. But then how do I compare the values in the macro variables to the values in the medical records in DXcardiology._spec_cat?
07-09-2015 01:59 PM
Mark, I certainly like your approach, but I'm not sure what you mean by providing an example of the datasets. I don't see any way to upload them here, and even if I could, they have sensitive info and cannot be shared publicly. I tried to post some screenshots (with the sensitive variables removed) but the forum won't allow me to upload them. They're well below the 2MB filesize limit.
Ok, I was able to upload some screenshots. The "card list" file shows the list of diagnosis codes (variables DX_5A and DX_5B) that are associated with Cardiology. The "card claims" file shows actual patient records, including the diagnosis codes (variable DX) associated with each record.
The output file should show each record as it exists in the "card claims" file, except that there should be a flag variable (cardiology_flag) indicating whether the diagnosis code for that record is considered to be cardiology-related in the "card list" file.
07-09-2015 03:11 PM
Dummy data that mimics the behavior of your data is sufficient and only needs enough of the variables involved to test all the basic rules you need. A data step with an input and datalines is preferred.
For example in the post I had previously to make and use an informat from data I had to start with a dummy data as you hadn't really provided much detail. That should provide a skeleton of how to post code to create a dataset.
07-13-2015 02:27 PM
So let's assume that you want to assign the variable C_CODE from CARD_LIST to every record in CARD_CLAIMS.
(Let's also assume that you have cleaned up C_CODE so that is does not contain both "7" and "'7'" as two different codes for the same thing.)
proc sql ;
create table want as
from card_claim a
left join card_list b
on (a.dx is not null)
and ((a.dx = b.dx_5b) or (a.dx = b.dx_5a))
07-14-2015 12:11 PM
That's not quite what I'm trying to get it to do, but I think it's close. Let me try to explain it again, just so we're all on the same page...
The variables DX_5A and DX_5B in "card_list" contain a list of all cardiology-related diagnosis codes.
So what I want SAS to do is look at the first record in "card_claims", find the value that is stored in the DX variable for that record, and then determine is that value is present ANYWHERE in the DX_5A and DX_5B variables in "card_list". If so, the record in "card_claims" should be retained, and if not, then it should be deleted. And then I need to repeat that process for all records in "card_claims".
So I think what you have above would work, but don't I want to use RIGHT JOIN? I only want to retain records that have a DX value that matches any of the values stored in the DX_5A and DX_5B variables in "card_list".
07-14-2015 01:28 PM
I just realized there's a complication. One of the things we want to know is what diagnosis codes cardiologists frequently use that AREN'T cardiology diagnoses. Using the above approach, those non-cardiology DX codes would be lost.
07-08-2015 01:01 PM
I highly recommend formats, especially if you already have it in a table. You can use the data set to create the format and then apply it to the dx field.
07-08-2015 01:16 PM
This is how I understand your issue and a possibly solution:
/* dummy data set as I think your look up table looks like*/
informat DX_5A DX_5B $10.;
input DX_5A DX_5B;
/* reshape and prepare for use to create an informat*/
set DXcodes_cardiology end=last;
start= DX_5A; output;
start= DX_5B; output;
if last then do;
start='**OTHER**'; /* special indicator that values not listed treated this way*/
label='0'; /* this is zero, value for informat not listed valeus*/
HLO = 'O'; /* this is capital o and is instruction on how to treat others */
/* create the format, best would be to place it in a permanent library and point the format search path to include that library*/
proc format library=work cntlin=cariocntl;
/* and a quick example of how to use*/
input code $;
CariologyFlag = input(code,CariologyFlag.);
07-08-2015 02:43 PM
Let's make up some data a try it.
data specialties ;
input dx $5. specialty $32. ;
input id dx $5. ;
proc sql ;
create table want as
from cases a left join specialties b
on a.dx = b.dx