- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I have files that include a variable that lists diagnosis codes associated with various medical specialties. I need to compare values from another file to determine if a given record included a diagnosis that matched the medical specialty. I've been copying and pasting the list directly into the syntax:
IF DX in ("03282","03640"....) THEN cardiology_flag = 1;
This works but is unwieldy and difficult to update. There are 10 specialties with 800-2500 codes each!
The list is stored in a file named as such: DXcodes_cardiology.sas7bdat. There are 2 variables in this file that contain valid codes, named DX_5A and DX_5B.
So how do I get SAS to look at a record in a file called DXcardiology._spec_cat.sas7bdat, and compare the the value stored in the variable DX to the list of codes in the variables DX_5A and DX_5B in DXcodes_cardiology.sas7bdat, and then give the variable cardiology_flag a value 1 if the code is present in the list, and a value of 0 if it's not present in the list?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
You can use proc format, or proc sql;
proc sql noprint;
select 'variable'
into :prefixlist
separated by ','
from table;
This will give you a macro variable &prefixlist
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Mark, I took what you provided and rewrote it a bit to work with my variable names and file names.
Proc SQL;
Select DX_5A, DX_5B
Into :DX_5Alist, :DX_5Blist
Separated by ","
From medicaid.DXcodes_cardiology;
Quit;
This works and gives me an output that looks like a crosstab frequency for the 2 variables. So as I'm trying to understand this, it looks like this is the 1st step of the process -- the values present in the variables assigned into a pair of macro variables. But then how do I compare the values in the macro variables to the values in the medical records in DXcardiology._spec_cat?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
If this isn't helpful provide an example, two datasets, of what you have and what you want.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Mark, I certainly like your approach, but I'm not sure what you mean by providing an example of the datasets. I don't see any way to upload them here, and even if I could, they have sensitive info and cannot be shared publicly. I tried to post some screenshots (with the sensitive variables removed) but the forum won't allow me to upload them. They're well below the 2MB filesize limit.
Ok, I was able to upload some screenshots. The "card list" file shows the list of diagnosis codes (variables DX_5A and DX_5B) that are associated with Cardiology. The "card claims" file shows actual patient records, including the diagnosis codes (variable DX) associated with each record.
The output file should show each record as it exists in the "card claims" file, except that there should be a flag variable (cardiology_flag) indicating whether the diagnosis code for that record is considered to be cardiology-related in the "card list" file.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Dummy data that mimics the behavior of your data is sufficient and only needs enough of the variables involved to test all the basic rules you need. A data step with an input and datalines is preferred.
For example in the post I had previously to make and use an informat from data I had to start with a dummy data as you hadn't really provided much detail. That should provide a skeleton of how to post code to create a dataset.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
So let's assume that you want to assign the variable C_CODE from CARD_LIST to every record in CARD_CLAIMS.
(Let's also assume that you have cleaned up C_CODE so that is does not contain both "7" and "'7'" as two different codes for the same thing.)
proc sql ;
create table want as
select a.*,b.c_code
from card_claim a
left join card_list b
on (a.dx is not null)
and ((a.dx = b.dx_5b) or (a.dx = b.dx_5a))
;
quit;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
That's not quite what I'm trying to get it to do, but I think it's close. Let me try to explain it again, just so we're all on the same page...
The variables DX_5A and DX_5B in "card_list" contain a list of all cardiology-related diagnosis codes.
So what I want SAS to do is look at the first record in "card_claims", find the value that is stored in the DX variable for that record, and then determine is that value is present ANYWHERE in the DX_5A and DX_5B variables in "card_list". If so, the record in "card_claims" should be retained, and if not, then it should be deleted. And then I need to repeat that process for all records in "card_claims".
So I think what you have above would work, but don't I want to use RIGHT JOIN? I only want to retain records that have a DX value that matches any of the values stored in the DX_5A and DX_5B variables in "card_list".
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Yes. So you have a special case of the general problem of putting the DX codes into categories.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I just realized there's a complication. One of the things we want to know is what diagnosis codes cardiologists frequently use that AREN'T cardiology diagnoses. Using the above approach, those non-cardiology DX codes would be lost.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
So you DO want the LEFT JOIN .
Sounds like CARD_CLAIM is the list of DX codes actually used and CARD_LIST is the list of "card" DX codes.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I highly recommend formats, especially if you already have it in a table. You can use the data set to create the format and then apply it to the dx field.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
This is how I understand your issue and a possibly solution:
/* dummy data set as I think your look up table looks like*/
data DXcodes_cardiology;
informat DX_5A DX_5B $10.;
input DX_5A DX_5B;
datalines;
10023 00456
156 34567
abc def
;
run;
/* reshape and prepare for use to create an informat*/
data cariocntl;
set DXcodes_cardiology end=last;
fmtname='CariologyFlag';
type='I';
label='1';
start= DX_5A; output;
start= DX_5B; output;
if last then do;
start='**OTHER**'; /* special indicator that values not listed treated this way*/
label='0'; /* this is zero, value for informat not listed valeus*/
HLO = 'O'; /* this is capital o and is instruction on how to treat others */
output;
end;
run;
/* create the format, best would be to place it in a permanent library and point the format search path to include that library*/
proc format library=work cntlin=cariocntl;
run;
/* and a quick example of how to use*/
data test;
input code $;
CariologyFlag = input(code,CariologyFlag.);
datalines;
10023
156
abc
00456
34567
def
211111
19045
pdq
;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Let's make up some data a try it.
data specialties ;
input dx $5. specialty $32. ;
cards;
03282Cardiology
03640Cardiology
run;
data cases;
input id dx $5. ;
cards;
1 03282
2 04567
;;;;
proc sql ;
create table want as
select a.*,b.specialty
from cases a left join specialties b
on a.dx = b.dx
;
quit;