03-12-2016 10:36 PM
I'm working with a health insurance claims database and am trying to identify instances in which there was a drug claim for an antibiotic that occurred either on the same day as a claim with a diagnosis code of bronchitis or within the 3 days after of a claim with a diagnsois code of bronchitis. I am working with two tables in PROC SQL, the table antibiotics (which has all drug claims for antibiotics for people in my sample) and the table bronchitis, which has all claims with a diagnosis of bronchitis for people in my sample. If there was more than one claim with a diagnosis of bronchitis on the same day for a particular person, I consider that to be one instance of a bronchitis diagnosis, and I only want to allow a maximum of one antibiotic per instance of a bronchitis diagnosis.
Here is my code:
create table antibiotics_bronchitis as
select L.claimid, L.servicedate, L.drugname, R.claimid as claimid_bronchitis, R.servicedate as svcdate_bronchitis,
from work.antibiotics as L
work.bronchitis as R
where L.servicedate-3 <= servicedate_bronchitis <= L.servicedate
order by L.claimid;
Using this code, I get many observations with duplicate claimid when a particular drug claim (say one that occurred January 4, 2015) is matched with more than one claim with a diagnosis of bronchitis with a service date of January 1, January 2, January 3, or January 4 (e.g.,this would happen if there were two claims with a diagnosis of bronchitis of January 1, one claim with a diagnosis of bronchitis on January 1 and one claim with a diagnosis of bronchitis on January 2, etc.). Using the "distinct" term in the select statement doesn't do anything because observations with duplicate claimid will have different claimid_bronchitis. I could just use proc sort NODUPKEY by claimid after this code, but I was wondering if there is an alternative method within SQL. Thanks.
03-13-2016 12:13 AM
Untested, but what about having clause to equal the minimum difference? You probably will also need a group by.
L.service_date - r.bronchitis_date as dif,
Group by a.claimid
03-13-2016 01:31 PM
Thanks for the suggestion. This doesn't seem to achieve what I want to do. Perhaps this is not easy to achieve in PROC SQL and I just need to use an additional step. Are there are concerns with using
proc sort data = antibiotics_bronchitis nodupkey; by claimid; run;
This should eliminate any duplicate drug claims that happened to be associated with more than one claim with a diagnosis of bronchitis on the same day or in the previous three days.
03-13-2016 03:29 PM
03-14-2016 05:29 AM
With any solution here, you will need to work out a rule as to what observation from the joining table should be kept. Its difficult to say, maybe post some test data - a few data elements of table as a datastep - and what the output should look like - and specify why a certain rwo should be pulled out, this is the key part here. Once you know hwich record you want to pull out then you should be able to simply sub-query the input dataset to filter for this record.