I'm working with a health insurance claims database and am trying to identify instances in which there was a drug claim for an antibiotic that occurred either on the same day as a claim with a diagnosis code of bronchitis or within the 3 days after of a claim with a diagnsois code of bronchitis. I am working with two tables in PROC SQL, the table antibiotics (which has all drug claims for antibiotics for people in my sample) and the table bronchitis, which has all claims with a diagnosis of bronchitis for people in my sample. If there was more than one claim with a diagnosis of bronchitis on the same day for a particular person, I consider that to be one instance of a bronchitis diagnosis, and I only want to allow a maximum of one antibiotic per instance of a bronchitis diagnosis.
Here is my code:
proc sql;
create table antibiotics_bronchitis as
select L.claimid, L.servicedate, L.drugname, R.claimid as claimid_bronchitis, R.servicedate as svcdate_bronchitis,
from work.antibiotics as L
inner join
work.bronchitis as R
on L.enrolid=R.enrolid
where L.servicedate-3 <= servicedate_bronchitis <= L.servicedate
order by L.claimid;
quit;
Using this code, I get many observations with duplicate claimid when a particular drug claim (say one that occurred January 4, 2015) is matched with more than one claim with a diagnosis of bronchitis with a service date of January 1, January 2, January 3, or January 4 (e.g.,this would happen if there were two claims with a diagnosis of bronchitis of January 1, one claim with a diagnosis of bronchitis on January 1 and one claim with a diagnosis of bronchitis on January 2, etc.). Using the "distinct" term in the select statement doesn't do anything because observations with duplicate claimid will have different claimid_bronchitis. I could just use proc sort NODUPKEY by claimid after this code, but I was wondering if there is an alternative method within SQL. Thanks.
Untested, but what about having clause to equal the minimum difference? You probably will also need a group by.
L.service_date - r.bronchitis_date as dif,
...
...
Group by a.claimid
...
...
having dif=min(dif)
Thanks for the suggestion. This doesn't seem to achieve what I want to do. Perhaps this is not easy to achieve in PROC SQL and I just need to use an additional step. Are there are concerns with using
proc sort data = antibiotics_bronchitis nodupkey; by claimid; run;
This should eliminate any duplicate drug claims that happened to be associated with more than one claim with a diagnosis of bronchitis on the same day or in the previous three days.
Hi,
With any solution here, you will need to work out a rule as to what observation from the joining table should be kept. Its difficult to say, maybe post some test data - a few data elements of table as a datastep - and what the output should look like - and specify why a certain rwo should be pulled out, this is the key part here. Once you know hwich record you want to pull out then you should be able to simply sub-query the input dataset to filter for this record.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.