How to compare two datasets and exclude IDs that are not included in o...

hkim30 · Posted 10-09-2017 05:51 PM

Hi. I am working on Medicare Claims data and want to ask a question about how to compare two datasets and exclude IDs that are not included in one of the file. There are some tricky components that I need to take into account, so I am so confused and don't know how to tackle this. Could anyone make any suggestions to approach this task or demonstrate codes to program this?

The details for my question are as follow:

I have a master file that has a list of my study sample. My sample is a group of patients (var: bene_id)
who died of cancer in 2013-2014. This dataset has a variable of death date (var: deat_dt).

AND additionally

I have three datasets for each year 2012 (dataset name: Lst_pcs12), 2013 (Lst_pcs13), and 2014 (Lst_pcs14) and each dataset has the list of those who have received at least one primary care service by primary care physicians, along with
service date (service_dt) for each line of primary care service claim.

What I have to do is to identify and exclude from my master file anyone who does not have received primary care service by
PCP. I think I will need to apply a rule that everyone should have at least one primary care service during the last year (12 months)
prior to death. Could anyone help me out? What should I try?

I am using SAS 9.3. I sincerely appreciate your help and attention to this inquiry! I am stuck after I created file for a subgroup so I apologize for not being able add my attempted code. Thank you so much!

Patrick · Posted 10-09-2017 11:01 PM

@hkim30

Append your 3 datasets into one. Keep only the last service date per patient.

Use a SQL inner join to combine your master with the service date DS and with a join condition along the line of below:

master.id=service.id and  0<=intck('month',service.dt_service,master.dt_death)<=12

Changed pseudo code as suggested by @mkeintz

mkeintz · Posted 10-10-2017 12:18 AM

@Patrick

I think you mean "0<=intck ...", not "0>=intck ...".

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

PGStats · Posted 10-10-2017 12:28 AM

@Patrick's suggestion could be implemented as:

data lst_pcs;
set lst_pcs12 lst_pcs13 lst_pcs14;
run;

proc sql;
create table want as
select 
	a.*,
	b.*
from
	master as a inner join
	(	select * 
		from lst_pcs 
		group by bene_id 
		having service_dt = max(service_dt)) as b 
			on 	a.bene_id = b.bene_id and 
				intck("MONTH", b.service_dt, a.deat_dt) between 0 and 12;
quit;

(untested)

PG

How to compare two datasets and exclude IDs that are not included in one of the file

Re: How to compare two datasets and exclude IDs that are not included in one of the file

Re: How to compare two datasets and exclude IDs that are not included in one of the file

Re: How to compare two datasets and exclude IDs that are not included in one of the file

How to compare two datasets and exclude IDs that are not included in one of the file

Re: How to compare two datasets and exclude IDs that are not included in one of the file

Re: How to compare two datasets and exclude IDs that are not included in one of the file

Re: How to compare two datasets and exclude IDs that are not included in one of the file

Click image to register for webinar

Classroom Training Available!