BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
JJ27
Fluorite | Level 6

Hi everyone!

 

I encountered a problem of excluding patients from claims data. My data is something like this:

IDdxfst_dtprior_dxinpatientstart_dt
112017-1-1112016-8-1
212018-1-1102016-1-1
212019-1-1102017-1-1
312017-2-1012015-2-1
312018-3-1102018-1-1
312019-1-2102018-12-1

 

My exclusion criteria is performing a 12 month lookback from fst_dt, if patients have at least one inpatient diagnosis (inpatient=1 and prior_dx=1) or patients have at least two outpatient diagnosis (inpatient=0 and priod_dx=1) then they will be excluded. Start_dt is the date of diagnosis of prior_dx. For example, patient ID 1 should be excluded, because this patient had an inpatient diagnosis within 12 months. Patient ID 2 should be included because the two outpatient diagnosis (inpatient=0) were before the 12 months interval. Similarly, patient 3 should be excluded.

 

So my output should be something like this:

IDdxfst_dtprior_dxinpatientstart_dt
212018-1-1102016-1-1
212019-1-1102017-1-1

 

I have tried multiple ways, but didn't work. Specifically, I need help with at least two outpatient diagnosis. 

Any help is greatly appreciated!!!!

 

JJ

1 ACCEPTED SOLUTION

Accepted Solutions
maguiremq
SAS Super FREQ

This is a really hacky way of doing it, in my opinion. I've tried doing some PROC SQL queries, but they end up being too complex, and that's probably due to my lack of understanding. Nonetheless, this provides the output that you want, but there's probably a more concise way out there.

 

data have;
infile datalines delimiter = "	";
input ID	dx	fst_dt :yymmdd10.	prior_dx	inpatient	start_dt :yymmdd10.;
format fst_dt start_dt yymmdd10.; 
datalines;
1	1	2017-1-1	1	1	2016-8-1
2	1	2018-1-1	1	0	2016-1-1
2	1	2019-1-1	1	0	2017-1-1
3	1	2017-2-1	0	1	2015-2-1
3	1	2018-3-1	1	0	2018-1-1
3	1	2019-1-2	1	0	2018-12-1
;
run;

data have_2;
	set have;
		month_diff = intck("month", start_dt, fst_dt, "c");
		if prior_dx = 1 and inpatient = 0 then op_flag = 1;
run;

proc means
	data = have_2 noprint nway;
		class id; /* by id */
		var op_flag; /* measure we want summed is op_flag */
		where month_diff < 12 and op_flag ~= .; /* records where month < 12 and op_flag not missing */
		output out = have_3 (drop = _:) /* output it into dataset have_3, drop automatic variables. */
			sum = ; /* sum the var op_flag */
run;

data 	want 
		do_not_want;
	merge 
			have_2 (in = a)
			have_3 (in = b where = (op_flag >= 2)); /* subsetting in case sum of op_flag not greater than 2. */
	by id;

		if a and b then output do_not_want; /* These are those records where month_diff < 12 and op_flag >= 2 */
			else if inpatient = 1 and month_diff < 12 then output do_not_want; /* This is our inpatient record with month_diff < 12 */
				else output want; /* These are our want records. */
run;
ID dx fst_dt prior_dx inpatient start_dt month_diff op_flag 
2 1 2018-01-01 1 0 2016-01-01 24 1 
2 1 2019-01-01 1 0 2017-01-01 24 1 

View solution in original post

8 REPLIES 8
ChrisNZ
Tourmaline | Level 20

patient ID 1 should be excluded, because this patient had an inpatient diagnosis within 12 months

Where? There is only one record for ID=1

 

Hi everyone!

 

I encountered a problem of excluding patients from claims data. My data is something like this:

ID dx fst_dt prior_dx inpatient start_dt
1 1 2017-1-1 1 1 2016-8-1
2 1 2018-1-1 1 0 2016-1-1
2 1 2019-1-1 1 0 2017-1-1
3 1 2017-2-1 0 1 2015-2-1
3 1 2018-3-1 1 0 2018-1-1
3 1 2019-1-2 1 0 2018-12-1

 

My exclusion criteria is performing a 12 month lookback from fst_dt, if patients have at least one inpatient diagnosis (inpatient=1 and prior_dx=1) or patients have at least two outpatient diagnosis (inpatient=0 and priod_dx=1) then they will be excluded. Start_dt is the date of diagnosis of prior_dx. For example, patient ID 1 should be excluded, because this patient had an inpatient diagnosis within 12 months. Patient ID 2 should be included because the two outpatient diagnosis (inpatient=0) were before the 12 months interval. Similarly, patient 3 should be excluded.

 

So my output should be something like this:

ID dx fst_dt prior_dx inpatient start_dt
2 1 2018-1-1 1 0 2016-1-1
2 1 2019-1-1 1 0 2017-1-1

 

I have tried multiple ways, but didn't work. Specifically, I need help with at least two outpatient diagnosis. 

Any help is greatly appreciated!!!!

 

JJ

JJ27
Fluorite | Level 6

Hey ChrisNZ, 

 

Patient ID 1 only had one record. I'm comparing the start_dt with fst_dt to see if the difference <12 months, if yes, then exclude, otherwise keep. start_dt is the diagnosis date for prioir_dx. fst_dt is the diagnosis date for dx. In this case, 2016-8-1 is within the 12-month window of 2017-1-1, so patient ID 1 should be dropped.

 

Hope it makes sense!

 

Thank you

 

maguiremq
SAS Super FREQ

Hey @JJ27,

 

A couple things on top of what @ChrisNZ said:

 

(1) How are your dates formatted? Is that year-month-day, or is it year-day-month?

(2) Please post it in reproducible form so that we can answer the question quicker than having to make the example ourselves. This is what I'm talking about:

 

data have;
infile datalines delimiter = "	";
input ID	dx	fst_dt :yymmdd10.	prior_dx	inpatient	start_dt :yymmdd10.;
format fst_dt start_dt yymmdd10.; 
datalines;
1	1	2017-1-1	1	1	2016-8-1
2	1	2018-1-1	1	0	2016-1-1
2	1	2019-1-1	1	0	2017-1-1
3	1	2017-2-1	0	1	2015-2-1
3	1	2018-3-1	1	0	2018-1-1
3	1	2019-1-2	1	0	2018-12-1
;
run;

This allows us to work on it immediately instead of having to create it ourselves. 

 

It's also good practice to provide us with your "want" dataset in the same way:

 

data want;
infile datalines delimiter = "	";
input ID	dx	fst_dt :yymmdd10.	prior_dx	inpatient	start_dt :yymmdd10.;
format fst_dt start_dt yymmdd10.; 
datalines;
2	1	2018-1-1	1	0	2016-1-1
2	1	2019-1-1	1	0	2017-1-1
;
run;

Until you answer @ChrisNZ 's question, I don't know how to move forward. 

 

After that, we should be able to figure it out pretty quickly. Thanks!

JJ27
Fluorite | Level 6

Hey @maguiremq 

 

Thank you so much for doing this! i'm new to the community, will definitely make the reproducible form myself next time. And my date is in year-month-day format, you are correct!

 

So as I replied to @ChrisNZ , Patient ID 1 only had one record. I'm comparing the start_dt with fst_dt to see if the difference <12 months, if yes, then exclude, otherwise keep. start_dt is the diagnosis date for prioir_dx. fst_dt is the diagnosis date for dx. In this case, 2016-8-1 is within the 12-month window of 2017-1-1, so patient ID 1 should be dropped. 

 

Patient ID 1 only has inpatient diagnosis (inpatient=1) so the exclusion criteria is at least one prior diagnosis within 12 months. But for patient ID 2, he/she had outpatient diagnosis (inpatient=0) only, the exclusion criteria is at least two prior diagnosis within 12 months. So it's keeped. 

 

Thank you! Looking forward to your response!

 

 

 

maguiremq
SAS Super FREQ

This is a really hacky way of doing it, in my opinion. I've tried doing some PROC SQL queries, but they end up being too complex, and that's probably due to my lack of understanding. Nonetheless, this provides the output that you want, but there's probably a more concise way out there.

 

data have;
infile datalines delimiter = "	";
input ID	dx	fst_dt :yymmdd10.	prior_dx	inpatient	start_dt :yymmdd10.;
format fst_dt start_dt yymmdd10.; 
datalines;
1	1	2017-1-1	1	1	2016-8-1
2	1	2018-1-1	1	0	2016-1-1
2	1	2019-1-1	1	0	2017-1-1
3	1	2017-2-1	0	1	2015-2-1
3	1	2018-3-1	1	0	2018-1-1
3	1	2019-1-2	1	0	2018-12-1
;
run;

data have_2;
	set have;
		month_diff = intck("month", start_dt, fst_dt, "c");
		if prior_dx = 1 and inpatient = 0 then op_flag = 1;
run;

proc means
	data = have_2 noprint nway;
		class id; /* by id */
		var op_flag; /* measure we want summed is op_flag */
		where month_diff < 12 and op_flag ~= .; /* records where month < 12 and op_flag not missing */
		output out = have_3 (drop = _:) /* output it into dataset have_3, drop automatic variables. */
			sum = ; /* sum the var op_flag */
run;

data 	want 
		do_not_want;
	merge 
			have_2 (in = a)
			have_3 (in = b where = (op_flag >= 2)); /* subsetting in case sum of op_flag not greater than 2. */
	by id;

		if a and b then output do_not_want; /* These are those records where month_diff < 12 and op_flag >= 2 */
			else if inpatient = 1 and month_diff < 12 then output do_not_want; /* This is our inpatient record with month_diff < 12 */
				else output want; /* These are our want records. */
run;
ID dx fst_dt prior_dx inpatient start_dt month_diff op_flag 
2 1 2018-01-01 1 0 2016-01-01 24 1 
2 1 2019-01-01 1 0 2017-01-01 24 1 
JJ27
Fluorite | Level 6

THANK YOU!

tarheel13
Rhodochrosite | Level 12

if you want a SQL solution, here is one way. 

data want;
	set have;
	where fst_dt >= start_dt + 365;
	IP=inpatient=1 and prior_dx=1;
	OP=inpatient=0 and prior_dx=1;
run;

proc sql;
	create table want2 as 
	select id, sum(IP) as IPcount, sum(OP) as OPcount
		from want
		group by ID
		having IPcount ge 1 or OPcount ge 2;
		
	create table want3 as 
	select h.*
		from have as h
		right join want2 as w2
		on h.id=w2.id;
quit;
Tom
Super User Tom
Super User

Your data structure is confusing.

Shouldn't you have one dataset with the diagnosis dates and a second dataset with the dates you want the test whether or not there were prior diagnoses?

data have;
  input ID dx fst_dt :yymmdd. prior_dx inpatient start_dt :yymmdd.;
  format fst_dt start_dt yymmdd10.;
cards;
1 1 2017-1-1 1 1 2016-8-1
2 1 2018-1-1 1 0 2016-1-1
2 1 2019-1-1 1 0 2017-1-1
3 1 2017-2-1 0 1 2015-2-1
3 1 2018-3-1 1 0 2018-1-1
3 1 2019-1-2 1 0 2018-12-1
;

data diag ;
  set have;
  where dx=1;
  keep id start_dt inpatient;
run;

data visits; 
  set have;
  keep id fst_dt;
run;

Now the question is to join the two and calculate the count of diagnoses and inpatient diagnoses within the time window.  Not sure what measure you are planning for 12 months.  Here is code that uses 365 days.  Some people standardize to 30 day months so would use 360 days instead.  You could also use INTCK() to test for one year or 12 month intervals.

proc sql ;
  create table want as 
    select distinct 
           a.id
         , a.fst_dt
         , count(inpatient) as n_dx
         , sum( inpatient) as n_ip_dx
         , calculated n_dx>1 or calculated n_ip_dx>0 as exclude
    from visits a
    left join diag b
      on a.id = b.id
      and (0 < (fst_dt - start_dt) <= 365)
    group by a.id, s.fst_dt
  ;
quit;

So the EXCLUDE flag is what you could use to decide whether or not to include this subject.

Obs    ID        fst_dt    n_dx    n_ip_dx    exclude

 1      1    2017-01-01      1        1          1
 2      2    2018-01-01      1        0          0
 3      2    2019-01-01      1        0          0
 4      3    2017-02-01      2        0          1
 5      3    2018-03-01      2        0          1
 6      3    2019-01-02      2        0          1

You probably will want to then further reduce this to just the minimum date  for each ID.

data cases;
  set want;
  by id;
  where not exclude ;
  if first.id;
run;

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 8 replies
  • 1165 views
  • 0 likes
  • 5 in conversation