Solved: Re: How do To keep only observations within 4 months of less of each o...

kbanders · Posted 02-01-2021 03:20 PM

Hello everyone!
I am working with a large dataset of medical claims trying to see how many of my patients are getting skin cancer treated within 4 months of diagnosis. So for each patient, I only want to keep observations that contain biopsied lesion code (1) and skin cancer code (2) if they are within 4 months of each other. Here’s an example of the data set:

Patient ID	Diagnosis	Date
1	1	2012/3
1	2	2012/5
1	2	2012/5
1	2	2012/5
1	1	2013/4
1	1	2014/5
1	1	2014/6
1	2	2014/10
2	1	2012/6
2	1	2013/6
2	1	2014/6
2	1	2014/8
2	1	2015/6
3	1	2013/5
3	1	2013/6
3	2	2013/8
3	1	2015/2
3	2	2015/8

The final data set would look something like this:

Patient ID	Diagnosis	Date
1	1	2012/3
1	2	2012/5
1	2	2012/5
1	2	2012/5
1	1	2014/6
1	2	2014/10
3	1	2013/5
3	1	2013/6
3	2	2013/8

Not all biopsied lesions are connected to a skin cancer, which is why I am having difficulty dropping unwanted observations.

If anyone could help write this code, I would really appreciate it! Thank you.

PGStats · Posted 02-01-2021 05:50 PM

This problem is more tractable if your claims have some sort of ID (i.e. there are no duplicates) Then you can use simple SQL, like this:

data have;
input ID  Diag   Date:anydtdte.;
claimId + 1; /* Make the claims distinct */
format date yymmdd10.;
datalines; 
1   1   2012/3
1   2   2012/5
1   2   2012/5
1   2   2012/5
1   1   2013/4
1   1   2014/5
1   1   2014/6
1   2   2014/10
2   1   2012/6
2   1   2013/6
2   1   2014/6
2   1   2014/8
2   1   2015/6
3   1   2013/5
3   1   2013/6
3   2   2013/8
3   1   2015/2
3   2   2015/8
;

proc sql;
create table want as
(select * from have as a where diag=1 and 
    exists (select * from have where diag=2 and id=a.id and intck("month", a.date, date) between 0 and 4))
union
(select * from have as b where diag=2 and 
    exists (select * from have where diag=1 and id=b.id and intck("month", date, b.date) between 0 and 4))
order by id, date, diag;
quit;

PG

View solution in original post

mkeintz · Posted 02-01-2021 05:30 PM

This is a good example for:

Reading each patient_id's data twice. The first time to record use the diagnosis/year/month in hand to record a list of acceptable combinations. If the current diagnosis is 1, then step forward to record acceptable combinations for diagnosis 2. If it is a 2, the step backward to record acceptable combinations of diagnosis 1.

The second pass through the patient_id compares the diagnosis/year/month in-hand to the recorded list of acceptable combinations.
Using a 3-dimensional array (called KEEP below) to record the keep dummies from #1 above. The dimensions would be diagnosis {1 through 2}, year {2012 through 2019} and month {1 through 12}.

data want (drop=_:);
  array keep {1:2,2012:2019,1:12} _temporary_;
  call missing(of keep{*});

  do until (last.patient_id);
    set have;
    by patient_id;

    /*if diagnosis=1 then the month increment is +1, otherwise it is -1 */
    _increment=ifn(diagnosis=1,1,-1); 
    /*if diagnosis=1 then set keep dummies for plane=2 (diagnosis 2), otherwise set for plane=1 */
    _plane=3-diagnosis;

    /* For each incoming record, set dummies for the complementary diagnosis */
    do _i=0 to 4;
      keep{_plane,year,month}=1;
      month=month+_increment;
      /* If month is out of range, the re-set it, and also increment the year */
      if not (1 <= month <= 12)  then do;
        year=year + _increment;
        if month=13 then month=1;
        else month=12;
      end;
    end;
  end;

  /* Now re-read the patient_id and output records with corresponding keep dummies */
  do until (last.patient_id);
    set have;
    by patient_id;
    if keep{diagnosis,year,month}=1 then output;
  end;
run;

You didn't provide the starting data in the format of a working data step, so I don't offer validation of this program.

Notice I set the lower and upper bounds of the 2nd dimension of the array to accommodate a year range from 2012 to 2019.

Edited addition:

I assume the data are sorted by ID (though not necessarily date year/month within id).

Although the data step reads each patient_id twice, it is NOT twice the amount of input disk activity, because the recently read observations will have been cached in memory when retrieved for the second pass.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

kbanders · Posted 02-11-2021 02:33 PM

Thank you both for your help!!

PGStats · Posted 02-01-2021 05:50 PM

This problem is more tractable if your claims have some sort of ID (i.e. there are no duplicates) Then you can use simple SQL, like this:

data have;
input ID  Diag   Date:anydtdte.;
claimId + 1; /* Make the claims distinct */
format date yymmdd10.;
datalines; 
1   1   2012/3
1   2   2012/5
1   2   2012/5
1   2   2012/5
1   1   2013/4
1   1   2014/5
1   1   2014/6
1   2   2014/10
2   1   2012/6
2   1   2013/6
2   1   2014/6
2   1   2014/8
2   1   2015/6
3   1   2013/5
3   1   2013/6
3   2   2013/8
3   1   2015/2
3   2   2015/8
;

proc sql;
create table want as
(select * from have as a where diag=1 and 
    exists (select * from have where diag=2 and id=a.id and intck("month", a.date, date) between 0 and 4))
union
(select * from have as b where diag=2 and 
    exists (select * from have where diag=1 and id=b.id and intck("month", date, b.date) between 0 and 4))
order by id, date, diag;
quit;

PG

How do To keep only observations within 4 months of less of each other

Re: How do To keep only observations within 4 months of less of each other

Re: How do To keep only observations within 4 months of less of each other

Re: How do To keep only observations within 4 months of less of each other

Re: How do To keep only observations within 4 months of less of each other

How do To keep only observations within 4 months of less of each other

Re: How do To keep only observations within 4 months of less of each other

Re: How do To keep only observations within 4 months of less of each other

Re: How do To keep only observations within 4 months of less of each other

Re: How do To keep only observations within 4 months of less of each other

SAS Innovate 2025: Save the Date

SAS Training: Just a Click Away