BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
kbanders
Calcite | Level 5

Hello everyone!
I am working with a large dataset of medical claims trying to see how many of my patients are getting skin cancer treated within 4 months of diagnosis. So for each patient, I only want to keep observations that contain biopsied lesion code (1) and skin cancer code (2) if they are within 4 months of each other. Here’s an example of the data set:

 

Patient IDDiagnosis Date 
112012/3
122012/5
122012/5
122012/5
112013/4
112014/5
112014/6
122014/10
212012/6
212013/6
212014/6
212014/8
212015/6
312013/5
312013/6
322013/8
312015/2
322015/8

 

The final data set would look something like this:

Patient IDDiagnosis Date 
112012/3
122012/5
122012/5
122012/5
112014/6
122014/10
312013/5
312013/6
322013/8

 

Not all biopsied lesions are connected to a skin cancer, which is why I am having difficulty dropping unwanted observations. 

If anyone could help write this code, I would really appreciate it! Thank you. 

1 ACCEPTED SOLUTION

Accepted Solutions
PGStats
Opal | Level 21

This problem is more tractable if your claims have some sort of ID (i.e. there are no duplicates) Then you can use simple SQL, like this:

 

data have;
input ID  Diag   Date:anydtdte.;
claimId + 1; /* Make the claims distinct */
format date yymmdd10.;
datalines; 
1   1   2012/3
1   2   2012/5
1   2   2012/5
1   2   2012/5
1   1   2013/4
1   1   2014/5
1   1   2014/6
1   2   2014/10
2   1   2012/6
2   1   2013/6
2   1   2014/6
2   1   2014/8
2   1   2015/6
3   1   2013/5
3   1   2013/6
3   2   2013/8
3   1   2015/2
3   2   2015/8
;

proc sql;
create table want as
(select * from have as a where diag=1 and 
    exists (select * from have where diag=2 and id=a.id and intck("month", a.date, date) between 0 and 4))
union
(select * from have as b where diag=2 and 
    exists (select * from have where diag=1 and id=b.id and intck("month", date, b.date) between 0 and 4))
order by id, date, diag;
quit;
PG

View solution in original post

3 REPLIES 3
mkeintz
PROC Star

This is a good example for:

  1. Reading each patient_id's data twice.  The first time to record use the diagnosis/year/month in hand to record a list of acceptable combinations.  If the current diagnosis is 1, then step forward to record acceptable combinations for diagnosis 2.  If it is a 2, the step backward to record acceptable combinations of diagnosis 1.

    The second pass through the patient_id compares the diagnosis/year/month in-hand to the recorded list of acceptable combinations.

  2. Using a 3-dimensional array (called KEEP below) to record the keep dummies from #1 above.  The dimensions would be diagnosis {1 through 2}, year {2012 through 2019} and month {1 through 12}.

 

data want (drop=_:);
  array keep {1:2,2012:2019,1:12} _temporary_;
  call missing(of keep{*});

  do until (last.patient_id);
    set have;
    by patient_id;

    /*if diagnosis=1 then the month increment is +1, otherwise it is -1 */
    _increment=ifn(diagnosis=1,1,-1); 
    /*if diagnosis=1 then set keep dummies for plane=2 (diagnosis 2), otherwise set for plane=1 */
    _plane=3-diagnosis;

    /* For each incoming record, set dummies for the complementary diagnosis */
    do _i=0 to 4;
      keep{_plane,year,month}=1;
      month=month+_increment;
/* If month is out of range, the re-set it, and also increment the year */ if not (1 <= month <= 12) then do; year=year + _increment; if month=13 then month=1; else month=12; end; end; end; /* Now re-read the patient_id and output records with corresponding keep dummies */ do until (last.patient_id); set have; by patient_id; if keep{diagnosis,year,month}=1 then output; end; run;

You didn't provide the starting data in the format of a working data step, so I don't offer validation of this program.

 

Notice I set the lower and upper bounds of the 2nd dimension of the array to accommodate a year range from 2012 to 2019.

 

Edited addition:

I assume the data are sorted by ID  (though not necessarily date year/month within id).

 

Although the data step reads each patient_id twice, it is NOT twice the amount of input disk activity, because the recently read observations will have been cached in memory when retrieved for the second pass.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
kbanders
Calcite | Level 5

Thank you both for your help!!

PGStats
Opal | Level 21

This problem is more tractable if your claims have some sort of ID (i.e. there are no duplicates) Then you can use simple SQL, like this:

 

data have;
input ID  Diag   Date:anydtdte.;
claimId + 1; /* Make the claims distinct */
format date yymmdd10.;
datalines; 
1   1   2012/3
1   2   2012/5
1   2   2012/5
1   2   2012/5
1   1   2013/4
1   1   2014/5
1   1   2014/6
1   2   2014/10
2   1   2012/6
2   1   2013/6
2   1   2014/6
2   1   2014/8
2   1   2015/6
3   1   2013/5
3   1   2013/6
3   2   2013/8
3   1   2015/2
3   2   2015/8
;

proc sql;
create table want as
(select * from have as a where diag=1 and 
    exists (select * from have where diag=2 and id=a.id and intck("month", a.date, date) between 0 and 4))
union
(select * from have as b where diag=2 and 
    exists (select * from have where diag=1 and id=b.id and intck("month", date, b.date) between 0 and 4))
order by id, date, diag;
quit;
PG

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 3 replies
  • 725 views
  • 2 likes
  • 3 in conversation