Solved: Data structure - Group allocation in long format for survival analyses

dwhitney · Posted 04-04-2019 08:05 AM

I am wondering how to code to allocate groups based on my data described below.

I have administrative claims data and am interested in comparing the incidence of an outcome between 2 groups: adults with CP (CP=1) and adults without CP (CP=0; controls). My data structure is in the long format and grouped by patid, date of encounter, many other variables, and an indicator for the CP group variable. Each row below represents a single claim. I already identified a CP diagnosis or not for each claim (1=claim for CP; 0=no claim for CP). The algorithm to identify individuals with CP is at least 2 claims for CP not on the same day. Since a CP diagnosis does not necessarily get flagged on the first date of encounter, I am unable to use the FIRST and LAST function. How do I code to get a single group variable with 3 levels: (1) 2+ claims for CP on different days; (2) no claim for CP (controls); and (3) only 1 claim for CP (will be used as a sensitivity analysis)?

Here is an example of what the data looks like. Patient 1 (patid) has CP; patient 2 does not have CP; and patient 3 has only 1 claim for CP.

patid fst_dt CP

1 01/01/2016 0

1 02/03/2016 1

1 02/05/2016 0

1 02/23/2016 0

1 07/5/2016 1

1 07/17/2016 1

1 09/02/2016 0

2 02/04/2016 0

2 05/21/2016 0

2 10/25/2016 0

2 11/11/2016 0

3 01/04/2016 0

3 04/26/2016 1

3 04/28/2016 0

3 06/12/2016 0

Thank you!!

andreas_lds · Posted 04-04-2019 08:36 AM

Not sure, what you expect as result, this seems to solve the issue. I have added a fourth patient, please verify the result:

data work.have;
   length patid fst_dt cp 8;
   informat fst_dt mmddyy10.;
   format fst_dt date9.;
   input patid fst_dt CP;
datalines;
1 01/01/2016 0
1 02/03/2016 1
1 02/05/2016 0
1 02/23/2016 0
1 07/5/2016 1
1 07/17/2016 1
1 09/02/2016 0
4 01/01/2016 0
4 02/03/2016 1
4 02/03/2016 1
4 02/23/2016 0
4 07/5/2016 0
4 07/17/2016 1
4 09/02/2016 0
2 02/04/2016 0
2 05/21/2016 0
2 10/25/2016 0
2 11/11/2016 0
3 01/04/2016 0
3 04/26/2016 1
3 04/28/2016 0
3 06/12/2016 0
;
run;


data work.want;
   set work.have;
   by patid notsorted;
   
   length group 8 lastCPDate 8 cpCounter 8;
   retain lastCPDate cpCounter;
   format lastCPDate date9.;

   if first.patid then do;
      cpCounter = 0;
   end;

   if CP then do;
      cpCounter = cpCounter + (lastCPDate ^= fst_dt);
      lastCPDate = fst_dt;
   end;

   if last.patid then do;
      select;
         when (cpCounter = 1) group = 3;
         when (cpCounter > 1) group = 1;
         otherwise group = 2;
      end;
      output;
   end;

   keep patid group;
run;

View solution in original post

andreas_lds · Posted 04-04-2019 08:13 AM

Is the variable fst_dt a date, dateime or string?

dwhitney · Posted 04-04-2019 08:29 AM

fst_dt is a date.

andreas_lds · Posted 04-04-2019 08:36 AM

Not sure, what you expect as result, this seems to solve the issue. I have added a fourth patient, please verify the result:

data work.have;
   length patid fst_dt cp 8;
   informat fst_dt mmddyy10.;
   format fst_dt date9.;
   input patid fst_dt CP;
datalines;
1 01/01/2016 0
1 02/03/2016 1
1 02/05/2016 0
1 02/23/2016 0
1 07/5/2016 1
1 07/17/2016 1
1 09/02/2016 0
4 01/01/2016 0
4 02/03/2016 1
4 02/03/2016 1
4 02/23/2016 0
4 07/5/2016 0
4 07/17/2016 1
4 09/02/2016 0
2 02/04/2016 0
2 05/21/2016 0
2 10/25/2016 0
2 11/11/2016 0
3 01/04/2016 0
3 04/26/2016 1
3 04/28/2016 0
3 06/12/2016 0
;
run;


data work.want;
   set work.have;
   by patid notsorted;
   
   length group 8 lastCPDate 8 cpCounter 8;
   retain lastCPDate cpCounter;
   format lastCPDate date9.;

   if first.patid then do;
      cpCounter = 0;
   end;

   if CP then do;
      cpCounter = cpCounter + (lastCPDate ^= fst_dt);
      lastCPDate = fst_dt;
   end;

   if last.patid then do;
      select;
         when (cpCounter = 1) group = 3;
         when (cpCounter > 1) group = 1;
         otherwise group = 2;
      end;
      output;
   end;

   keep patid group;
run;

dwhitney · Posted 04-04-2019 09:38 AM

This works great! However, this results in a single line for each patid. This is fine, because I then merged this dataset with my other dataset in the long format, and the group variable populated correctly.

This is awesome! Thanks so much!

Data structure - Group allocation in long format for survival analyses

Re: Data structure - Group allocation in long format for survival analyses

Re: Data structure - Group allocation in long format for survival analyses

Re: Data structure - Group allocation in long format for survival analyses

Re: Data structure - Group allocation in long format for survival analyses

Re: Data structure - Group allocation in long format for survival analyses

Data structure - Group allocation in long format for survival analyses

Re: Data structure - Group allocation in long format for survival analyses

Re: Data structure - Group allocation in long format for survival analyses

Re: Data structure - Group allocation in long format for survival analyses

Re: Data structure - Group allocation in long format for survival analyses

Re: Data structure - Group allocation in long format for survival analyses

Ready to join fellow brilliant minds for the SAS Hackathon?