BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
dwhitney
Calcite | Level 5

I am wondering how to code to allocate groups based on my data described below.

 

I have administrative claims data and am interested in comparing the incidence of an outcome between 2 groups: adults with CP (CP=1) and adults without CP (CP=0; controls). My data structure is in the long format and grouped by patid, date of encounter, many other variables, and an indicator for the CP group variable. Each row below represents a single claim. I already identified a CP diagnosis or not for each claim (1=claim for CP; 0=no claim for CP). The algorithm to identify individuals with CP is at least 2 claims for CP not on the same day. Since a CP diagnosis does not necessarily get flagged on the first date of encounter, I am unable to use the FIRST and LAST function. How do I code to get a single group variable with 3 levels: (1) 2+ claims for CP on different days; (2) no claim for CP (controls); and (3) only 1 claim for CP (will be used as a sensitivity analysis)?

 

Here is an example of what the data looks like. Patient 1 (patid) has CP; patient 2 does not have CP; and patient 3 has only 1 claim for CP.

 

patid    fst_dt                 CP

1          01/01/2016        0

1          02/03/2016        1

1          02/05/2016        0

1          02/23/2016        0

1          07/5/2016          1

1          07/17/2016        1

1          09/02/2016        0

2          02/04/2016        0

2          05/21/2016        0

2          10/25/2016        0

2          11/11/2016         0

3           01/04/2016       0

3          04/26/2016        1

3          04/28/2016        0

3          06/12/2016        0

 

Thank you!!

1 ACCEPTED SOLUTION

Accepted Solutions
andreas_lds
Jade | Level 19

Not sure, what you expect as result, this seems to solve the issue. I have added a fourth patient, please verify the result:

data work.have;
   length patid fst_dt cp 8;
   informat fst_dt mmddyy10.;
   format fst_dt date9.;
   input patid fst_dt CP;
datalines;
1 01/01/2016 0
1 02/03/2016 1
1 02/05/2016 0
1 02/23/2016 0
1 07/5/2016 1
1 07/17/2016 1
1 09/02/2016 0
4 01/01/2016 0
4 02/03/2016 1
4 02/03/2016 1
4 02/23/2016 0
4 07/5/2016 0
4 07/17/2016 1
4 09/02/2016 0
2 02/04/2016 0
2 05/21/2016 0
2 10/25/2016 0
2 11/11/2016 0
3 01/04/2016 0
3 04/26/2016 1
3 04/28/2016 0
3 06/12/2016 0
;
run;


data work.want;
   set work.have;
   by patid notsorted;
   
   length group 8 lastCPDate 8 cpCounter 8;
   retain lastCPDate cpCounter;
   format lastCPDate date9.;

   if first.patid then do;
      cpCounter = 0;
   end;

   if CP then do;
      cpCounter = cpCounter + (lastCPDate ^= fst_dt);
      lastCPDate = fst_dt;
   end;

   if last.patid then do;
      select;
         when (cpCounter = 1) group = 3;
         when (cpCounter > 1) group = 1;
         otherwise group = 2;
      end;
      output;
   end;

   keep patid group;
run;

View solution in original post

4 REPLIES 4
andreas_lds
Jade | Level 19

Is the variable fst_dt a date, dateime or string?

andreas_lds
Jade | Level 19

Not sure, what you expect as result, this seems to solve the issue. I have added a fourth patient, please verify the result:

data work.have;
   length patid fst_dt cp 8;
   informat fst_dt mmddyy10.;
   format fst_dt date9.;
   input patid fst_dt CP;
datalines;
1 01/01/2016 0
1 02/03/2016 1
1 02/05/2016 0
1 02/23/2016 0
1 07/5/2016 1
1 07/17/2016 1
1 09/02/2016 0
4 01/01/2016 0
4 02/03/2016 1
4 02/03/2016 1
4 02/23/2016 0
4 07/5/2016 0
4 07/17/2016 1
4 09/02/2016 0
2 02/04/2016 0
2 05/21/2016 0
2 10/25/2016 0
2 11/11/2016 0
3 01/04/2016 0
3 04/26/2016 1
3 04/28/2016 0
3 06/12/2016 0
;
run;


data work.want;
   set work.have;
   by patid notsorted;
   
   length group 8 lastCPDate 8 cpCounter 8;
   retain lastCPDate cpCounter;
   format lastCPDate date9.;

   if first.patid then do;
      cpCounter = 0;
   end;

   if CP then do;
      cpCounter = cpCounter + (lastCPDate ^= fst_dt);
      lastCPDate = fst_dt;
   end;

   if last.patid then do;
      select;
         when (cpCounter = 1) group = 3;
         when (cpCounter > 1) group = 1;
         otherwise group = 2;
      end;
      output;
   end;

   keep patid group;
run;
dwhitney
Calcite | Level 5

This works great! However, this results in a single line for each patid. This is fine, because I then merged this dataset with my other dataset in the long format, and the group variable populated correctly.

 

This is awesome! Thanks so much!