BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
dwhitney
Calcite | Level 5

I am wondering how to code to allocate groups based on my data described below.

 

I have administrative claims data and am interested in comparing the incidence of an outcome between 2 groups: adults with CP (CP=1) and adults without CP (CP=0; controls). My data structure is in the long format and grouped by patid, date of encounter, many other variables, and an indicator for the CP group variable. Each row below represents a single claim. I already identified a CP diagnosis or not for each claim (1=claim for CP; 0=no claim for CP). The algorithm to identify individuals with CP is at least 2 claims for CP not on the same day. Since a CP diagnosis does not necessarily get flagged on the first date of encounter, I am unable to use the FIRST and LAST function. How do I code to get a single group variable with 3 levels: (1) 2+ claims for CP on different days; (2) no claim for CP (controls); and (3) only 1 claim for CP (will be used as a sensitivity analysis)?

 

Here is an example of what the data looks like. Patient 1 (patid) has CP; patient 2 does not have CP; and patient 3 has only 1 claim for CP.

 

patid    fst_dt                 CP

1          01/01/2016        0

1          02/03/2016        1

1          02/05/2016        0

1          02/23/2016        0

1          07/5/2016          1

1          07/17/2016        1

1          09/02/2016        0

2          02/04/2016        0

2          05/21/2016        0

2          10/25/2016        0

2          11/11/2016         0

3           01/04/2016       0

3          04/26/2016        1

3          04/28/2016        0

3          06/12/2016        0

 

Thank you!!

1 ACCEPTED SOLUTION

Accepted Solutions
andreas_lds
Jade | Level 19

Not sure, what you expect as result, this seems to solve the issue. I have added a fourth patient, please verify the result:

data work.have;
   length patid fst_dt cp 8;
   informat fst_dt mmddyy10.;
   format fst_dt date9.;
   input patid fst_dt CP;
datalines;
1 01/01/2016 0
1 02/03/2016 1
1 02/05/2016 0
1 02/23/2016 0
1 07/5/2016 1
1 07/17/2016 1
1 09/02/2016 0
4 01/01/2016 0
4 02/03/2016 1
4 02/03/2016 1
4 02/23/2016 0
4 07/5/2016 0
4 07/17/2016 1
4 09/02/2016 0
2 02/04/2016 0
2 05/21/2016 0
2 10/25/2016 0
2 11/11/2016 0
3 01/04/2016 0
3 04/26/2016 1
3 04/28/2016 0
3 06/12/2016 0
;
run;


data work.want;
   set work.have;
   by patid notsorted;
   
   length group 8 lastCPDate 8 cpCounter 8;
   retain lastCPDate cpCounter;
   format lastCPDate date9.;

   if first.patid then do;
      cpCounter = 0;
   end;

   if CP then do;
      cpCounter = cpCounter + (lastCPDate ^= fst_dt);
      lastCPDate = fst_dt;
   end;

   if last.patid then do;
      select;
         when (cpCounter = 1) group = 3;
         when (cpCounter > 1) group = 1;
         otherwise group = 2;
      end;
      output;
   end;

   keep patid group;
run;

View solution in original post

4 REPLIES 4
andreas_lds
Jade | Level 19

Is the variable fst_dt a date, dateime or string?

andreas_lds
Jade | Level 19

Not sure, what you expect as result, this seems to solve the issue. I have added a fourth patient, please verify the result:

data work.have;
   length patid fst_dt cp 8;
   informat fst_dt mmddyy10.;
   format fst_dt date9.;
   input patid fst_dt CP;
datalines;
1 01/01/2016 0
1 02/03/2016 1
1 02/05/2016 0
1 02/23/2016 0
1 07/5/2016 1
1 07/17/2016 1
1 09/02/2016 0
4 01/01/2016 0
4 02/03/2016 1
4 02/03/2016 1
4 02/23/2016 0
4 07/5/2016 0
4 07/17/2016 1
4 09/02/2016 0
2 02/04/2016 0
2 05/21/2016 0
2 10/25/2016 0
2 11/11/2016 0
3 01/04/2016 0
3 04/26/2016 1
3 04/28/2016 0
3 06/12/2016 0
;
run;


data work.want;
   set work.have;
   by patid notsorted;
   
   length group 8 lastCPDate 8 cpCounter 8;
   retain lastCPDate cpCounter;
   format lastCPDate date9.;

   if first.patid then do;
      cpCounter = 0;
   end;

   if CP then do;
      cpCounter = cpCounter + (lastCPDate ^= fst_dt);
      lastCPDate = fst_dt;
   end;

   if last.patid then do;
      select;
         when (cpCounter = 1) group = 3;
         when (cpCounter > 1) group = 1;
         otherwise group = 2;
      end;
      output;
   end;

   keep patid group;
run;
dwhitney
Calcite | Level 5

This works great! However, this results in a single line for each patid. This is fine, because I then merged this dataset with my other dataset in the long format, and the group variable populated correctly.

 

This is awesome! Thanks so much!

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 4 replies
  • 688 views
  • 0 likes
  • 2 in conversation