Help using Base SAS procedures

summarizing repeated data

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 10
Accepted Solution

summarizing repeated data

Hi,

I'm working on a database describing medication reimbursement. My database looks like this :

idrefdateATCreimb date
1ABC2011-01-0112011-02-01
1ABC2011-01-0112011-04-05
1ABC2011-01-0112011-06-17
2KHG2011-04-0352011-04-05
2KHG2011-04-0352011-05-03
2KHG2011-04-0352011-06-05

"id" stands for the identifier of my patients, "refdate" is a reference date, "ATC" is a medication class, "reimb date" stands for reimbursement dates.

I'm trying to determine for each patient in my base, for periods of 60 days starting from their reference date, whether they did or did not receive a reimbursement of each "ATC" medication class in a given period of time.

For example for the patient with the id "1ABC" I would like to obtain an array ATC1 with the values {1,1,1,0 ....}, an array ATC2 with the values {0,0,0,0 ....) etc. for each different ATC value I have.

For the patient with the id "2KHG" I would like to obtain an array ATC1 with the values {0,0,0,0 ....}, an array ATC2 with the values {0,0,0,0 ....}, .... an array ATC5 with the values {1,1,1,0 ....) etc.

For every patient I have an array containing the dates used to determine reimbursement status: I created an array "dates" with the values {refdate, refdate + 60, refdate + 120 ...}

I have difficulties to loop on my patients ids, my "dates" array, the different ATC etc. I'm a beginner in SAS and more used to R "way of thinking".

Thank you very much for your help.

Yohann


Accepted Solutions
Solution
‎05-13-2015 11:47 AM
New Contributor
Posts: 4

Re: summarizing repeated data

Hi Yohann,

I used a brute force method so there is a little repetition.  With more values of ATC it might become worth it to look into macros, but this should give you the table you are after.

Note I assumed three periods of 60 days, but you can add more where noted below.

* prepare sample data;
data med1;
  input id $
  ref_date YYMMDD10.
  ATC
  reimb_date YYMMDD10.;

datalines;
1ABC 2011-01-01 1 2011-02-01
1ABC 2011-01-01 1 2011-04-05
1ABC 2011-01-01 1 2011-06-17
1ABC 2011-01-01 2 2011-02-01
1ABC 2011-01-01 2 2011-04-05
1ABC 2011-01-01 2 2011-06-17
2KHG 2011-04-03 5 2011-04-05
2KHG 2011-04-03 5 2011-05-03
2KHG 2011-04-03 5 2011-06-05
;
run;

* sort the data to ensure reimbursements are tested correctly in next step;

proc sort data = med1; by id ATC reimb_date; run;


* check if reimbursement occurred in 60 day periods;

data med2;
  set med1; 
  by ATC;
  retain reimb_period;

  if first.ATC then reimb_period = 1;
  else reimb_period + 1;

  reimb_datdif = datdif(ref_date,reimb_date,'ACT/ACT');
  reimb_test = reimb_period * 60;
 
  if reimb_datdif le reimb_test then success = 1;
  else if reimb_datdif gt reimb_test then success = 0;

run;

* reshape table;
data med3;
  set med2;
  by id;

  * I assume three reimbursement periods, but changing the _3 to another number will add more;

  retain ATC1_1-ATC1_3;
  retain ATC2_1-ATC2_3;
  retain ATC3_1-ATC3_3;
  retain ATC4_1-ATC4_3;
  retain ATC5_1-ATC5_3;

  array ATC1_array

  • ATC1_1-ATC1_3;
      array ATC2_array
  • ATC2_1-ATC2_3;
      array ATC3_array
  • ATC3_1-ATC3_3;
      array ATC4_array
  • ATC4_1-ATC4_3;
      array ATC5_array
  • ATC5_1-ATC5_3;
  •   if first.id then do;
        call missing(of ATC1_array{*});
        call missing(of ATC2_array{*});
        call missing(of ATC3_array{*});
        call missing(of ATC4_array{*});
        call missing(of ATC5_array{*});
      end;

      if ATC = 1 then ATC1_array[reimb_period] = success;
      else if ATC = 2 then ATC2_array[reimb_period] = success;
      else if ATC = 3 then ATC3_array[reimb_period] = success;
      else if ATC = 4 then ATC4_array[reimb_period] = success;
      else if ATC = 5 then ATC5_array[reimb_period] = success;

      * try commenting out this last conditional if you have never used this before, it works together with the RETAIN statement above to reshape the table one row at a time;
      if last.id;

      drop ATC reimb_date reimb_period reimb_datdif reimb_test success;
    run;

    View solution in original post


    All Replies
    Super User
    Posts: 11,343

    Re: summarizing repeated data

    Since you have a within 60 days requirement my first question: are the dates actual SAS date values or character variables? If the later likely the first thing will be to get actual date valued variables so time intervals can be determined.

    Second, if you want a data set as output you will need to provide a better description of that output. Since your example data has only single ATC types it is bit difficult to determine how you intend to apply the rule "for each different ATC value".

    Something that may affect your 60 day rules: Is it possible for any patient to have a second reference date? If so, could other reference dates be within 60 days of a previous?

    Your statement "I would like to obtain an array" may not be possible in the manner you are thinking as SAS arrays are temporary constructs for manipulating groups of variables and do not exist outside of data step code. Each value in a SAS dataset is a single variable. You could well have 60 similarly named variable var1-var60 for instance though. Which takes us back to an output description.

    Also, how would the resulting data set be used? The SAS report procedures sometimes may not require as much preprocessing of data as you are attempting.

    Occasional Contributor
    Posts: 10

    Re: summarizing repeated data

    Hi,

    My dates are SAS date values so I dont have any problem to determine my time intervals. Moreover each patient has a unique reference date, which makes the task easier.

    Sorry I'm probably misusing the term "array", I would like to obtain an output like this (I have 5 different ATC types (1,2,3,4,5)) :

    idrefdateATC1_1ATC1_2ATC1_3.....ATC2_1.....ATC5_1ATC5_2ATC5_3...
    1ABC2011-01-01111...0...000...
    2KHG2011-04-03000...0...111....

    The goal is to obtain "medication use trajectories" over the time for each of my patients.

    Thanks for your help.

    Super User
    Posts: 10,046

    Re: summarizing repeated data

    Use the MERGE skill me,Matt and Arthur.T proposed .

    http://support.sas.com/resources/papers/proceedings15/2785-2015.pdf

    data have;
    input id $ ref_date : yymmdd12. atc reimb_date : yymmdd12. ;
    format ref_date  reimb_date yymmdd10. ;
    cards;
    1ABC     2011-01-01     1     2011-02-01
    1ABC     2011-01-01     1     2011-04-05
    1ABC     2011-01-01     1     2011-06-17
    2KHG     2011-04-03     5     2011-04-05
    2KHG     2011-04-03     5     2011-05-03
    2KHG     2011-04-03     5     2011-06-05
    ;
    run;
    data have(drop=reimb_date);
     set have;
     by id  ref_date atc ;
     retain dummy 1;
     if first.atc then n=0;
     n+1;
    run;
    proc sql;
     select distinct catt('have(where=(id="',id,'" and ref_date=',ref_date,' and atc=',atc,' and n=',n,') rename=(dummy=atc',atc,'_',n,'))') into : list separated by ' '
      from have;
    quit;
    data temp;
     merge &list ;
     by id ref_date atc;
     drop n atc;
    run;
    proc stdize data=temp out=want missing=0 reponly;run;
    
    

    Xia Keshan

    Solution
    ‎05-13-2015 11:47 AM
    New Contributor
    Posts: 4

    Re: summarizing repeated data

    Hi Yohann,

    I used a brute force method so there is a little repetition.  With more values of ATC it might become worth it to look into macros, but this should give you the table you are after.

    Note I assumed three periods of 60 days, but you can add more where noted below.

    * prepare sample data;
    data med1;
      input id $
      ref_date YYMMDD10.
      ATC
      reimb_date YYMMDD10.;

    datalines;
    1ABC 2011-01-01 1 2011-02-01
    1ABC 2011-01-01 1 2011-04-05
    1ABC 2011-01-01 1 2011-06-17
    1ABC 2011-01-01 2 2011-02-01
    1ABC 2011-01-01 2 2011-04-05
    1ABC 2011-01-01 2 2011-06-17
    2KHG 2011-04-03 5 2011-04-05
    2KHG 2011-04-03 5 2011-05-03
    2KHG 2011-04-03 5 2011-06-05
    ;
    run;

    * sort the data to ensure reimbursements are tested correctly in next step;

    proc sort data = med1; by id ATC reimb_date; run;


    * check if reimbursement occurred in 60 day periods;

    data med2;
      set med1; 
      by ATC;
      retain reimb_period;

      if first.ATC then reimb_period = 1;
      else reimb_period + 1;

      reimb_datdif = datdif(ref_date,reimb_date,'ACT/ACT');
      reimb_test = reimb_period * 60;
     
      if reimb_datdif le reimb_test then success = 1;
      else if reimb_datdif gt reimb_test then success = 0;

    run;

    * reshape table;
    data med3;
      set med2;
      by id;

      * I assume three reimbursement periods, but changing the _3 to another number will add more;

      retain ATC1_1-ATC1_3;
      retain ATC2_1-ATC2_3;
      retain ATC3_1-ATC3_3;
      retain ATC4_1-ATC4_3;
      retain ATC5_1-ATC5_3;

      array ATC1_array

  • ATC1_1-ATC1_3;
      array ATC2_array
  • ATC2_1-ATC2_3;
      array ATC3_array
  • ATC3_1-ATC3_3;
      array ATC4_array
  • ATC4_1-ATC4_3;
      array ATC5_array
  • ATC5_1-ATC5_3;
  •   if first.id then do;
        call missing(of ATC1_array{*});
        call missing(of ATC2_array{*});
        call missing(of ATC3_array{*});
        call missing(of ATC4_array{*});
        call missing(of ATC5_array{*});
      end;

      if ATC = 1 then ATC1_array[reimb_period] = success;
      else if ATC = 2 then ATC2_array[reimb_period] = success;
      else if ATC = 3 then ATC3_array[reimb_period] = success;
      else if ATC = 4 then ATC4_array[reimb_period] = success;
      else if ATC = 5 then ATC5_array[reimb_period] = success;

      * try commenting out this last conditional if you have never used this before, it works together with the RETAIN statement above to reshape the table one row at a time;
      if last.id;

      drop ATC reimb_date reimb_period reimb_datdif reimb_test success;
    run;

    Occasional Contributor
    Posts: 10

    Re: summarizing repeated data

    Posted in reply to seeLowGreen

    Thanks very much for your help it works perfectly. I managed to obtain the output I wanted in R but the execution time was very long so your solution is very helpful !

    Thanks

    Yohann

    🔒 This topic is solved and locked.

    Need further help from the community? Please ask a new question.

    Discussion stats
    • 5 replies
    • 352 views
    • 0 likes
    • 4 in conversation