BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
ammarhm
Lapis Lazuli | Level 10

Hi everyone, 

i am trying to prepare a dataset for a Cox regression analysis...

Below is an example dataset to give you an idea of what I am trying to do. Let us say I am interested in studying the effect of age, diabetes and medication on survival. I am using the following dataset:

 

data Data1;
attrib
id label = "ID"
born label = "Born" informat=ddmmyy8. format=date7.
entered label = "Entered" informat=ddmmyy8. format=date7.
exited label = "Exited" informat=ddmmyy8. format=date7.
diabetes label="Diabetes"
diabetes_date label = "Diabetes Date" informat=ddmmyy8. format=date7.
medication label="Medication"
medication_start label = "Med Start" informat=ddmmyy8. format=date7.
medication_end label = "Med End" informat=ddmmyy8. format=date7.
death label = "Death"
;
input id born entered exited diabetes diabetes_date medication medication_start medication_end  death;

cards ;
1 14/12/74 30/07/04 22/01/10 1 01/04/02 1 12/05/06 . 0
2 20/02/34 03/09/84 30/03/12 0 . 1 07/07/99 02/05/06 1
3 17/08/51 22/09/89 09/04/06 1 23/05/02 1 23/05/87 11/06/04 0
4 17/09/74 12/07/92 05/08/07 0 . 0 . . 0
;
run ;

As "diabetes" and "medication" are time varying covariate, I will need to split follow-up time to reflect that exposure. The code below illustrates how to do that for the variable diabetes:

 

 

Data Data2;
set Data1 (rename=(entered=_entered exited=_exited));
by id;
if diabetes=1 and _entered lt diabetes_date le _exited then do;
entered=_entered; exited=diabetes_date; diabetes=0; Age=(_entered-born)/365;output;
entered=diabetes_date; exited=_exited; diabetes=1; Age=(diabetes_date-born)/365; output;
end;
else do;
entered=_entered; exited=_exited; diabetes=0; Age=(_entered-born)/365; output;
end;
format entered exited date7.;
drop _entered _exited;
run;

And then the analysis:


proc phreg data=Data2;
class diabetes;
model (entered exited)*death(0)=age diabetes/ ties=exact risklimits;
run;

 

You get the idea, I hope.

However, I have several time-varying covariates, some of which have start and end date like the variable "medication" in the dataset above, other exposures start before entry date etc, and while one way to do this would be to repeat the steps above for each variable, I do wonder if there is an easier way?

I would appreciate your help.

Thank you

1 ACCEPTED SOLUTION

Accepted Solutions
ChrisNZ
Tourmaline | Level 20

> Any thoughts about how to modify the macro to get it to work within one data step?

I don't understand all the needs, but the complexity involved would outweigh the benefit imho.

Keeping legible code is always a high priority. 

Unless the table is massive and reading it several times is a liability, keep it simple :).

 

It doesn't have to be wordy, even without a macro (so it's simple), it can still be compact:

 

data DATA2;
  set DATA1(drop=_: rename=(ENTERED =_ENTERED  EXITED =_EXITED ));
  %reassign(DIABETES);
data DATA3;
  set DATA2(drop=_: rename=(ENTERED =_ENTERED  EXITED =_EXITED ));
  %reassign(MEDICATION);
data FINAL;
  set DATA3(drop=_: rename=(ENTERED =_ENTERED  EXITED =_EXITED ));
  %reassign(OTHER);
  format ENTERED EXITED date7.;
run;

 

 

View solution in original post

5 REPLIES 5
Reeza
Super User
Have you considered an array?
ammarhm
Lapis Lazuli | Level 10
Thanks Reza,
No not really .. I cannot say arrays are my strong side, would very much appreciate a sample code if you don't mind and have the time?
Kind regards
AM
ChrisNZ
Tourmaline | Level 20

You could write a short macro, like:

%let variables = DIABETES ;
%macro reassign;
  %local var;
  %let var=%scan(&variables ,1);
  if &var.=1 and _ENTERED lt &var._DATE le _EXITED then do;
    ENTERED=_ENTERED  ; EXITED=&var._DATE; &var.=0; AGE=(_ENTERED  -BORN)/365; output;
    ENTERED=&var._date; EXITED=_EXITED   ; &var.=1; AGE=(&var._DATE-BORN)/365; output;
  end;
  else do;
    ENTERED =_ENTERED ; EXITED =_EXITED ; &var.=0; AGE=(_ENTERED - BORN)/365;  output;
  end;
%mend;

data DATA2;
  set DATA1(rename=(ENTERED =_ENTERED  EXITED =_EXITED ));
  by ID;
  %reassign;
  format ENTERED EXITED date7.;
  drop _: ;
run;

If this suits, you can add a loop to the macro to go through a list of variable names

 

 

ammarhm
Lapis Lazuli | Level 10

Thanks so much @ChrisNZ 

This is a great solution if you are splitting one variable, but things become complicated when you are splitting by more than one, as there will be an overlap between the two variables timelines

So for a certain person there will be say a start follow up period where diabetes=0 and medication=0, then the person becomes diabetic but without medication diabetes=1 and medication=0. Then when medication is started the variables become diabetes=1 medication=1. The current code however treats the two variables completely independently from each other, see this example with some modification:


%macro reassign (variables);
  %local var;
  %let var=%scan(&variables ,1);
  if &var.=1 and _ENTERED lt &var._DATE le _EXITED then do;
    ENTERED=_ENTERED  ; EXITED=&var._DATE; &var.=0; AGE=(_ENTERED  -BORN)/365; output;
    ENTERED=&var._date; EXITED=_EXITED   ; &var.=1; AGE=(&var._DATE-BORN)/365; output;
  end;

  
  if &var.=1 and &var._DATE le _ENTERED then do;
    ENTERED=_ENTERED  ; EXITED=_EXITED; &var.=1; AGE=(_ENTERED  -BORN)/365; output;
  end;


  if &var.=0 OR (&var.=1 and &var._DATE gt _EXITED) then do;
    ENTERED =_ENTERED ; EXITED =_EXITED ; &var.=0; AGE=(_ENTERED - BORN)/365;  output;
  end;
%mend;

Then the data step


data DATA2;
  set DATA1(rename=(ENTERED =_ENTERED  EXITED =_EXITED ));
  by ID;
  %reassign(diabetes);
  %reassign(medication);

  format ENTERED EXITED date7.;
  *drop _: ;
run;

So instead, the code will split the follow up time into 4 rows

1. Diabetes=0 medication=0

2. Diabetes=1, medication=0.

3. Diabetes=0, medication=1

4. Diabetes=1, medication=1

 

However, splitting the data step into two will generate the right output:

 

data DATA2;
  set DATA1(rename=(ENTERED =_ENTERED  EXITED =_EXITED ));
  %reassign(diabetes);
  format ENTERED EXITED date7.;
  drop _: ;
run;


data DATA3;
  set DATA2(rename=(ENTERED =_ENTERED  EXITED =_EXITED ));
  %reassign(medication);
  format ENTERED EXITED date7.;
  drop _: ;
run;

Any thoughts about how to modify the macro to get it to work within one data step?

ChrisNZ
Tourmaline | Level 20

> Any thoughts about how to modify the macro to get it to work within one data step?

I don't understand all the needs, but the complexity involved would outweigh the benefit imho.

Keeping legible code is always a high priority. 

Unless the table is massive and reading it several times is a liability, keep it simple :).

 

It doesn't have to be wordy, even without a macro (so it's simple), it can still be compact:

 

data DATA2;
  set DATA1(drop=_: rename=(ENTERED =_ENTERED  EXITED =_EXITED ));
  %reassign(DIABETES);
data DATA3;
  set DATA2(drop=_: rename=(ENTERED =_ENTERED  EXITED =_EXITED ));
  %reassign(MEDICATION);
data FINAL;
  set DATA3(drop=_: rename=(ENTERED =_ENTERED  EXITED =_EXITED ));
  %reassign(OTHER);
  format ENTERED EXITED date7.;
run;

 

 

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 16. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 938 views
  • 2 likes
  • 3 in conversation