Hi everyone,
i am trying to prepare a dataset for a Cox regression analysis...
Below is an example dataset to give you an idea of what I am trying to do. Let us say I am interested in studying the effect of age, diabetes and medication on survival. I am using the following dataset:
data Data1;
attrib
id label = "ID"
born label = "Born" informat=ddmmyy8. format=date7.
entered label = "Entered" informat=ddmmyy8. format=date7.
exited label = "Exited" informat=ddmmyy8. format=date7.
diabetes label="Diabetes"
diabetes_date label = "Diabetes Date" informat=ddmmyy8. format=date7.
medication label="Medication"
medication_start label = "Med Start" informat=ddmmyy8. format=date7.
medication_end label = "Med End" informat=ddmmyy8. format=date7.
death label = "Death"
;
input id born entered exited diabetes diabetes_date medication medication_start medication_end death;
cards ;
1 14/12/74 30/07/04 22/01/10 1 01/04/02 1 12/05/06 . 0
2 20/02/34 03/09/84 30/03/12 0 . 1 07/07/99 02/05/06 1
3 17/08/51 22/09/89 09/04/06 1 23/05/02 1 23/05/87 11/06/04 0
4 17/09/74 12/07/92 05/08/07 0 . 0 . . 0
;
run ;
As "diabetes" and "medication" are time varying covariate, I will need to split follow-up time to reflect that exposure. The code below illustrates how to do that for the variable diabetes:
Data Data2;
set Data1 (rename=(entered=_entered exited=_exited));
by id;
if diabetes=1 and _entered lt diabetes_date le _exited then do;
entered=_entered; exited=diabetes_date; diabetes=0; Age=(_entered-born)/365;output;
entered=diabetes_date; exited=_exited; diabetes=1; Age=(diabetes_date-born)/365; output;
end;
else do;
entered=_entered; exited=_exited; diabetes=0; Age=(_entered-born)/365; output;
end;
format entered exited date7.;
drop _entered _exited;
run;
And then the analysis:
proc phreg data=Data2;
class diabetes;
model (entered exited)*death(0)=age diabetes/ ties=exact risklimits;
run;
You get the idea, I hope.
However, I have several time-varying covariates, some of which have start and end date like the variable "medication" in the dataset above, other exposures start before entry date etc, and while one way to do this would be to repeat the steps above for each variable, I do wonder if there is an easier way?
I would appreciate your help.
Thank you
> Any thoughts about how to modify the macro to get it to work within one data step?
I don't understand all the needs, but the complexity involved would outweigh the benefit imho.
Keeping legible code is always a high priority.
Unless the table is massive and reading it several times is a liability, keep it simple :).
It doesn't have to be wordy, even without a macro (so it's simple), it can still be compact:
data DATA2;
set DATA1(drop=_: rename=(ENTERED =_ENTERED EXITED =_EXITED ));
%reassign(DIABETES);
data DATA3;
set DATA2(drop=_: rename=(ENTERED =_ENTERED EXITED =_EXITED ));
%reassign(MEDICATION);
data FINAL;
set DATA3(drop=_: rename=(ENTERED =_ENTERED EXITED =_EXITED ));
%reassign(OTHER);
format ENTERED EXITED date7.;
run;
You could write a short macro, like:
%let variables = DIABETES ;
%macro reassign;
%local var;
%let var=%scan(&variables ,1);
if &var.=1 and _ENTERED lt &var._DATE le _EXITED then do;
ENTERED=_ENTERED ; EXITED=&var._DATE; &var.=0; AGE=(_ENTERED -BORN)/365; output;
ENTERED=&var._date; EXITED=_EXITED ; &var.=1; AGE=(&var._DATE-BORN)/365; output;
end;
else do;
ENTERED =_ENTERED ; EXITED =_EXITED ; &var.=0; AGE=(_ENTERED - BORN)/365; output;
end;
%mend;
data DATA2;
set DATA1(rename=(ENTERED =_ENTERED EXITED =_EXITED ));
by ID;
%reassign;
format ENTERED EXITED date7.;
drop _: ;
run;
If this suits, you can add a loop to the macro to go through a list of variable names
Thanks so much @ChrisNZ
This is a great solution if you are splitting one variable, but things become complicated when you are splitting by more than one, as there will be an overlap between the two variables timelines
So for a certain person there will be say a start follow up period where diabetes=0 and medication=0, then the person becomes diabetic but without medication diabetes=1 and medication=0. Then when medication is started the variables become diabetes=1 medication=1. The current code however treats the two variables completely independently from each other, see this example with some modification:
%macro reassign (variables);
%local var;
%let var=%scan(&variables ,1);
if &var.=1 and _ENTERED lt &var._DATE le _EXITED then do;
ENTERED=_ENTERED ; EXITED=&var._DATE; &var.=0; AGE=(_ENTERED -BORN)/365; output;
ENTERED=&var._date; EXITED=_EXITED ; &var.=1; AGE=(&var._DATE-BORN)/365; output;
end;
if &var.=1 and &var._DATE le _ENTERED then do;
ENTERED=_ENTERED ; EXITED=_EXITED; &var.=1; AGE=(_ENTERED -BORN)/365; output;
end;
if &var.=0 OR (&var.=1 and &var._DATE gt _EXITED) then do;
ENTERED =_ENTERED ; EXITED =_EXITED ; &var.=0; AGE=(_ENTERED - BORN)/365; output;
end;
%mend;
Then the data step
data DATA2;
set DATA1(rename=(ENTERED =_ENTERED EXITED =_EXITED ));
by ID;
%reassign(diabetes);
%reassign(medication);
format ENTERED EXITED date7.;
*drop _: ;
run;
So instead, the code will split the follow up time into 4 rows
1. Diabetes=0 medication=0
2. Diabetes=1, medication=0.
3. Diabetes=0, medication=1
4. Diabetes=1, medication=1
However, splitting the data step into two will generate the right output:
data DATA2;
set DATA1(rename=(ENTERED =_ENTERED EXITED =_EXITED ));
%reassign(diabetes);
format ENTERED EXITED date7.;
drop _: ;
run;
data DATA3;
set DATA2(rename=(ENTERED =_ENTERED EXITED =_EXITED ));
%reassign(medication);
format ENTERED EXITED date7.;
drop _: ;
run;
Any thoughts about how to modify the macro to get it to work within one data step?
> Any thoughts about how to modify the macro to get it to work within one data step?
I don't understand all the needs, but the complexity involved would outweigh the benefit imho.
Keeping legible code is always a high priority.
Unless the table is massive and reading it several times is a liability, keep it simple :).
It doesn't have to be wordy, even without a macro (so it's simple), it can still be compact:
data DATA2;
set DATA1(drop=_: rename=(ENTERED =_ENTERED EXITED =_EXITED ));
%reassign(DIABETES);
data DATA3;
set DATA2(drop=_: rename=(ENTERED =_ENTERED EXITED =_EXITED ));
%reassign(MEDICATION);
data FINAL;
set DATA3(drop=_: rename=(ENTERED =_ENTERED EXITED =_EXITED ));
%reassign(OTHER);
format ENTERED EXITED date7.;
run;
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.