BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Cruise
Ammonite | Level 13

Dear SAS experts:

 

I'd like to conduct the analysis using Dr.Dickman's SAS programming that he kindly shared on his blog. http://www.pauldickman.com/survival/sas/relative_survival_using_sas.pdf

datasets here: http://www.pauldickman.com/survival/?dir=sas

and attached to this post as well.

Therefore, I have to understand their SAS programming step-wise. I have prepared my outcome and population data compatible to their input datasets. Alas, not really 🙂 is this easy.

 

I somehow understand the 'lexis' macro and fully understand its objectives and the contents of the resulting output dataset. But I don't understand the logic how variables 'w' and 'year 8594' created? These two variables were not part of the two input datasets and must have been created based on the other variables inside the "lexis" macro.

 

output.png

w - 'Indicator for censoring during the interval'
year8594 - 'Year of diagnosis 1985-94'

 

Do you mind help me understand the logic how 'w' and 'year8594' were created in SAS language out of macro?

 

What variables are being the basis of creating these two variables?  I found "cint = w" under %lexis definition. Then temporary _cint_ was then created. Then what happened?

 

I'll greatly appreciate your time and efforts.

 

Thanks zillions in advance.

 

/****************************************************************
SURVIVAL.SAS

Estimate relative survival and produce output data files that
can be used for fitting relative survival models.
http://www.pauldickman.com/rsmodel/

Paul Dickman (paul.dickman@ki.se)
Version 1.0 - June 2004
****************************************************************/

title; footnote;
title1 'Colon carcinoma diagnosed in Finland 1975-1994 (follow-up to 1995)';
libname colon 'D:\...\dickman';
options fmtsearch=(colon work library) orientation=landscape pageno=1;

/****************************************************************
Define the input and output files.
****************************************************************/
/* Population mortality file */
%let popmort=colon.popmort ;

/* Patient data file */
%let patdata=colon.colon ;

/* Output data file containing individual records */
%let individ=colon.individ ;

/* Output data file containing collapsed data */
%let grouped=colon.grouped ;

/********************************************************
This program creates SAS formats.
********************************************************/

proc format;

value fu
1='1 '
2=' 2'
3=' 3'
4=' 4'
5=' 5'
6=' 6'
7=' 7'
8=' 8'
9=' 9'
10=' 10'
;

value sex
1='Male'
2=' Female'
;

value yydx
1975-1984='1975-84'
1985-1994=' 1985-94'
;

value age
0-44='0-44'
45-59=' 45-59'
60-74=' 60-74'
75-high=' 75+'
;

value status
0='Alive'
1='Dead: cancer'
2='Dead: other'
4='Lost to follow-up'
;

value stage
0='Unknown'
1='Localised'
2='Regional'
3='Distant'
;

/* colon subsite */
value colonsub
1='Coecum and ascending'
2='Transverse'
3='Descending and sigmoid'
4='Other and NOS'
;

run;

/****************************************************************
The macro variable VARS carries the variables over which the
life tables are stratified. 
For example:
%let vars = sex yydx age;
Will result in a lifetable being estimated for each combination
of sex, yydx (year of diagnosis), and age. If, for example, the
variable age contains age at diagnosis in years then categories
can be constructed using a format. 
****************************************************************/
%let vars = sex yydx age;
%let formats = sex sex. age age. yydx yydx. ; 

data &individ;
length id 5;
set &patdata;

/* Restrict to localised stage*/
if stage=1;

/* Create a unique ID for each individual */
id+1;

/****************************************************************
The variable SURV_MM contains survival time in completed months.
We will add 0.5 to all survival times, both to avoid problems 
with individuals with time=0 (who are theoretically never at risk
and may be excluded from some analyses) and because this provides
a more accurate estimate of person-time at risk (for Poisson 
regression analyses).
****************************************************************/
surv_mm = surv_mm + 0.5;

/* The lexis macro requires a variable containing the time at entry */
entry=0;

/* Create an indicator variable for death due to any cause */
if status in (1,2) then d=1;
else d=0;

drop stage subsite;
label id='Unique subject ID';
run;

/*****************************************************************
It is preferable to put the lexis macro in the autocall library
rather than including it as is done here
****************************************************************/
/**************************************************************************
Author: Bendix Carstensen, 1999-2002
Update: Paul Dickman, BxC, November 2003
Bug-fix: BxC, December 2007:
         If the origin= argument had missing values erroneous output would
         be generated (too much risk time). Now remedied so that
         observations with missing values of origin are excluded.
This macro is in: http://www.biostat.ku.dk/~bxc/Lexis/Lexis.sas
Example program:  http://www.biostat.ku.dk/~bxc/Lexis/xLexis.sas 
***************************************************************************/

%macro Lexis ( data = ,       /* Data set with original data,             */
                              /*    defaults to _last_                    */
                out = ,       /* Where to put the result,                 */
                              /*    defaults to &data.                    */
              entry = entry,  /* Variable holding the entry date          */
               exit = exit,   /* Variable holding the exit date           */
               fail = fail,   /* Variable holding the exit status         */
                              /* If any of the entry, exit or fail        */
                              /*    variables are missing the person is   */
                              /*    discarded from the computations.      */
             breaks = ,       /* Specification of the cutpoints on        */
                              /*    the transformed scale.                */
                              /*    Syntax as for a do statement.         */
                              /*    The ONLY Mandatory argument.          */
               cens = 0,      /* Code for censoring (may be a variable)   */
              scale = 1,      /* Factor to transform from the scale       */
                              /*    of entry and exit to the scale        */
                              /*    where breaks and risk are given       */
             origin = 0,      /* Origin of the transformed scale          */
               risk = risk,   /* Variable recieving the risk time         */
              lrisk = lrisk,  /* Variable recieving the log(risk time)    */
               left = left,   /* Variable recieving left  endpoint of int */
              other = ,       /* Other dataset statements to be used such */
                              /*     as: %str( format var ddmmyy10. ; )   */
                              /*     or: %str( label risk ="P-years" ; )  */
               disc = discrd, /* Dataset holding discarded observations   */
           /*-------------------------------------------------------------*/
           /* Variables for making life-tables and other housekeeping:    */
           /* These will only appear in the output dataset if given here  */
           /* The existence of these arguments are tested in the macro so */
           /* they cannot have names that are also logical operators such */
           /* as: or, and, eq, ne, le, lt, gt.                            */
           /*-------------------------------------------------------------*/
              right = ,       /* Variable recieving right endpoint of int */
               lint = ,       /* Variable recieving interval length       */
            os_left = ,       /* Variable recieving left  endpoint of int */
           os_right = ,       /* Variable recieving right endpoint of int */
            os_lint = ,       /* Variable recieving interval length       */
                              /*    - the latter three on original scale  */
               cint = ,       /* Variable recieving censoring indicator   */
                              /*    for the current input record          */
               nint =         /* Variable recieving index of follow-up    */
                              /*       interval;                          */
              );

%if &breaks.= %then %put ERROR: breaks MUST be specified. ;
%if &data.  = %then %let data = &syslast. ;
%if &out.   = %then %do ;
                    %let out=&data. ;
                    %put
NOTE: Output dataset not specified, input dataset %upcase(&data.) will be overwritten. ;
                  %end ;

data &disc. &out. ;
  set &data. ;
  if ( nmiss ( &entry., &exit., &fail., &origin. ) gt 0 ) 
     then do ; output &disc. ;
               goto next ;
          end ;
  * Labelling of variables ;
  label &entry.  = 'Entry into interval' ;
  label &exit.   = 'Exit from interval' ;
  label &fail.   = 'Failure indicator for interval' ;
  label &risk.   = 'Risktime in interval' ;
  label &lrisk.  = 'Natural log of risktime in interval' ;
  label &left.   = 'Left endpoint of interval (transformed scale)' ;
%if    &right.^= %then  label &right. = 'Right endpoint of interval (transformed scale)' ; ;
%if     &lint.^= %then  label &lint. = 'Interval width (transformed scale)' ; ;
%if  &os_left.^= %then  label &os_left. = 'Left endpoint of interval (original scale)' ; ;
%if &os_right.^= %then  label &os_right. = 'Right endpoint of interval (original scale)' ; ; 
%if  &os_lint.^= %then  label &os_lint. = 'Interval width (original scale)' ; ;
%if     &cint.^= %then  label &cint. = 'Indicator for censoring during the interval' ; ;
%if     &nint.^= %then  label &nint. = 'Sequential index for follow-up interval' ; ;
  &other. ;
  drop _entry_ _exit_ _fail_
       _origin_ _break_
       _cur_r _cur_l _int_r _int_l
       _first_ _cint_ _nint_;

/*
Temporary variables in this macro:

  _entry_  holds entry date on the transformed timescale
  _exit_   holds exit  date on the transformed timescale
  _fail_   holds exit  status
  _break_  current cut-point
  _origin_ origin of the time scale
  _cur_l   left  endpoint of current risk interval
  _cur_r   right endpoint of current risk interval
  _int_l   left  endpoint of current break interval
  _int_r   right endpoint of current break interval
  _first_  indicator for processing of the first break interval
  _cint_   indicator for censoring during the interval
  _nint_   sequential index of interval
   
If a variable with any of these names appear in the input dataset it will
not be present in the output dataset.
*/

  _origin_ = &origin. ;
  _entry_  = ( &entry. - _origin_ ) / &scale. ;
  _exit_   = ( &exit.  - _origin_ ) / &scale. ;
  _fail_   = &fail. ;
  _cur_l   = _entry_ ;
  _first_  = 1 ;

  do _break_ = &breaks. ;
     if _first_ then do ;
        _nint_=-1;
        _cur_l = max ( _break_, _entry_ ) ;
        _int_l = _break_ ;
     end ;
     _nint_ + 1;
     _first_ = 0 ;
     _int_r = _break_ ;
     _cur_r = min ( _exit_, _break_ ) ;
     if _cur_r gt _cur_l then do ;
/*
Endpoints of risk interval are put back on original scale.
If any of left or right are specified the corresponding endpoint
of the break-interval are output.
*/
        &entry.  = _cur_l * &scale. + _origin_ ;
        &exit.   = _cur_r * &scale. + _origin_ ;
        &risk.   = _cur_r - _cur_l ;
        &lrisk.  = log ( &risk. ) ;
        &fail.   = _fail_ * ( _exit_ eq _cur_r ) +
                   &cens. * ( _exit_ gt _cur_r ) ;
        _cint_   = not( _fail_ ) * ( _exit_ eq _cur_r ) ;            
        %if     &left.^= %then &left.     = _int_l ; ;
        %if    &right.^= %then &right.    = _int_r ; ;
        %if     &lint.^= %then &lint.     = _int_r - _int_l ; ;
        %if  &os_left.^= %then &os_left.  = _int_l * &scale. + _origin_ ; ;
        %if &os_right.^= %then &os_right. = _int_r * &scale. + _origin_ ; ; 
        %if  &os_lint.^= %then &os_lint.  = ( _int_r - _int_l ) * &scale. ; ;
        %if     &cint.^= %then &cint.     = _cint_ ; ;
        %if     &nint.^= %then &nint.     = _nint_ ; ;
        output &out. ;
     end ;
     _cur_l = max ( _entry_, _break_ ) ;
     _int_l = _break_ ;
  end ;
  next: ;
run ;

%mend Lexis ;

/*****************************************************************
Split the data to obtain one obervation for each life table interval
for each individual. The scale must be transformed to years.
****************************************************************/
%lexis (
data=&individ.,
out=&individ.,
breaks = %str( 0 to 10 by 1 ),
origin = 0,
entry = entry,
exit = surv_mm,
fail = d,
scale = 12,
right = right,
risk = y,
lrisk = ln_y,
lint = length,
cint = w,
nint = fu
)
;
proc contents data=colon.Individ; run; 

PROC FREQ DATA=a;
TABLES w*status/LIST;
RUN;

/****************************************************************
Create variables for attained age and calendar 
year which are 'updated' for each observation for a single
individual. These are the variables by which we will merge in
the expected probabilities of death, so they must have the
same names and same format as the variables indexing the 
POPMORT file (sex, _age, _year in this example). 
****************************************************************/
data &individ;
set &individ;

/*********************************************************************
Create a variable for attained age at the start of the interval).
This variable must have the same name and have the same format as the 
corresponing variable in the popmort file.
*********************************************************************/
_age=floor(age+left);

/*******************************************************************
Create a variable for calendar period at the start of the interval.
This variable must have the same name and have the same format as the 
corresponing variable in the popmort file.
*******************************************************************/
_year=floor(yydx+left);

/* A variable to label the life table output */
range=put(left,4.1) || ' - ' || left(put(right,4.1));

drop entry left right;
run;

/****************************************************************
Now merge in the expected probabilities of death.
****************************************************************/
proc sort data=&individ;
by sex _year _age;
run;

proc sort data=&popmort;
by sex _year _age;
run;

data &individ;
length d w fu 4 y ln_y length 5;
merge &individ(in=a) &popmort(in=b);
by sex _year _age;
if a;
/* Need to adjust for interval lengths other than 1 year */
p_star=prob**length;
/* Expected number of deaths */
d_star=-log(p_star)*(y/length);
*keep &vars fu range length d w p_star y ln_y d_star;
label
d_star='Expected number of deaths'
d='Indicator for death during interval'
w='Indicator for censored during interval'
y='Person-time (years) at risk during the interval'
length='Interval length (potential not actual)'
ln_y='ln(person-time at risk)'
p_star='Expected survival probability'
_age='Attained age'
_year='Attained calendar year'
range='Life table interval'
fu='Follow-up interval'
sex='Sex'
;
run;

/****************************************************************
Collapse the data to produce the life table.
****************************************************************/
proc summary data=&individ nway;
var d w p_star y d_star;
id range length;
class &vars fu; /* Follow-up must be the last variable in this list */
output out=&grouped(drop=_type_ rename=(_freq_=l)) sum(d w y d_star)=d w y d_star mean(p_star)=p_star;
format &formats ; 
run;

/****************************************************************
Calculate life table quantities. 
****************************************************************/
data &grouped;
retain cp cp_star cr 1;
set &grouped;
if fu=1 then do;
  cp=1; cp_star=1; cr=1; se_temp=0;
  end;
l_prime=l-w/2;
ns=l_prime-d;
/* Two alternative approaches to estimating interval-specific survival */
/* Must use the hazard approach for period analysis */
p=exp(-(d/y)*length); /* transforming the hazard */ 
p=1-d/l_prime; /* actuarial approach */
r=p/p_star;
cp=cp*p;
cp_star=cp_star*p_star;
cr=cp/cp_star;
ln_y_group=log(l_prime-d/2);
ln_y=log(y);
d_star_group=l_prime*(1-p_star);
excess=(d-d_star)/y;
se_p=sqrt(p*(1-p)/l_prime);
se_r=se_p/p_star;
se_temp+d/(l_prime*(l_prime-d)); /* Component of the SE of the cumulative survival */
se_cp=cp*sqrt(se_temp);
se_cr=se_cp/cp_star;

/* Calculate confidence intervals on the log-hazard scale and back transform */
/* First for the interval-specific estimates */
if se_p ne 0 then do;  
  /* SE on the log-hazard scale using Taylor series approximation */
  se_lh_p=sqrt( se_p**2/(p*log(p))**2 );
  /* Confidence limits on the log-hazard scale */
  lo_lh_p=log(-log(p))+1.96*se_lh_p;
  hi_lh_p=log(-log(p))-1.96*se_lh_p;
  /* Confidence limits on the survival scale (observed survival) */
  lo_p=exp(-exp(lo_lh_p));
  hi_p=exp(-exp(hi_lh_p));
  /* Confidence limits for the corresponding relative survival rate */
  lo_r=lo_p/p_star;
  hi_r=hi_p/p_star;
  /* Drop temporary variables */
  drop se_lh_p lo_lh_p hi_lh_p;
  /* Formats and labels */
  format lo_p hi_p lo_r hi_r 8.5;
  label 
  lo_p='Lower 95% CI for P'
  hi_p='Upper 95% CI for P'
  lo_r='Lower 95% CI for R'
  hi_r='Upper 95% CI for R'
  ;
end;

/* Now for the cumulative estimates */
if se_cp ne 0 then do;  
  /* SE on the log-hazard scale using Taylor series approximation */
  se_lh_cp=sqrt( se_cp**2/(cp*log(cp))**2 );
  /* Confidence limits on the log-hazard scale */
  lo_lh_cp=log(-log(cp))+1.96*se_lh_cp;
  hi_lh_cp=log(-log(cp))-1.96*se_lh_cp;
  /* Confidence limits on the survival scale (observed survival) */
  lo_cp=exp(-exp(lo_lh_cp));
  hi_cp=exp(-exp(hi_lh_cp));
  /* Confidence limits for the corresponding relative survival rate */
  lo_cr=lo_cp/cp_star;
  hi_cr=hi_cp/cp_star;
  /* Drop temporary variables */
  drop se_lh_cp lo_lh_cp hi_lh_cp;
  /* Formats and labels */
  format lo_cp hi_cp lo_cr hi_cr 8.5;
  label 
  lo_cp='Lower 95% CI for CP'
  hi_cp='Upper 95% CI for CP'
  lo_cr='Lower 95% CI for CR'
  hi_cr='Upper 95% CI for CR'
  ;
end;

drop se_temp;
label
range='Interval'
fu='Interval'
l='Alive at start'
l_prime='Effective number at risk'
ns='Number surviving the interval'
d='Deaths'
w='Withdrawals'
p='Interval-specific observed survival'
cp='Cumulative observed survival'
r='Interval-specific relative survival'
cr='Cumulative relative survival'
p_star='Interval-specific expected survival'
cp_star='Cumulative expected survival'
ln_y_group='ln(l_prime-d/2)'
ln_y='ln(person-time) (using exact times)'
y='Person-time at risk (using exact times)'
d_star='Expected deaths (using exact times)'
d_star_group='Expected deaths (approximate)'
excess='Empirical excess hazard'
se_p='Standard error of P'
se_r='Standard error of R'
se_cp='Standard error of CP'
se_cr='Standard error of CR'
;
run;

/****************************************************************
Print the lifetables. We first need to extract the last variable
in the varlist to use as the argument in the pageby command.
****************************************************************/
%let lastvar = %scan(&vars,-1);

proc print data=&grouped noobs label;
title2 'Life table estimates of patient survival';
title3 'The Ederer II method is used to estimate expected survival'; 
by &vars;
pageby &lastvar;
var range l d w l_prime p cp p_star cp_star r cr;
format fu 3.0 l d w 4.0 l_prime 8.1 p cp p_star cp_star r cr se_p se_r se_cp se_cr 8.5;
label l='L' d='D' w='W';
run;

 

1 ACCEPTED SOLUTION

Accepted Solutions
Patrick
Opal | Level 21

year8594

Comes from source table colon.colon

 

W

Gets created and populated within macro %lexis()

%lexis (
....
cint = w,
....
)
;

 

And the relevant code within the macro:

....
_cint_ = not( _fail_ ) * ( _exit_ eq _cur_r );
....
%if &cint.^= %then
&cint. = _cint_;
....

 

View solution in original post

2 REPLIES 2
Patrick
Opal | Level 21

year8594

Comes from source table colon.colon

 

W

Gets created and populated within macro %lexis()

%lexis (
....
cint = w,
....
)
;

 

And the relevant code within the macro:

....
_cint_ = not( _fail_ ) * ( _exit_ eq _cur_r );
....
%if &cint.^= %then
&cint. = _cint_;
....

 

Cruise
Ammonite | Level 13
Thanks a lot Patrick. w then indicates the number of cases dead or lost in follow up at each interval. I got it. Thanks again.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 2 replies
  • 878 views
  • 0 likes
  • 2 in conversation