BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
yoyong555
Obsidian | Level 7

Hi everyone.

I hope you are all doing fine.

I have a large data set which requires calculations. For a small data set (probably less than 30), I can patiently use MS Excel. 

So, anyone's help would greatly be appreciated.

Thank you in advance.

And keep safe!

------------------------------------

I have a reference data (Table 1) and two tables (Tables 2 and 3) to produce.

Table 1 (reference data) contains two variables - Age (0 to 100) and Age specific death rates. I use this to populate Tables 2 and 3.

Below is a subset.

Age Age_spec_dth_rates
71 0.023
72 0.023
73 0.023
74 0.023
75 0.037
76 0.037
77 0.037
78 0.037
79 0.037
80 0.063
81 0.063
82 0.063
83 0.063
84 0.063
85 0.147
86 0.147
87 0.147
88

0.147

89 0.147
90 0.147

 

Table 2 has three variables - age of the patients, age of the patients at last follow up and cumulative death rate.

I need to calculate the cumulative death rates for each patient using the reference data.

 

Table 2. 

Patient_age Patient_age_last_follow_up Cum_dth_rate
74 80  
78 84  
80 83  

 

Patient 1 joined the study at 74 years old and lived until 80 years old.

Patient 2 joined the study at 78 years old and lived until 84 years old.

Patient 2 joined the study at 80 years old and lived until 83 years old.

 

The cumulative death rate (cum_dth_rate) is obtained by adding the death rates at each year to the cumulative death rate for the previous year.

Table 2a shows the results. Table 2b shows how the values are generated.

 

Table 2a.

Patient_age Patient_age_last_follow_up Cum_dth_rate
74 80 0.208
78 84 0.326
80 83 0.189

 

Table 2b.

Age Age_spec_dth_rates Cum_dth_rate_pt1 Cum_dth_rate_pt2 Cum_dth_rate_pt3
71 0.023      
72 0.023      
73 0.023      
74 0.023      
75 0.037 0.023    
76 0.037 0.06     
77 0.037 0.097    
78 0.037 0.134    
79 0.037 0.171 0.037  
80 0.063 0.208 0.074  
81 0.063   0.137 0.063
82 0.063   0.2 0.126
83 0.063   0.263 0.189
84 0.063   0.326  
85 0.147      
86 0.147      
87 0.147      
88

0.147

 

 

 

89 0.147      
90 0.147      

 

Table 3 is different. Using again the reference data, I need to populate each patient's cumulative death rate until the end of the longest follow - up time of the study. For example, if the study is conducted for 10 years, the table will look like this:

 

Table 3.

Study Year Cum_dth_rate_pt1 Cum_dth_rate_pt2 Cum_dth_rate_pt3
1      
2      
3      
4      
5      
6      
7      
8      
9      
10      

 

Table 3a shows the results. Table 3b shows how the values are generated.

 

Table 3a.

Study Year Cum_dth_rate_pt1 Cum_dth_rate_pt2 Cum_dth_rate_pt3
1 0.023 0.037 0.063
2 0.06 0.074 0.126
3 0.097 0.137 0.189
4 0.134 0.2 0.252
5 0.171 0.263 0.315
6 0.208 0.326 0.462
7 0.271 0.389 0.609
8 0.334 0.536 0.756
9 0.397 0.683 0.903
10 0.46 0.83 1.05

 

Table 3b.

Age Age_spec_dth_rates Cum_dth_rate_pt1 Cum_dth_rate_pt2 Cum_dth_rate_pt3
71 0.023      
72 0.023      
73 0.023      
74 0.023      
75 0.037 0.023    
76 0.037 0.06     
77 0.037 0.097    
78 0.037 0.134    
79 0.037 0.171 0.037  
80 0.063 0.208 0.074  
81 0.063 0.271 0.137 0.063
82 0.063 0.334 0.2 0.126
83 0.063 0.397 0.263 0.189
84 0.063 0.46 0.326 0.252
85 0.147   0.389 0.315
86 0.147   0.536 0.462
87 0.147   0.683 0.609
88

0.147

 

0.83 0.756
89 0.147     0.903
90 0.147     1.05

 

The generated values are an extension of Table 2b where the cumulative death rates are calculated until the end of the entire study period (in this case, 10th year).

-----

Again, thank you very much for your help!

Yoyong.

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
PGStats
Opal | Level 21

I would use this code:

 

data rates;
input Age   Age_spec_dth_rate;
datalines;
71    0.023
72    0.023
73    0.023
74    0.023
75    0.037
76    0.037
77    0.037
78    0.037
79    0.037
80    0.063
81    0.063
82    0.063
83    0.063
84    0.063
85    0.147
86    0.147
87    0.147
88    0.147
89    0.147
90    0.147
;

data patients;
input Patient_age Patient_age_last_follow_up;
patient_Id = _n_;
datalines;
74    80   
78    84   
80    83
;

data 
    t2a(keep=patient_id patient_age Patient_age_last_follow_up cum_rate) 
    t3a(keep=patient_id year cum_rate);
array r {0:100} _temporary_;
if _n_ = 1 then do until(done);
    set rates end=done;
    r{age} = Age_spec_dth_rate;
    end;
set patients;
cum_rate = 0;
do year = 1 to 10;
    age = patient_age + year - 1;
    cum_rate = cum_rate + r{min(age, hbound(r))};
    if age = Patient_age_last_follow_up - 1 then output t2a;
    output t3a;
    end;
run;
PG

View solution in original post

11 REPLIES 11
smantha
Lapis Lazuli | Level 10
data table_2a(keep=patient_age patient_age_last_follow_up Cum_dth_rate) table_2b(keep=patient age Cum_dth_rate)
table_3a(keep=year patient Cum_dth_rate) table_3b (keep=age patient Cum_dth_rate);
length patient $32.;
array age_death_rates{100} (0.023,
0.023,
0.023,
0.023,
0.037,
0.037,
0.037,
0.037,
0.037,
0.063,
0.063,
0.063,
0.063,
0.063,
0.147,
0.147,
0.147,
0.147,
0.147,
0.147,
0.023,
0.023,
0.023,
0.023,
0.037,
0.037,
0.037,
0.037,
0.037,
0.063,
0.063,
0.063,
0.063,
0.063,
0.147,
0.147,
0.147,
0.147,
0.147,
0.147,
0.023,
0.023,
0.023,
0.023,
0.037,
0.037,
0.037,
0.037,
0.037,
0.063,
0.063,
0.063,
0.063,
0.063,
0.147,
0.147,
0.147,
0.147,
0.147,
0.147,
0.023,
0.023,
0.023,
0.023,
0.037,
0.037,
0.037,
0.037,
0.037,
0.063,
0.063,
0.063,
0.063,
0.063,
0.147,
0.147,
0.147,
0.147,
0.147,
0.147,
0.023,
0.023,
0.023,
0.023,
0.037,
0.037,
0.037,
0.037,
0.037,
0.063,
0.063,
0.063,
0.063,
0.063,
0.147,
0.147,
0.147,
0.147,
0.147,
0.147
);
input patient_age patient_age_last_follow_up;
retain count 1;
Cum_dth_rate=0;
do age=patient_age to patient_age_last_follow_up-1;
Cum_dth_rate = age_death_rates[age] + Cum_dth_rate ;
put age_death_rates[age]=;
end;
output table_2a;
Cum_dth_rate = 0;
do age=patient_age to patient_age_last_follow_up;
patient='patient_cnt'||strip(put(count,12.));
output table_2b;
Cum_dth_rate = age_death_rates[age] + Cum_dth_rate ;
end;
Cum_dth_rate = 0;
do age=patient_age+1 to patient_age+10;
patient='patient_cnt'||strip(put(count,12.));
Cum_dth_rate = age_death_rates[age-1] + Cum_dth_rate ;
year=age-patient_age;
output table_3a;
output table_3b;
end;

count=count+1;
cards;
74 80
78 84
80 83
;;;
run;

proc sort data=table_3b;
by age;

proc transpose data=table_3b out=table_3b_trans(drop=_name_);
by age;
id patient;
var Cum_dth_rate;
;
run;

proc sort data=table_3a;
by year;

proc transpose data=table_3a out=table_3a_trans(drop=_name_);
by year;
id patient;
var Cum_dth_rate;
;
run;

proc sort data=table_2b;
by age;

proc transpose data=table_2b out=table_2b_trans(drop=_name_);
by age;
id patient;
var Cum_dth_rate;
;
run;

I created a random array of cumulative death rates by year. Please verify before you use it.

yoyong555
Obsidian | Level 7

Hi again @ballardw.

 

I have additional questions.

a. Instead of using cards, how do I change the code if I want to use a SAS data that contains the two columns with the same variable names? 

b. How do I change the code to produce this output ? Cum_dth_rateYno refers to cumulative death rate from Year 1 to Y 10. It is merging the Table 2 and 3 (but transposing the results).

 

Patient_age Patient_age_last_follow_up Cum_dth_rate  Cum_dth_rate_Y1  Cum_dth_rate_Y2  Cum_dth_rate_Y3  Cum_dth_rate_Y4  Cum_dth_rate_Y5
               
               
               
ballardw
Super User

Are the variables Cum_dth_rate_pt1, Cum_dth_rate_pt2 and Cum_dth_rate_pt3 supposed to be for Patient 1, Patient 2 and Patient 3?

If you are going to do further work in SAS that is a very cumbersome file layout. to work with.

 

 

yoyong555
Obsidian | Level 7

Hi @ballardw.

 

Yes, the Cum_dth_rate_pt1, Cum_dth_rate_pt2 and Cum_dth_rate_pt3 are supposed to be for Patient 1, Patient 2 and Patient 3.

Thank you very much for your reply and advice.

 

PGStats
Opal | Level 21

I would use this code:

 

data rates;
input Age   Age_spec_dth_rate;
datalines;
71    0.023
72    0.023
73    0.023
74    0.023
75    0.037
76    0.037
77    0.037
78    0.037
79    0.037
80    0.063
81    0.063
82    0.063
83    0.063
84    0.063
85    0.147
86    0.147
87    0.147
88    0.147
89    0.147
90    0.147
;

data patients;
input Patient_age Patient_age_last_follow_up;
patient_Id = _n_;
datalines;
74    80   
78    84   
80    83
;

data 
    t2a(keep=patient_id patient_age Patient_age_last_follow_up cum_rate) 
    t3a(keep=patient_id year cum_rate);
array r {0:100} _temporary_;
if _n_ = 1 then do until(done);
    set rates end=done;
    r{age} = Age_spec_dth_rate;
    end;
set patients;
cum_rate = 0;
do year = 1 to 10;
    age = patient_age + year - 1;
    cum_rate = cum_rate + r{min(age, hbound(r))};
    if age = Patient_age_last_follow_up - 1 then output t2a;
    output t3a;
    end;
run;
PG
yoyong555
Obsidian | Level 7

Thank you @PGStats. for your reply.

I have additional questions.

a. Instead of using cards, how do I change the code if I want to use a SAS data that contains the two columns with the same variable names? 

b. How do I change the code to produce this output ? Cum_dth_rateYno refers to cumulative death rate from Year 1 to Y 10. It is merging the Table 2 and 3 (but transposing the results).

 

Patient_age Patient_age_last_follow_up Cum_dth_rate  Cum_dth_rate_Y1  Cum_dth_rate_Y2  Cum_dth_rate_Y3  Cum_dth_rate_Y4  Cum_dth_rate_Y5
               
               
               
PGStats
Opal | Level 21

a) The code contains 3 data steps. The first two create the datasets RATES and PATIENTS. If you already have those, just skip those data steps.

b) Unless all you want is to return play with Excel, DON'T do this! SAS does not expect wide data. But if you realy have to, do this:

 

proc sql;
create table t3Long as
select 
    a.patient_id,
    patient_age,
    Patient_age_last_follow_up,
    a.cum_rate,
    year,
    b.cum_rate as cum_rate_y
from t2a as a, t3a as b
where a.patient_id=b.patient_id
order by patient_id, year;
quit;

proc transpose data=t3Long prefix=cum_rate_ out=t3aWide(drop=_name_);
by patient_id patient_age Patient_age_last_follow_up cum_rate;
id year;
var cum_rate_y;
run;For

 For SAS work, you would do better with the dataset t3Long.

PG
yoyong555
Obsidian | Level 7

I am really learning a lot. Thank you @PGStats !

In the RATES data set, a Gender column is added where each gender will have different sets of death rates for patients from 1 to 100 years old.

How will the code change if the cumulative death rate is both gender and age specific?

 

Gender Age Patient_age_last_follow_up cum_rate cum_rate_1 cum_rate2 cum_rate3
M 74 80        
M 78 84        
F 80 83        

 

 

PGStats
Opal | Level 21

Make separate datasets for each gender, then concatenate them. You could also work gender through the data step code, which would also be a good way to make sure you understand what the code is doing.

PG
yoyong555
Obsidian | Level 7

Thank you @PGStats .

 

I am already analyzing the data by gender and learning the code.

 

Here is the result of the last syntax.

 

yoyong555_0-1591932155410.png

 

I have two more questions.

 

I want to calculate the survival rate for of each patient for each cum_rate_year.

 

This means 10 more columns will be added to the table.

patient_id surv_rate_1 surv_rate_2 surv_rate3 .....
1        
2        
3        

 

surv_rate_n = exp -(cum_rate_n)

 

For patient 1, the surv_rate_1 for cum_rate_1 is exp(-0.023) = 0.977.

For patient 2, the surv_rate_1 for cum_rate_1 is exp(-0.037) = 0.963.

For patient 3, the surv_rate_1 for cum_rate_1 is exp(-0.063) = 0.939.

 

I should do this for each patient for every cum_rate. How do I include this in the syntax?

 

And lastly, I would like to produce a table that shows the average of surv_rates across all patients for each year.

 

year average_surv_rate
1  
2  
3  
4  
5  
6  
7  
8  
9  
10  

 

 

 

PGStats
Opal | Level 21

This should do it.

 

proc sql;
create table t3Long as
select 
    a.patient_id,
    patient_age,
    Patient_age_last_follow_up,
    a.cum_rate,
    year,
    b.cum_rate as cum_rate_y,
    exp(-b.cum_rate) as surv_rate_y
from t2a as a, t3a as b
where a.patient_id=b.patient_id
order by patient_id, year;
create table t3AvgSurv as
select 
    year,
    mean(surv_rate_y) as average_surv_rate
from t3Long
group by year;
quit;

proc transpose data=t3Long prefix=cum_rate_ out=t3aWide(drop=_name_);
by patient_id patient_age Patient_age_last_follow_up cum_rate;
id year;
var cum_rate_y;
run;

proc transpose data=t3Long prefix=surv_rate_ out=t3aSurvWide(drop=_name_);
by patient_id patient_age Patient_age_last_follow_up;
id year;
var surv_rate_y;
run;

Cheers!

PG

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 11 replies
  • 1495 views
  • 5 likes
  • 4 in conversation