Hi everyone.
I hope you are all doing fine.
I have a large data set which requires calculations. For a small data set (probably less than 30), I can patiently use MS Excel.
So, anyone's help would greatly be appreciated.
Thank you in advance.
And keep safe!
------------------------------------
I have a reference data (Table 1) and two tables (Tables 2 and 3) to produce.
Table 1 (reference data) contains two variables - Age (0 to 100) and Age specific death rates. I use this to populate Tables 2 and 3.
Below is a subset.
Age | Age_spec_dth_rates |
71 | 0.023 |
72 | 0.023 |
73 | 0.023 |
74 | 0.023 |
75 | 0.037 |
76 | 0.037 |
77 | 0.037 |
78 | 0.037 |
79 | 0.037 |
80 | 0.063 |
81 | 0.063 |
82 | 0.063 |
83 | 0.063 |
84 | 0.063 |
85 | 0.147 |
86 | 0.147 |
87 | 0.147 |
88 |
0.147 |
89 | 0.147 |
90 | 0.147 |
Table 2 has three variables - age of the patients, age of the patients at last follow up and cumulative death rate.
I need to calculate the cumulative death rates for each patient using the reference data.
Table 2.
Patient_age | Patient_age_last_follow_up | Cum_dth_rate |
74 | 80 | |
78 | 84 | |
80 | 83 |
Patient 1 joined the study at 74 years old and lived until 80 years old.
Patient 2 joined the study at 78 years old and lived until 84 years old.
Patient 2 joined the study at 80 years old and lived until 83 years old.
The cumulative death rate (cum_dth_rate) is obtained by adding the death rates at each year to the cumulative death rate for the previous year.
Table 2a shows the results. Table 2b shows how the values are generated.
Table 2a.
Patient_age | Patient_age_last_follow_up | Cum_dth_rate |
74 | 80 | 0.208 |
78 | 84 | 0.326 |
80 | 83 | 0.189 |
Table 2b.
Age | Age_spec_dth_rates | Cum_dth_rate_pt1 | Cum_dth_rate_pt2 | Cum_dth_rate_pt3 |
71 | 0.023 | |||
72 | 0.023 | |||
73 | 0.023 | |||
74 | 0.023 | |||
75 | 0.037 | 0.023 | ||
76 | 0.037 | 0.06 | ||
77 | 0.037 | 0.097 | ||
78 | 0.037 | 0.134 | ||
79 | 0.037 | 0.171 | 0.037 | |
80 | 0.063 | 0.208 | 0.074 | |
81 | 0.063 | 0.137 | 0.063 | |
82 | 0.063 | 0.2 | 0.126 | |
83 | 0.063 | 0.263 | 0.189 | |
84 | 0.063 | 0.326 | ||
85 | 0.147 | |||
86 | 0.147 | |||
87 | 0.147 | |||
88 |
0.147 |
|
|
|
89 | 0.147 | |||
90 | 0.147 |
Table 3 is different. Using again the reference data, I need to populate each patient's cumulative death rate until the end of the longest follow - up time of the study. For example, if the study is conducted for 10 years, the table will look like this:
Table 3.
Study Year | Cum_dth_rate_pt1 | Cum_dth_rate_pt2 | Cum_dth_rate_pt3 |
1 | |||
2 | |||
3 | |||
4 | |||
5 | |||
6 | |||
7 | |||
8 | |||
9 | |||
10 |
Table 3a shows the results. Table 3b shows how the values are generated.
Table 3a.
Study Year | Cum_dth_rate_pt1 | Cum_dth_rate_pt2 | Cum_dth_rate_pt3 |
1 | 0.023 | 0.037 | 0.063 |
2 | 0.06 | 0.074 | 0.126 |
3 | 0.097 | 0.137 | 0.189 |
4 | 0.134 | 0.2 | 0.252 |
5 | 0.171 | 0.263 | 0.315 |
6 | 0.208 | 0.326 | 0.462 |
7 | 0.271 | 0.389 | 0.609 |
8 | 0.334 | 0.536 | 0.756 |
9 | 0.397 | 0.683 | 0.903 |
10 | 0.46 | 0.83 | 1.05 |
Table 3b.
Age | Age_spec_dth_rates | Cum_dth_rate_pt1 | Cum_dth_rate_pt2 | Cum_dth_rate_pt3 |
71 | 0.023 | |||
72 | 0.023 | |||
73 | 0.023 | |||
74 | 0.023 | |||
75 | 0.037 | 0.023 | ||
76 | 0.037 | 0.06 | ||
77 | 0.037 | 0.097 | ||
78 | 0.037 | 0.134 | ||
79 | 0.037 | 0.171 | 0.037 | |
80 | 0.063 | 0.208 | 0.074 | |
81 | 0.063 | 0.271 | 0.137 | 0.063 |
82 | 0.063 | 0.334 | 0.2 | 0.126 |
83 | 0.063 | 0.397 | 0.263 | 0.189 |
84 | 0.063 | 0.46 | 0.326 | 0.252 |
85 | 0.147 | 0.389 | 0.315 | |
86 | 0.147 | 0.536 | 0.462 | |
87 | 0.147 | 0.683 | 0.609 | |
88 |
0.147 |
|
0.83 | 0.756 |
89 | 0.147 | 0.903 | ||
90 | 0.147 | 1.05 |
The generated values are an extension of Table 2b where the cumulative death rates are calculated until the end of the entire study period (in this case, 10th year).
-----
Again, thank you very much for your help!
Yoyong.
I would use this code:
data rates;
input Age Age_spec_dth_rate;
datalines;
71 0.023
72 0.023
73 0.023
74 0.023
75 0.037
76 0.037
77 0.037
78 0.037
79 0.037
80 0.063
81 0.063
82 0.063
83 0.063
84 0.063
85 0.147
86 0.147
87 0.147
88 0.147
89 0.147
90 0.147
;
data patients;
input Patient_age Patient_age_last_follow_up;
patient_Id = _n_;
datalines;
74 80
78 84
80 83
;
data
t2a(keep=patient_id patient_age Patient_age_last_follow_up cum_rate)
t3a(keep=patient_id year cum_rate);
array r {0:100} _temporary_;
if _n_ = 1 then do until(done);
set rates end=done;
r{age} = Age_spec_dth_rate;
end;
set patients;
cum_rate = 0;
do year = 1 to 10;
age = patient_age + year - 1;
cum_rate = cum_rate + r{min(age, hbound(r))};
if age = Patient_age_last_follow_up - 1 then output t2a;
output t3a;
end;
run;
data table_2a(keep=patient_age patient_age_last_follow_up Cum_dth_rate) table_2b(keep=patient age Cum_dth_rate)
table_3a(keep=year patient Cum_dth_rate) table_3b (keep=age patient Cum_dth_rate);
length patient $32.;
array age_death_rates{100} (0.023,
0.023,
0.023,
0.023,
0.037,
0.037,
0.037,
0.037,
0.037,
0.063,
0.063,
0.063,
0.063,
0.063,
0.147,
0.147,
0.147,
0.147,
0.147,
0.147,
0.023,
0.023,
0.023,
0.023,
0.037,
0.037,
0.037,
0.037,
0.037,
0.063,
0.063,
0.063,
0.063,
0.063,
0.147,
0.147,
0.147,
0.147,
0.147,
0.147,
0.023,
0.023,
0.023,
0.023,
0.037,
0.037,
0.037,
0.037,
0.037,
0.063,
0.063,
0.063,
0.063,
0.063,
0.147,
0.147,
0.147,
0.147,
0.147,
0.147,
0.023,
0.023,
0.023,
0.023,
0.037,
0.037,
0.037,
0.037,
0.037,
0.063,
0.063,
0.063,
0.063,
0.063,
0.147,
0.147,
0.147,
0.147,
0.147,
0.147,
0.023,
0.023,
0.023,
0.023,
0.037,
0.037,
0.037,
0.037,
0.037,
0.063,
0.063,
0.063,
0.063,
0.063,
0.147,
0.147,
0.147,
0.147,
0.147,
0.147
);
input patient_age patient_age_last_follow_up;
retain count 1;
Cum_dth_rate=0;
do age=patient_age to patient_age_last_follow_up-1;
Cum_dth_rate = age_death_rates[age] + Cum_dth_rate ;
put age_death_rates[age]=;
end;
output table_2a;
Cum_dth_rate = 0;
do age=patient_age to patient_age_last_follow_up;
patient='patient_cnt'||strip(put(count,12.));
output table_2b;
Cum_dth_rate = age_death_rates[age] + Cum_dth_rate ;
end;
Cum_dth_rate = 0;
do age=patient_age+1 to patient_age+10;
patient='patient_cnt'||strip(put(count,12.));
Cum_dth_rate = age_death_rates[age-1] + Cum_dth_rate ;
year=age-patient_age;
output table_3a;
output table_3b;
end;
count=count+1;
cards;
74 80
78 84
80 83
;;;
run;
proc sort data=table_3b;
by age;
proc transpose data=table_3b out=table_3b_trans(drop=_name_);
by age;
id patient;
var Cum_dth_rate;
;
run;
proc sort data=table_3a;
by year;
proc transpose data=table_3a out=table_3a_trans(drop=_name_);
by year;
id patient;
var Cum_dth_rate;
;
run;
proc sort data=table_2b;
by age;
proc transpose data=table_2b out=table_2b_trans(drop=_name_);
by age;
id patient;
var Cum_dth_rate;
;
run;
I created a random array of cumulative death rates by year. Please verify before you use it.
Hi again @ballardw.
I have additional questions.
a. Instead of using cards, how do I change the code if I want to use a SAS data that contains the two columns with the same variable names?
b. How do I change the code to produce this output ? Cum_dth_rateYno refers to cumulative death rate from Year 1 to Y 10. It is merging the Table 2 and 3 (but transposing the results).
Patient_age | Patient_age_last_follow_up | Cum_dth_rate | Cum_dth_rate_Y1 | Cum_dth_rate_Y2 | Cum_dth_rate_Y3 | Cum_dth_rate_Y4 | Cum_dth_rate_Y5 |
Are the variables Cum_dth_rate_pt1, Cum_dth_rate_pt2 and Cum_dth_rate_pt3 supposed to be for Patient 1, Patient 2 and Patient 3?
If you are going to do further work in SAS that is a very cumbersome file layout. to work with.
Hi @ballardw.
Yes, the Cum_dth_rate_pt1, Cum_dth_rate_pt2 and Cum_dth_rate_pt3 are supposed to be for Patient 1, Patient 2 and Patient 3.
Thank you very much for your reply and advice.
I would use this code:
data rates;
input Age Age_spec_dth_rate;
datalines;
71 0.023
72 0.023
73 0.023
74 0.023
75 0.037
76 0.037
77 0.037
78 0.037
79 0.037
80 0.063
81 0.063
82 0.063
83 0.063
84 0.063
85 0.147
86 0.147
87 0.147
88 0.147
89 0.147
90 0.147
;
data patients;
input Patient_age Patient_age_last_follow_up;
patient_Id = _n_;
datalines;
74 80
78 84
80 83
;
data
t2a(keep=patient_id patient_age Patient_age_last_follow_up cum_rate)
t3a(keep=patient_id year cum_rate);
array r {0:100} _temporary_;
if _n_ = 1 then do until(done);
set rates end=done;
r{age} = Age_spec_dth_rate;
end;
set patients;
cum_rate = 0;
do year = 1 to 10;
age = patient_age + year - 1;
cum_rate = cum_rate + r{min(age, hbound(r))};
if age = Patient_age_last_follow_up - 1 then output t2a;
output t3a;
end;
run;
Thank you @PGStats. for your reply.
I have additional questions.
a. Instead of using cards, how do I change the code if I want to use a SAS data that contains the two columns with the same variable names?
b. How do I change the code to produce this output ? Cum_dth_rateYno refers to cumulative death rate from Year 1 to Y 10. It is merging the Table 2 and 3 (but transposing the results).
Patient_age | Patient_age_last_follow_up | Cum_dth_rate | Cum_dth_rate_Y1 | Cum_dth_rate_Y2 | Cum_dth_rate_Y3 | Cum_dth_rate_Y4 | Cum_dth_rate_Y5 |
a) The code contains 3 data steps. The first two create the datasets RATES and PATIENTS. If you already have those, just skip those data steps.
b) Unless all you want is to return play with Excel, DON'T do this! SAS does not expect wide data. But if you realy have to, do this:
proc sql;
create table t3Long as
select
a.patient_id,
patient_age,
Patient_age_last_follow_up,
a.cum_rate,
year,
b.cum_rate as cum_rate_y
from t2a as a, t3a as b
where a.patient_id=b.patient_id
order by patient_id, year;
quit;
proc transpose data=t3Long prefix=cum_rate_ out=t3aWide(drop=_name_);
by patient_id patient_age Patient_age_last_follow_up cum_rate;
id year;
var cum_rate_y;
run;For
For SAS work, you would do better with the dataset t3Long.
I am really learning a lot. Thank you @PGStats !
In the RATES data set, a Gender column is added where each gender will have different sets of death rates for patients from 1 to 100 years old.
How will the code change if the cumulative death rate is both gender and age specific?
Gender | Age | Patient_age_last_follow_up | cum_rate | cum_rate_1 | cum_rate2 | cum_rate3 |
M | 74 | 80 | ||||
M | 78 | 84 | ||||
F | 80 | 83 |
Make separate datasets for each gender, then concatenate them. You could also work gender through the data step code, which would also be a good way to make sure you understand what the code is doing.
Thank you @PGStats .
I am already analyzing the data by gender and learning the code.
Here is the result of the last syntax.
I have two more questions.
I want to calculate the survival rate for of each patient for each cum_rate_year.
This means 10 more columns will be added to the table.
patient_id | surv_rate_1 | surv_rate_2 | surv_rate3 | ..... |
1 | ||||
2 | ||||
3 |
surv_rate_n = exp -(cum_rate_n)
For patient 1, the surv_rate_1 for cum_rate_1 is exp(-0.023) = 0.977.
For patient 2, the surv_rate_1 for cum_rate_1 is exp(-0.037) = 0.963.
For patient 3, the surv_rate_1 for cum_rate_1 is exp(-0.063) = 0.939.
I should do this for each patient for every cum_rate. How do I include this in the syntax?
And lastly, I would like to produce a table that shows the average of surv_rates across all patients for each year.
year | average_surv_rate |
1 | |
2 | |
3 | |
4 | |
5 | |
6 | |
7 | |
8 | |
9 | |
10 |
This should do it.
proc sql;
create table t3Long as
select
a.patient_id,
patient_age,
Patient_age_last_follow_up,
a.cum_rate,
year,
b.cum_rate as cum_rate_y,
exp(-b.cum_rate) as surv_rate_y
from t2a as a, t3a as b
where a.patient_id=b.patient_id
order by patient_id, year;
create table t3AvgSurv as
select
year,
mean(surv_rate_y) as average_surv_rate
from t3Long
group by year;
quit;
proc transpose data=t3Long prefix=cum_rate_ out=t3aWide(drop=_name_);
by patient_id patient_age Patient_age_last_follow_up cum_rate;
id year;
var cum_rate_y;
run;
proc transpose data=t3Long prefix=surv_rate_ out=t3aSurvWide(drop=_name_);
by patient_id patient_age Patient_age_last_follow_up;
id year;
var surv_rate_y;
run;
Cheers!
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.