04-04-2016 05:54 AM
Hi all,
I'm working with a longitudinal data with structure as below:
ID |
Hbp1 |
Hbp2 |
Hbp3 |
Hbp4 |
Hbp5 |
1 |
0 |
0 |
0 |
0 |
1 |
2 |
0 |
0 |
0 |
. |
. |
3 |
0 |
. |
. |
0 |
. |
4 |
0 |
0 |
1 |
1 |
1 |
5 |
0 |
. |
0 |
. |
1 |
Hbp1-Hbp5: hypertension(0: no, 1:yes) from year 1 to year 5.
I want to calculate person-years (survival time) and incidence rate for this data but I failed to write the SAS code.
It's very easy to calculate by hand. Incidence case is defined as the first time the event appears during the follow-up period.
For example, ID 1 has 4+0.5 =4.5 person-years, Id 2 has 3+0.5=3.5 person-years, ID 3 has 4 person-years (missing values before the last observation are considered 0)...
But impossible to do that with big sample size.
Does anybody have experience with calculating survival time using this data structure in SAS?
04-05-2016 01:09 AM
It is the kind of hard to understand what you are expected. Assuming I know what you mean. data have; input ID Hbp1 Hbp2 Hbp3 Hbp4 Hbp5; cards; 1 0 0 0 0 1 2 0 0 0 . . 3 0 . . 0 . 4 0 0 1 1 1 5 0 . 0 . 1 6 0 . 1 . . ; run; data want; if _n_=1 then do; declare hash h(); h.definekey('k'); h.definedone(); end; set have; array x{*} Hbp:; do i=dim(x) to 1 by -1; if x{i}=0 then leave; k=x{i};h.replace(); end; person_year=i+0.5*h.num_items; h.clear(); drop k i; run;
04-04-2016 06:02 AM
Sorry I have 1 correction:
Missing values before Observation with value 0 are consider 0
Missing values before Observation with value 1 are still missing values.
04-04-2016 06:15 AM
Why does ID 3 have 4 years and not 4.5 or 5.5? And why does ID 2 get 3.5 years when only the last value may be considered 1 when missing?
Nonetheless:
data want;
set have;
array hbp {*} hbp1-hbp5;
i = 1;
do while (i <= dim(hbp) and hbp{i} ne 1);
i + 1;
end;
person_years = i - .5;
run;
04-04-2016 06:34 AM
Dear Kurt,
Thank you very much for your quick reply.
You're right, some of my calcultion are wrong.
I checked your code and have error as below
04-04-2016 06:51 AM
Yeah, have to safeguard because SAS does not stop evaluating a boolean "and" when the first false condition is encountered.
Improved code with test data:
data have;
input ID Hbp1 Hbp2 Hbp3 Hbp4 Hbp5;
cards;
1 0 0 0 0 1
2 0 0 0 . .
3 0 . . 0 .
4 0 0 1 1 1
5 0 . 0 . 1
6 0 . 1 . .
;
run;
data want;
set have;
array hbp {*} hbp1-hbp5;
i = 1;
do while (i <= dim(hbp) and person_years = .);
if hbp{i} = 1 or i = dim(hbp) then person_years = i - .5;
i + 1;
end;
drop i;
run;
proc print;
run;
Result:
person_ Obs ID Hbp1 Hbp2 Hbp3 Hbp4 Hbp5 years 1 1 0 0 0 0 1 4.5 2 2 0 0 0 . . 4.5 3 3 0 . . 0 . 4.5 4 4 0 0 1 1 1 2.5 5 5 0 . 0 . 1 4.5 6 6 0 . 1 . . 2.5
04-04-2016 07:48 AM
Dear Kurt,
I think the results should be like this:
Obs ID Hbp1 Hbp2 Hbp3 Hbp4 Hbp5 expect_results your_results 1 1 0 0 0 0 1 4.5 4.5 2 2 0 0 0 . . 3.5 4.5 3 3 0 . . 0 . 4.5 4.5 4 4 0 0 1 1 1 2.5 2.5 5 5 0 . 0 . 1 4 4.5 6 6 0 . 1 . . 2 2.5
ID 2 can not have the same result as ID 1. It should be 3.5
ID 5 has 1 missing value before the last observation (Hbp5) and this last obs=1, so person_years=3 years (from Hbp1 to hbp3) + 1 year (1/2 length of time from Hbp3 to Hbp5 or 2 years, 'cause of the missing value in year 4_Hbp4).
ID 6 has 1 missing value before the last observation (Hbp3) and this last obs=1, so so person_years=1 years (1/2 length of time from Hbp1 to hbp3 or 2 years).
Perhaps I should clarify my rule again:
Can you help me adjust the code to produce the results for the above conditions? I could not work it out.
04-04-2016 08:01 AM
Minhtrang wrote:
Dear Kurt,
I think the results should be like this:
Obs ID Hbp1 Hbp2 Hbp3 Hbp4 Hbp5 expect_results your_results 1 1 0 0 0 0 1 4.5 4.5 2 2 0 0 0 . . 3.5 4.5 3 3 0 . . 0 . 4.5 4.5 4 4 0 0 1 1 1 2.5 2.5 5 5 0 . 0 . 1 4 4.5 6 6 0 . 1 . . 2 2.5ID 2 can not have the same result as ID 1. It should be 3.5
ID 5 has 1 missing value before the last observation (Hbp5) and this last obs=1, so person_years=3 years (from Hbp1 to hbp3) + 1 year (1/2 length of time from Hbp3 to Hbp5 or 2 years, 'cause of the missing value in year 4_Hbp4).
ID 6 has 1 missing value before the last observation (Hbp3) and this last obs=1, so so person_years=1 years (1/2 length of time from Hbp1 to hbp3 or 2 years).
Perhaps I should clarify my rule again:
- Missing values before Observation with value 0 are consider 0 (so we count this year=1)
- Missing values before Observation with value 1 are still missing values. (so person_years will be 1/2 length of time from the last obs before missing value to the first obs with value 1)
Can you help me adjust the code to produce the results for the above conditions? I could not work it out.
Check your rules again. Per your rules, ID 2 would get a virtual "1" in hbp5 and a missing value in hbp4, so the time between hbp3 and hbp5 would be two years, half that is 1 year, add to 3 means 4 years.
04-04-2016 08:57 AM
Dear Kurt,
I'm sorry that my English is not good enough so that my explanation was confusing.
For ID 2, he was followed up for 3 years (from hbp1 to hbp3, if hbp2 is missing, I still consider it as hbp2=0). Information on hbp4 and hbp5 is missing due to drop out or loss to follow-up, so I just add 0.5 year. Finally, the person_years for this ID is 3.5.
Maybe I should ajust my rule as:
Any missing value between 0...0 is considered as 0
Any missing value between 0..1 stays the same as missing, and the person_years for this period=1/2 length of time from 0..to..1 (ex: hbp2...hbp5: 1/2 length of time is 1.5 years)
I hope you can understand my explanation this time and help me adjust for the code.
Thank you very much.
04-04-2016 09:29 AM
But for ID2, there are TWO years from hbp3 to hbp5 (5 minus 3), and half of that is ONE year. 3 + 1 = 4.
Your value of 3.5 for ID 2 contradicts your example in
"Any missing value between 0..1 stays the same as missing, and the person_years for this period=1/2 length of time from 0..to..1 (ex: hbp2...hbp5: 1/2 length of time is 1.5 years)"
04-04-2016 09:59 AM
Dear Kurt,
ID 2 has Hbp3 (value 0) Hbp4 (missing value) Hbp5 (missing value): 0 . .
So it's not the case "any missing value between 0..1" but between "0....until the last observation which is also a missing value"
Hbp5 has a missing value, not 1.
Anyway, thank you very much for your help. I will try to write the code, maybe a simplebut very long code.
I still hope to hear from your solution.
Best,
04-04-2016 11:49 PM
Dear Kurt,
I worked it out, thanks to some hint from your code (array and loop). My code is like this:
*hbp1-hbp5: hypertension from visit 1 to 5;
*hp: hypertension after 5 year follow-up;
*year1: last visit with hbp=0 if hp=0;
*incvisit: visit with incidence hypertension;
*lastzero: last visit with hbp=0 before incvisit;
*personyear: person-years;
data hbp;
input id hbp1 hbp2 hbp3 hbp4 hbp5 hp;
datalines;
1 0 0 0 0 1 1
2 0 0 0 . . 0
3 0 . . 0 . 0
4 0 0 1 1 1 1
5 0 . 0 . 1 1
;
run;
data want;
set hbp;
array hbp {*} hbp1-hbp5;
i = 1;
do while (hp=0 and i<=dim(hbp));
if missing(hbp{i})=0 then year1=i;
i+1;
end;
do while (hp=1 and i<=dim(hbp) and incvisit=.);
if hbp{i}=1 then incvisit=i;
i+1;
end;
drop i;
run;
data want1;
set want;
array hbp {*} hbp1-hbp5;
i = 1;
do while (hp=1 and i<=incvisit);
if hbp{i}=0 then lastzero=i;
i+1;
end;
drop i;
run;
data want2;
set want1;
if hp=0 then personyear=year1;
else personyear=lastzero+(incvisit-lastzero)/2;
run;
proc print;
run;
The result with variable "personyear":
Obs |
id |
hbp1 |
hbp2 |
hbp3 |
hbp4 |
hbp5 |
hp |
year1 |
incvisit |
lastzero |
personyear |
1 |
1 |
0 |
0 |
0 |
0 |
1 |
1 |
. |
5 |
4 |
4.5 |
2 |
2 |
0 |
0 |
0 |
. |
. |
0 |
3 |
. |
. |
3.0 |
3 |
3 |
0 |
. |
. |
0 |
. |
0 |
4 |
. |
. |
4.0 |
4 |
4 |
0 |
0 |
1 |
1 |
1 |
1 |
. |
3 |
2 |
2.5 |
5 |
5 |
0 |
. |
0 |
. |
1 |
1 |
. |
5 |
3 |
4.0 |
04-05-2016 01:09 AM
It is the kind of hard to understand what you are expected. Assuming I know what you mean. data have; input ID Hbp1 Hbp2 Hbp3 Hbp4 Hbp5; cards; 1 0 0 0 0 1 2 0 0 0 . . 3 0 . . 0 . 4 0 0 1 1 1 5 0 . 0 . 1 6 0 . 1 . . ; run; data want; if _n_=1 then do; declare hash h(); h.definekey('k'); h.definedone(); end; set have; array x{*} Hbp:; do i=dim(x) to 1 by -1; if x{i}=0 then leave; k=x{i};h.replace(); end; person_year=i+0.5*h.num_items; h.clear(); drop k i; run;
04-05-2016 05:38 AM
Dear Xia,
I'm really surprised by your code! It's so short and it creates the same results which I expected.
I've more experience now.
Thank you very much.
Best,