BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Minhtrang
Obsidian | Level 7

Hi all,

I'm working with a longitudinal data with structure as below:

ID

Hbp1

Hbp2

Hbp3

Hbp4

Hbp5

1

0

0

0

0

1

2

0

0

0

.

.

3

0

.

.

0

.

4

0

0

1

1

1

5

0

.

0

.

1

 

Hbp1-Hbp5: hypertension(0: no, 1:yes) from year 1 to year 5.

I want to calculate person-years (survival time) and incidence rate for this data but I failed to write the SAS code. 

It's very easy to calculate by hand. Incidence case is defined as the first time the event appears during the follow-up period.

For example, ID 1 has 4+0.5 =4.5 person-years, Id 2 has 3+0.5=3.5 person-years, ID 3 has 4 person-years (missing values before the last observation are considered 0)...

But impossible to do that with big sample size. 

Does anybody have experience with calculating survival time using this data structure in SAS?  

 

1 ACCEPTED SOLUTION

Accepted Solutions
Ksharp
Super User
It is the kind of hard to understand what you are expected.
Assuming I know what you mean.



data have;
input ID Hbp1 Hbp2 Hbp3 Hbp4 Hbp5;
cards;
1 0 0 0 0 1
2 0 0 0 . .
3 0 . . 0 .
4 0 0 1 1 1
5 0 . 0 . 1
6 0 . 1 . .
;
run;


data want;
 if _n_=1 then do;
  declare hash h();
  h.definekey('k');
  h.definedone();
 end;
set have;
array x{*} Hbp:;
do i=dim(x) to 1 by -1;
 if x{i}=0 then leave;
 k=x{i};h.replace();
end;
person_year=i+0.5*h.num_items;
h.clear();
drop k i;
run;


View solution in original post

12 REPLIES 12
Minhtrang
Obsidian | Level 7

Sorry I have 1 correction:

Missing values before Observation with value 0 are consider 0

Missing values before Observation with value 1 are still missing values.

Kurt_Bremser
Super User

Why does ID 3 have 4 years and not 4.5 or 5.5? And why does ID 2 get 3.5 years when only the last value may be considered 1 when missing?

Nonetheless:

data want;
set have;
array hbp {*} hbp1-hbp5;
i = 1;
do while (i <= dim(hbp) and hbp{i} ne 1);
  i + 1;
end;
person_years = i - .5;
run;
Minhtrang
Obsidian | Level 7

Dear Kurt,

Thank you very much for your quick reply.

You're right, some of my calcultion are wrong.

I checked your code and have error as below

 

1 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
55
56 data want;
57 set hbp;
58 array hbp {*} hbp1-hbp5;
59 i = 1;
60 do while (i <= dim(hbp) and hbp{i} ne 1);
61 i + 1;
62 end;
63 person_years = i - .5;
64 run;
 
ERROR: Array subscript out of range at line 60 column 29.
id=2 hbp1=0 hbp2=0 hbp3=0 hbp4=. hbp5=. i=6 person_years=. _ERROR_=1 _N_=2
NOTE: The SAS System stopped processing this step because of errors.
NOTE: There were 2 observations read from the data set WORK.HBP.
WARNING: The data set WORK.WANT may be incomplete. When this step was stopped there were 1 observations and 8 variables.
NOTE: DATA statement used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds
 
Do you have the same error?
Kurt_Bremser
Super User

Yeah, have to safeguard because SAS does not stop evaluating a boolean "and" when the first false condition is encountered.

Improved code with test data&colon;

data have;
input ID Hbp1 Hbp2 Hbp3 Hbp4 Hbp5;
cards;
1 0 0 0 0 1
2 0 0 0 . .
3 0 . . 0 .
4 0 0 1 1 1
5 0 . 0 . 1
6 0 . 1 . .
;
run;

data want;
set have;
array hbp {*} hbp1-hbp5;
i = 1;
do while (i <= dim(hbp) and person_years = .);
  if hbp{i} = 1 or i = dim(hbp) then person_years = i - .5;
  i + 1;
end;
drop i;
run;

proc print;
run;

Result:

                                                                                         person_
                                    Obs    ID    Hbp1    Hbp2    Hbp3    Hbp4    Hbp5     years

                                     1      1      0       0       0       0       1       4.5  
                                     2      2      0       0       0       .       .       4.5  
                                     3      3      0       .       .       0       .       4.5  
                                     4      4      0       0       1       1       1       2.5  
                                     5      5      0       .       0       .       1       4.5  
                                     6      6      0       .       1       .       .       2.5  
Minhtrang
Obsidian | Level 7

Dear Kurt,

I think the results should be like this:

                                    Obs    ID    Hbp1    Hbp2    Hbp3    Hbp4    Hbp5     expect_results   your_results

                                     1      1      0       0       0       0       1       4.5                 4.5
                                     2      2      0       0       0       .       .       3.5                 4.5
                                     3      3      0       .       .       0       .       4.5                 4.5
                                     4      4      0       0       1       1       1       2.5                 2.5
                                     5      5      0       .       0       .       1       4                   4.5
                                     6      6      0       .       1       .       .       2                   2.5

ID 2 can not have the same result as ID 1. It should be 3.5

ID 5 has 1 missing value before the last observation (Hbp5) and this last obs=1, so person_years=3 years (from Hbp1 to hbp3) + 1 year (1/2 length of time from Hbp3 to Hbp5 or 2 years, 'cause of the missing value in year 4_Hbp4).

ID 6 has 1 missing value before the last observation (Hbp3) and this  last obs=1, so so person_years=1 years (1/2 length of time from Hbp1 to hbp3 or 2 years).

Perhaps I should clarify my rule again:

  • Missing values before Observation with value 0 are consider 0 (so we count this year=1)
  • Missing values before Observation with value 1 are still missing values. (so person_years will be 1/2 length of time from the last obs before missing value to the first obs with value 1)

Can you help me adjust the code to produce the results for the above conditions? I could not work it out.

Kurt_Bremser
Super User

@Minhtrang wrote:

Dear Kurt,

I think the results should be like this:

                                    Obs    ID    Hbp1    Hbp2    Hbp3    Hbp4    Hbp5     expect_results   your_results

                                     1      1      0       0       0       0       1       4.5                 4.5
                                     2      2      0       0       0       .       .       3.5                 4.5
                                     3      3      0       .       .       0       .       4.5                 4.5
                                     4      4      0       0       1       1       1       2.5                 2.5
                                     5      5      0       .       0       .       1       4                   4.5
                                     6      6      0       .       1       .       .       2                   2.5

ID 2 can not have the same result as ID 1. It should be 3.5

ID 5 has 1 missing value before the last observation (Hbp5) and this last obs=1, so person_years=3 years (from Hbp1 to hbp3) + 1 year (1/2 length of time from Hbp3 to Hbp5 or 2 years, 'cause of the missing value in year 4_Hbp4).

ID 6 has 1 missing value before the last observation (Hbp3) and this  last obs=1, so so person_years=1 years (1/2 length of time from Hbp1 to hbp3 or 2 years).

Perhaps I should clarify my rule again:

  • Missing values before Observation with value 0 are consider 0 (so we count this year=1)
  • Missing values before Observation with value 1 are still missing values. (so person_years will be 1/2 length of time from the last obs before missing value to the first obs with value 1)

Can you help me adjust the code to produce the results for the above conditions? I could not work it out.


Check your rules again. Per your rules, ID 2 would get a virtual "1" in hbp5 and a missing value in hbp4, so the time between hbp3 and hbp5 would be two years, half that is 1 year, add to 3 means 4 years.

Minhtrang
Obsidian | Level 7

Dear Kurt,

I'm sorry that my English is not good enough so that my explanation was confusing.

For ID 2, he was followed up for 3 years (from hbp1 to hbp3, if hbp2 is missing, I still consider it as hbp2=0). Information on hbp4 and hbp5 is missing due to drop out or loss to follow-up, so I just add 0.5 year. Finally, the person_years for this ID is 3.5.

Maybe I should ajust my rule as:

Any missing value between 0...0 is considered as 0

Any missing value between 0..1 stays the same as missing, and the person_years for this period=1/2 length of time from 0..to..1 (ex: hbp2...hbp5: 1/2 length of time is 1.5 years)

I hope you can understand my explanation this time and help me adjust for the code.

Thank you very much.

Kurt_Bremser
Super User

But for ID2, there are TWO years from hbp3 to hbp5 (5 minus 3), and half of that is ONE year. 3 + 1 = 4.

Your value of 3.5 for ID 2 contradicts your example in

"Any missing value between 0..1 stays the same as missing, and the person_years for this period=1/2 length of time from 0..to..1 (ex: hbp2...hbp5: 1/2 length of time is 1.5 years)"

Minhtrang
Obsidian | Level 7

Dear Kurt,

ID 2 has Hbp3 (value 0) Hbp4 (missing value) Hbp5 (missing value): 0 . .

So it's not the case "any missing value between 0..1" but between "0....until the last observation which is also a missing value"

Hbp5 has a missing value, not 1.

Anyway, thank you very much for your help. I will try to write the code, maybe a simplebut very long code.

I still hope to hear from your solution. 

Best,

 

Minhtrang
Obsidian | Level 7

Dear Kurt,

I worked it out, thanks to some hint from your code (array and loop). My code is like this:

*hbp1-hbp5: hypertension from visit 1 to 5;

*hp: hypertension after 5 year follow-up;

*year1: last visit with hbp=0 if hp=0;

*incvisit: visit with incidence hypertension;

*lastzero: last visit with hbp=0 before incvisit;

*personyear: person-years;

data hbp;
input id hbp1 hbp2 hbp3 hbp4 hbp5 hp;
datalines;
1 0 0 0 0 1 1
2 0 0 0 . . 0
3 0 . . 0 . 0
4 0 0 1 1 1 1
5 0 . 0 . 1 1
;
run;

data want;
set hbp;
array hbp {*} hbp1-hbp5;
i = 1;
do while (hp=0 and i<=dim(hbp));
if missing(hbp{i})=0 then year1=i;
i+1;
end;
do while (hp=1 and i<=dim(hbp) and incvisit=.);
if hbp{i}=1 then incvisit=i;
i+1;
end;
drop i;
run;

data want1;
set want;
array hbp {*} hbp1-hbp5;
i = 1;
do while (hp=1 and i<=incvisit);
if hbp{i}=0 then lastzero=i;
i+1;
end;
drop i;
run;

data want2;
set want1;
if hp=0 then personyear=year1;
else personyear=lastzero+(incvisit-lastzero)/2;
run;

proc print;
run;

 

The result with variable "personyear":

 

Obs

id

hbp1

hbp2

hbp3

hbp4

hbp5

hp

year1

incvisit

lastzero

personyear

1

1

0

0

0

0

1

1

.

5

4

4.5

2

2

0

0

0

.

.

0

3

.

.

3.0

3

3

0

.

.

0

.

0

4

.

.

4.0

4

4

0

0

1

1

1

1

.

3

2

2.5

5

5

0

.

0

.

1

1

.

5

3

4.0

Ksharp
Super User
It is the kind of hard to understand what you are expected.
Assuming I know what you mean.



data have;
input ID Hbp1 Hbp2 Hbp3 Hbp4 Hbp5;
cards;
1 0 0 0 0 1
2 0 0 0 . .
3 0 . . 0 .
4 0 0 1 1 1
5 0 . 0 . 1
6 0 . 1 . .
;
run;


data want;
 if _n_=1 then do;
  declare hash h();
  h.definekey('k');
  h.definedone();
 end;
set have;
array x{*} Hbp:;
do i=dim(x) to 1 by -1;
 if x{i}=0 then leave;
 k=x{i};h.replace();
end;
person_year=i+0.5*h.num_items;
h.clear();
drop k i;
run;


Minhtrang
Obsidian | Level 7

Dear Xia,

I'm really surprised by your code! It's so short and it creates the same results which I expected.

I've more experience now.

Thank you very much. 

Best,

 

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

New Learning Events in April

 

Join us for two new fee-based courses: Administrative Healthcare Data and SAS via Live Web Monday-Thursday, April 24-27 from 1:00 to 4:30 PM ET each day. And Administrative Healthcare Data and SAS: Hands-On Programming Workshop via Live Web on Friday, April 28 from 9:00 AM to 5:00 PM ET.

LEARN MORE

Discussion stats
  • 12 replies
  • 3195 views
  • 1 like
  • 3 in conversation