I am trying to calculate person time for an analysis where I have individual study IDs spanning multiple observations.
I have the data in a format where there is essentially a 0/1 indicator for if they existed during a month (mon1, mon2... mon50 etc.), as well as exposure status and group type they were a part of during that month.
I want to confirm this was formatted correctly and that there is not any overlap for any given subject across their observations. In other words, the max they can contribute is 1 month for a given month across any observations.
Is there any easy way to test this? I'm thinking of creating a flag variable to indicate if an overlap occurs, but can't figure out how to do it.
Notes: they do not need to have a 1 for every month pending sample entry/exit.
data have;
input study_id exp group mon1 mon2 mon3 mon4 mon5 mon6;
datalines;
1 exposed a 1 1 0 0 0 0
1 exposed a 0 0 1 1 0 0
1 exposed a 0 0 0 0 1 0
2 exposed b 1 0 0 0 0 0
2 exposed b 0 1 1 0 0 0
2 unexposed c 0 0 0 1 1 1
3 unexposed c 1 1 1 1 1 1
4 unexposed c 1 1 1 0 0 0
4 unexposed d 0 0 1 1 1 1 /*here is an instance of overlap for study id 4 I want to flag*/
;
RUN;
data want;
input study_id exp group mon1 mon2 mon3 mon4 mon5 mon6 flag_overlap;
datalines;
1 exposed a 1 1 0 0 0 0 0
1 exposed a 0 0 1 1 0 0 0
1 exposed a 0 0 0 0 1 0 0
2 exposed b 1 0 0 0 0 0 0
2 exposed b 0 1 1 0 0 0 0
2 unexposed c 0 0 0 1 1 1 0
3 unexposed c 1 1 1 1 1 1 0
4 unexposed c 1 1 1 0 0 0 0
4 unexposed d 0 0 1 1 1 1 1
;
run;
Here is one way - note I have corrected your HAVE step so it works.
data have;
input study_id exp $ group $ mon1 mon2 mon3 mon4 mon5 mon6;
datalines;
1 exposed a 1 1 0 0 0 0
1 exposed a 0 0 1 1 0 0
1 exposed a 0 0 0 0 1 0
2 exposed b 1 0 0 0 0 0
2 exposed b 0 1 1 0 0 0
2 unexposed c 0 0 0 1 1 1
3 unexposed c 1 1 1 1 1 1
4 unexposed c 1 1 1 0 0 0
4 unexposed d 0 0 1 1 1 1
;
RUN;
data want;
drop i mon_sum1 - mon_sum6;
set have;
by study_id;
array mons (*) mon1 - mon6;
retain mon_sum1 - mon_sum6;
array mons_sum (6) mon_sum1 - mon_sum6;
if first.study_id then call missing (of mon_sum1 - mon_sum6);
do i = 1 to dim(mons);
mons_sum(i) = sum(mons_sum(i), mons(i));
end;
if last.study_id then do i = 1 to dim(mons);
if mons_sum(i) >= 2 then flag_overlap = 1;
end;
run;
Here is one way - note I have corrected your HAVE step so it works.
data have;
input study_id exp $ group $ mon1 mon2 mon3 mon4 mon5 mon6;
datalines;
1 exposed a 1 1 0 0 0 0
1 exposed a 0 0 1 1 0 0
1 exposed a 0 0 0 0 1 0
2 exposed b 1 0 0 0 0 0
2 exposed b 0 1 1 0 0 0
2 unexposed c 0 0 0 1 1 1
3 unexposed c 1 1 1 1 1 1
4 unexposed c 1 1 1 0 0 0
4 unexposed d 0 0 1 1 1 1
;
RUN;
data want;
drop i mon_sum1 - mon_sum6;
set have;
by study_id;
array mons (*) mon1 - mon6;
retain mon_sum1 - mon_sum6;
array mons_sum (6) mon_sum1 - mon_sum6;
if first.study_id then call missing (of mon_sum1 - mon_sum6);
do i = 1 to dim(mons);
mons_sum(i) = sum(mons_sum(i), mons(i));
end;
if last.study_id then do i = 1 to dim(mons);
if mons_sum(i) >= 2 then flag_overlap = 1;
end;
run;
@SASKiwi wrote:
Here is one way - note I have corrected your HAVE step so it works.
data have; input study_id exp $ group $ mon1 mon2 mon3 mon4 mon5 mon6; datalines; 1 exposed a 1 1 0 0 0 0 1 exposed a 0 0 1 1 0 0 1 exposed a 0 0 0 0 1 0 2 exposed b 1 0 0 0 0 0 2 exposed b 0 1 1 0 0 0 2 unexposed c 0 0 0 1 1 1 3 unexposed c 1 1 1 1 1 1 4 unexposed c 1 1 1 0 0 0 4 unexposed d 0 0 1 1 1 1 ; RUN; data want; drop i mon_sum1 - mon_sum6; set have; by study_id; array mons (*) mon1 - mon6; retain mon_sum1 - mon_sum6; array mons_sum (6) mon_sum1 - mon_sum6; if first.study_id then call missing (of mon_sum1 - mon_sum6); do i = 1 to dim(mons); mons_sum(i) = sum(mons_sum(i), mons(i)); end; if last.study_id then do i = 1 to dim(mons); if mons_sum(i) >= 2 then flag_overlap = 1; end; run;
Yes, your correction was needed to my original "have" data, thank you.
As for the data step you provided, I was able to utilize to determine there were no overlapping time periods. Thank you!
Just looking at the first few records of your input data:
data have;
input study_id exp group mon1 mon2 mon3 mon4 mon5 mon6;
datalines;
1 exposed a 1 1 0 0 0 0
1 exposed a 0 0 1 1 0 0
1 exposed a 0 0 0 0 1 0
;
run;
Can you tell us why the data are split up this way? I assume there are other variables in your real data? The reason I ask is, I don't see what you're achieving by having this as 3 rows instead of one (with mon1-mon5 all having 1s, and mon6=0).
Are you aware of the "counting process" data structure? I think you might want to consider that - it is pretty hard to beat for situations where exposure changes over time.
Yes, the received data was organized this way because a new row was created anytime there was any demographic data change over time, or any of the outcomes of interest. So there are basically segments of time in each observation, thus in the full data set each subject has many rows.
I might ask another question about how that could have been better organized, but I will do that in another thread.
data have;
input study_id exp :$20. group $ mon1 mon2 mon3 mon4 mon5 mon6;
datalines;
1 exposed a 1 1 0 0 0 0
1 exposed a 0 0 1 1 0 0
1 exposed a 0 0 0 0 1 0
2 exposed b 1 0 0 0 0 0
2 exposed b 0 1 1 0 0 0
2 unexposed c 0 0 0 1 1 1
3 unexposed c 1 1 1 1 1 1
4 unexposed c 1 1 1 0 0 0
4 unexposed d 0 0 1 1 1 1 /*here is an instance of overlap for study id 4 I want to flag*/
;
RUN;
data want;
set have;
by study_id;
array m{6} _temporary_;
array mon{6};
if first.study_id then call missing(of m{*});
flag_overlap=0;
do i=1 to dim(m);
if m{i}=1 and mon{i}=1 then flag_overlap=1;
if mon{i}=1 then m{i}=mon{i};
end;
drop i;
run;
@Ksharp wrote:
data have; input study_id exp :$20. group $ mon1 mon2 mon3 mon4 mon5 mon6; datalines; 1 exposed a 1 1 0 0 0 0 1 exposed a 0 0 1 1 0 0 1 exposed a 0 0 0 0 1 0 2 exposed b 1 0 0 0 0 0 2 exposed b 0 1 1 0 0 0 2 unexposed c 0 0 0 1 1 1 3 unexposed c 1 1 1 1 1 1 4 unexposed c 1 1 1 0 0 0 4 unexposed d 0 0 1 1 1 1 /*here is an instance of overlap for study id 4 I want to flag*/ ; RUN; data want; set have; by study_id; array m{6} _temporary_; array mon{6}; if first.study_id then call missing(of m{*}); flag_overlap=0; do i=1 to dim(m); if m{i}=1 and mon{i}=1 then flag_overlap=1; if mon{i}=1 then m{i}=mon{i}; end; drop i; run;
@Ksharp , I tested your method against my data as well and this also worked and demonstrated there was no overlapping person-time across observations for each individual. Thank you!
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.