Solved: Creating a flag variable for person-time overlap across multiple obser...

sasgorilla · Posted 01-21-2025 03:21 PM

I am trying to calculate person time for an analysis where I have individual study IDs spanning multiple observations.

I have the data in a format where there is essentially a 0/1 indicator for if they existed during a month (mon1, mon2... mon50 etc.), as well as exposure status and group type they were a part of during that month.

I want to confirm this was formatted correctly and that there is not any overlap for any given subject across their observations. In other words, the max they can contribute is 1 month for a given month across any observations.

Is there any easy way to test this? I'm thinking of creating a flag variable to indicate if an overlap occurs, but can't figure out how to do it.

Notes: they do not need to have a 1 for every month pending sample entry/exit.

data have;
input study_id exp group mon1 mon2 mon3 mon4 mon5 mon6;
datalines;
1 exposed a 1 1 0 0 0 0
1 exposed a 0 0 1 1 0 0
1 exposed a 0 0 0 0 1 0
2 exposed b 1 0 0 0 0 0
2 exposed b 0 1 1 0 0 0
2 unexposed c 0 0 0 1 1 1
3 unexposed c 1 1 1 1 1 1
4 unexposed c 1 1 1 0 0 0 
4 unexposed d 0 0 1 1 1 1 /*here is an instance of overlap for study id 4 I want to flag*/
;
RUN;

data want;
input study_id exp group mon1 mon2 mon3 mon4 mon5 mon6 flag_overlap;
datalines;
1 exposed a 1 1 0 0 0 0 0
1 exposed a 0 0 1 1 0 0 0
1 exposed a 0 0 0 0 1 0 0
2 exposed b 1 0 0 0 0 0 0
2 exposed b 0 1 1 0 0 0 0
2 unexposed c 0 0 0 1 1 1 0
3 unexposed c 1 1 1 1 1 1 0
4 unexposed c 1 1 1 0 0 0 0
4 unexposed d 0 0 1 1 1 1 1
;
run;

SASKiwi · Posted 01-21-2025 07:41 PM

Here is one way - note I have corrected your HAVE step so it works.

data have;
input study_id exp $ group $ mon1 mon2 mon3 mon4 mon5 mon6;
datalines;
1 exposed a 1 1 0 0 0 0
1 exposed a 0 0 1 1 0 0
1 exposed a 0 0 0 0 1 0
2 exposed b 1 0 0 0 0 0
2 exposed b 0 1 1 0 0 0
2 unexposed c 0 0 0 1 1 1
3 unexposed c 1 1 1 1 1 1
4 unexposed c 1 1 1 0 0 0 
4 unexposed d 0 0 1 1 1 1 
;
RUN;

data want;
  drop i mon_sum1 - mon_sum6;
  set have;
  by study_id;
  array mons (*) mon1 - mon6;
  retain mon_sum1 - mon_sum6;
  array mons_sum (6) mon_sum1 - mon_sum6;
  if first.study_id then call missing (of mon_sum1 - mon_sum6);
  do i = 1 to dim(mons);
    mons_sum(i) = sum(mons_sum(i), mons(i));
  end;
  if last.study_id then do i = 1 to dim(mons);
    if mons_sum(i) >= 2 then flag_overlap = 1;
  end;
run;

View solution in original post

coder1234 · Posted 01-21-2025 05:39 PM

SQL Left JOIN ON,
"if" won't work on SQL.

SASKiwi · Posted 01-21-2025 07:41 PM

Here is one way - note I have corrected your HAVE step so it works.

data have;
input study_id exp $ group $ mon1 mon2 mon3 mon4 mon5 mon6;
datalines;
1 exposed a 1 1 0 0 0 0
1 exposed a 0 0 1 1 0 0
1 exposed a 0 0 0 0 1 0
2 exposed b 1 0 0 0 0 0
2 exposed b 0 1 1 0 0 0
2 unexposed c 0 0 0 1 1 1
3 unexposed c 1 1 1 1 1 1
4 unexposed c 1 1 1 0 0 0 
4 unexposed d 0 0 1 1 1 1 
;
RUN;

data want;
  drop i mon_sum1 - mon_sum6;
  set have;
  by study_id;
  array mons (*) mon1 - mon6;
  retain mon_sum1 - mon_sum6;
  array mons_sum (6) mon_sum1 - mon_sum6;
  if first.study_id then call missing (of mon_sum1 - mon_sum6);
  do i = 1 to dim(mons);
    mons_sum(i) = sum(mons_sum(i), mons(i));
  end;
  if last.study_id then do i = 1 to dim(mons);
    if mons_sum(i) >= 2 then flag_overlap = 1;
  end;
run;

sasgorilla · Posted 01-21-2025 09:19 PM

@SASKiwi wrote:

Here is one way - note I have corrected your HAVE step so it works.

data have;
input study_id exp $ group $ mon1 mon2 mon3 mon4 mon5 mon6;
datalines;
1 exposed a 1 1 0 0 0 0
1 exposed a 0 0 1 1 0 0
1 exposed a 0 0 0 0 1 0
2 exposed b 1 0 0 0 0 0
2 exposed b 0 1 1 0 0 0
2 unexposed c 0 0 0 1 1 1
3 unexposed c 1 1 1 1 1 1
4 unexposed c 1 1 1 0 0 0 
4 unexposed d 0 0 1 1 1 1 
;
RUN;

data want;
  drop i mon_sum1 - mon_sum6;
  set have;
  by study_id;
  array mons (*) mon1 - mon6;
  retain mon_sum1 - mon_sum6;
  array mons_sum (6) mon_sum1 - mon_sum6;
  if first.study_id then call missing (of mon_sum1 - mon_sum6);
  do i = 1 to dim(mons);
    mons_sum(i) = sum(mons_sum(i), mons(i));
  end;
  if last.study_id then do i = 1 to dim(mons);
    if mons_sum(i) >= 2 then flag_overlap = 1;
  end;
run;

Yes, your correction was needed to my original "have" data, thank you.

As for the data step you provided, I was able to utilize to determine there were no overlapping time periods. Thank you!

quickbluefish · Posted 01-21-2025 08:03 PM

Just looking at the first few records of your input data:

data have;
input study_id exp group mon1 mon2 mon3 mon4 mon5 mon6;
datalines;
1 exposed a 1 1 0 0 0 0
1 exposed a 0 0 1 1 0 0
1 exposed a 0 0 0 0 1 0
;
run;

Can you tell us why the data are split up this way? I assume there are other variables in your real data? The reason I ask is, I don't see what you're achieving by having this as 3 rows instead of one (with mon1-mon5 all having 1s, and mon6=0).

Are you aware of the "counting process" data structure? I think you might want to consider that - it is pretty hard to beat for situations where exposure changes over time.

sasgorilla · Posted 01-21-2025 09:02 PM

Yes, the received data was organized this way because a new row was created anytime there was any demographic data change over time, or any of the outcomes of interest. So there are basically segments of time in each observation, thus in the full data set each subject has many rows.

I might ask another question about how that could have been better organized, but I will do that in another thread.

Ksharp · Posted 01-21-2025 08:12 PM

data have;
input study_id exp :$20. group $ mon1 mon2 mon3 mon4 mon5 mon6;
datalines;
1 exposed a 1 1 0 0 0 0
1 exposed a 0 0 1 1 0 0
1 exposed a 0 0 0 0 1 0
2 exposed b 1 0 0 0 0 0
2 exposed b 0 1 1 0 0 0
2 unexposed c 0 0 0 1 1 1
3 unexposed c 1 1 1 1 1 1
4 unexposed c 1 1 1 0 0 0 
4 unexposed d 0 0 1 1 1 1 /*here is an instance of overlap for study id 4 I want to flag*/
;
RUN;
data want;
 set have;
 by study_id;
 array m{6} _temporary_;
 array mon{6};
 if first.study_id then call missing(of m{*});
 flag_overlap=0;
 do i=1 to dim(m);
  if m{i}=1 and mon{i}=1 then  flag_overlap=1;
  if mon{i}=1 then m{i}=mon{i};
 end;
 drop i;
 run;

sasgorilla · Posted 01-21-2025 09:20 PM

@Ksharp wrote:

data have;
input study_id exp :$20. group $ mon1 mon2 mon3 mon4 mon5 mon6;
datalines;
1 exposed a 1 1 0 0 0 0
1 exposed a 0 0 1 1 0 0
1 exposed a 0 0 0 0 1 0
2 exposed b 1 0 0 0 0 0
2 exposed b 0 1 1 0 0 0
2 unexposed c 0 0 0 1 1 1
3 unexposed c 1 1 1 1 1 1
4 unexposed c 1 1 1 0 0 0 
4 unexposed d 0 0 1 1 1 1 /*here is an instance of overlap for study id 4 I want to flag*/
;
RUN;
data want;
 set have;
 by study_id;
 array m{6} _temporary_;
 array mon{6};
 if first.study_id then call missing(of m{*});
 flag_overlap=0;
 do i=1 to dim(m);
  if m{i}=1 and mon{i}=1 then  flag_overlap=1;
  if mon{i}=1 then m{i}=mon{i};
 end;
 drop i;
 run;

@Ksharp , I tested your method against my data as well and this also worked and demonstrated there was no overlapping person-time across observations for each individual. Thank you!

Creating a flag variable for person-time overlap across multiple observations

Re: Creating a flag variable for person-time overlap across multiple observations

Re: Creating a flag variable for person-time overlap across multiple observations

Re: Creating a flag variable for person-time overlap across multiple observations

Re: Creating a flag variable for person-time overlap across multiple observations

Re: Creating a flag variable for person-time overlap across multiple observations

Re: Creating a flag variable for person-time overlap across multiple observations

Re: Creating a flag variable for person-time overlap across multiple observations

Re: Creating a flag variable for person-time overlap across multiple observations

Registration is open

SAS Training: Just a Click Away