BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
sasgorilla
Pyrite | Level 9

I am trying to calculate person time for an analysis where I have individual study IDs spanning multiple observations. 

 

I have the data in a format where there is essentially a 0/1 indicator for if they existed during a month (mon1, mon2... mon50 etc.), as well as exposure status and group type they were a part of during that month.

 

I want to confirm this was formatted correctly and that there is not any overlap for any given subject across their observations. In other words, the max they can contribute is 1 month for a given month across any observations. 

 

Is there any easy way to test this? I'm thinking of creating a flag variable to indicate if an overlap occurs, but can't figure out how to do it. 

 

Notes: they do not need to have a 1 for every month pending sample entry/exit. 

 

 

data have;
input study_id exp group mon1 mon2 mon3 mon4 mon5 mon6;
datalines;
1 exposed a 1 1 0 0 0 0
1 exposed a 0 0 1 1 0 0
1 exposed a 0 0 0 0 1 0
2 exposed b 1 0 0 0 0 0
2 exposed b 0 1 1 0 0 0
2 unexposed c 0 0 0 1 1 1
3 unexposed c 1 1 1 1 1 1
4 unexposed c 1 1 1 0 0 0 
4 unexposed d 0 0 1 1 1 1 /*here is an instance of overlap for study id 4 I want to flag*/
;
RUN;

data want;
input study_id exp group mon1 mon2 mon3 mon4 mon5 mon6 flag_overlap;
datalines;
1 exposed a 1 1 0 0 0 0 0
1 exposed a 0 0 1 1 0 0 0
1 exposed a 0 0 0 0 1 0 0
2 exposed b 1 0 0 0 0 0 0
2 exposed b 0 1 1 0 0 0 0
2 unexposed c 0 0 0 1 1 1 0
3 unexposed c 1 1 1 1 1 1 0
4 unexposed c 1 1 1 0 0 0 0
4 unexposed d 0 0 1 1 1 1 1
;
run;
1 ACCEPTED SOLUTION

Accepted Solutions
SASKiwi
PROC Star

Here is one way - note I have corrected your HAVE step so it works.

data have;
input study_id exp $ group $ mon1 mon2 mon3 mon4 mon5 mon6;
datalines;
1 exposed a 1 1 0 0 0 0
1 exposed a 0 0 1 1 0 0
1 exposed a 0 0 0 0 1 0
2 exposed b 1 0 0 0 0 0
2 exposed b 0 1 1 0 0 0
2 unexposed c 0 0 0 1 1 1
3 unexposed c 1 1 1 1 1 1
4 unexposed c 1 1 1 0 0 0 
4 unexposed d 0 0 1 1 1 1 
;
RUN;

data want;
  drop i mon_sum1 - mon_sum6;
  set have;
  by study_id;
  array mons (*) mon1 - mon6;
  retain mon_sum1 - mon_sum6;
  array mons_sum (6) mon_sum1 - mon_sum6;
  if first.study_id then call missing (of mon_sum1 - mon_sum6);
  do i = 1 to dim(mons);
    mons_sum(i) = sum(mons_sum(i), mons(i));
  end;
  if last.study_id then do i = 1 to dim(mons);
    if mons_sum(i) >= 2 then flag_overlap = 1;
  end;
run;

View solution in original post

7 REPLIES 7
coder1234
Obsidian | Level 7
SQL Left JOIN ON,
"if" won't work on SQL.
SASKiwi
PROC Star

Here is one way - note I have corrected your HAVE step so it works.

data have;
input study_id exp $ group $ mon1 mon2 mon3 mon4 mon5 mon6;
datalines;
1 exposed a 1 1 0 0 0 0
1 exposed a 0 0 1 1 0 0
1 exposed a 0 0 0 0 1 0
2 exposed b 1 0 0 0 0 0
2 exposed b 0 1 1 0 0 0
2 unexposed c 0 0 0 1 1 1
3 unexposed c 1 1 1 1 1 1
4 unexposed c 1 1 1 0 0 0 
4 unexposed d 0 0 1 1 1 1 
;
RUN;

data want;
  drop i mon_sum1 - mon_sum6;
  set have;
  by study_id;
  array mons (*) mon1 - mon6;
  retain mon_sum1 - mon_sum6;
  array mons_sum (6) mon_sum1 - mon_sum6;
  if first.study_id then call missing (of mon_sum1 - mon_sum6);
  do i = 1 to dim(mons);
    mons_sum(i) = sum(mons_sum(i), mons(i));
  end;
  if last.study_id then do i = 1 to dim(mons);
    if mons_sum(i) >= 2 then flag_overlap = 1;
  end;
run;
sasgorilla
Pyrite | Level 9

@SASKiwi wrote:

Here is one way - note I have corrected your HAVE step so it works.

data have;
input study_id exp $ group $ mon1 mon2 mon3 mon4 mon5 mon6;
datalines;
1 exposed a 1 1 0 0 0 0
1 exposed a 0 0 1 1 0 0
1 exposed a 0 0 0 0 1 0
2 exposed b 1 0 0 0 0 0
2 exposed b 0 1 1 0 0 0
2 unexposed c 0 0 0 1 1 1
3 unexposed c 1 1 1 1 1 1
4 unexposed c 1 1 1 0 0 0 
4 unexposed d 0 0 1 1 1 1 
;
RUN;

data want;
  drop i mon_sum1 - mon_sum6;
  set have;
  by study_id;
  array mons (*) mon1 - mon6;
  retain mon_sum1 - mon_sum6;
  array mons_sum (6) mon_sum1 - mon_sum6;
  if first.study_id then call missing (of mon_sum1 - mon_sum6);
  do i = 1 to dim(mons);
    mons_sum(i) = sum(mons_sum(i), mons(i));
  end;
  if last.study_id then do i = 1 to dim(mons);
    if mons_sum(i) >= 2 then flag_overlap = 1;
  end;
run;

Yes, your correction was needed to my original "have" data, thank you. 

As for the data step you provided, I was able to utilize to determine there were no overlapping time periods. Thank you!


quickbluefish
Barite | Level 11

Just looking at the first few records of your input data:

data have;
input study_id exp group mon1 mon2 mon3 mon4 mon5 mon6;
datalines;
1 exposed a 1 1 0 0 0 0
1 exposed a 0 0 1 1 0 0
1 exposed a 0 0 0 0 1 0
;
run;

Can you tell us why the data are split up this way?  I assume there are other variables in your real data?  The reason I ask is, I don't see what you're achieving by having this as 3 rows instead of one (with mon1-mon5 all having 1s, and mon6=0).  

 

Are you aware of the "counting process" data structure?  I think you might want to consider that - it is pretty hard to beat for situations where exposure changes over time.  

sasgorilla
Pyrite | Level 9

Yes, the received data was organized this way because a new row was created anytime there was any demographic data change over time, or any of the outcomes of interest. So there are basically segments of time in each observation, thus in the full data set each subject has many rows.

 

I might ask another question about how that could have been better organized, but I will do that in another thread. 

Ksharp
Super User
data have;
input study_id exp :$20. group $ mon1 mon2 mon3 mon4 mon5 mon6;
datalines;
1 exposed a 1 1 0 0 0 0
1 exposed a 0 0 1 1 0 0
1 exposed a 0 0 0 0 1 0
2 exposed b 1 0 0 0 0 0
2 exposed b 0 1 1 0 0 0
2 unexposed c 0 0 0 1 1 1
3 unexposed c 1 1 1 1 1 1
4 unexposed c 1 1 1 0 0 0 
4 unexposed d 0 0 1 1 1 1 /*here is an instance of overlap for study id 4 I want to flag*/
;
RUN;
data want;
 set have;
 by study_id;
 array m{6} _temporary_;
 array mon{6};
 if first.study_id then call missing(of m{*});
 flag_overlap=0;
 do i=1 to dim(m);
  if m{i}=1 and mon{i}=1 then  flag_overlap=1;
  if mon{i}=1 then m{i}=mon{i};
 end;
 drop i;
 run;
sasgorilla
Pyrite | Level 9

@Ksharp wrote:
data have;
input study_id exp :$20. group $ mon1 mon2 mon3 mon4 mon5 mon6;
datalines;
1 exposed a 1 1 0 0 0 0
1 exposed a 0 0 1 1 0 0
1 exposed a 0 0 0 0 1 0
2 exposed b 1 0 0 0 0 0
2 exposed b 0 1 1 0 0 0
2 unexposed c 0 0 0 1 1 1
3 unexposed c 1 1 1 1 1 1
4 unexposed c 1 1 1 0 0 0 
4 unexposed d 0 0 1 1 1 1 /*here is an instance of overlap for study id 4 I want to flag*/
;
RUN;
data want;
 set have;
 by study_id;
 array m{6} _temporary_;
 array mon{6};
 if first.study_id then call missing(of m{*});
 flag_overlap=0;
 do i=1 to dim(m);
  if m{i}=1 and mon{i}=1 then  flag_overlap=1;
  if mon{i}=1 then m{i}=mon{i};
 end;
 drop i;
 run;

@Ksharp , I tested your method against my data as well and this also worked and demonstrated there was no overlapping person-time across observations for each individual. Thank you!

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 7 replies
  • 1497 views
  • 4 likes
  • 5 in conversation