I'm trying to create a new variable that shows what proportion of follow-up time for each participant was during a rainy season, but I'm not sure how to do it.
Below is an example of my data.
FUStart= Start of follow-up
FUEnd=End of follow-up
The next 4 columns show the dates for the start and end of the rainy seasons that occurred over the course of follow-up. I know how to calculate the length of follow-up, but I don't know how to structure an if/then statement to determine how much of the rainy season fell in that period.
In short, how do I use these dates to count the number of follow-up days that were in a rainy season?
Thanks in advance for any thoughts!
PID | FUStart | FUEnd | Rain1Start | Rain1End | Rain2Start | Rain2End | RainyDaysFU |
1 | 01/27/18 | 05/10/19 | 11/01/17 | 04/30/18 | 11/01/18 | 04/30/19 | ? |
2 | 01/12/18 | 05/08/19 | 11/01/17 | 04/30/18 | 11/01/18 | 04/30/19 | ? |
3 | 09/26/17 | 01/30/18 | 11/01/17 | 04/30/18 | 11/01/18 | 04/30/19 | ? |
4 | 12/10/17 | 01/23/18 | 11/01/17 | 04/30/18 | 11/01/18 | 04/30/19 | ? |
5 | 07/16/17 | 01/30/18 | 11/01/17 | 04/30/18 | 11/01/18 | 04/30/19 | ? |
6 | 05/03/18 | 05/17/19 | 11/01/17 | 04/30/18 | 11/01/18 | 04/30/19 | ? |
7 | 08/11/18 | 05/13/19 | 11/01/17 | 04/30/18 | 11/01/18 | 04/30/19 | ? |
A crude way:
data WANT;
retain FUSTART '27JAN2018'D FUEND '10MAY2019'D RAIN1START '01NOV2017'D RAIN1END '30APR2018'D RAIN2START '01NOV2018'D RAIN2END '30APR2019'D;
do DT=min(FUSTART, RAIN1START, RAIN2START) to max(FUEND, RAIN1END, RAIN2END);
TALLY=0;
if DT>=FUSTART then TALLY+ 1;
if DT>=RAIN1START then TALLY+ 1;
if DT>=RAIN2START then TALLY+ 1;
if DT> FUEND then TALLY+-1;
if DT> RAIN1END then TALLY+-1;
if DT> RAIN2END then TALLY+-1;
if TALLY=2 then RAINYDAYS+1;
end;
put RAINYDAYS=;
run;
RAINYDAYS=275
Because the rainy seasons appear to be constant over the entire dataset, you can make an array, indexed by date, with the lower bound of the array set to at least one day prior to the earliest date in any of your date fields (rainy season or fustart) and set the upper bound is at least equivalent to the latest date in any of your date fields (i.e. rain1end rain2end or the highest encountered fuend).
In that array, you can assign a value of 0 to all dates prior to 01nov2017. Then assign 1 to the element corresponding to 01nov2017, 2 to 02nov2017, etc. incrementing by one for each rainy day, but not for other days. Of course this means that the array values for 30apr2018 through 30oct2018 all have the same value, because only 30apr2018 is a rainy day.
Once you've done that, then simply subtract the array value to the FUEND minus the array value for the day prior to FUSTART:
data want (drop=_:);
set have;
array rainy_day_index {%sysevalf('01jan2017'd):%sysevalf('30dec2019'd)} _temporary_;
if _n_=1 then do;
do _rdate='01nov2017'd to '30apr2018'd,'01nov2018'd to '30apr2019'd;
_index+1;
rainy_day_index{_rdate}=_index;
end;
do _date=lbound(rainy_day_index) to hbound(rainy_day_index);
if _date<'01nov2017'd then rainy_day_index{_date}=0;
else if rainy_day_index{_date}=. then rainy_day_index{_date}=rainy_day_index{_date-1};
end;
end;
/** Run this once to see what the rainy_day_index array looks like
if _n_=1 then do _date=lbound(rainy_day_index) to hbound(rainy_day_index);
put _date=date9. rainy_day_index{_date} z4.;
end;
**/
n_rainy_days=rainy_day_index{fuend} - rainy_day_index{fustart-1};
run;
The "if _N_=1 do group just sets up the array rainy_day_index at the start of the data step. As a "_temporary_" array its value will be retained for all subsequent observations.
To get a better idea of the array contents, de-comment the indicated section once.
Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.