Create a variable using date ranges within date ranges

EpiNovice · Posted 11-11-2020 08:52 PM

I'm trying to create a new variable that shows what proportion of follow-up time for each participant was during a rainy season, but I'm not sure how to do it.

Below is an example of my data.

FUStart= Start of follow-up

FUEnd=End of follow-up

The next 4 columns show the dates for the start and end of the rainy seasons that occurred over the course of follow-up. I know how to calculate the length of follow-up, but I don't know how to structure an if/then statement to determine how much of the rainy season fell in that period.

In short, how do I use these dates to count the number of follow-up days that were in a rainy season?

Thanks in advance for any thoughts!

PID	FUStart	FUEnd	Rain1Start	Rain1End	Rain2Start	Rain2End	RainyDaysFU
1	01/27/18	05/10/19	11/01/17	04/30/18	11/01/18	04/30/19	?
2	01/12/18	05/08/19	11/01/17	04/30/18	11/01/18	04/30/19	?
3	09/26/17	01/30/18	11/01/17	04/30/18	11/01/18	04/30/19	?
4	12/10/17	01/23/18	11/01/17	04/30/18	11/01/18	04/30/19	?
5	07/16/17	01/30/18	11/01/17	04/30/18	11/01/18	04/30/19	?
6	05/03/18	05/17/19	11/01/17	04/30/18	11/01/18	04/30/19	?
7	08/11/18	05/13/19	11/01/17	04/30/18	11/01/18	04/30/19	?

ChrisNZ · Posted 11-11-2020 10:52 PM

A crude way:

data WANT;
  retain FUSTART '27JAN2018'D FUEND '10MAY2019'D RAIN1START '01NOV2017'D RAIN1END '30APR2018'D RAIN2START '01NOV2018'D RAIN2END '30APR2019'D;
  do DT=min(FUSTART, RAIN1START, RAIN2START) to max(FUEND, RAIN1END, RAIN2END);
    TALLY=0;
    if DT>=FUSTART    then TALLY+ 1;
    if DT>=RAIN1START then TALLY+ 1;
    if DT>=RAIN2START then TALLY+ 1;
    if DT> FUEND      then TALLY+-1;
    if DT> RAIN1END   then TALLY+-1;
    if DT> RAIN2END   then TALLY+-1;
    if TALLY=2 then RAINYDAYS+1;           
  end;
  put RAINYDAYS=;
run;

RAINYDAYS=275

High-Performance SAS Coding - Third Edition

mkeintz · Posted 11-11-2020 11:18 PM

Because the rainy seasons appear to be constant over the entire dataset, you can make an array, indexed by date, with the lower bound of the array set to at least one day prior to the earliest date in any of your date fields (rainy season or fustart) and set the upper bound is at least equivalent to the latest date in any of your date fields (i.e. rain1end rain2end or the highest encountered fuend).

In that array, you can assign a value of 0 to all dates prior to 01nov2017. Then assign 1 to the element corresponding to 01nov2017, 2 to 02nov2017, etc. incrementing by one for each rainy day, but not for other days. Of course this means that the array values for 30apr2018 through 30oct2018 all have the same value, because only 30apr2018 is a rainy day.

Once you've done that, then simply subtract the array value to the FUEND minus the array value for the day prior to FUSTART:

data want (drop=_:);
  set have;

  array rainy_day_index {%sysevalf('01jan2017'd):%sysevalf('30dec2019'd)} _temporary_;

  if _n_=1 then do;
    do _rdate='01nov2017'd to '30apr2018'd,'01nov2018'd to '30apr2019'd;
      _index+1;
      rainy_day_index{_rdate}=_index;
	end;
	do _date=lbound(rainy_day_index) to hbound(rainy_day_index);
	  if _date<'01nov2017'd then rainy_day_index{_date}=0;
	  else if rainy_day_index{_date}=. then rainy_day_index{_date}=rainy_day_index{_date-1};
	end;
  end;

  /** Run this once to see what the rainy_day_index array looks like 
  if _n_=1 then do _date=lbound(rainy_day_index) to hbound(rainy_day_index);
    put _date=date9.  rainy_day_index{_date} z4.;
  end;
  **/

  n_rainy_days=rainy_day_index{fuend} - rainy_day_index{fustart-1};
run;

The "if _N_=1 do group just sets up the array rainy_day_index at the start of the data step. As a "_temporary_" array its value will be retained for all subsequent observations.

To get a better idea of the array contents, de-comment the indicated section once.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

Create a variable using date ranges within date ranges

Re: Create a variable using date ranges within date ranges

Re: Create a variable using date ranges within date ranges

Create a variable using date ranges within date ranges

Re: Create a variable using date ranges within date ranges

Re: Create a variable using date ranges within date ranges

Registration is open

SAS Training: Just a Click Away