BookmarkSubscribeRSS Feed
EpiNovice
Calcite | Level 5

I'm trying to create a new variable that shows what proportion of follow-up time for each participant was during a rainy season, but I'm not sure how to do it.

 

Below is an example of my data. 

FUStart= Start of follow-up

FUEnd=End of follow-up

The next 4 columns show the dates for the start and end of the rainy seasons that occurred over the course of follow-up. I know how to calculate the length of follow-up, but I don't know how to structure an if/then statement to determine how much of the rainy season fell in that period. 

 

In short, how do I use these dates to count the number of follow-up days that were in a rainy season?

 

Thanks in advance for any thoughts!

 

PIDFUStartFUEndRain1StartRain1EndRain2StartRain2EndRainyDaysFU
101/27/1805/10/1911/01/1704/30/1811/01/1804/30/19?
201/12/1805/08/1911/01/1704/30/1811/01/1804/30/19?
309/26/1701/30/1811/01/1704/30/1811/01/1804/30/19?
412/10/1701/23/1811/01/1704/30/1811/01/1804/30/19?
507/16/1701/30/1811/01/1704/30/1811/01/1804/30/19?
605/03/1805/17/1911/01/1704/30/1811/01/1804/30/19?
708/11/1805/13/1911/01/1704/30/1811/01/1804/30/19?
2 REPLIES 2
ChrisNZ
Tourmaline | Level 20

A crude way:

data WANT;
  retain FUSTART '27JAN2018'D FUEND '10MAY2019'D RAIN1START '01NOV2017'D RAIN1END '30APR2018'D RAIN2START '01NOV2018'D RAIN2END '30APR2019'D;
  do DT=min(FUSTART, RAIN1START, RAIN2START) to max(FUEND, RAIN1END, RAIN2END);
    TALLY=0;
    if DT>=FUSTART    then TALLY+ 1;
    if DT>=RAIN1START then TALLY+ 1;
    if DT>=RAIN2START then TALLY+ 1;
    if DT> FUEND      then TALLY+-1;
    if DT> RAIN1END   then TALLY+-1;
    if DT> RAIN2END   then TALLY+-1;
    if TALLY=2 then RAINYDAYS+1;           
  end;
  put RAINYDAYS=;
run;

RAINYDAYS=275

 

mkeintz
PROC Star

Because the rainy seasons appear to be constant over the entire dataset, you can make an array, indexed by date, with the lower bound of the array set to at least one day prior to the earliest date in any of your date fields (rainy season or fustart) and set the upper bound is at least equivalent to the latest date in any of your date fields (i.e. rain1end rain2end or the highest encountered fuend).

 

In that array, you can assign a value of  0 to all dates prior to 01nov2017.  Then assign 1 to the element corresponding to 01nov2017, 2 to 02nov2017, etc. incrementing by one for each rainy day, but not for other days.  Of course this means that the array values for 30apr2018 through 30oct2018 all have the same value, because only 30apr2018 is a rainy day.

 

Once you've done that, then simply subtract the array value to the FUEND minus the array value for the day prior to FUSTART:

 

data want (drop=_:);
  set have;

  array rainy_day_index {%sysevalf('01jan2017'd):%sysevalf('30dec2019'd)} _temporary_;

  if _n_=1 then do;
    do _rdate='01nov2017'd to '30apr2018'd,'01nov2018'd to '30apr2019'd;
      _index+1;
      rainy_day_index{_rdate}=_index;
	end;
	do _date=lbound(rainy_day_index) to hbound(rainy_day_index);
	  if _date<'01nov2017'd then rainy_day_index{_date}=0;
	  else if rainy_day_index{_date}=. then rainy_day_index{_date}=rainy_day_index{_date-1};
	end;
  end;

  /** Run this once to see what the rainy_day_index array looks like 
  if _n_=1 then do _date=lbound(rainy_day_index) to hbound(rainy_day_index);
    put _date=date9.  rainy_day_index{_date} z4.;
  end;
  **/

  n_rainy_days=rainy_day_index{fuend} - rainy_day_index{fustart-1};
run;

The "if _N_=1 do group just sets up the array rainy_day_index at the start of the data step.  As a "_temporary_" array its value will be retained for all subsequent observations.

 

To get a better idea of the array contents, de-comment the indicated section once.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 2 replies
  • 804 views
  • 0 likes
  • 3 in conversation