Help using Base SAS procedures

Calculations using dates on two separate data rows...

Accepted Solution Solved
Reply
New Contributor
Posts: 2
Accepted Solution

Calculations using dates on two separate data rows...

Hi,

 

I'm new to the forum, so I'm looking forward to not only getting some support from the community forums but also hoping to also provide some where possible.

 

I can't seem to figure out the following (without considering transposing the data in some way).  I'm trying to do some calculations using dates but the dates I'm working with are on two separate records for the same individual.   I suspect there is some lag function, but this is an area that I am unfamiliar with.

 

Here is what the data looks like:

 

ID     DATE1                   DATE2

1      2009-06-14       2009-08-11

1      2009-10-29

1      200912-31

2      2009-09-05       2009-11-10

2      2009-10-09

2      2010-05-23

and so on.....

 

The output I'm looking for is the following:

ID     DATE1                 DATE2            DAYS

1     2009-06-14       2009-08-11         79  

1     2009-10-29

1     2009-12-31

2     2009-09-05       2009-11-10        194

2     2009-10-09

2     2010-05-23

 

and so on.....

 

So, for the first person (ID #1)  (LAG)DATE1 - DATE2 = 20091029 - 20090811 = 79 days.   So, no further calculations are required.

 

For the 2nd person (ID #2)  (LAG)DATE1 - DATE2 =  20091009 - 20091110 = -32 days.  When a negative "DAYS" is encountered, I would like to be able to check the next record for that person to see if there is a non-negative count.  So, it would loop to the 3rd record whereby (LAG)DATE1 - DATE2 = 20100523 - 20091110 = 194 days.

 

For illustrative purposes, I've included the actual dates, but I did convert the dates into numeric values using  NEWDATE1 = input (DATE1, yymmdd8.) to allow for the calculations.

 

Essentially, I would like one record per person which tells me when did the next event (DATE1) occur after DATE2.

 

Any help with this would be GREATLY appreciated.

 

Thanks in advance for your time,

M. Anthony


Accepted Solutions
Solution
‎11-20-2015 08:01 PM
Respected Advisor
Posts: 4,649

Re: Calculations using dates on two separate data rows...

This will do it:

 


proc sql;
create table want as
select 
    a.*, 
    case 
        when date2 is missing then .
        else intck( "DAY", date2,
            (select min(date1) from have where id=a.id and date1 > a.date2) )
        end as days
from 
    have as a
order by id, date1;
quit;
PG

View solution in original post


All Replies
Super User
Posts: 5,257

Re: Calculations using dates on two separate data rows...

Since you need to access the information on the next row, I think you need to re-sort your data in descending order on date1.
The you can use lag () or RETAIN (with BY and first./last.-logic). Which to use is a matter of taste, and for me I find RETAIN easier to follow. There a tons of lag () and RETAIN examples on support.Sas.com.
Data never sleeps
Valued Guide
Posts: 858

Re: Calculations using dates on two separate data rows...

I agree, you can't look down and then pull up.  If you want the presentation you have I think you need to sort:

 

data have;
infile cards dsd;
informat d 5. date1 date2 yymmdd10.;
format d 5. date1 date2 yymmdd10.;
input D DATE1 DATE2;
cards;
1,2009-06-14,2009-08-11
1,2009-10-29,
1,2009-12-31,
2,2009-09-05,2009-11-10
2,2009-10-09,
2,2010-05-23,
;

proc sort data=have;by d descending date1;

data want;
set have;
by d descending date1;
days2 = lag3(date1);
DAYS = lag(date1)-date2;
if days < 0 and not missing(days) then days = date2-days2;
drop days2;
run;

proc sort data=want;by d date1;

Solution
‎11-20-2015 08:01 PM
Respected Advisor
Posts: 4,649

Re: Calculations using dates on two separate data rows...

This will do it:

 


proc sql;
create table want as
select 
    a.*, 
    case 
        when date2 is missing then .
        else intck( "DAY", date2,
            (select min(date1) from have where id=a.id and date1 > a.date2) )
        end as days
from 
    have as a
order by id, date1;
quit;
PG
Super Contributor
Posts: 275

Re: Calculations using dates on two separate data rows...

[ Edited ]

If date1 of 3 or more later record in each ID is larger than date2, in such condition, you could try hash.

 

data want;
   if 0 then set have(keep=id date1);
   if _n_=1 then do;
    declare hash h(dataset:'have(keep=id date1 rename=(date1=_date1))',multidata&colon;'y');
    declare hiter hi('h');
      h.definekey('id');
      h.definedata('_date1');
      h.definedone();
    end;
    set have;
    by id notsorted;
    if first.id then do;
       rc=hi.setcur();
       do while (rc=0);
         if _date1>date2 then do;
           temp=_date1;      
           goto day;
         end;
         rc=hi.next();
    end;
    day: day=temp-date2;
    end;
    drop temp _date1 rc;
run;

 

New Contributor
Posts: 2

Re: Calculations using dates on two separate data rows...

Hi,

 

Thank you for all the quick replies and for the various suggestions.  For my current purposes, the solution by PGStats seemed to be relatively straightfoward and did the trick.   The other solutions are certain to come in hand in the future, so they are definitely worth adding to my syntax list.

 

Thanks again for the help.

Trusted Advisor
Posts: 1,115

Re: Calculations using dates on two separate data rows...

Just for the records:

Solutions to this kind of problem using the RETAIN statement or (especially) the LAGn functions, as have been suggested, are somewhat prone to unwanted carry-over effects from one BY group to another. Care has to be taken to avoid these. Neither RETAIN nor LAG/LAGn respect BY groups by default.

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 6 replies
  • 467 views
  • 3 likes
  • 6 in conversation