DATA Step, Macro, Functions and more

LAG function looking at previous dates: how to tackle missing dates?

Accepted Solution Solved
Reply
Contributor
Posts: 20
Accepted Solution

LAG function looking at previous dates: how to tackle missing dates?

Dear all,

 

Here is below an example of dataset I am using and the calculation of the variable IND using a LAG4 function.

The LAG4 function looks at 4 quarters before. In particular I am looking at end of year comparison: IND is calculated by comparing (in this example summing) 31DEC2014 and 31DEC2015.

 

The issue is that the number of reporting period (date) for each BANK can vary: a BANK might not have reported the variables VAR at some reporting period. Therefore the LAG4 function provides an inconsistent result.

Does anyone knows how I can enrich the code to:

-  look for the previous end of year value?

- apply the calculation of IND only at end of year?

 

I have provided below the expected dataset I am trying to produce.

I have some ideas but they are quite burdensome ( (1)create an intermediary dataset with only end of year data, (2) use the proc freq to select the bank who does not report either end 2014 or end 15, (3) then make the calculatin of IND while excluding manually the BANKs found in step2)

 

I suppose it exist a quick way to do this.

 

Best,

 

 

DATA example; 
   INPUT ID $ date DATE9. VAR; 
   DATALINES; 
BANK1 31DEC2014	3
BANK1 31MAR2015	5
BANK1 30JUN2015	3
BANK1 30SEP2015	8
BANK1 31DEC2015	5
BANK1 31MAR2016	4
BANK2 31MAR2015	9
BANK2 30JUN2015	4
BANK2 30SEP2015	7
BANK2 31MAR2016	4
BANK3 31DEC2014	8
BANK3 31DEC2015	3
BANK3 31MAR2016	2
; 

DATA example;
SET example;
ATTRIB date
	FORMAT=DATE9.;
IND = SUM(VAR,LAG4(VAR));
RUN;
/*
Results:
ID		Date	    VAR IND
BANK1	31DEC2014	3	3
BANK1	31MAR2015	5	5
BANK1	30JUN2015	3	3
BANK1	30SEP2015	8	8
BANK1	31DEC2015	5	8
BANK1	31MAR2016	4	9
BANK2	31MAR2015	9	12
BANK2	30JUN2015	4	12
BANK2	30SEP2015	7	12
BANK2	31MAR2016	4	8
BANK3	31DEC2014	8	17
BANK3	31DEC2015	3	7
BANK3	31MAR2016	2	9
*/

/*
Expected Results:
ID		Date	    VAR IND
BANK1	31DEC2014	3	.
BANK1	31MAR2015	5	.
BANK1	30JUN2015	3	.
BANK1	30SEP2015	8	.
BANK1	31DEC2015	5	8
BANK1	31MAR2016	4	.
BANK2	31MAR2015	9	.
BANK2	30JUN2015	4	.
BANK2	30SEP2015	7	.
BANK2	31MAR2016	4	.
BANK3	31DEC2014	8	.
BANK3	31DEC2015	3	.
BANK3	31MAR2016	2	.
*/

 


Accepted Solutions
Solution
‎07-25-2016 09:55 AM
Super User
Super User
Posts: 7,060

Re: LAG function looking at previous dates: how to tackle missing dates?

One way is to just expand your data to have one record per quarter per ID.  You should be able to find many examples here on how to do that. In fact you might want to look at PROC EXPAND as it might do the whole problem.  Here is one way to create a skelton dataset with one record per year and quarter by getting the start/stop date for each ID and then expanding it.

proc means data=example noprint nway;
  class id;
  var date ;
  output out=summary min=mindate max=maxdate;
run;
data timeline;
  set summary;
  do recno=1 to intck('qtr',mindate,maxdate)+1;
    date=intnx('qtr',mindate,recno-1,'e');
    output;
  end;
  format date date9.;
  keep id recno date;
run;

You can then merge that back with your source data to and apply your rule directly.

data want ;
  merge timeline example;
  by id date;
  lastyr=lag4(var);
  if recno > 4 and qtr(date)=4 then ind=var+lastyr;
  drop lastyr;
run;

Capture.PNG

View solution in original post


All Replies
Super User
Posts: 7,815

Re: LAG function looking at previous dates: how to tackle missing dates?

Well, my solution needs an intermediate dataset, but avoids other steps:

data example;
input id $ date date9. var;
format date date9.;
cards;
BANK1 31DEC2014	3
BANK1 31MAR2015	5
BANK1 30JUN2015	3
BANK1 30SEP2015	8
BANK1 31DEC2015	5
BANK1 31MAR2016	4
BANK2 31MAR2015	9
BANK2 30JUN2015	4
BANK2 30SEP2015	7
BANK2 31MAR2016	4
BANK3 31DEC2014	8
BANK3 31DEC2015	3
BANK3 31MAR2016	2
;
run;

data oldyear;
set example;
where month(date) = 12;
date= intnx('year',date,1,'end');
rename var=oldvar;
run;

data want;
merge
  example (in=a)
  oldyear (in=b)
;
by id date;
if a;
if b then ind = var + oldvar;
drop oldvar;
run;
---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Contributor
Posts: 20

Re: LAG function looking at previous dates: how to tackle missing dates?

Posted in reply to KurtBremser

Thanks a lot for your quick answer, really appreciated.

This solutions suits

Solution
‎07-25-2016 09:55 AM
Super User
Super User
Posts: 7,060

Re: LAG function looking at previous dates: how to tackle missing dates?

One way is to just expand your data to have one record per quarter per ID.  You should be able to find many examples here on how to do that. In fact you might want to look at PROC EXPAND as it might do the whole problem.  Here is one way to create a skelton dataset with one record per year and quarter by getting the start/stop date for each ID and then expanding it.

proc means data=example noprint nway;
  class id;
  var date ;
  output out=summary min=mindate max=maxdate;
run;
data timeline;
  set summary;
  do recno=1 to intck('qtr',mindate,maxdate)+1;
    date=intnx('qtr',mindate,recno-1,'e');
    output;
  end;
  format date date9.;
  keep id recno date;
run;

You can then merge that back with your source data to and apply your rule directly.

data want ;
  merge timeline example;
  by id date;
  lastyr=lag4(var);
  if recno > 4 and qtr(date)=4 then ind=var+lastyr;
  drop lastyr;
run;

Capture.PNG

Contributor
Posts: 20

Re: LAG function looking at previous dates: how to tackle missing dates?

[ Edited ]

Thank you very much for this,

Is is very useful indeed to insert rows per reporting period and prepare the data for time series analysis,

Nevertheless the 31DEC2014 is missing in the WANT dataset for BANK2, do you know why?

best

 

Occasional Contributor
Posts: 9

Re: LAG function looking at previous dates: how to tackle missing dates?

This works for me if you wanted to do it in one step... data has to be sorted first though as it assumes max of lag4.

 

DATA example; 
INPUT ID $
date DATE9.
VAR;
DATALINES;
BANK1 31DEC2014 3
BANK1 31MAR2015 5
BANK1 30JUN2015 3
BANK1 30SEP2015 8
BANK1 31DEC2015 5
BANK1 31MAR2016 4
BANK2 31MAR2015 9
BANK2 30JUN2015 4
BANK2 30SEP2015 7
BANK2 31MAR2016 4
BANK3 31DEC2014 8
BANK3 31DEC2015 3
BANK3 31MAR2016 2
;


DATA example_new (keep=id date var ind);
attrib date year_end format=date9.;
attrib store_val1 store_val2 store_val3 store_val4 format=date9.;
attrib val1-val4 format=8.;

set example;

ind = .;
store_val1 = lag1(date);
store_val2 = lag2(date);
store_val3 = lag3(date);
store_val4 = lag4(date);

val1 = LAG1(VAR);
val2 = LAG2(VAR);
val3 = LAG3(VAR);
val4 = LAG4(VAR);

year_end = intnx('year', date, 0, 'end');

if (intck('year', date , store_val1) = -1 and year_end = date) then IND = SUM(VAR,val1);
else if intck('year', date , store_val2) = -1 and year_end = date then IND = SUM(VAR,val2);
else if intck('year', date , store_val3) = -1 and year_end = date then IND = SUM(VAR,val3);
else if intck('year', date , store_val4) = -1 and year_end = date then IND = SUM(VAR,val4);
else ind =.;

RUN;

 

 

Contributor
Posts: 20

Re: LAG function looking at previous dates: how to tackle missing dates?

I mean between obs 6 and 7 a record is missing right? as we want to have one record per quarter even if a BANK has not reported it

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 6 replies
  • 334 views
  • 0 likes
  • 4 in conversation