BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Tanguy
Calcite | Level 5

Dear all,

 

Here is below an example of dataset I am using and the calculation of the variable IND using a LAG4 function.

The LAG4 function looks at 4 quarters before. In particular I am looking at end of year comparison: IND is calculated by comparing (in this example summing) 31DEC2014 and 31DEC2015.

 

The issue is that the number of reporting period (date) for each BANK can vary: a BANK might not have reported the variables VAR at some reporting period. Therefore the LAG4 function provides an inconsistent result.

Does anyone knows how I can enrich the code to:

-  look for the previous end of year value?

- apply the calculation of IND only at end of year?

 

I have provided below the expected dataset I am trying to produce.

I have some ideas but they are quite burdensome ( (1)create an intermediary dataset with only end of year data, (2) use the proc freq to select the bank who does not report either end 2014 or end 15, (3) then make the calculatin of IND while excluding manually the BANKs found in step2)

 

I suppose it exist a quick way to do this.

 

Best,

 

 

DATA example; 
   INPUT ID $ date DATE9. VAR; 
   DATALINES; 
BANK1 31DEC2014	3
BANK1 31MAR2015	5
BANK1 30JUN2015	3
BANK1 30SEP2015	8
BANK1 31DEC2015	5
BANK1 31MAR2016	4
BANK2 31MAR2015	9
BANK2 30JUN2015	4
BANK2 30SEP2015	7
BANK2 31MAR2016	4
BANK3 31DEC2014	8
BANK3 31DEC2015	3
BANK3 31MAR2016	2
; 

DATA example;
SET example;
ATTRIB date
	FORMAT=DATE9.;
IND = SUM(VAR,LAG4(VAR));
RUN;
/*
Results:
ID		Date	    VAR IND
BANK1	31DEC2014	3	3
BANK1	31MAR2015	5	5
BANK1	30JUN2015	3	3
BANK1	30SEP2015	8	8
BANK1	31DEC2015	5	8
BANK1	31MAR2016	4	9
BANK2	31MAR2015	9	12
BANK2	30JUN2015	4	12
BANK2	30SEP2015	7	12
BANK2	31MAR2016	4	8
BANK3	31DEC2014	8	17
BANK3	31DEC2015	3	7
BANK3	31MAR2016	2	9
*/

/*
Expected Results:
ID		Date	    VAR IND
BANK1	31DEC2014	3	.
BANK1	31MAR2015	5	.
BANK1	30JUN2015	3	.
BANK1	30SEP2015	8	.
BANK1	31DEC2015	5	8
BANK1	31MAR2016	4	.
BANK2	31MAR2015	9	.
BANK2	30JUN2015	4	.
BANK2	30SEP2015	7	.
BANK2	31MAR2016	4	.
BANK3	31DEC2014	8	.
BANK3	31DEC2015	3	.
BANK3	31MAR2016	2	.
*/

 

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

One way is to just expand your data to have one record per quarter per ID.  You should be able to find many examples here on how to do that. In fact you might want to look at PROC EXPAND as it might do the whole problem.  Here is one way to create a skelton dataset with one record per year and quarter by getting the start/stop date for each ID and then expanding it.

proc means data=example noprint nway;
  class id;
  var date ;
  output out=summary min=mindate max=maxdate;
run;
data timeline;
  set summary;
  do recno=1 to intck('qtr',mindate,maxdate)+1;
    date=intnx('qtr',mindate,recno-1,'e');
    output;
  end;
  format date date9.;
  keep id recno date;
run;

You can then merge that back with your source data to and apply your rule directly.

data want ;
  merge timeline example;
  by id date;
  lastyr=lag4(var);
  if recno > 4 and qtr(date)=4 then ind=var+lastyr;
  drop lastyr;
run;

Capture.PNG

View solution in original post

6 REPLIES 6
Kurt_Bremser
Super User

Well, my solution needs an intermediate dataset, but avoids other steps:

data example;
input id $ date date9. var;
format date date9.;
cards;
BANK1 31DEC2014	3
BANK1 31MAR2015	5
BANK1 30JUN2015	3
BANK1 30SEP2015	8
BANK1 31DEC2015	5
BANK1 31MAR2016	4
BANK2 31MAR2015	9
BANK2 30JUN2015	4
BANK2 30SEP2015	7
BANK2 31MAR2016	4
BANK3 31DEC2014	8
BANK3 31DEC2015	3
BANK3 31MAR2016	2
;
run;

data oldyear;
set example;
where month(date) = 12;
date= intnx('year',date,1,'end');
rename var=oldvar;
run;

data want;
merge
  example (in=a)
  oldyear (in=b)
;
by id date;
if a;
if b then ind = var + oldvar;
drop oldvar;
run;
Tanguy
Calcite | Level 5

Thanks a lot for your quick answer, really appreciated.

This solutions suits

Tom
Super User Tom
Super User

One way is to just expand your data to have one record per quarter per ID.  You should be able to find many examples here on how to do that. In fact you might want to look at PROC EXPAND as it might do the whole problem.  Here is one way to create a skelton dataset with one record per year and quarter by getting the start/stop date for each ID and then expanding it.

proc means data=example noprint nway;
  class id;
  var date ;
  output out=summary min=mindate max=maxdate;
run;
data timeline;
  set summary;
  do recno=1 to intck('qtr',mindate,maxdate)+1;
    date=intnx('qtr',mindate,recno-1,'e');
    output;
  end;
  format date date9.;
  keep id recno date;
run;

You can then merge that back with your source data to and apply your rule directly.

data want ;
  merge timeline example;
  by id date;
  lastyr=lag4(var);
  if recno > 4 and qtr(date)=4 then ind=var+lastyr;
  drop lastyr;
run;

Capture.PNG

Tanguy
Calcite | Level 5

Thank you very much for this,

Is is very useful indeed to insert rows per reporting period and prepare the data for time series analysis,

Nevertheless the 31DEC2014 is missing in the WANT dataset for BANK2, do you know why?

best

 

PaulLee
Fluorite | Level 6

This works for me if you wanted to do it in one step... data has to be sorted first though as it assumes max of lag4.

 

DATA example; 
INPUT ID $
date DATE9.
VAR;
DATALINES;
BANK1 31DEC2014 3
BANK1 31MAR2015 5
BANK1 30JUN2015 3
BANK1 30SEP2015 8
BANK1 31DEC2015 5
BANK1 31MAR2016 4
BANK2 31MAR2015 9
BANK2 30JUN2015 4
BANK2 30SEP2015 7
BANK2 31MAR2016 4
BANK3 31DEC2014 8
BANK3 31DEC2015 3
BANK3 31MAR2016 2
;


DATA example_new (keep=id date var ind);
attrib date year_end format=date9.;
attrib store_val1 store_val2 store_val3 store_val4 format=date9.;
attrib val1-val4 format=8.;

set example;

ind = .;
store_val1 = lag1(date);
store_val2 = lag2(date);
store_val3 = lag3(date);
store_val4 = lag4(date);

val1 = LAG1(VAR);
val2 = LAG2(VAR);
val3 = LAG3(VAR);
val4 = LAG4(VAR);

year_end = intnx('year', date, 0, 'end');

if (intck('year', date , store_val1) = -1 and year_end = date) then IND = SUM(VAR,val1);
else if intck('year', date , store_val2) = -1 and year_end = date then IND = SUM(VAR,val2);
else if intck('year', date , store_val3) = -1 and year_end = date then IND = SUM(VAR,val3);
else if intck('year', date , store_val4) = -1 and year_end = date then IND = SUM(VAR,val4);
else ind =.;

RUN;

 

 

Tanguy
Calcite | Level 5

I mean between obs 6 and 7 a record is missing right? as we want to have one record per quarter even if a BANK has not reported it

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 6407 views
  • 0 likes
  • 4 in conversation