Hi everyone,
I have a dataset with variable "meancost". Now I want to create new variable "lagmeancost" which equal to meancost in previous 4 quarters (4 lag). For example, the lagmeancost of GVKEY (firmid) 1001 in 2002Q1 = meancost in 2001Q1 (=11)
The problem is that I have to ignore the first 4 observations in each group to avoid the lagmeancost to be contaminated by data of other company.
The output data should look like this
GVKEY | Quarter | Meancost | lagmeancost |
1001 | 2001Q1 | 11 | . |
1001 | 2001Q2 | 11 | . |
1001 | 2001Q3 | 11 | . |
1001 | 2001Q4 | 11 | . |
1001 | 2002Q1 | 12 | 11 |
1001 | 2002Q2 | 12 | 11 |
1001 | 2002Q3 | 12 | 11 |
1001 | 2002Q4 | 12 | 11 |
1002 | 2006Q1 | 65 | . |
1002 | 2006Q2 | 65 | . |
1002 | 2006Q3 | 65 | . |
1002 | 2006Q4 | 65 | . |
1002 | 2007Q1 | 70 | 65 |
1002 | 2007Q2 | 70 | 65 |
1002 | 2007Q3 | 70 | 65 |
1002 | 2007Q4 | 70 | 65 |
Any idea how to do that? Thank you very much
NVM, you're just lagging which is easier.
data want;
set sashelp.stocks;
by stock;
lagged=lag4(open);
if first.stock then counter=1;
else counter+1;
if counter<=4 then lagged=.;
run;
PROC EXPAND
Could you be more specific?Thanks
data have;
input (GVKEY Quarter Meancost) ($);
datalines;
1001 2001Q1 11 .
1001 2001Q2 11 .
1001 2001Q3 11 .
1001 2001Q4 11 .
1001 2002Q1 12 11
1001 2002Q2 12 11
1001 2002Q3 12 11
1001 2002Q4 12 11
1002 2006Q1 65 .
1002 2006Q2 65 .
1002 2006Q3 65 .
1002 2006Q4 65 .
1002 2007Q1 70 65
1002 2007Q2 70 65
1002 2007Q3 70 65
1002 2007Q4 70 65
run;
data want;
set have;
by gvkey;
if first.gvkey then call missing(_t,_k);
retain _k _t;
if lag(meancost) ne meancost then _t+1;
if mod(_t,2) ne 0 then _k=meancost;
else lagmeancost=_k ;
drop _:;
run;
I tried your code but it gives me missing value of lagmeancost where it should have a value.
The output of this code is like this. This code does not give me lagmeancost of the year 2003. Basically, it gives lagmeancost of 1 year, then missing value for the next year, and then gives value for the year after that ....
GVKEY | Quarter | Meancost | lagmeancost |
1001 | 2001Q1 | 11 | . |
1001 | 2001Q2 | 11 | . |
1001 | 2001Q3 | 11 | . |
1001 | 2001Q4 | 11 | . |
1001 | 2002Q1 | 12 | 11 |
1001 | 2002Q2 | 12 | 11 |
1001 | 2002Q3 | 12 | 11 |
1001 | 2002Q4 | 12 | 11 |
1001 | 2003Q1 | 13 | . |
1001 | 2003Q2 | 13 | . |
1001 | 2003Q3 | 13 | . |
1001 | 2003Q4 | 13 | . |
1001 | 2004Q1 | 14 | 13 |
1001 | 2004Q2 | 14 | 13 |
1001 | 2004Q3 | 14 | 13 |
1001 | 2004Q4 | 14 | 13 |
…. | …. | …. | …. |
Your output sample in your original post skips , so i coded accordingly. Do you want lag values for all quarters?
Yes, I would like to compute lagmeancost for all quarter, except the first 4 quarters of each GVKEY (firmid) because obviously there is no lag year for the first 4 quarters.
The dataset I posted is just a small sample, my full data is like this:
GVKEY | Quarter | Meancost | lagmeancost |
1001 | 2001Q1 | 11 | . |
1001 | 2001Q2 | 11 | . |
1001 | 2001Q3 | 11 | . |
1001 | 2001Q4 | 11 | . |
1001 | 2002Q1 | 12 | 11 |
1001 | 2002Q2 | 12 | 11 |
1001 | 2002Q3 | 12 | 11 |
1001 | 2002Q4 | 12 | 11 |
1001 | ……. | ……. | ……. |
1001 | ……. | ……. | ……. |
1002 | 2006Q3 | 65 | . |
1002 | 2006Q4 | 65 | . |
1002 | 2007Q1 | 70 | . |
1002 | 2007Q2 | 70 | . |
1002 | 2007Q3 | 70 | 65 |
1002 | 2007Q4 | 70 | 65 |
1002 | 2008Q1 | 75 | 70 |
1002 | 2008Q2 | 75 | 70 |
1002 | 2008Q3 | 75 | 70 |
1002 | 2008Q4 | 75 | 70 |
1002 | ……. | ……. | ……. |
1002 | ……. | ……. | ……. |
You want trailing 4-quarter moving averages, where the window moves once per year instead of once per quarter:
data want;
set have;
by gvkey;
retain lagmeancost;
lagmeancost=ifn(mod(_n_,4)=1,mean(lag1(cost),lag2(cost),lag3(cost),lag4(cost)),lagmeancost);
if first.gvkey then lagmeancost=.;
run;
The above assumes that every gvkey has exactly 4 records per year.
If that's not the case, then instead of looking for every 4th record you need to determine if the current record belongs to a different year than the prior record. Assuming your QUARTER variable is actually a SAS date variable, you could:
data want;
set have;
by gvkey;
retain lagmeancost;
array c{4} _temporary_;
if intck('year',lag(quarter),quarter)=1 then do;
lagmeancost=mean(of c{*});
call missing(of c{*});
end;
if first.gvkey then call missing(lagmeancost,of c{*});
c{qtr(quarter)}=cost;
run;
Please accept my apologies, I think this is what you are after:
data want;
set have;
by gvkey;
if first.gvkey then call missing(_t,lagmeancost);
retain _t lagmeancost;
if lag(meancost) ne meancost then
do;
lagmeancost=_t;
_t=meancost;
end;
drop _:;
run;
EDIT: Typo edited
RTM
http://support.sas.com/resources/papers/proceedings10/093-2010.pdf
Try MOVAVE3 TRIMLEFT3
Otherwise there's the good old temporary array trick, I've left more in than I usually would so you can see the flow of data. For comparison, PROC EXPAND would be 4 lines of code.
data have;
set sashelp.stocks;
by stock;
retain counter p0-p3;
array p(0:3) p0-p3;
if first.stock then
do;
call missing(of p(*));
counter=1;
end;
else
counter=counter+1;
p{mod(counter, 4)}=open;
if counter>4 then
moving_avg4=mean(of p(*));
keep stock date open moving_: p: counter;
run;
@trungcva112 wrote:
I tried your code but it gives me missing value of lagmeancost where it should have a value.
The output of this code is like this. This code does not give me lagmeancost of the year 2003. Basically, it gives lagmeancost of 1 year, then missing value for the next year, and then gives value for the year after that ....
GVKEY Quarter Meancost lagmeancost 1001 2001Q1 11 . 1001 2001Q2 11 . 1001 2001Q3 11 . 1001 2001Q4 11 . 1001 2002Q1 12 11 1001 2002Q2 12 11 1001 2002Q3 12 11 1001 2002Q4 12 11 1001 2003Q1 13 . 1001 2003Q2 13 . 1001 2003Q3 13 . 1001 2003Q4 13 . 1001 2004Q1 14 13 1001 2004Q2 14 13 1001 2004Q3 14 13 1001 2004Q4 14 13 …. …. …. ….
NVM, you're just lagging which is easier.
data want;
set sashelp.stocks;
by stock;
lagged=lag4(open);
if first.stock then counter=1;
else counter+1;
if counter<=4 then lagged=.;
run;
data have;
input GVKEY Quarter : yyq6. Meancost;
format Quarter yyq6.;
datalines;
1001 2001Q1 11 .
1001 2001Q2 11 .
1001 2001Q3 11 .
1001 2001Q4 11 .
1001 2002Q1 12 11
1001 2002Q2 12 11
1001 2002Q3 12 11
1001 2002Q4 12 11
1002 2006Q1 65 .
1002 2006Q2 65 .
1002 2006Q3 65 .
1002 2006Q4 65 .
1002 2007Q1 70 65
1002 2007Q2 70 65
1002 2007Q3 70 65
1002 2007Q4 70 65
;
run;
data temp;
set have(rename=(Quarter=_Quarter));
Quarter=intnx('qtr',_Quarter,4);
format Quarter yyq6.;
drop _Quarter;
run;
data want;
merge have(in=ina) temp(keep=GVKEY Quarter Meancost
rename=( Meancost=_Meancost));
by GVKEY Quarter;
if ina;
run;
Available on demand!
Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.