Hi everyone,
I have a dataset with variable "meancost". Now I want to create new variable "lagmeancost" which equal to meancost in previous 4 quarters (4 lag). For example, the lagmeancost of GVKEY (firmid) 1001 in 2002Q1 = meancost in 2001Q1 (=11)
The problem is that I have to ignore the first 4 observations in each group to avoid the lagmeancost to be contaminated by data of other company.
The output data should look like this
GVKEY | Quarter | Meancost | lagmeancost |
1001 | 2001Q1 | 11 | . |
1001 | 2001Q2 | 11 | . |
1001 | 2001Q3 | 11 | . |
1001 | 2001Q4 | 11 | . |
1001 | 2002Q1 | 12 | 11 |
1001 | 2002Q2 | 12 | 11 |
1001 | 2002Q3 | 12 | 11 |
1001 | 2002Q4 | 12 | 11 |
1002 | 2006Q1 | 65 | . |
1002 | 2006Q2 | 65 | . |
1002 | 2006Q3 | 65 | . |
1002 | 2006Q4 | 65 | . |
1002 | 2007Q1 | 70 | 65 |
1002 | 2007Q2 | 70 | 65 |
1002 | 2007Q3 | 70 | 65 |
1002 | 2007Q4 | 70 | 65 |
Any idea how to do that? Thank you very much
NVM, you're just lagging which is easier.
data want;
set sashelp.stocks;
by stock;
lagged=lag4(open);
if first.stock then counter=1;
else counter+1;
if counter<=4 then lagged=.;
run;
PROC EXPAND
Could you be more specific?Thanks
data have;
input (GVKEY Quarter Meancost) ($);
datalines;
1001 2001Q1 11 .
1001 2001Q2 11 .
1001 2001Q3 11 .
1001 2001Q4 11 .
1001 2002Q1 12 11
1001 2002Q2 12 11
1001 2002Q3 12 11
1001 2002Q4 12 11
1002 2006Q1 65 .
1002 2006Q2 65 .
1002 2006Q3 65 .
1002 2006Q4 65 .
1002 2007Q1 70 65
1002 2007Q2 70 65
1002 2007Q3 70 65
1002 2007Q4 70 65
run;
data want;
set have;
by gvkey;
if first.gvkey then call missing(_t,_k);
retain _k _t;
if lag(meancost) ne meancost then _t+1;
if mod(_t,2) ne 0 then _k=meancost;
else lagmeancost=_k ;
drop _:;
run;
I tried your code but it gives me missing value of lagmeancost where it should have a value.
The output of this code is like this. This code does not give me lagmeancost of the year 2003. Basically, it gives lagmeancost of 1 year, then missing value for the next year, and then gives value for the year after that ....
GVKEY | Quarter | Meancost | lagmeancost |
1001 | 2001Q1 | 11 | . |
1001 | 2001Q2 | 11 | . |
1001 | 2001Q3 | 11 | . |
1001 | 2001Q4 | 11 | . |
1001 | 2002Q1 | 12 | 11 |
1001 | 2002Q2 | 12 | 11 |
1001 | 2002Q3 | 12 | 11 |
1001 | 2002Q4 | 12 | 11 |
1001 | 2003Q1 | 13 | . |
1001 | 2003Q2 | 13 | . |
1001 | 2003Q3 | 13 | . |
1001 | 2003Q4 | 13 | . |
1001 | 2004Q1 | 14 | 13 |
1001 | 2004Q2 | 14 | 13 |
1001 | 2004Q3 | 14 | 13 |
1001 | 2004Q4 | 14 | 13 |
…. | …. | …. | …. |
Your output sample in your original post skips , so i coded accordingly. Do you want lag values for all quarters?
Yes, I would like to compute lagmeancost for all quarter, except the first 4 quarters of each GVKEY (firmid) because obviously there is no lag year for the first 4 quarters.
The dataset I posted is just a small sample, my full data is like this:
GVKEY | Quarter | Meancost | lagmeancost |
1001 | 2001Q1 | 11 | . |
1001 | 2001Q2 | 11 | . |
1001 | 2001Q3 | 11 | . |
1001 | 2001Q4 | 11 | . |
1001 | 2002Q1 | 12 | 11 |
1001 | 2002Q2 | 12 | 11 |
1001 | 2002Q3 | 12 | 11 |
1001 | 2002Q4 | 12 | 11 |
1001 | ……. | ……. | ……. |
1001 | ……. | ……. | ……. |
1002 | 2006Q3 | 65 | . |
1002 | 2006Q4 | 65 | . |
1002 | 2007Q1 | 70 | . |
1002 | 2007Q2 | 70 | . |
1002 | 2007Q3 | 70 | 65 |
1002 | 2007Q4 | 70 | 65 |
1002 | 2008Q1 | 75 | 70 |
1002 | 2008Q2 | 75 | 70 |
1002 | 2008Q3 | 75 | 70 |
1002 | 2008Q4 | 75 | 70 |
1002 | ……. | ……. | ……. |
1002 | ……. | ……. | ……. |
You want trailing 4-quarter moving averages, where the window moves once per year instead of once per quarter:
data want;
set have;
by gvkey;
retain lagmeancost;
lagmeancost=ifn(mod(_n_,4)=1,mean(lag1(cost),lag2(cost),lag3(cost),lag4(cost)),lagmeancost);
if first.gvkey then lagmeancost=.;
run;
The above assumes that every gvkey has exactly 4 records per year.
If that's not the case, then instead of looking for every 4th record you need to determine if the current record belongs to a different year than the prior record. Assuming your QUARTER variable is actually a SAS date variable, you could:
data want;
set have;
by gvkey;
retain lagmeancost;
array c{4} _temporary_;
if intck('year',lag(quarter),quarter)=1 then do;
lagmeancost=mean(of c{*});
call missing(of c{*});
end;
if first.gvkey then call missing(lagmeancost,of c{*});
c{qtr(quarter)}=cost;
run;
Please accept my apologies, I think this is what you are after:
data want;
set have;
by gvkey;
if first.gvkey then call missing(_t,lagmeancost);
retain _t lagmeancost;
if lag(meancost) ne meancost then
do;
lagmeancost=_t;
_t=meancost;
end;
drop _:;
run;
EDIT: Typo edited
RTM
http://support.sas.com/resources/papers/proceedings10/093-2010.pdf
Try MOVAVE3 TRIMLEFT3
Otherwise there's the good old temporary array trick, I've left more in than I usually would so you can see the flow of data. For comparison, PROC EXPAND would be 4 lines of code.
data have;
set sashelp.stocks;
by stock;
retain counter p0-p3;
array p(0:3) p0-p3;
if first.stock then
do;
call missing(of p(*));
counter=1;
end;
else
counter=counter+1;
p{mod(counter, 4)}=open;
if counter>4 then
moving_avg4=mean(of p(*));
keep stock date open moving_: p: counter;
run;
@trungcva112 wrote:
I tried your code but it gives me missing value of lagmeancost where it should have a value.
The output of this code is like this. This code does not give me lagmeancost of the year 2003. Basically, it gives lagmeancost of 1 year, then missing value for the next year, and then gives value for the year after that ....
GVKEY Quarter Meancost lagmeancost 1001 2001Q1 11 . 1001 2001Q2 11 . 1001 2001Q3 11 . 1001 2001Q4 11 . 1001 2002Q1 12 11 1001 2002Q2 12 11 1001 2002Q3 12 11 1001 2002Q4 12 11 1001 2003Q1 13 . 1001 2003Q2 13 . 1001 2003Q3 13 . 1001 2003Q4 13 . 1001 2004Q1 14 13 1001 2004Q2 14 13 1001 2004Q3 14 13 1001 2004Q4 14 13 …. …. …. ….
NVM, you're just lagging which is easier.
data want;
set sashelp.stocks;
by stock;
lagged=lag4(open);
if first.stock then counter=1;
else counter+1;
if counter<=4 then lagged=.;
run;
data have;
input GVKEY Quarter : yyq6. Meancost;
format Quarter yyq6.;
datalines;
1001 2001Q1 11 .
1001 2001Q2 11 .
1001 2001Q3 11 .
1001 2001Q4 11 .
1001 2002Q1 12 11
1001 2002Q2 12 11
1001 2002Q3 12 11
1001 2002Q4 12 11
1002 2006Q1 65 .
1002 2006Q2 65 .
1002 2006Q3 65 .
1002 2006Q4 65 .
1002 2007Q1 70 65
1002 2007Q2 70 65
1002 2007Q3 70 65
1002 2007Q4 70 65
;
run;
data temp;
set have(rename=(Quarter=_Quarter));
Quarter=intnx('qtr',_Quarter,4);
format Quarter yyq6.;
drop _Quarter;
run;
data want;
merge have(in=ina) temp(keep=GVKEY Quarter Meancost
rename=( Meancost=_Meancost));
by GVKEY Quarter;
if ina;
run;
Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.
Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.