BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
trungcva112
Obsidian | Level 7

Hi everyone,

 

I have a dataset with variable "meancost". Now I want to create new variable "lagmeancost" which equal to meancost in previous 4 quarters (4 lag). For example, the lagmeancost of GVKEY (firmid) 1001 in 2002Q1 = meancost in 2001Q1 (=11)

 

The problem is that I have to ignore the first 4 observations in each group to avoid the lagmeancost to be contaminated by data of other company.

 

The output data should look like this

GVKEYQuarterMeancostlagmeancost
10012001Q111.
10012001Q211.
10012001Q311.
10012001Q411.
10012002Q11211
10012002Q21211
10012002Q31211
10012002Q41211
10022006Q165.
10022006Q265.
10022006Q365.
10022006Q465.
10022007Q17065
10022007Q27065
10022007Q37065
10022007Q47065

 

Any idea how to do that? Thank you very much

1 ACCEPTED SOLUTION

Accepted Solutions
Reeza
Super User

NVM, you're just lagging which is easier. 

 

data want;
set sashelp.stocks;

by stock;

lagged=lag4(open);

if first.stock then counter=1;
else counter+1;

if counter<=4 then lagged=.;

run;

View solution in original post

12 REPLIES 12
Reeza
Super User

PROC EXPAND

trungcva112
Obsidian | Level 7

Could you be more specific?Thanks

novinosrin
Tourmaline | Level 20

data have;
input (GVKEY Quarter Meancost) ($);
datalines;
1001 2001Q1 11 .
1001 2001Q2 11 .
1001 2001Q3 11 .
1001 2001Q4 11 .
1001 2002Q1 12 11
1001 2002Q2 12 11
1001 2002Q3 12 11
1001 2002Q4 12 11
1002 2006Q1 65 .
1002 2006Q2 65 .
1002 2006Q3 65 .
1002 2006Q4 65 .
1002 2007Q1 70 65
1002 2007Q2 70 65
1002 2007Q3 70 65
1002 2007Q4 70 65
run;

data want;
set have;
by gvkey;
if first.gvkey then call missing(_t,_k);
retain _k _t;
if lag(meancost) ne meancost then _t+1;
if mod(_t,2) ne 0 then _k=meancost;
else lagmeancost=_k ;
drop _:;
run;

trungcva112
Obsidian | Level 7

I tried your code but it gives me missing value of lagmeancost where it should have a value.

The output of this code is like this. This code does not give me lagmeancost of the year 2003. Basically, it gives lagmeancost of 1 year, then missing value for the next year, and then gives value for the year after that ....

GVKEYQuarterMeancostlagmeancost
10012001Q111.
10012001Q211.
10012001Q311.
10012001Q411.
10012002Q11211
10012002Q21211
10012002Q31211
10012002Q41211
10012003Q113.
10012003Q213.
10012003Q313.
10012003Q413.
10012004Q11413
10012004Q21413
10012004Q31413
10012004Q41413
….….….….
novinosrin
Tourmaline | Level 20

Your output sample in your original post skips , so i coded accordingly. Do you want lag values for all quarters?

trungcva112
Obsidian | Level 7

Yes, I would like to compute lagmeancost for all quarter, except the first 4 quarters of each GVKEY (firmid) because obviously there is no lag year for the first 4 quarters.

 

The dataset I posted is just a small sample, my full data is like this:

GVKEYQuarterMeancostlagmeancost
10012001Q111.
10012001Q211.
10012001Q311.
10012001Q411.
10012002Q11211
10012002Q21211
10012002Q31211
10012002Q41211
1001…….…….…….
1001…….…….…….
10022006Q365.
10022006Q465.
10022007Q170.
10022007Q270.
10022007Q37065
10022007Q47065
10022008Q17570
10022008Q27570
10022008Q37570
10022008Q47570
1002…….…….…….
1002…….…….…….
mkeintz
PROC Star

You want trailing 4-quarter moving averages, where the window moves once per year instead of once per quarter:

 

data want;
  set have;
  by gvkey;
  retain lagmeancost;

  lagmeancost=ifn(mod(_n_,4)=1,mean(lag1(cost),lag2(cost),lag3(cost),lag4(cost)),lagmeancost);
  if first.gvkey then lagmeancost=.;
run;

 

The above  assumes that every gvkey has exactly 4 records per year.

 

If that's not the case, then instead of looking for every 4th record you need to determine if the current record belongs to a different year than the prior record. Assuming your QUARTER variable is actually a SAS date variable, you could:

 

data want;
  set have;
  by gvkey;
  retain lagmeancost;

  array c{4} _temporary_;

  if intck('year',lag(quarter),quarter)=1 then do;
    lagmeancost=mean(of c{*});
    call missing(of c{*});
  end;
  if first.gvkey then call missing(lagmeancost,of c{*});
  c{qtr(quarter)}=cost;
run;

 

 

 

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
novinosrin
Tourmaline | Level 20

Please accept my apologies, I think this is what you are after:

 

data want;
set have;
by gvkey;
if first.gvkey then call missing(_t,lagmeancost);
retain _t  lagmeancost;
if lag(meancost) ne meancost then

do;
lagmeancost=_t;

_t=meancost;
end;
drop _:;
run;

 

EDIT: Typo edited

Reeza
Super User

 

 

RTM

http://documentation.sas.com/?docsetId=etsug&docsetTarget=etsug_expand_examples04.htm&docsetVersion=...

 

http://support.sas.com/resources/papers/proceedings10/093-2010.pdf

 

Try MOVAVE3 TRIMLEFT3

 

Otherwise there's the good old temporary array trick, I've left more in than I usually would so you can see the flow of data. For comparison, PROC EXPAND would be 4 lines of code. 

 

 

data have;
    set sashelp.stocks;
    by stock;
    retain counter p0-p3;
    array p(0:3) p0-p3;

    if first.stock then
        do;
            call missing(of p(*));
            counter=1;
        end;
    else
        counter=counter+1;
    p{mod(counter, 4)}=open;

    if counter>4 then
        moving_avg4=mean(of p(*));
    keep stock date open moving_: p: counter;
run;

 


@trungcva112 wrote:

I tried your code but it gives me missing value of lagmeancost where it should have a value.

The output of this code is like this. This code does not give me lagmeancost of the year 2003. Basically, it gives lagmeancost of 1 year, then missing value for the next year, and then gives value for the year after that ....

GVKEY Quarter Meancost lagmeancost
1001 2001Q1 11 .
1001 2001Q2 11 .
1001 2001Q3 11 .
1001 2001Q4 11 .
1001 2002Q1 12 11
1001 2002Q2 12 11
1001 2002Q3 12 11
1001 2002Q4 12 11
1001 2003Q1 13 .
1001 2003Q2 13 .
1001 2003Q3 13 .
1001 2003Q4 13 .
1001 2004Q1 14 13
1001 2004Q2 14 13
1001 2004Q3 14 13
1001 2004Q4 14 13
…. …. …. ….

 

 

Reeza
Super User

NVM, you're just lagging which is easier. 

 

data want;
set sashelp.stocks;

by stock;

lagged=lag4(open);

if first.stock then counter=1;
else counter+1;

if counter<=4 then lagged=.;

run;
Ksharp
Super User
data have;
input GVKEY Quarter : yyq6. Meancost;
format Quarter   yyq6.;
datalines;
1001 2001Q1 11 .
1001 2001Q2 11 .
1001 2001Q3 11 .
1001 2001Q4 11 .
1001 2002Q1 12 11
1001 2002Q2 12 11
1001 2002Q3 12 11
1001 2002Q4 12 11
1002 2006Q1 65 .
1002 2006Q2 65 .
1002 2006Q3 65 .
1002 2006Q4 65 .
1002 2007Q1 70 65
1002 2007Q2 70 65
1002 2007Q3 70 65
1002 2007Q4 70 65
;
run;
data temp; 
 set have(rename=(Quarter=_Quarter));
 Quarter=intnx('qtr',_Quarter,4);
format Quarter   yyq6.;
drop _Quarter;
run;

data want;
 merge have(in=ina) temp(keep=GVKEY Quarter Meancost 
 rename=( Meancost=_Meancost));
 by GVKEY Quarter;
 if ina;
run;
trungcva112
Obsidian | Level 7
Thanks everyone. Reeza's and Ksharp's code work

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 12 replies
  • 1211 views
  • 1 like
  • 5 in conversation