ignore first n observations by group

Accepted Solution Solved
Reply
Contributor
Posts: 42
Accepted Solution

ignore first n observations by group

Hi everyone,

 

I have a dataset with variable "meancost". Now I want to create new variable "lagmeancost" which equal to meancost in previous 4 quarters (4 lag). For example, the lagmeancost of GVKEY (firmid) 1001 in 2002Q1 = meancost in 2001Q1 (=11)

 

The problem is that I have to ignore the first 4 observations in each group to avoid the lagmeancost to be contaminated by data of other company.

 

The output data should look like this

GVKEYQuarterMeancostlagmeancost
10012001Q111.
10012001Q211.
10012001Q311.
10012001Q411.
10012002Q11211
10012002Q21211
10012002Q31211
10012002Q41211
10022006Q165.
10022006Q265.
10022006Q365.
10022006Q465.
10022007Q17065
10022007Q27065
10022007Q37065
10022007Q47065

 

Any idea how to do that? Thank you very much


Accepted Solutions
Solution
‎01-06-2018 08:42 PM
Super User
Posts: 23,343

Re: ignore first n observations by group

NVM, you're just lagging which is easier. 

 

data want;
set sashelp.stocks;

by stock;

lagged=lag4(open);

if first.stock then counter=1;
else counter+1;

if counter<=4 then lagged=.;

run;

View solution in original post


All Replies
Super User
Posts: 23,343

Re: ignore first n observations by group

Posted in reply to trungcva112

PROC EXPAND

Contributor
Posts: 42

Re: ignore first n observations by group

Could you be more specific?Thanks

PROC Star
Posts: 1,603

Re: ignore first n observations by group

Posted in reply to trungcva112

data have;
input (GVKEY Quarter Meancost) ($);
datalines;
1001 2001Q1 11 .
1001 2001Q2 11 .
1001 2001Q3 11 .
1001 2001Q4 11 .
1001 2002Q1 12 11
1001 2002Q2 12 11
1001 2002Q3 12 11
1001 2002Q4 12 11
1002 2006Q1 65 .
1002 2006Q2 65 .
1002 2006Q3 65 .
1002 2006Q4 65 .
1002 2007Q1 70 65
1002 2007Q2 70 65
1002 2007Q3 70 65
1002 2007Q4 70 65
run;

data want;
set have;
by gvkey;
if first.gvkey then call missing(_t,_k);
retain _k _t;
if lag(meancost) ne meancost then _t+1;
if mod(_t,2) ne 0 then _k=meancost;
else lagmeancost=_k ;
drop _:;
run;

Contributor
Posts: 42

Re: ignore first n observations by group

Posted in reply to novinosrin

I tried your code but it gives me missing value of lagmeancost where it should have a value.

The output of this code is like this. This code does not give me lagmeancost of the year 2003. Basically, it gives lagmeancost of 1 year, then missing value for the next year, and then gives value for the year after that ....

GVKEYQuarterMeancostlagmeancost
10012001Q111.
10012001Q211.
10012001Q311.
10012001Q411.
10012002Q11211
10012002Q21211
10012002Q31211
10012002Q41211
10012003Q113.
10012003Q213.
10012003Q313.
10012003Q413.
10012004Q11413
10012004Q21413
10012004Q31413
10012004Q41413
….….….….
PROC Star
Posts: 1,603

Re: ignore first n observations by group

Posted in reply to trungcva112

Your output sample in your original post skips , so i coded accordingly. Do you want lag values for all quarters?

Contributor
Posts: 42

Re: ignore first n observations by group

Posted in reply to novinosrin

Yes, I would like to compute lagmeancost for all quarter, except the first 4 quarters of each GVKEY (firmid) because obviously there is no lag year for the first 4 quarters.

 

The dataset I posted is just a small sample, my full data is like this:

GVKEYQuarterMeancostlagmeancost
10012001Q111.
10012001Q211.
10012001Q311.
10012001Q411.
10012002Q11211
10012002Q21211
10012002Q31211
10012002Q41211
1001…….…….…….
1001…….…….…….
10022006Q365.
10022006Q465.
10022007Q170.
10022007Q270.
10022007Q37065
10022007Q47065
10022008Q17570
10022008Q27570
10022008Q37570
10022008Q47570
1002…….…….…….
1002…….…….…….
Trusted Advisor
Posts: 1,312

Re: ignore first n observations by group

Posted in reply to trungcva112

You want trailing 4-quarter moving averages, where the window moves once per year instead of once per quarter:

 

data want;
  set have;
  by gvkey;
  retain lagmeancost;

  lagmeancost=ifn(mod(_n_,4)=1,mean(lag1(cost),lag2(cost),lag3(cost),lag4(cost)),lagmeancost);
  if first.gvkey then lagmeancost=.;
run;

 

The above  assumes that every gvkey has exactly 4 records per year.

 

If that's not the case, then instead of looking for every 4th record you need to determine if the current record belongs to a different year than the prior record. Assuming your QUARTER variable is actually a SAS date variable, you could:

 

data want;
  set have;
  by gvkey;
  retain lagmeancost;

  array c{4} _temporary_;

  if intck('year',lag(quarter),quarter)=1 then do;
    lagmeancost=mean(of c{*});
    call missing(of c{*});
  end;
  if first.gvkey then call missing(lagmeancost,of c{*});
  c{qtr(quarter)}=cost;
run;

 

 

 

 

PROC Star
Posts: 1,603

Re: ignore first n observations by group

[ Edited ]
Posted in reply to trungcva112

Please accept my apologies, I think this is what you are after:

 

data want;
set have;
by gvkey;
if first.gvkey then call missing(_t,lagmeancost);
retain _t  lagmeancost;
if lag(meancost) ne meancost then

do;
lagmeancost=_t;

_t=meancost;
end;
drop _:;
run;

 

EDIT: Typo edited

Super User
Posts: 23,343

Re: ignore first n observations by group

Posted in reply to trungcva112

 

 

RTM

http://documentation.sas.com/?docsetId=etsug&docsetTarget=etsug_expand_examples04.htm&docsetVersion=...

 

http://support.sas.com/resources/papers/proceedings10/093-2010.pdf

 

Try MOVAVE3 TRIMLEFT3

 

Otherwise there's the good old temporary array trick, I've left more in than I usually would so you can see the flow of data. For comparison, PROC EXPAND would be 4 lines of code. 

 

 

data have;
    set sashelp.stocks;
    by stock;
    retain counter p0-p3;
    array p(0:3) p0-p3;

    if first.stock then
        do;
            call missing(of p(*));
            counter=1;
        end;
    else
        counter=counter+1;
    p{mod(counter, 4)}=open;

    if counter>4 then
        moving_avg4=mean(of p(*));
    keep stock date open moving_: p: counter;
run;

 


trungcva112 wrote:

I tried your code but it gives me missing value of lagmeancost where it should have a value.

The output of this code is like this. This code does not give me lagmeancost of the year 2003. Basically, it gives lagmeancost of 1 year, then missing value for the next year, and then gives value for the year after that ....

GVKEY Quarter Meancost lagmeancost
1001 2001Q1 11 .
1001 2001Q2 11 .
1001 2001Q3 11 .
1001 2001Q4 11 .
1001 2002Q1 12 11
1001 2002Q2 12 11
1001 2002Q3 12 11
1001 2002Q4 12 11
1001 2003Q1 13 .
1001 2003Q2 13 .
1001 2003Q3 13 .
1001 2003Q4 13 .
1001 2004Q1 14 13
1001 2004Q2 14 13
1001 2004Q3 14 13
1001 2004Q4 14 13
…. …. …. ….

 

 

Solution
‎01-06-2018 08:42 PM
Super User
Posts: 23,343

Re: ignore first n observations by group

NVM, you're just lagging which is easier. 

 

data want;
set sashelp.stocks;

by stock;

lagged=lag4(open);

if first.stock then counter=1;
else counter+1;

if counter<=4 then lagged=.;

run;
Super User
Posts: 10,698

Re: ignore first n observations by group

Posted in reply to trungcva112
data have;
input GVKEY Quarter : yyq6. Meancost;
format Quarter   yyq6.;
datalines;
1001 2001Q1 11 .
1001 2001Q2 11 .
1001 2001Q3 11 .
1001 2001Q4 11 .
1001 2002Q1 12 11
1001 2002Q2 12 11
1001 2002Q3 12 11
1001 2002Q4 12 11
1002 2006Q1 65 .
1002 2006Q2 65 .
1002 2006Q3 65 .
1002 2006Q4 65 .
1002 2007Q1 70 65
1002 2007Q2 70 65
1002 2007Q3 70 65
1002 2007Q4 70 65
;
run;
data temp; 
 set have(rename=(Quarter=_Quarter));
 Quarter=intnx('qtr',_Quarter,4);
format Quarter   yyq6.;
drop _Quarter;
run;

data want;
 merge have(in=ina) temp(keep=GVKEY Quarter Meancost 
 rename=( Meancost=_Meancost));
 by GVKEY Quarter;
 if ina;
run;
Contributor
Posts: 42

Re: ignore first n observations by group

Thanks everyone. Reeza's and Ksharp's code work
☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 12 replies
  • 344 views
  • 1 like
  • 5 in conversation