Solved
Contributor
Posts: 42

# ignore first n observations by group

Hi everyone,

I have a dataset with variable "meancost". Now I want to create new variable "lagmeancost" which equal to meancost in previous 4 quarters (4 lag). For example, the lagmeancost of GVKEY (firmid) 1001 in 2002Q1 = meancost in 2001Q1 (=11)

The problem is that I have to ignore the first 4 observations in each group to avoid the lagmeancost to be contaminated by data of other company.

The output data should look like this

 GVKEY Quarter Meancost lagmeancost 1001 2001Q1 11 . 1001 2001Q2 11 . 1001 2001Q3 11 . 1001 2001Q4 11 . 1001 2002Q1 12 11 1001 2002Q2 12 11 1001 2002Q3 12 11 1001 2002Q4 12 11 1002 2006Q1 65 . 1002 2006Q2 65 . 1002 2006Q3 65 . 1002 2006Q4 65 . 1002 2007Q1 70 65 1002 2007Q2 70 65 1002 2007Q3 70 65 1002 2007Q4 70 65

Any idea how to do that? Thank you very much

Accepted Solutions
Solution
‎01-06-2018 08:42 PM
Super User
Posts: 23,343

## Re: ignore first n observations by group

NVM, you're just lagging which is easier.

``````data want;
set sashelp.stocks;

by stock;

lagged=lag4(open);

if first.stock then counter=1;
else counter+1;

if counter<=4 then lagged=.;

run;``````

All Replies
Super User
Posts: 23,343

PROC EXPAND

Contributor
Posts: 42

## Re: ignore first n observations by group

Could you be more specific?Thanks

PROC Star
Posts: 1,603

## Re: ignore first n observations by group

data have;
input (GVKEY Quarter Meancost) (\$);
datalines;
1001 2001Q1 11 .
1001 2001Q2 11 .
1001 2001Q3 11 .
1001 2001Q4 11 .
1001 2002Q1 12 11
1001 2002Q2 12 11
1001 2002Q3 12 11
1001 2002Q4 12 11
1002 2006Q1 65 .
1002 2006Q2 65 .
1002 2006Q3 65 .
1002 2006Q4 65 .
1002 2007Q1 70 65
1002 2007Q2 70 65
1002 2007Q3 70 65
1002 2007Q4 70 65
run;

data want;
set have;
by gvkey;
if first.gvkey then call missing(_t,_k);
retain _k _t;
if lag(meancost) ne meancost then _t+1;
if mod(_t,2) ne 0 then _k=meancost;
else lagmeancost=_k ;
drop _:;
run;

Contributor
Posts: 42

## Re: ignore first n observations by group

I tried your code but it gives me missing value of lagmeancost where it should have a value.

The output of this code is like this. This code does not give me lagmeancost of the year 2003. Basically, it gives lagmeancost of 1 year, then missing value for the next year, and then gives value for the year after that ....

 GVKEY Quarter Meancost lagmeancost 1001 2001Q1 11 . 1001 2001Q2 11 . 1001 2001Q3 11 . 1001 2001Q4 11 . 1001 2002Q1 12 11 1001 2002Q2 12 11 1001 2002Q3 12 11 1001 2002Q4 12 11 1001 2003Q1 13 . 1001 2003Q2 13 . 1001 2003Q3 13 . 1001 2003Q4 13 . 1001 2004Q1 14 13 1001 2004Q2 14 13 1001 2004Q3 14 13 1001 2004Q4 14 13 …. …. …. ….
PROC Star
Posts: 1,603

## Re: ignore first n observations by group

Your output sample in your original post skips , so i coded accordingly. Do you want lag values for all quarters?

Contributor
Posts: 42

## Re: ignore first n observations by group

Yes, I would like to compute lagmeancost for all quarter, except the first 4 quarters of each GVKEY (firmid) because obviously there is no lag year for the first 4 quarters.

The dataset I posted is just a small sample, my full data is like this:

 GVKEY Quarter Meancost lagmeancost 1001 2001Q1 11 . 1001 2001Q2 11 . 1001 2001Q3 11 . 1001 2001Q4 11 . 1001 2002Q1 12 11 1001 2002Q2 12 11 1001 2002Q3 12 11 1001 2002Q4 12 11 1001 ……. ……. ……. 1001 ……. ……. ……. 1002 2006Q3 65 . 1002 2006Q4 65 . 1002 2007Q1 70 . 1002 2007Q2 70 . 1002 2007Q3 70 65 1002 2007Q4 70 65 1002 2008Q1 75 70 1002 2008Q2 75 70 1002 2008Q3 75 70 1002 2008Q4 75 70 1002 ……. ……. ……. 1002 ……. ……. …….
Posts: 1,312

## Re: ignore first n observations by group

You want trailing 4-quarter moving averages, where the window moves once per year instead of once per quarter:

``````data want;
set have;
by gvkey;
retain lagmeancost;

lagmeancost=ifn(mod(_n_,4)=1,mean(lag1(cost),lag2(cost),lag3(cost),lag4(cost)),lagmeancost);
if first.gvkey then lagmeancost=.;
run;
``````

The above  assumes that every gvkey has exactly 4 records per year.

If that's not the case, then instead of looking for every 4th record you need to determine if the current record belongs to a different year than the prior record. Assuming your QUARTER variable is actually a SAS date variable, you could:

``````data want;
set have;
by gvkey;
retain lagmeancost;

array c{4} _temporary_;

if intck('year',lag(quarter),quarter)=1 then do;
lagmeancost=mean(of c{*});
call missing(of c{*});
end;
if first.gvkey then call missing(lagmeancost,of c{*});
c{qtr(quarter)}=cost;
run;
``````

PROC Star
Posts: 1,603

## Re: ignore first n observations by group

[ Edited ]

Please accept my apologies, I think this is what you are after:

data want;
set have;
by gvkey;
if first.gvkey then call missing(_t,lagmeancost);
retain _t  lagmeancost;
if lag(meancost) ne meancost then

do;
lagmeancost=_t;

_t=meancost;
end;
drop _:;
run;

EDIT: Typo edited

Super User
Posts: 23,343

## Re: ignore first n observations by group

RTM

http://documentation.sas.com/?docsetId=etsug&docsetTarget=etsug_expand_examples04.htm&docsetVersion=...

http://support.sas.com/resources/papers/proceedings10/093-2010.pdf

Try MOVAVE3 TRIMLEFT3

Otherwise there's the good old temporary array trick, I've left more in than I usually would so you can see the flow of data. For comparison, PROC EXPAND would be 4 lines of code.

``````data have;
set sashelp.stocks;
by stock;
retain counter p0-p3;
array p(0:3) p0-p3;

if first.stock then
do;
call missing(of p(*));
counter=1;
end;
else
counter=counter+1;
p{mod(counter, 4)}=open;

if counter>4 then
moving_avg4=mean(of p(*));
keep stock date open moving_: p: counter;
run;``````

trungcva112 wrote:

I tried your code but it gives me missing value of lagmeancost where it should have a value.

The output of this code is like this. This code does not give me lagmeancost of the year 2003. Basically, it gives lagmeancost of 1 year, then missing value for the next year, and then gives value for the year after that ....

 GVKEY Quarter Meancost lagmeancost 1001 2001Q1 11 . 1001 2001Q2 11 . 1001 2001Q3 11 . 1001 2001Q4 11 . 1001 2002Q1 12 11 1001 2002Q2 12 11 1001 2002Q3 12 11 1001 2002Q4 12 11 1001 2003Q1 13 . 1001 2003Q2 13 . 1001 2003Q3 13 . 1001 2003Q4 13 . 1001 2004Q1 14 13 1001 2004Q2 14 13 1001 2004Q3 14 13 1001 2004Q4 14 13 …. …. …. ….

Solution
‎01-06-2018 08:42 PM
Super User
Posts: 23,343

## Re: ignore first n observations by group

NVM, you're just lagging which is easier.

``````data want;
set sashelp.stocks;

by stock;

lagged=lag4(open);

if first.stock then counter=1;
else counter+1;

if counter<=4 then lagged=.;

run;``````
Super User
Posts: 10,698

## Re: ignore first n observations by group

``````data have;
input GVKEY Quarter : yyq6. Meancost;
format Quarter   yyq6.;
datalines;
1001 2001Q1 11 .
1001 2001Q2 11 .
1001 2001Q3 11 .
1001 2001Q4 11 .
1001 2002Q1 12 11
1001 2002Q2 12 11
1001 2002Q3 12 11
1001 2002Q4 12 11
1002 2006Q1 65 .
1002 2006Q2 65 .
1002 2006Q3 65 .
1002 2006Q4 65 .
1002 2007Q1 70 65
1002 2007Q2 70 65
1002 2007Q3 70 65
1002 2007Q4 70 65
;
run;
data temp;
set have(rename=(Quarter=_Quarter));
Quarter=intnx('qtr',_Quarter,4);
format Quarter   yyq6.;
drop _Quarter;
run;

data want;
merge have(in=ina) temp(keep=GVKEY Quarter Meancost
rename=( Meancost=_Meancost));
by GVKEY Quarter;
if ina;
run;``````
Contributor
Posts: 42

## Re: ignore first n observations by group

Thanks everyone. Reeza's and Ksharp's code work
☑ This topic is solved.

Discussion stats
• 12 replies
• 344 views
• 1 like
• 5 in conversation