DATA Step, Macro, Functions and more

Rolling average_another question

Reply
Contributor
Posts: 26

Rolling average_another question

Dear PGStats,
I mistakenly set the condition that at least three observations available for variable 'rds'. Condition should be that rds5y(average varible) should be available for at least continuous three years per company(gvkey). How can I change the below program?
Thank you very much
 
 
data have;
input gvkey fyear rds;
datalines;
0001    1980   0.05
0001    1981   0.02
0001    1982   0.03
0001    1983   0.04
0001    1984   0.04
0001    1985   0.05
0001    1986   0.06
;

proc sql;
create table want as
select 
    a.gvkey, 
    a.fyear, 
    a.rds, 
    case 
        when count(b.rds) >= 3 then mean(b.rds) 
        else . 
        end as rds5y
from 
    have as a left join
    have as b 
        on  a.gvkey=b.gvkey and 
            b.fyear between a.fyear-5 and a.fyear-1
group by a.gvkey, a.fyear, a.rds;
select * from want;
quit;
 
 
Respected Advisor
Posts: 4,930

Re: Rolling average_another question

So what should be the result for the above example?

PG
Contributor
Posts: 26

Re: Rolling average_another question

The following line should be deleted without affecting other program lines and new program line is needed to put the condition that at least continuous three rds5y(averge) should be avaiable per company(gvkey).

when count(b.rds) >= 3 then mean(b.rds)

Thank you

Respected Advisor
Posts: 4,930

Re: Rolling average_another question

The above code returns table want as

                          gvkey     fyear       rds     rds5y
                              1      1980      0.05         .
                              1      1981      0.02         .
                              1      1982      0.03         .
                              1      1983      0.04  0.033333
                              1      1984      0.04     0.035
                              1      1985      0.05     0.036
                              1      1986      0.06     0.036

what would you want instead?

PG
Contributor
Posts: 26

Re: Rolling average_another question

                     The following observations should be dropped because the number of observations for rds5y is less than 3(2 obs).

                          gvkey     fyear       rds     rds5y
                              1      1980      0.05         .
                              1      1981      0.02         .
                              1      1982      0.03         .
                              1      1983      0.04         .
                              1      1984      0.04         .    
                              1      1985      0.05     0.036
                                       1986        .         0.036

 

 

                        The following observations should remain. 

 

 

                        gvkey     fyear       rds     rds5y
                              1      1980      0.05         .
                              1      1981      0.02         .
                              1      1982      0.03         .
                              1      1983      0.04         .
                              1      1984      0.04         .    
                              1      1985      0.05     0.036
                              1      1986      0.06     0.036

                              1      1987      0.03      0.044

Contributor
Posts: 26

Re: Rolling average_another question

Hi,

Could you please advise the program that just removes the following condition? Thank you

 

when count(b.rds) >= 3 then mean(b.rds)

 

 

data have;
input gvkey fyear rds;
datalines;
0001    1980   0.05
0001    1981   0.02
0001    1982   0.03
0001    1983   0.04
0001    1984   0.04
0001    1985   0.05
0001    1986   0.06
;

proc sql;
create table want as
select
    a.gvkey,
    a.fyear,
    a.rds,
    case
        when count(b.rds) >= 3 then mean(b.rds)
        else .
        end as rds5y
from
    have as a left join
    have as b
        on  a.gvkey=b.gvkey and
            b.fyear between a.fyear-5 and a.fyear-1
group by a.gvkey, a.fyear, a.rds;
select * from want;
quit;

Contributor
Posts: 26

Re: Rolling average_another question

Hi PGStats,

 

  

Could you please advise the program that just removes the following condition? 

 

when count(b.rds) >= 3 then mean(b.rds)

 

I would like to have the following result.

                                  rds5y

0001    1980   0.05    .
0001    1981   0.02    .
0001    1982   0.03    .
0001    1983   0.04    .
0001    1984   0.04    .
0001    1985   0.05    0.036
0001    1986   0.06    0.036

 

data have;
input gvkey fyear rds;
datalines;
0001    1980   0.05
0001    1981   0.02
0001    1982   0.03
0001    1983   0.04
0001    1984   0.04
0001    1985   0.05
0001    1986   0.06
;

proc sql;
create table want as
select
    a.gvkey,
    a.fyear,
    a.rds,
    case
        when count(b.rds) >= 3 then mean(b.rds)
        else .
        end as rds5y
from
    have as a left join
    have as b
        on  a.gvkey=b.gvkey and
            b.fyear between a.fyear-5 and a.fyear-1
group by a.gvkey, a.fyear, a.rds;
select * from want;
quit;

Super User
Posts: 19,855

Re: Rolling average_another question

Its not as easy as that. If you want to count continuous years you may need to use a data step rather than proc sql. If you search in here for rolling stats there's a bunch of methods that use a temporary array that would work
Ask a Question
Discussion stats
  • 7 replies
  • 231 views
  • 0 likes
  • 3 in conversation