BookmarkSubscribeRSS Feed
kimx0961
Obsidian | Level 7
Dear PGStats,
I mistakenly set the condition that at least three observations available for variable 'rds'. Condition should be that rds5y(average varible) should be available for at least continuous three years per company(gvkey). How can I change the below program?
Thank you very much
 
 
data have;
input gvkey fyear rds;
datalines;
0001    1980   0.05
0001    1981   0.02
0001    1982   0.03
0001    1983   0.04
0001    1984   0.04
0001    1985   0.05
0001    1986   0.06
;

proc sql;
create table want as
select 
    a.gvkey, 
    a.fyear, 
    a.rds, 
    case 
        when count(b.rds) >= 3 then mean(b.rds) 
        else . 
        end as rds5y
from 
    have as a left join
    have as b 
        on  a.gvkey=b.gvkey and 
            b.fyear between a.fyear-5 and a.fyear-1
group by a.gvkey, a.fyear, a.rds;
select * from want;
quit;
 
 
7 REPLIES 7
PGStats
Opal | Level 21

So what should be the result for the above example?

PG
kimx0961
Obsidian | Level 7

The following line should be deleted without affecting other program lines and new program line is needed to put the condition that at least continuous three rds5y(averge) should be avaiable per company(gvkey).

when count(b.rds) >= 3 then mean(b.rds)

Thank you

PGStats
Opal | Level 21

The above code returns table want as

                          gvkey     fyear       rds     rds5y
                              1      1980      0.05         .
                              1      1981      0.02         .
                              1      1982      0.03         .
                              1      1983      0.04  0.033333
                              1      1984      0.04     0.035
                              1      1985      0.05     0.036
                              1      1986      0.06     0.036

what would you want instead?

PG
kimx0961
Obsidian | Level 7

                     The following observations should be dropped because the number of observations for rds5y is less than 3(2 obs).

                          gvkey     fyear       rds     rds5y
                              1      1980      0.05         .
                              1      1981      0.02         .
                              1      1982      0.03         .
                              1      1983      0.04         .
                              1      1984      0.04         .    
                              1      1985      0.05     0.036
                                       1986        .         0.036

 

 

                        The following observations should remain. 

 

 

                        gvkey     fyear       rds     rds5y
                              1      1980      0.05         .
                              1      1981      0.02         .
                              1      1982      0.03         .
                              1      1983      0.04         .
                              1      1984      0.04         .    
                              1      1985      0.05     0.036
                              1      1986      0.06     0.036

                              1      1987      0.03      0.044

kimx0961
Obsidian | Level 7

Hi,

Could you please advise the program that just removes the following condition? Thank you

 

when count(b.rds) >= 3 then mean(b.rds)

 

 

data have;
input gvkey fyear rds;
datalines;
0001    1980   0.05
0001    1981   0.02
0001    1982   0.03
0001    1983   0.04
0001    1984   0.04
0001    1985   0.05
0001    1986   0.06
;

proc sql;
create table want as
select
    a.gvkey,
    a.fyear,
    a.rds,
    case
        when count(b.rds) >= 3 then mean(b.rds)
        else .
        end as rds5y
from
    have as a left join
    have as b
        on  a.gvkey=b.gvkey and
            b.fyear between a.fyear-5 and a.fyear-1
group by a.gvkey, a.fyear, a.rds;
select * from want;
quit;

kimx0961
Obsidian | Level 7

Hi PGStats,

 

  

Could you please advise the program that just removes the following condition? 

 

when count(b.rds) >= 3 then mean(b.rds)

 

I would like to have the following result.

                                  rds5y

0001    1980   0.05    .
0001    1981   0.02    .
0001    1982   0.03    .
0001    1983   0.04    .
0001    1984   0.04    .
0001    1985   0.05    0.036
0001    1986   0.06    0.036

 

data have;
input gvkey fyear rds;
datalines;
0001    1980   0.05
0001    1981   0.02
0001    1982   0.03
0001    1983   0.04
0001    1984   0.04
0001    1985   0.05
0001    1986   0.06
;

proc sql;
create table want as
select
    a.gvkey,
    a.fyear,
    a.rds,
    case
        when count(b.rds) >= 3 then mean(b.rds)
        else .
        end as rds5y
from
    have as a left join
    have as b
        on  a.gvkey=b.gvkey and
            b.fyear between a.fyear-5 and a.fyear-1
group by a.gvkey, a.fyear, a.rds;
select * from want;
quit;

Reeza
Super User
Its not as easy as that. If you want to count continuous years you may need to use a data step rather than proc sql. If you search in here for rolling stats there's a bunch of methods that use a temporary array that would work

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 1134 views
  • 0 likes
  • 3 in conversation