BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
logann2
Fluorite | Level 6

The relevant column in my dataset are "gvkey" (a group id), datadate (ex. YYYYMMDD = 20100531), and revt.

 

The observation frequency is annual.

I want to to construct a rolling max of "revt" for the PREVIOUS 4 years for some "gvkey".

I don't want to hard code a window size of 5 because this would not handle missing values appropriately.

 

I'm very new the SAS so I'm not sure how this should be handled but I think SQL is the way to go.

 

For sake of clarity, I need something that's a little like

```data['5yr_revt_max'= data.groupby("gvkey")['revt'].shift(1).rolling(5).max()```

Except this doesn't account for the years and is in Python

1 ACCEPTED SOLUTION

Accepted Solutions
PGStats
Opal | Level 21

Then, assuming there are never more than one revt value per year for a given gvkey, you would want something like this:

 

proc sql;
create table want as
select
	a.gvkey,
	a.datadate,
	max(b.revt) as maxRevt
from
	have as a inner join
	have as b on a.gvkey=b.gvkey and 
		b.datadate between intnx("YEAR", a.datadate, -4) and a.datadate
group by a.gvkey,	a.datadate;
quit;

(untested)

PG

View solution in original post

12 REPLIES 12
PGStats
Opal | Level 21

So how do you want to handle missing years for a given gvkey? Return a missing max or the max of the non-missing years?

PG
logann2
Fluorite | Level 6

max from the set of remaining valid years. does that make sense?

PGStats
Opal | Level 21

Then, assuming there are never more than one revt value per year for a given gvkey, you would want something like this:

 

proc sql;
create table want as
select
	a.gvkey,
	a.datadate,
	max(b.revt) as maxRevt
from
	have as a inner join
	have as b on a.gvkey=b.gvkey and 
		b.datadate between intnx("YEAR", a.datadate, -4) and a.datadate
group by a.gvkey,	a.datadate;
quit;

(untested)

PG
logann2
Fluorite | Level 6

So I'd like to only look at the previous 5 years and exclude the current observation.

 

I thought that something like this might work 

 

```

b.datadate between intnx("YEAR", a.datadate, -4) and intnx("YEAR", a.datadate, -1)

````

 

^^^ All I did was modify a.datadate to intnx("YEAR", a.datadate, -1) which it doesn't like... 

mkeintz
PROC Star

What do you mean by "doesn't like".  Does SAS object, or do you get unexpected results.  If the latter (which I suspect), provide an example of unexpected results vs the data from which they were derived.

 

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
logann2
Fluorite | Level 6
My table ends up empty, maybe because current date can never equal yesterday's date?
mkeintz
PROC Star

Be aware that the expression

b.datadate between intnx("YEAR", a.datadate, -4) and intnx("YEAR", a.datadate, -1)

generates time limits of 01jan 4 calendar years before and 01jan of last year.  INTNX has to be told to align things, otherwise it defaults to the start of the resulting calendar year.   This pushes back the upper limit of your time range.  So to use this properly, you should use the "S" (for "same") parameter, as in:

 

b.datadate between intnx("YEAR", a.datadate, -4,'s') and intnx("YEAR", a.datadate, -1,'s')

Of course this changes the lower limit of your time range too, but maybe it didn't have any impact.

 

FINALLY, just don't bother with using INTNX for the upper range limit.  Just use

b.datadate between intnx("YEAR", a.datadate, -4) and datadate-1;

which will produce the same results (as long as you don't have two records in less than 365 (or 366) days), and will avoid unnecessary use of the INTNX function.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
logann2
Fluorite | Level 6

The last option is the only one that worked. The ones with the `ntnx` command produced 0 observation tables. Why would that be?

 

Also for the last one, how does it know to subtract from the years? As far as I understand it views 20100531 as a big integer

Proc sql;
	create table final as
	select
		a.gvkey,
		a.datadate,
		max(b.revt) as maxRevt
	from
		data as a inner join
		data as b on a.gvkey=b.gvkey and
			b.datadate between intnx("YEAR", a.datadate, -4) and a.datadate-1
	group by a.gvkey, a.datadate;
quit;

 

PGStats
Opal | Level 21

It should return something... The problem may be elsewhere, please post more of your code.

PG
logann2
Fluorite | Level 6
Proc sql;
	create table final as
	select
		a.gvkey,
		a.datadate,
		max(b.revt) as maxRevt
	from
		data as a inner join
		data as b on a.gvkey=b.gvkey and
			b.datadate between intnx("YEAR", a.datadate, -4) and intnx("YEAR", a.datadate, -1)
	group by a.gvkey, a.datadate;
quit;

Figured out how to insert code haha.

PGStats
Opal | Level 21

If you want to avoid the intricacies of intnx and intck functions, you could simply do:

 

Proc sql;
	create table final as
	select
		a.gvkey,
		a.datadate,
		max(b.revt) as maxRevt
	from
		data as a inner join
		data as b on a.gvkey=b.gvkey and
			year(b.datadate) between year( a.datadate) -4 and year( a.datadate) -1
	group by a.gvkey, a.datadate;
quit;
PG
mkeintz
PROC Star

Here's a data step solution that assumes your data are sorted by gvkey/datadate (no data, untested):

 

data want (drop=_:);
  set have;
  by gvkey;

  array rev_array {0:4} _temporary_;  /* for 0mod5 through 4mod5*/

  cur_yyyy=year(datadate);

  if first.gvkey then call missing(of rev_array{*});

  /* For any missing year (and current year), set rev_array element to missing */
  do _yyyy=lag(_cur_yyyy)+1 to _cur_yyyy;
    if first.gvkey=0 then  rev_array{mod(_yyyy-1,5}}=.;
  end;

  max_revt_prior4=max(of rev_array{*});

  rev_array{mod(_cur_yyyy-1,5}=revt;

run;

The idea here is to:

  1. Find any holes following the year of the previous record.  Set the correspond array elements in rev_array (and also the array element for the CURRENT record) to missing.

  2. Get the max of the array, which will be the max of the non-missing values among the preceding 4 years.

  3. After the max has been established, put the current year REVT into the array, making it available for the next 4 years.

 

Note the array rev_array is indexed from 0 through 4, to conform to the result of the mod(_yyyy,5) function.

 

Also heed the warning of @PGStats .  If a company changes its fiscal year, it is possible to have two fiscal year reports (one of these "annual" reports won't cover 12 actual months) in the same calendar year.  You will have two identical maximums for that calendar year.  And all subsequent records will have a maximum ignoring the earlier datadate of the two.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 12 replies
  • 1081 views
  • 2 likes
  • 3 in conversation