hi, i have a table in which I need to calculate the max among the previous 30 days for every record , is there a way in hive with sub queries?, see below for sample data
FIELD1 | FIELD2 | FIELD3 | DATE | VAR | MAX VAR |
USA | TX | RTB | 20160930 | 1 | |
USA | TX | RTB | 20161001 | 1 | |
USA | TX | RTB | 20161002 | 1 | |
USA | TX | RTB | 20161003 | 1 | |
USA | TX | RTB | 20161004 | 1 | |
USA | TX | RTB | 20161005 | 2 | |
USA | TX | RTB | 20161006 | 2 | |
USA | TX | RTB | 20161007 | 2 | |
USA | TX | RTB | 20161008 | 2 | |
USA | TX | RTB | 20161009 | 1 | |
USA | TX | RTB | 20161010 | 2 | |
USA | TX | RTB | 20161011 | 2 | |
USA | TX | RTB | 20161012 | 2 | |
USA | TX | RTB | 20161013 | 2 | |
USA | TX | RTB | 20161014 | 2 | |
USA | TX | RTB | 20161015 | 2 | |
USA | TX | RTB | 20161016 | 2 | |
USA | TX | RTB | 20161017 | 2 | |
USA | TX | RTB | 20161018 | 2 | |
USA | TX | RTB | 20161019 | 2 | |
USA | TX | RTB | 20161020 | 2 | |
USA | TX | RTB | 20161021 | 2 | |
USA | TX | RTB | 20161022 | 2 | |
USA | TX | RTB | 20161023 | 2 | |
USA | TX | RTB | 20161024 | 2 | |
USA | TX | RTB | 20161025 | 3 | |
USA | TX | RTB | 20161026 | 2 | |
USA | TX | RTB | 20161027 | 2 | |
USA | TX | RTB | 20161028 | 2 | |
USA | TX | RTB | 20161029 | 2 | |
USA | TX | RTB | 20161030 | 2 | 3 |
USA | TX | RTB | 20161031 | 2 | 3 |
USA | TX | RTB | 20161101 | 2 | 3 |
USA | TX | RTB | 20161102 | 2 | 3 |
USA | TX | RTB | 20161103 | 2 | 3 |
USA | TX | RTB | 20161104 | 2 | 3 |
USA | TX | RTB | 20161105 | 2 | 3 |
USA | TX | RTB | 20161106 | 2 | 3 |
USA | TX | RTB | 20161107 | 2 | 3 |
USA | TX | RTB | 20161108 | 2 | 3 |
USA | TX | RTB | 20161109 | 2 | 3 |
USA | TX | RTB | 20161110 | 2 | 3 |
USA | TX | RTB | 20161111 | 2 | 3 |
USA | TX | RTB | 20161112 | 2 | 3 |
USA | TX | RTB | 20161113 | 2 | 3 |
USA | TX | RTB | 20161114 | 2 | 3 |
USA | TX | RTB | 20161115 | 2 | 3 |
USA | TX | RTB | 20161116 | 2 | 3 |
USA | TX | RTB | 20161117 | 2 | 3 |
USA | TX | RTB | 20161118 | 2 | 3 |
USA | TX | RTB | 20161119 | 2 | 3 |
USA | TX | RTB | 20161120 | 2 | 3 |
USA | TX | RTB | 20161121 | 2 | 3 |
USA | TX | RTB | 20161122 | 2 | 3 |
USA | TX | RTB | 20161123 | 2 | 3 |
USA | TX | RTB | 20161124 | 2 | 3 |
USA | TX | RTB | 20161125 | 2 | 2 |
USA | TX | RTB | 20161126 | 2 | 2 |
USA | TX | RTB | 20161127 | 2 | 2 |
USA | TX | RTB | 20161128 | 2 | 2 |
USA | TX | RTB | 20161129 | 2 | 2 |
UK | LONDON | RTB | 20160930 | 1 | |
UK | LONDON | RTB | 20161001 | 1 | |
UK | LONDON | RTB | 20161002 | 1 | |
UK | LONDON | RTB | 20161003 | 1 | |
UK | LONDON | RTB | 20161004 | 1 | |
UK | LONDON | RTB | 20161005 | 1 | |
UK | LONDON | RTB | 20161006 | 1 | |
UK | LONDON | RTB | 20161007 | 1 | |
UK | LONDON | RTB | 20161008 | 1 | |
UK | LONDON | RTB | 20161009 | 1 | |
UK | LONDON | RTB | 20161010 | 1 | |
UK | LONDON | RTB | 20161011 | 1 | |
UK | LONDON | RTB | 20161012 | 1 | |
UK | LONDON | RTB | 20161013 | 1 | |
UK | LONDON | RTB | 20161014 | 1 | |
UK | LONDON | RTB | 20161015 | 1 | |
UK | LONDON | RTB | 20161016 | 1 | |
UK | LONDON | RTB | 20161017 | 1 | |
UK | LONDON | RTB | 20161018 | 1 | |
UK | LONDON | RTB | 20161019 | 1 | |
UK | LONDON | RTB | 20161020 | 1 | |
UK | LONDON | RTB | 20161021 | 1 | |
UK | LONDON | RTB | 20161022 | 1 | |
UK | LONDON | RTB | 20161023 | 1 | |
UK | LONDON | RTB | 20161024 | 1 | |
UK | LONDON | RTB | 20161025 | 4 | |
UK | LONDON | RTB | 20161026 | 2 | |
UK | LONDON | RTB | 20161027 | 2 | |
UK | LONDON | RTB | 20161028 | 2 | |
UK | LONDON | RTB | 20161029 | 2 | |
UK | LONDON | RTB | 20161030 | 2 | 4 |
UK | LONDON | RTB | 20161031 | 2 | 4 |
UK | LONDON | RTB | 20161101 | 2 | 4 |
UK | LONDON | RTB | 20161102 | 2 | 4 |
UK | LONDON | RTB | 20161103 | 2 | 4 |
UK | LONDON | RTB | 20161104 | 2 | 4 |
UK | LONDON | RTB | 20161105 | 2 | 4 |
UK | LONDON | RTB | 20161106 | 2 | 4 |
UK | LONDON | RTB | 20161107 | 2 | 4 |
UK | LONDON | RTB | 20161108 | 2 | 4 |
UK | LONDON | RTB | 20161109 | 2 | 4 |
UK | LONDON | RTB | 20161110 | 2 | 4 |
UK | LONDON | RTB | 20161111 | 2 | 4 |
UK | LONDON | RTB | 20161112 | 2 | 4 |
UK | LONDON | RTB | 20161113 | 2 | 4 |
UK | LONDON | RTB | 20161114 | 2 | 4 |
UK | LONDON | RTB | 20161115 | 2 | 4 |
UK | LONDON | RTB | 20161116 | 2 | 4 |
UK | LONDON | RTB | 20161117 | 2 | 4 |
UK | LONDON | RTB | 20161118 | 2 | 4 |
UK | LONDON | RTB | 20161119 | 2 | 4 |
UK | LONDON | RTB | 20161120 | 2 | 4 |
UK | LONDON | RTB | 20161121 | 2 | 4 |
UK | LONDON | RTB | 20161122 | 2 | 4 |
UK | LONDON | RTB | 20161123 | 2 | 4 |
UK | LONDON | RTB | 20161124 | 2 | 4 |
UK | LONDON | RTB | 20161125 | 2 | 2 |
UK | LONDON | RTB | 20161126 | 2 | 2 |
UK | LONDON | RTB | 20161127 | 2 | 2 |
UK | LONDON | RTB | 20161128 | 2 | 2 |
UK | LONDON | RTB | 20161129 | 2 | 2 |
Are you grouping by the first three variables? Also, is your date a SAS date, numeric with a date format?
A self join is what's typically used to compute these types of calculations. If you have SAS/ETS licensed, PROC EXPAND would be a straightforward option.
Reeza, thanks for the reply.
yes i am grouping by first three variables and my sasdate is in dateformat. i dont have sas ets, can you guide me in writing the self join.
Proc SQL;
create table want as select a.*, max(b.var) as maxvar
from have as a
join have as b
on a.var1=b.var1
..../*first thee vars go here*/
AND b.date between a.date and a.date+30 /*plus or minus depending on looking forward or back*/
group by a.var1, a.var2, a.var3, a.date, a.var /*all variables from table a*/;
quit;
Is your source table already ordered by date within the field1/field2/field3 groups? You mention "hive" so maybe not. But if it is, a DATA step is often more efficient than PROC SQL:
Starting with the 31st record per group the process gets maximum value of the PRIOR_30 array.
The PRIOR_30 array has these properties:
Untested program:
data want ;
array prior_30{0:29} _temporary_ ;
set have;
by field1 field2 field3 notsorted;
I+1;
if first.field3 then do;
call missing(of prior_30{*});
I=1;
end;
if I>30 then maxvar=max(of prior_30{*});
prior_30{mod(I,30)}=var;
run;
@mkeintz solution Also assumes that you have no gaps in your data so you need a record for each date. There's nothing wrong with this, just something to consider.
I agree on the issue of "holes" in the data series. Here's a modificatioon that accomodates gaps in the date sequence. In effect it acts as if the missing dates were present but VAR1 had a missing value:
data want (drop=_:);
array prior_30{0:29} _temporary_ ;
set have;
by field1 field2 field3 notsorted;
if first.field3 then do;
call missing(of prior_30{*});
_first_date=date;
end;
retain _first_date;
/* See if there are holes between current and preceding dates */
/* If so, assign missing values to corresponding array elements */
_date_increment=dif(date);
if _date_increment>1 then do _d=1 to _date_increment-1;
prior_30{mod(date-_d,30)}=.;
end;
if date>=_first_date+30 then maxvar=max(of prior_30{*});
prior_30{mod(date,30)}=var;
run;
If you don't have big table, try SQL. data have; infile cards truncover expandtabs; input FIELD1 $ FIELD2 $ FIELD3 $ DATE :yymmdd10. VAR; format date yymmdd10.; cards; USA TX RTB 20160930 1 USA TX RTB 20161001 1 USA TX RTB 20161002 1 USA TX RTB 20161003 1 USA TX RTB 20161004 1 USA TX RTB 20161005 2 USA TX RTB 20161006 2 USA TX RTB 20161007 2 USA TX RTB 20161008 2 USA TX RTB 20161009 1 ; run; proc sql; select *,(select max(var) from have where field1=a.field1 and field2=a.field2 and field3=field3 and date between a.date-30 and a.date) as max_val from have as a; quit;
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.