hi, i have a table in which I need to calculate the max among the previous 30 days for every record , is there a way in hive with sub queries?, see below for sample data
FIELD1 | FIELD2 | FIELD3 | DATE | VAR | MAX VAR |
USA | TX | RTB | 20160930 | 1 | |
USA | TX | RTB | 20161001 | 1 | |
USA | TX | RTB | 20161002 | 1 | |
USA | TX | RTB | 20161003 | 1 | |
USA | TX | RTB | 20161004 | 1 | |
USA | TX | RTB | 20161005 | 2 | |
USA | TX | RTB | 20161006 | 2 | |
USA | TX | RTB | 20161007 | 2 | |
USA | TX | RTB | 20161008 | 2 | |
USA | TX | RTB | 20161009 | 1 | |
USA | TX | RTB | 20161010 | 2 | |
USA | TX | RTB | 20161011 | 2 | |
USA | TX | RTB | 20161012 | 2 | |
USA | TX | RTB | 20161013 | 2 | |
USA | TX | RTB | 20161014 | 2 | |
USA | TX | RTB | 20161015 | 2 | |
USA | TX | RTB | 20161016 | 2 | |
USA | TX | RTB | 20161017 | 2 | |
USA | TX | RTB | 20161018 | 2 | |
USA | TX | RTB | 20161019 | 2 | |
USA | TX | RTB | 20161020 | 2 | |
USA | TX | RTB | 20161021 | 2 | |
USA | TX | RTB | 20161022 | 2 | |
USA | TX | RTB | 20161023 | 2 | |
USA | TX | RTB | 20161024 | 2 | |
USA | TX | RTB | 20161025 | 3 | |
USA | TX | RTB | 20161026 | 2 | |
USA | TX | RTB | 20161027 | 2 | |
USA | TX | RTB | 20161028 | 2 | |
USA | TX | RTB | 20161029 | 2 | |
USA | TX | RTB | 20161030 | 2 | 3 |
USA | TX | RTB | 20161031 | 2 | 3 |
USA | TX | RTB | 20161101 | 2 | 3 |
USA | TX | RTB | 20161102 | 2 | 3 |
USA | TX | RTB | 20161103 | 2 | 3 |
USA | TX | RTB | 20161104 | 2 | 3 |
USA | TX | RTB | 20161105 | 2 | 3 |
USA | TX | RTB | 20161106 | 2 | 3 |
USA | TX | RTB | 20161107 | 2 | 3 |
USA | TX | RTB | 20161108 | 2 | 3 |
USA | TX | RTB | 20161109 | 2 | 3 |
USA | TX | RTB | 20161110 | 2 | 3 |
USA | TX | RTB | 20161111 | 2 | 3 |
USA | TX | RTB | 20161112 | 2 | 3 |
USA | TX | RTB | 20161113 | 2 | 3 |
USA | TX | RTB | 20161114 | 2 | 3 |
USA | TX | RTB | 20161115 | 2 | 3 |
USA | TX | RTB | 20161116 | 2 | 3 |
USA | TX | RTB | 20161117 | 2 | 3 |
USA | TX | RTB | 20161118 | 2 | 3 |
USA | TX | RTB | 20161119 | 2 | 3 |
USA | TX | RTB | 20161120 | 2 | 3 |
USA | TX | RTB | 20161121 | 2 | 3 |
USA | TX | RTB | 20161122 | 2 | 3 |
USA | TX | RTB | 20161123 | 2 | 3 |
USA | TX | RTB | 20161124 | 2 | 3 |
USA | TX | RTB | 20161125 | 2 | 2 |
USA | TX | RTB | 20161126 | 2 | 2 |
USA | TX | RTB | 20161127 | 2 | 2 |
USA | TX | RTB | 20161128 | 2 | 2 |
USA | TX | RTB | 20161129 | 2 | 2 |
UK | LONDON | RTB | 20160930 | 1 | |
UK | LONDON | RTB | 20161001 | 1 | |
UK | LONDON | RTB | 20161002 | 1 | |
UK | LONDON | RTB | 20161003 | 1 | |
UK | LONDON | RTB | 20161004 | 1 | |
UK | LONDON | RTB | 20161005 | 1 | |
UK | LONDON | RTB | 20161006 | 1 | |
UK | LONDON | RTB | 20161007 | 1 | |
UK | LONDON | RTB | 20161008 | 1 | |
UK | LONDON | RTB | 20161009 | 1 | |
UK | LONDON | RTB | 20161010 | 1 | |
UK | LONDON | RTB | 20161011 | 1 | |
UK | LONDON | RTB | 20161012 | 1 | |
UK | LONDON | RTB | 20161013 | 1 | |
UK | LONDON | RTB | 20161014 | 1 | |
UK | LONDON | RTB | 20161015 | 1 | |
UK | LONDON | RTB | 20161016 | 1 | |
UK | LONDON | RTB | 20161017 | 1 | |
UK | LONDON | RTB | 20161018 | 1 | |
UK | LONDON | RTB | 20161019 | 1 | |
UK | LONDON | RTB | 20161020 | 1 | |
UK | LONDON | RTB | 20161021 | 1 | |
UK | LONDON | RTB | 20161022 | 1 | |
UK | LONDON | RTB | 20161023 | 1 | |
UK | LONDON | RTB | 20161024 | 1 | |
UK | LONDON | RTB | 20161025 | 4 | |
UK | LONDON | RTB | 20161026 | 2 | |
UK | LONDON | RTB | 20161027 | 2 | |
UK | LONDON | RTB | 20161028 | 2 | |
UK | LONDON | RTB | 20161029 | 2 | |
UK | LONDON | RTB | 20161030 | 2 | 4 |
UK | LONDON | RTB | 20161031 | 2 | 4 |
UK | LONDON | RTB | 20161101 | 2 | 4 |
UK | LONDON | RTB | 20161102 | 2 | 4 |
UK | LONDON | RTB | 20161103 | 2 | 4 |
UK | LONDON | RTB | 20161104 | 2 | 4 |
UK | LONDON | RTB | 20161105 | 2 | 4 |
UK | LONDON | RTB | 20161106 | 2 | 4 |
UK | LONDON | RTB | 20161107 | 2 | 4 |
UK | LONDON | RTB | 20161108 | 2 | 4 |
UK | LONDON | RTB | 20161109 | 2 | 4 |
UK | LONDON | RTB | 20161110 | 2 | 4 |
UK | LONDON | RTB | 20161111 | 2 | 4 |
UK | LONDON | RTB | 20161112 | 2 | 4 |
UK | LONDON | RTB | 20161113 | 2 | 4 |
UK | LONDON | RTB | 20161114 | 2 | 4 |
UK | LONDON | RTB | 20161115 | 2 | 4 |
UK | LONDON | RTB | 20161116 | 2 | 4 |
UK | LONDON | RTB | 20161117 | 2 | 4 |
UK | LONDON | RTB | 20161118 | 2 | 4 |
UK | LONDON | RTB | 20161119 | 2 | 4 |
UK | LONDON | RTB | 20161120 | 2 | 4 |
UK | LONDON | RTB | 20161121 | 2 | 4 |
UK | LONDON | RTB | 20161122 | 2 | 4 |
UK | LONDON | RTB | 20161123 | 2 | 4 |
UK | LONDON | RTB | 20161124 | 2 | 4 |
UK | LONDON | RTB | 20161125 | 2 | 2 |
UK | LONDON | RTB | 20161126 | 2 | 2 |
UK | LONDON | RTB | 20161127 | 2 | 2 |
UK | LONDON | RTB | 20161128 | 2 | 2 |
UK | LONDON | RTB | 20161129 | 2 | 2 |
Are you grouping by the first three variables? Also, is your date a SAS date, numeric with a date format?
A self join is what's typically used to compute these types of calculations. If you have SAS/ETS licensed, PROC EXPAND would be a straightforward option.
Reeza, thanks for the reply.
yes i am grouping by first three variables and my sasdate is in dateformat. i dont have sas ets, can you guide me in writing the self join.
Proc SQL;
create table want as select a.*, max(b.var) as maxvar
from have as a
join have as b
on a.var1=b.var1
..../*first thee vars go here*/
AND b.date between a.date and a.date+30 /*plus or minus depending on looking forward or back*/
group by a.var1, a.var2, a.var3, a.date, a.var /*all variables from table a*/;
quit;
Is your source table already ordered by date within the field1/field2/field3 groups? You mention "hive" so maybe not. But if it is, a DATA step is often more efficient than PROC SQL:
Starting with the 31st record per group the process gets maximum value of the PRIOR_30 array.
The PRIOR_30 array has these properties:
Untested program:
data want ;
array prior_30{0:29} _temporary_ ;
set have;
by field1 field2 field3 notsorted;
I+1;
if first.field3 then do;
call missing(of prior_30{*});
I=1;
end;
if I>30 then maxvar=max(of prior_30{*});
prior_30{mod(I,30)}=var;
run;
@mkeintz solution Also assumes that you have no gaps in your data so you need a record for each date. There's nothing wrong with this, just something to consider.
I agree on the issue of "holes" in the data series. Here's a modificatioon that accomodates gaps in the date sequence. In effect it acts as if the missing dates were present but VAR1 had a missing value:
data want (drop=_:);
array prior_30{0:29} _temporary_ ;
set have;
by field1 field2 field3 notsorted;
if first.field3 then do;
call missing(of prior_30{*});
_first_date=date;
end;
retain _first_date;
/* See if there are holes between current and preceding dates */
/* If so, assign missing values to corresponding array elements */
_date_increment=dif(date);
if _date_increment>1 then do _d=1 to _date_increment-1;
prior_30{mod(date-_d,30)}=.;
end;
if date>=_first_date+30 then maxvar=max(of prior_30{*});
prior_30{mod(date,30)}=var;
run;
If you don't have big table, try SQL. data have; infile cards truncover expandtabs; input FIELD1 $ FIELD2 $ FIELD3 $ DATE :yymmdd10. VAR; format date yymmdd10.; cards; USA TX RTB 20160930 1 USA TX RTB 20161001 1 USA TX RTB 20161002 1 USA TX RTB 20161003 1 USA TX RTB 20161004 1 USA TX RTB 20161005 2 USA TX RTB 20161006 2 USA TX RTB 20161007 2 USA TX RTB 20161008 2 USA TX RTB 20161009 1 ; run; proc sql; select *,(select max(var) from have where field1=a.field1 and field2=a.field2 and field3=field3 and date between a.date-30 and a.date) as max_val from have as a; quit;
Available on demand!
Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.