I would like the occurrences of an item sale within 60 seconds by a Geographic location as well as the Total of those items sold.
Note: Item 1 and 2 do have the same DateTime.
Data:
GeoLoc ItemNum DateTime Price
Miami 1 2019-05-24 09:21:55.000 2.00
Miami 2 2019-05-24 09:21:55.000 2.49
Miami 3 2019-05-24 09:22:30.000 .99
Miami 4 2019-05-24 09:28:55.000 .49
Dallas 5 2019-05-24 10:05:13.000 2.50
Dallas 6 2019-05-24 10:08:42.000 4.00
Dallas 7 2019-05-24 10:10:05.000 7.00
Dallas 8 2019-05-24 10:10:56.000 3.99
Dallas 9 2019-05-24 10:11:02.000 .89
Desired Result
GeoLoc DateTime Volume TotalPrice
Miami 2019-05-24 09:21:55.000 1 2.00
Miami 2019-05-24 09:21:55.000 2 4.49
Miami 2019-05-24 09:22:30.000 3 5.48
Miami 2019-05-24 09:23:27.000 2 1.48
Dallas 2019-05-24 09:23:13.000 1 2.50
Dallas 2019-05-24 09:23:42.000 2 6.50
Dallas 2019-05-24 09:26:05.000 1 7.00
Dallas 2019-05-24 09:26:56.000 2 10.99
Dallas 2019-05-24 09:27:02.000 3 11.88
I tried using intnx as well as a slew of other methods, including inner joins and datediffs, but I can't seem to get the results that I'm looking for.
Thank you in advance!
Hi @Red781
If my understanding is correct, the problem can be described as moving a 60-second time window over all sales within a given location, and make a count + a sales summation for the current observation + any previous observations where the sale occured <= 60 seconds before the current sale.
The first idea that springs to mind is to use two arrays to collect DateTime and Price for all sales at a given location, and then process the arrays. It seems to work with your test data, if my understanding is correct, but it is difficult to compare with your wanted output. You have different DateTimes for Dallas in your input and output, so it is not comparable, and I cannot understand why the last Miami-observation has the second-last added, because there is 6:25 minutes gap.
This solution seems to be what you want, but it is scalable only with limitations, because the arrays become very long, and your test data looks like a full data set might be huge. I don't know how it would perform with - say - 10.000 sales pr. location. It might be optimized by keeping the Date variable and include it in the sort and by processing + index count.
So I would suggest another solution that yields a different output. The idea is to drop the seconds, so all sales occuring within the same minute gets the same value, and then use a proc SQL to do the counting/summation.
It will not give the same output, because it doesn't work with a rolling window, so the count/summations will not cover sales occuring in different minutes, even if they are only a few seconds apart, but it is fully scalable and easier to analyze.
Below test input from your post and the two different solutions.
* Get input;
data have;
input GeoLoc$ ItemNum Date yymmdd10. Time time9. @34 Price ;
format Date Date9. Time time8. Price 8.2;
cards;
Miami 1 2019-05-24 09:21:55.001 2.00
Miami 2 2019-05-24 09:21:55.000 2.49
Miami 3 2019-05-24 09:22:30.000 .99
Miami 4 2019-05-24 09:28:55.000 .49
Dallas 5 2019-05-24 10:05:13.000 2.50
Dallas 6 2019-05-24 10:08:42.000 4.00
Dallas 7 2019-05-24 10:10:05.000 7.00
Dallas 8 2019-05-24 10:10:56.000 3.99
Dallas 9 2019-05-24 10:11:02.000 .89
;
run;
/* First solution - rolling 60-second time window - scalable with limitations */
* Transform to date-time value;
data tmp1 (drop=Date Time h m s);
set have;
format DateTime datetime26.3;
h = hour(time);
m = minute(time);
s = second(time);
DateTime = dhms(date,hour(time),minute(time),second(time));
run;
* sort;
proc sort data=tmp1; by GeoLoc DateTime;
run;
* find max. array index;
proc sql noprint;
select max(maxi) into :maxi
from (
select count(*) as maxi from tmp1
group by GeoLoc
);
quit;
%put &=maxi;
* counting and summing;
data want1 (keep=GeoLoc DateTime Volume TotalPrice);
set tmp1 (rename=(DateTime=wdt));
by GeoLoc;
retain c;
format wdt DateTime datetime22. TotalPrice 12.2;
array ts {%eval(&maxi)} _temporary_;
array pr {%eval(&maxi)} _temporary_;
if first.GeoLoc then do;
c = 0;
do i = 1 to dim(ts); ts{i} = .; end;
do i = 1 to dim(pr); pr{i} = .; end;
end;
c + 1;
ts{c} = wdt;
pr{c} = Price;
if last.GeoLoc then do i = 1 to c;
DateTime = ts{i};
Volume = 1;
TotalPrice = pr{i};
do j = i-1 to 1 by -1;
if (DateTime - ts{j}) <= 60 then do;
Volume = Volume + 1;
TotalPrice = TotalPrice + pr{j};
end;
else leave;
end;
output;
end;
run;
/* second solution - per minute-count/summation - fully scalable */
* transform to datetime value - minutes only;
data tmp2 (drop=Date Time h m);
set have;
format DateTime datetime26.3;
h = hour(time);
m = minute(time);
s = second(time);
DateTime = dhms(date,hour(time),minute(time),0);
run;
* counting and summing;
proc sql;
create table want2 as
select distinct
GeoLoc,
DateTime format datetime18.,
count(*) as Volume,
sum(Price) as TotalPrice format=10.2
from tmp2
group by
GeoLoc,
DateTime;
quit;
First, note that you sample input doesn't match your sample output in terms of Datetime; I assume that the figures shown in the output are correct. Second, you need some kind of memory table to store your datetime and price values for backtracking while applying your datetime range criteria. Below, I've chosen a hash table; but it can be an array as well.
data have ;
input geoloc $ itemnum datetime :anydtdtm19. price ;
format datetime e8601dt. ;
cards;
Miami 1 2019-05-24:09:21:55.000 2.00
Miami 2 2019-05-24:09:21:55.000 2.49
Miami 3 2019-05-24:09:22:30.000 0.99
Miami 4 2019-05-24:09:23:27.000 0.49
Dallas 5 2019-05-24:09:23:13.000 2.50
Dallas 6 2019-05-24:09:23:42.000 4.00
Dallas 7 2019-05-24:09:26:05.000 7.00
Dallas 8 2019-05-24:09:26:56.000 3.99
Dallas 9 2019-05-24:09:27:02.000 0.89
;
run ;
data v (drop = _:) / view = v ;
if _n_ = 1 then do ;
dcl hash h (dataset:"have(rename=(datetime=_d price=_p))", ordered:"d") ;
h.definekey ("geoloc", "itemnum") ;
h.definedata ("geoloc", "itemnum", "_d", "_p") ;
h.definedone () ;
dcl hiter hi ("h") ;
call missing (_d, _p) ;
end ;
set have ;
hi.setcur() ;
_g = geoloc ;
do totalprice = _p by 0 while (hi.next() = 0 and _g = geoloc and datetime < _d + 60 ) ;
totalprice + _p ;
end ;
run ;
proc rank data = v out = want ;
var totalprice ;
ranks Volume ;
run ;
Kind regards
Paul D.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.