BookmarkSubscribeRSS Feed
Red781
Calcite | Level 5

I would like the occurrences of an item sale within 60 seconds by a Geographic location as well as the Total of those items sold.

Note:  Item 1 and 2 do have the same DateTime.

 

Data:

GeoLoc          ItemNum             DateTime                                  Price

Miami             1                         2019-05-24 09:21:55.000          2.00 

Miami             2                         2019-05-24 09:21:55.000          2.49

Miami             3                         2019-05-24 09:22:30.000             .99 

Miami             4                         2019-05-24 09:28:55.000             .49

Dallas             5                        2019-05-24 10:05:13.000           2.50

Dallas             6                        2019-05-24 10:08:42.000           4.00

Dallas             7                        2019-05-24 10:10:05.000           7.00

Dallas             8                        2019-05-24 10:10:56.000           3.99

Dallas             9                        2019-05-24 10:11:02.000             .89

 

Desired Result

GeoLoc            DateTime                                  Volume             TotalPrice

Miami               2019-05-24 09:21:55.000            1                          2.00

Miami               2019-05-24 09:21:55.000            2                          4.49

Miami               2019-05-24 09:22:30.000            3                          5.48

Miami               2019-05-24 09:23:27.000            2                          1.48

Dallas              2019-05-24 09:23:13.000            1                           2.50

Dallas              2019-05-24 09:23:42.000            2                           6.50

Dallas              2019-05-24 09:26:05.000            1                           7.00

Dallas              2019-05-24 09:26:56.000            2                           10.99

Dallas              2019-05-24 09:27:02.000            3                           11.88

 

I tried using intnx as well as a slew of other methods, including inner joins and datediffs, but I can't seem to get the results that I'm looking for.

 

Thank you in advance!

2 REPLIES 2
ErikLund_Jensen
Rhodochrosite | Level 12

Hi @Red781 

 

If my understanding is correct, the problem can be described as moving a 60-second time window over all sales within a given location, and make a count + a sales summation for the current observation + any previous observations where the sale occured <= 60 seconds before the current sale.

 

The first idea that springs to mind is to use two arrays to collect DateTime and Price for all sales at a given location, and then process the arrays. It seems to work with your test data, if my understanding is correct, but it is difficult to compare with your wanted output. You have different DateTimes for Dallas in your input and output, so it is not comparable, and I cannot understand why the last Miami-observation has the second-last added, because there is 6:25 minutes gap.

 

This solution seems to be what you want, but it is scalable only with limitations, because the arrays become very long, and your test data looks like a full data set might be huge. I don't know how it would perform with - say - 10.000 sales pr. location. It might be optimized by keeping the Date variable and include it in the sort and by processing + index count.

 

So I would suggest another solution that yields a different output. The idea is to drop the seconds, so all sales occuring within the same minute gets the same value, and then use a proc SQL to do the counting/summation.

 

It will not give the same output, because it doesn't work with a rolling window, so the count/summations will not cover sales occuring in different minutes, even if they are only a few seconds apart, but it is fully scalable and easier to analyze.

 

Below test input from your post and the two different solutions.

 

* Get input;
data have;
	input GeoLoc$ ItemNum Date yymmdd10. Time time9. @34 Price ;
  	format Date Date9. Time time8. Price 8.2;
cards;
Miami 1 2019-05-24 09:21:55.001  2.00 
Miami 2 2019-05-24 09:21:55.000  2.49
Miami 3 2019-05-24 09:22:30.000  .99 
Miami 4 2019-05-24 09:28:55.000  .49
Dallas 5 2019-05-24 10:05:13.000 2.50
Dallas 6 2019-05-24 10:08:42.000 4.00
Dallas 7 2019-05-24 10:10:05.000 7.00
Dallas 8 2019-05-24 10:10:56.000 3.99
Dallas 9 2019-05-24 10:11:02.000 .89
;
run;

/* First solution - rolling 60-second time window - scalable with limitations */
* Transform to date-time value;
data tmp1 (drop=Date Time h m s);
	set have;
  	format DateTime datetime26.3;
	h = hour(time);
	m = minute(time);
	s = second(time);
	DateTime = dhms(date,hour(time),minute(time),second(time));
run;

* sort;
proc sort data=tmp1; by GeoLoc DateTime;
run;

* find max. array index;
proc sql noprint;
	select max(maxi) into :maxi 
	from (
	 	select count(*) as maxi from tmp1
		group by GeoLoc
	);
quit;
%put &=maxi;

* counting and summing;
data want1 (keep=GeoLoc DateTime Volume TotalPrice); 
	set tmp1 (rename=(DateTime=wdt));
	by GeoLoc;
	retain c;
	format wdt DateTime datetime22. TotalPrice 12.2;
	array ts {%eval(&maxi)} _temporary_;
	array pr {%eval(&maxi)} _temporary_;

	if first.GeoLoc then do;
		c = 0;
		do i = 1 to dim(ts); ts{i} = .; end;
		do i = 1 to dim(pr); pr{i} = .; end;
	end;

	c + 1;
	ts{c} = wdt;
	pr{c} = Price;
	if last.GeoLoc then do i = 1 to c;
		DateTime = ts{i};
		Volume = 1;
		TotalPrice = pr{i};
		do j = i-1 to 1 by -1;
			if (DateTime - ts{j}) <= 60 then do;
				Volume = Volume + 1;
				TotalPrice = TotalPrice + pr{j};
			end;
			else leave;
		end;
		output;
	end;
run;

/* second solution - per minute-count/summation - fully scalable */

* transform to datetime value - minutes only;
data tmp2 (drop=Date Time h m);
	set have;
  	format DateTime datetime26.3;
	h = hour(time);
	m = minute(time);
	s = second(time);
	DateTime = dhms(date,hour(time),minute(time),0);
run;

* counting and summing;
proc sql;
	create table want2 as 
		select distinct
			GeoLoc,
			DateTime format datetime18.,
			count(*) as Volume,
			sum(Price) as TotalPrice format=10.2
		from tmp2 
		group by
			GeoLoc,
			DateTime;
quit;

 

hashman
Ammonite | Level 13

@Red781:

First, note that you sample input doesn't match your sample output in terms of Datetime; I assume that the figures shown in the output are correct. Second, you need some kind of memory table to store your datetime and price values for backtracking while applying your datetime range criteria. Below, I've chosen a hash table; but it can be an array as well.

data have ;                                                                                                                             
  input geoloc $ itemnum datetime :anydtdtm19. price ;                                                                                  
  format datetime e8601dt. ;                                                                                                            
cards;                                                                                                                                  
Miami   1  2019-05-24:09:21:55.000  2.00                                                                                                
Miami   2  2019-05-24:09:21:55.000  2.49                                                                                                
Miami   3  2019-05-24:09:22:30.000  0.99                                                                                                
Miami   4  2019-05-24:09:23:27.000  0.49                                                                                                
Dallas  5  2019-05-24:09:23:13.000  2.50                                                                                                
Dallas  6  2019-05-24:09:23:42.000  4.00                                                                                                
Dallas  7  2019-05-24:09:26:05.000  7.00                                                                                                
Dallas  8  2019-05-24:09:26:56.000  3.99                                                                                                
Dallas  9  2019-05-24:09:27:02.000  0.89                                                                                                
;                                                                                                                                       
run ;                                                                                                                                   
                                                                                                                                        
data v (drop = _:) / view = v ;                                                                                                         
  if _n_ = 1 then do ;                                                                                                                  
    dcl hash h (dataset:"have(rename=(datetime=_d price=_p))", ordered:"d") ;                                                           
    h.definekey  ("geoloc", "itemnum") ;                                                                                                
    h.definedata ("geoloc", "itemnum", "_d", "_p") ;                                                                                    
    h.definedone () ;                                                                                                                   
    dcl hiter hi ("h") ;                                                                                                                
    call missing (_d, _p) ;                                                                                                             
  end ;                                                                                                                                 
  set have ;                                                                                                                            
  hi.setcur() ;                                                                                                                         
  _g = geoloc ;                                                                                                                         
  do totalprice = _p by 0 while (hi.next() = 0 and _g = geoloc and datetime < _d + 60 ) ;                                               
    totalprice + _p ;                                                                                                                   
  end ;                                                                                                                                 
run ;                                                                                                                                   
                                                                                                                                        
proc rank data = v out = want ;                                                                                                         
  var totalprice ;                                                                                                                      
  ranks Volume ;                                                                                                                        
run ;                             

Kind regards

Paul D. 

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 2 replies
  • 504 views
  • 0 likes
  • 3 in conversation