BookmarkSubscribeRSS Feed
ChrisNZ
Tourmaline | Level 20
Using this method, you'll read the 100 million row table over and over, once for each subset. That's wasteful.
If for some reason you do want to do this, I'd just use the macro loop to create a where clause to be applied when using the data. At least you save creating an intermediate table.
But for large tables, a one time read is to be preferred imho.
DonH
Lapis Lazuli | Level 10

Nope. The code I posted used a WHERE clause and I explicitly said the DATA step was simply a surrogate for the code to process that data - the implication being that the WHERE clause should be used in either the hash or your proposed SQL approach. And that point was reinforced by the inclusion of a sample PROC APPEND step to create the cumulative result set.

I was not suggesting that a local copy be made.

ChrisNZ
Tourmaline | Level 20
Good. We're on the same page then. Just clarifying in case someone is tempted to create subset tables. 🙂
mkeintz
PROC Star

@DonH, @ChrisNZ :

 

Yes, the where clause prevents the data step from seeing the entire dataset multiple times - a non-trivial savings.  But it still exports the burden of reviewing the entire data multiple times to the input engine.

 

If grouping is needed, then I suggest taking advantage of the pre-sorted order by DATE/CUSTOMERID to create date groups, which can take advantage of pre-setting the FIRSTOBS= parameter to alleviate the need for filtering by the data engine.

 

But unlike partitioning by account, which are simple disjoint subsets, partitioning by date ranges requires some overlap.

 

For example if your data ranges from 01jan2018 through 31dec2019, you could generate windows which terminate in the following four date ranges - where "[" and ")" mean closed and open end-points, respectively:

  • [01jan2018,01jul2018)
  • [01jul2018,01jan2019)
  • [01jan2019,01jul2019)
  • [01jul2019,01jan2020)

The problem is that while the first group is easy (just start at observation 1 and stop when 01jul2018 is reached), the rest aren't.  The second group, which "starts" at 01jul2018, actually requires data reading to start one day earlier at 31jun2018.  It's required because any 24 hour window ending in 01jul2018 needs data for 31jun2018.

 

The upshot is that you need 24 hour overlaps between consecutive groups.

 

This can be done with the firstobs= parameter.  For the first group firsobs=1.  For the second date group, firstobs is the observation number of the earliest record for 31jun2018 - which fortunately can be determined when processing the first group.  Each group can specify the needed firstobs for the next group.

 

Here's a construct that is about as efficient in data input processing as possible.  A few twenty-four hour intervals will be read twice, but almost all of the data is read only once.  And all without filtering by the data engine:

 

%macro process_window_groups;
  %let upper_limit_date_list  =   01jul2018 01jan2019 01jul2019 01jan2020 ;
  %let window_length=24:00:00 ;
  
  %do i=1 %to %sysfunc(countw(&upper_limit_date_list));
    %let upper_limit_datetime = %scan(&upper_limit_date_list,&i):00:00:00  ;

    %let stale_datetime = %eval(%sysevalf("&upper_limit_datetime"dt)-%sysevalf("&window_length"t);
    %let stale_datetime = %sysfunc(putn(&stale_datetime,datetime20.));

    %if &i=1 %then %do;  /* Set initial values for subsetting the data */
      %let fobs=1;
      %let prior_upper_limit_datetime=15oct1582:00:00:00 ;
    %end;

	%put For Group &i., &=upper_limit_datetime &=stale_datetime &=fobs &=prior_upper_limit_datetime;

    data want&i (drop=_:);
      set have (firstobs=&fobs);
      by date ;
      retain _nextfobs . /*firstobs to be used for the next group */

	  /***************************************************/
	  /*                                                 */
	  /*    Code for generating windows results here     */
	  /*                                                 */
	  /***************************************************/

      if date>="&stale_datetime"dt then do;
        if _next_fobs=. then do;    /* Write macrovar FOBS for next group */
          _next_fobs= _n_ + &fobs;  
          call symputx('fobs',cats(_next_fobs));
        end;
        if date>="&upper_limit_datetime"dt then stop;
      end;
      if date>="&prior_upper_limit_datetime"dt then output;
    run;

    %let prior_upper_limit_datetime=&upper_limit_datetime;
  %end;
%mend;

%process_window_groups;

I didn't replicate the actual code for producing windows - but you can see where it would go in this program.

 

@Array_Mover .  If interested, I suggest you test by processing the first couple weeks of data in your dataset - by specifying 3 or 4 dates in that period in the UPPER_LIMIT_DATE_LIST macrovar.  Just make sure the dates are more than 1 day apart. 

 

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Array_Mover
Obsidian | Level 7

Yes, thank you for the suggestion.  Cards are binned based on region, so I am working through this logic taking that into account.

Meaning, 8432100000000000 <= Account < 8432200000000000 will have no cards at all because that bin doesnt exist.

 

Since bins are known, I am creating an array of bins to cycle through and use the macro to create subsets of each bin, instead of every possible number.

 

Ive been working hard processing everyone's answers and appreciate all of the insight and thoughts.  I will choose a solution once I am satisfied with my outcome.  I am humbled that you would take the time to help me with my question.

Array_Mover
Obsidian | Level 7

Ignoring that I do not know what a SPDE library is, and google hasn't helped the information enter my dense skull, there is one issue I am having trouble doing with your code.  (I am using SAS Enterprise Guide, by the way)

 

The card numbers we are using are all 16 digits and (Im going to use a random digit) are between 8000000000000000 and 9000000000000000.

 

I have tried using your code to loop through I= 8 to 9 by 0.05 and have the number resolve to e15

(Meaning 8.05e15 or 8050000000000000)

But that screwed up the nice naming convention of the tables you organized.

 

Next, I tried to code the 16-digit numbers in the cat statement:

 

data _null_;
  call execute('data ');
  do I=800 to 900;
   call execute (cat('TMP',I));
  end;
  do I=800 to 900;
    call execute(ifc(I=1,';set My_Data_Set;','else '));
    call execute (cat('if ',I-1,'0000000000000 <= Account < ',I,'0000000000000 then output TMP',I,';')); 
 end; 
run;  

But I get an error on the numbers: ERROR 22-322: Syntax error, expecting one of the following: a name, a quoted string, (, /, ;, _DATA_, _LAST_, _NULL_. 

 

If I use double quotes to put single quotes around the numbers, I get another error.

I am thinking it has something to do with the call execute statement and the order it processes items in quotes.

ChrisNZ
Tourmaline | Level 20

The culprit is you test for I=1, which of course never happens.

data _null_;
  call execute('data ');
  do I=800 to 900 by 10;
   call execute (cat('TMP',I));
  end;
  do I=800 to 900 by 10;
    call execute(ifc(I=800,';set My_Data_Set;','else '));
    call execute (cat('if "',I-1,'0000000000000" <= Account < "',I,'0000000000000" then output TMP',I,';')); 
 end; 
 call execute ('run;'); 
run; 

 

ChrisNZ
Tourmaline | Level 20

The full run:

Spoiler

29 data W.HAVE;
30 VALUE=1;
31 do DATE='01jan2020:0:0'dt to '31aug2020:23:59:59'dt by '06:00:00't;
32 do ACCT =1 to 2e6;
33 if ranuni(1) > .9 then output;
34 end;
35 end;
36 run;

NOTE: The data set W.HAVE has 195196814 observations and 3 variables.
NOTE: Compressing data set W.HAVE decreased size by 81.74 percent.
NOTE: DATA statement used (Total process time):
real time 1:22.70
user cpu time 1:03.18
system cpu time 1.04 seconds
memory 8538.03k
OS Memory 29552.00k
Timestamp 15/09/2020 02:41:39 PM
Step Count 54 Switch Count 4339

37
38 data _null_;
39 call execute('data ');
40 do I=1 to 20;
41 call execute (cat('W.TMP',I));
42 end;
43 do I=1 to 20;
44 call execute(ifc(I=1,';set W.HAVE;','else '));
45 call execute (cat('if ',I-1,'e5 <= ACCT < ',I,'e5 then output W.TMP',I,';'));
46 end;
47 run;

NOTE: DATA statement used (Total process time):
real time 0.01 seconds
user cpu time 0.00 seconds
system cpu time 0.00 seconds
memory 328.46k
OS Memory 21600.00k
Timestamp 15/09/2020 02:41:39 PM
Step Count 55 Switch Count 54

NOTE: CALL EXECUTE generated line.
1 + data
2 + W.TMP1
3 + W.TMP2
4 + W.TMP3
5 + W.TMP4
6 + W.TMP5
7 + W.TMP6
8 + W.TMP7
9 + W.TMP8
10 + W.TMP9
11 + W.TMP10
12 + W.TMP11
13 + W.TMP12
14 + W.TMP13
15 + W.TMP14
16 + W.TMP15
17 + W.TMP16
18 + W.TMP17
19 + W.TMP18
20 + W.TMP19
21 + W.TMP20
22 + ;set W.HAVE;
23 + if 0e5 <= ACCT < 1e5 then output W.TMP1;
24 + else
25 + if 1e5 <= ACCT < 2e5 then output W.TMP2;
26 + else
27 + if 2e5 <= ACCT < 3e5 then output W.TMP3;
28 + else
29 + if 3e5 <= ACCT < 4e5 then output W.TMP4;
30 + else
31 + if 4e5 <= ACCT < 5e5 then output W.TMP5;
32 + else
33 + if 5e5 <= ACCT < 6e5 then output W.TMP6;
34 + else
35 + if 6e5 <= ACCT < 7e5 then output W.TMP7;
36 + else
37 + if 7e5 <= ACCT < 8e5 then output W.TMP8;
38 + else
39 + if 8e5 <= ACCT < 9e5 then output W.TMP9;
40 + else
41 + if 9e5 <= ACCT < 10e5 then output W.TMP10;
42 + else
43 + if 10e5 <= ACCT < 11e5 then output W.TMP11;
44 + else
45 + if 11e5 <= ACCT < 12e5 then output W.TMP12;
46 + else
47 + if 12e5 <= ACCT < 13e5 then output W.TMP13;
48 + else
49 + if 13e5 <= ACCT < 14e5 then output W.TMP14;
50 + else
51 + if 14e5 <= ACCT < 15e5 then output W.TMP15;
52 + else
53 + if 15e5 <= ACCT < 16e5 then output W.TMP16;
54 + else
55 + if 16e5 <= ACCT < 17e5 then output W.TMP17;
56 + else
57 + if 17e5 <= ACCT < 18e5 then output W.TMP18;
58 + else
59 + if 18e5 <= ACCT < 19e5 then output W.TMP19;
60 + else
61 + if 19e5 <= ACCT < 20e5 then output W.TMP20;
48

NOTE: There were 195196814 observations read from the data set W.HAVE.
NOTE: The data set W.TMP1 has 9757439 observations and 3 variables.
NOTE: Compressing data set W.TMP1 decreased size by 79.71 percent.
NOTE: The data set W.TMP2 has 9761847 observations and 3 variables.
NOTE: Compressing data set W.TMP2 decreased size by 80.57 percent.
NOTE: The data set W.TMP3 has 9755925 observations and 3 variables.
NOTE: Compressing data set W.TMP3 decreased size by 81.11 percent.
NOTE: The data set W.TMP4 has 9755351 observations and 3 variables.
NOTE: Compressing data set W.TMP4 decreased size by 81.54 percent.
NOTE: The data set W.TMP5 has 9763311 observations and 3 variables.
NOTE: Compressing data set W.TMP5 decreased size by 81.53 percent.
NOTE: The data set W.TMP6 has 9762058 observations and 3 variables.
NOTE: Compressing data set W.TMP6 decreased size by 81.81 percent.
NOTE: The data set W.TMP7 has 9762421 observations and 3 variables.
NOTE: Compressing data set W.TMP7 decreased size by 81.90 percent.
NOTE: The data set W.TMP8 has 9756555 observations and 3 variables.
NOTE: Compressing data set W.TMP8 decreased size by 81.90 percent.
NOTE: The data set W.TMP9 has 9758082 observations and 3 variables.
NOTE: Compressing data set W.TMP9 decreased size by 81.90 percent.
NOTE: The data set W.TMP10 has 9762843 observations and 3 variables.
NOTE: Compressing data set W.TMP10 decreased size by 81.90 percent.
NOTE: The data set W.TMP11 has 9756001 observations and 3 variables.
NOTE: Compressing data set W.TMP11 decreased size by 81.99 percent.
NOTE: The data set W.TMP12 has 9758699 observations and 3 variables.
NOTE: Compressing data set W.TMP12 decreased size by 82.09 percent.
NOTE: The data set W.TMP13 has 9761100 observations and 3 variables.
NOTE: Compressing data set W.TMP13 decreased size by 82.09 percent.
NOTE: The data set W.TMP14 has 9757221 observations and 3 variables.
NOTE: Compressing data set W.TMP14 decreased size by 82.08 percent.
NOTE: The data set W.TMP15 has 9760533 observations and 3 variables.
NOTE: Compressing data set W.TMP15 decreased size by 82.09 percent.
NOTE: The data set W.TMP16 has 9757133 observations and 3 variables.
NOTE: Compressing data set W.TMP16 decreased size by 82.09 percent.
NOTE: The data set W.TMP17 has 9760740 observations and 3 variables.
NOTE: Compressing data set W.TMP17 decreased size by 82.09 percent.
NOTE: The data set W.TMP18 has 9763384 observations and 3 variables.
NOTE: Compressing data set W.TMP18 decreased size by 82.09 percent.
NOTE: The data set W.TMP19 has 9762164 observations and 3 variables.
NOTE: Compressing data set W.TMP19 decreased size by 82.09 percent.
NOTE: The data set W.TMP20 has 9763895 observations and 3 variables.
NOTE: Compressing data set W.TMP20 decreased size by 82.09 percent.
NOTE: DATA statement used (Total process time):
real time 39.35 seconds
user cpu time 40.04 seconds
system cpu time 2.17 seconds
memory 171106.57k
OS Memory 191232.00k
Timestamp 15/09/2020 02:42:18 PM
Step Count 56 Switch Count 4351

49 data _null_;

50 do I=1 to 20;
51 call execute (cat('proc sort data=W.TMP',I,'; by ACCT DATE; run; '));
52 call execute ('proc sql; create table W.OUT as select a.*, mean(b.VALUE) as AVG ');
53 call execute (cat('from W.TMP', I, ' a left join W.TMP', I, ' b '));
54 call execute ('on a.ACCT=b.ACCT and b.DATE between a.DATE and a.DATE-3600*24 ');
55 call execute ('group by a.ACCT, a.DATE, a.VALUE; quit; ');
56 call execute ('proc append base=W.WANT data=W.OUT; run; ');
57 end;
58 run;

NOTE: DATA statement used (Total process time):
real time 0.01 seconds
user cpu time 0.01 seconds
system cpu time 0.00 seconds
memory 330.12k
OS Memory 23648.00k
Timestamp 15/09/2020 02:42:18 PM
Step Count 57 Switch Count 52

NOTE: CALL EXECUTE generated line.
1 + proc sort data=W.TMP1; by ACCT DATE; run;

NOTE: Sorting was performed by the data source.
NOTE: There were 9757439 observations read from the data set W.TMP1.
NOTE: The data set W.TMP1 has 9757439 observations and 3 variables.
NOTE: Compressing data set W.TMP1 decreased size by 75.26 percent.
NOTE: PROCEDURE SORT used (Total process time):
real time 3.04 seconds
user cpu time 7.67 seconds
system cpu time 1.45 seconds
memory 65420.46k
OS Memory 88932.00k
Timestamp 15/09/2020 02:42:21 PM
Step Count 58 Switch Count 256

2 + proc sql;
2 + create table W.OUT as select a.*, mean(b.VALUE) as AVG
3 + from W.TMP1 a left join W.TMP1 b
4 + on a.ACCT=b.ACCT and b.DATE between a.DATE and a.DATE-3600*24
5 + group by a.ACCT, a.DATE, a.VALUE;
NOTE: SAS threaded sort was used.
NOTE: Compressing data set W.OUT decreased size by 79.97 percent.
NOTE: Table W.OUT created, with 9757439 rows and 4 columns.

5 + quit;
NOTE: PROCEDURE SQL used (Total process time):
real time 1:36.96
user cpu time 1:17.06
system cpu time 1.18 seconds
memory 982567.39k
OS Memory 1006868.00k
Timestamp 15/09/2020 02:43:58 PM
Step Count 59 Switch Count 316

6 + proc append base=W.WANT data=W.OUT; run;

NOTE: Appending W.OUT to W.WANT.
NOTE: BASE data set does not exist. DATA file is being copied to BASE file.
NOTE: There were 9757439 observations read from the data set W.OUT.
NOTE: The data set W.WANT has 9757439 observations and 4 variables.
NOTE: Compressing data set W.WANT decreased size by 79.97 percent.
NOTE: PROCEDURE APPEND used (Total process time):
real time 2.03 seconds
user cpu time 1.93 seconds
system cpu time 0.18 seconds
memory 11641.37k
OS Memory 37248.00k
Timestamp 15/09/2020 02:44:00 PM
Step Count 60 Switch Count 325

7 + proc sort data=W.TMP2; by ACCT DATE; run;

NOTE: Sorting was performed by the data source.
NOTE: There were 9761847 observations read from the data set W.TMP2.
NOTE: The data set W.TMP2 has 9761847 observations and 3 variables.
NOTE: Compressing data set W.TMP2 decreased size by 75.30 percent.
NOTE: PROCEDURE SORT used (Total process time):
real time 3.90 seconds
user cpu time 8.00 seconds
system cpu time 1.37 seconds
memory 64787.68k
OS Memory 89700.00k
Timestamp 15/09/2020 02:44:04 PM
Step Count 61 Switch Count 258

8 + proc sql;
8 + create table W.OUT as select a.*, mean(b.VALUE) as AVG
9 + from W.TMP2 a left join W.TMP2 b
10 + on a.ACCT=b.ACCT and b.DATE between a.DATE and a.DATE-3600*24
11 + group by a.ACCT, a.DATE, a.VALUE;
NOTE: SAS threaded sort was used.
NOTE: Compressing data set W.OUT decreased size by 80.00 percent.
NOTE: Table W.OUT created, with 9761847 rows and 4 columns.

11 + quit;
NOTE: PROCEDURE SQL used (Total process time):
real time 1:37.21
user cpu time 1:15.84
system cpu time 1.60 seconds
memory 982711.54k
OS Memory 1007124.00k
Timestamp 15/09/2020 02:45:41 PM
Step Count 62 Switch Count 322

12 + proc append base=W.WANT data=W.OUT; run;

NOTE: Appending W.OUT to W.WANT.
NOTE: There were 9761847 observations read from the data set W.OUT.
NOTE: 9761847 observations added.
NOTE: The data set W.WANT has 19519286 observations and 4 variables.
NOTE: Compressing data set W.WANT decreased size by 79.99 percent.
NOTE: PROCEDURE APPEND used (Total process time):
real time 2.40 seconds
user cpu time 1.76 seconds
system cpu time 0.18 seconds
memory 11361.12k
OS Memory 37248.00k
Timestamp 15/09/2020 02:45:44 PM
Step Count 63 Switch Count 326

13 + proc sort data=W.TMP3; by ACCT DATE; run;

NOTE: Sorting was performed by the data source.
NOTE: There were 9755925 observations read from the data set W.TMP3.
NOTE: The data set W.TMP3 has 9755925 observations and 3 variables.
NOTE: Compressing data set W.TMP3 decreased size by 75.30 percent.
NOTE: PROCEDURE SORT used (Total process time):
real time 3.67 seconds
user cpu time 7.93 seconds
system cpu time 1.20 seconds
memory 64353.18k
OS Memory 89188.00k
Timestamp 15/09/2020 02:45:48 PM
Step Count 64 Switch Count 264

14 + proc sql;
14 + create table W.OUT as select a.*, mean(b.VALUE) as AVG
15 + from W.TMP3 a left join W.TMP3 b
16 + on a.ACCT=b.ACCT and b.DATE between a.DATE and a.DATE-3600*24
17 + group by a.ACCT, a.DATE, a.VALUE;
NOTE: SAS threaded sort was used.
NOTE: Compressing data set W.OUT decreased size by 80.01 percent.
NOTE: Table W.OUT created, with 9755925 rows and 4 columns.

17 + quit;
NOTE: PROCEDURE SQL used (Total process time):
real time 1:31.93
user cpu time 1:18.28
system cpu time 1.46 seconds
memory 982717.39k
OS Memory 1006868.00k
Timestamp 15/09/2020 02:47:20 PM
Step Count 65 Switch Count 328

18 + proc append base=W.WANT data=W.OUT; run;

NOTE: Appending W.OUT to W.WANT.
NOTE: There were 9755925 observations read from the data set W.OUT.
NOTE: 9755925 observations added.
NOTE: The data set W.WANT has 29275211 observations and 4 variables.
NOTE: Compressing data set W.WANT decreased size by 79.99 percent.
NOTE: PROCEDURE APPEND used (Total process time):
real time 2.31 seconds
user cpu time 1.79 seconds
system cpu time 0.17 seconds
memory 11360.96k
OS Memory 36480.00k
Timestamp 15/09/2020 02:47:22 PM
Step Count 66 Switch Count 326

19 + proc sort data=W.TMP4; by ACCT DATE; run;

NOTE: Sorting was performed by the data source.
NOTE: There were 9755351 observations read from the data set W.TMP4.
NOTE: The data set W.TMP4 has 9755351 observations and 3 variables.
NOTE: Compressing data set W.TMP4 decreased size by 75.31 percent.
NOTE: PROCEDURE SORT used (Total process time):
real time 3.35 seconds
user cpu time 7.61 seconds
system cpu time 1.43 seconds
memory 64691.18k
OS Memory 89700.00k
Timestamp 15/09/2020 02:47:25 PM
Step Count 67 Switch Count 262

20 + proc sql;
20 + create table W.OUT as select a.*, mean(b.VALUE) as AVG
21 + from W.TMP4 a left join W.TMP4 b
22 + on a.ACCT=b.ACCT and b.DATE between a.DATE and a.DATE-3600*24
23 + group by a.ACCT, a.DATE, a.VALUE;
NOTE: SAS threaded sort was used.
NOTE: Compressing data set W.OUT decreased size by 80.01 percent.
NOTE: Table W.OUT created, with 9755351 rows and 4 columns.

23 + quit;
NOTE: PROCEDURE SQL used (Total process time):
real time 1:30.07
user cpu time 1:16.65
system cpu time 1.26 seconds
memory 982720.70k
OS Memory 1006868.00k
Timestamp 15/09/2020 02:48:55 PM
Step Count 68 Switch Count 323

24 + proc append base=W.WANT data=W.OUT; run;

NOTE: Appending W.OUT to W.WANT.
NOTE: There were 9755351 observations read from the data set W.OUT.
NOTE: 9755351 observations added.
NOTE: The data set W.WANT has 39030562 observations and 4 variables.
NOTE: Compressing data set W.WANT decreased size by 80.00 percent.
NOTE: PROCEDURE APPEND used (Total process time):
real time 1.95 seconds
user cpu time 1.82 seconds
system cpu time 0.14 seconds
memory 11361.43k
OS Memory 36224.00k
Timestamp 15/09/2020 02:48:57 PM
Step Count 69 Switch Count 326

25 + proc sort data=W.TMP5; by ACCT DATE; run;

NOTE: Sorting was performed by the data source.
NOTE: There were 9763311 observations read from the data set W.TMP5.
NOTE: The data set W.TMP5 has 9763311 observations and 3 variables.
NOTE: Compressing data set W.TMP5 decreased size by 75.31 percent.
NOTE: PROCEDURE SORT used (Total process time):
real time 4.07 seconds
user cpu time 7.75 seconds
system cpu time 1.43 seconds
memory 64580.65k
OS Memory 89188.00k
Timestamp 15/09/2020 02:49:01 PM
Step Count 70 Switch Count 289

26 + proc sql;
26 + create table W.OUT as select a.*, mean(b.VALUE) as AVG
27 + from W.TMP5 a left join W.TMP5 b
28 + on a.ACCT=b.ACCT and b.DATE between a.DATE and a.DATE-3600*24
29 + group by a.ACCT, a.DATE, a.VALUE;
NOTE: SAS threaded sort was used.
NOTE: Compressing data set W.OUT decreased size by 80.01 percent.
NOTE: Table W.OUT created, with 9763311 rows and 4 columns.

29 + quit;
NOTE: PROCEDURE SQL used (Total process time):
real time 1:40.38
user cpu time 1:17.93
system cpu time 1.62 seconds
memory 983408.57k
OS Memory 1007896.00k
Timestamp 15/09/2020 02:50:42 PM
Step Count 71 Switch Count 310

30 + proc append base=W.WANT data=W.OUT; run;

NOTE: Appending W.OUT to W.WANT.
NOTE: There were 9763311 observations read from the data set W.OUT.
NOTE: 9763311 observations added.
NOTE: The data set W.WANT has 48793873 observations and 4 variables.
NOTE: Compressing data set W.WANT decreased size by 80.00 percent.
NOTE: PROCEDURE APPEND used (Total process time):
real time 1.87 seconds
user cpu time 1.92 seconds
system cpu time 0.21 seconds
memory 11137.18k
OS Memory 36480.00k
Timestamp 15/09/2020 02:50:44 PM
Step Count 72 Switch Count 323

31 + proc sort data=W.TMP6; by ACCT DATE; run;

NOTE: Sorting was performed by the data source.
NOTE: There were 9762058 observations read from the data set W.TMP6.
NOTE: The data set W.TMP6 has 9762058 observations and 3 variables.
NOTE: Compressing data set W.TMP6 decreased size by 75.31 percent.
NOTE: PROCEDURE SORT used (Total process time):
real time 3.98 seconds
user cpu time 7.64 seconds
system cpu time 1.31 seconds
memory 65252.59k
OS Memory 89700.00k
Timestamp 15/09/2020 02:50:48 PM
Step Count 73 Switch Count 272

32 + proc sql;
32 + create table W.OUT as select a.*, mean(b.VALUE) as AVG
33 + from W.TMP6 a left join W.TMP6 b
34 + on a.ACCT=b.ACCT and b.DATE between a.DATE and a.DATE-3600*24
35 + group by a.ACCT, a.DATE, a.VALUE;
NOTE: SAS threaded sort was used.
NOTE: Compressing data set W.OUT decreased size by 80.01 percent.
NOTE: Table W.OUT created, with 9762058 rows and 4 columns.

35 + quit;
NOTE: PROCEDURE SQL used (Total process time):
real time 1:46.23
user cpu time 1:18.07
system cpu time 1.43 seconds
memory 982492.60k
OS Memory 1007380.00k
Timestamp 15/09/2020 02:52:34 PM
Step Count 74 Switch Count 328

36 + proc append base=W.WANT data=W.OUT; run;

NOTE: Appending W.OUT to W.WANT.
NOTE: There were 9762058 observations read from the data set W.OUT.
NOTE: 9762058 observations added.
NOTE: The data set W.WANT has 58555931 observations and 4 variables.
NOTE: Compressing data set W.WANT decreased size by 80.00 percent.
NOTE: PROCEDURE APPEND used (Total process time):
real time 2.65 seconds
user cpu time 2.07 seconds
system cpu time 0.17 seconds
memory 11251.56k
OS Memory 36992.00k
Timestamp 15/09/2020 02:52:37 PM
Step Count 75 Switch Count 328

37 + proc sort data=W.TMP7; by ACCT DATE; run;

NOTE: Sorting was performed by the data source.
NOTE: There were 9762421 observations read from the data set W.TMP7.
NOTE: The data set W.TMP7 has 9762421 observations and 3 variables.
NOTE: Compressing data set W.TMP7 decreased size by 75.31 percent.
NOTE: PROCEDURE SORT used (Total process time):
real time 6.04 seconds
user cpu time 7.70 seconds
system cpu time 1.21 seconds
memory 65253.25k
OS Memory 90212.00k
Timestamp 15/09/2020 02:52:43 PM
Step Count 76 Switch Count 273

38 + proc sql;
38 + create table W.OUT as select a.*, mean(b.VALUE) as AVG
39 + from W.TMP7 a left join W.TMP7 b
40 + on a.ACCT=b.ACCT and b.DATE between a.DATE and a.DATE-3600*24
41 + group by a.ACCT, a.DATE, a.VALUE;
NOTE: SAS threaded sort was used.
NOTE: Compressing data set W.OUT decreased size by 80.01 percent.
NOTE: Table W.OUT created, with 9762421 rows and 4 columns.

41 + quit;
NOTE: PROCEDURE SQL used (Total process time):
real time 2:10.78
user cpu time 1:13.96
system cpu time 1.60 seconds
memory 982607.32k
OS Memory 1007892.00k
Timestamp 15/09/2020 02:54:54 PM
Step Count 77 Switch Count 330

42 + proc append base=W.WANT data=W.OUT; run;

NOTE: Appending W.OUT to W.WANT.
NOTE: There were 9762421 observations read from the data set W.OUT.
NOTE: 9762421 observations added.
NOTE: The data set W.WANT has 68318352 observations and 4 variables.
NOTE: Compressing data set W.WANT decreased size by 80.00 percent.
NOTE: PROCEDURE APPEND used (Total process time):
real time 4.18 seconds
user cpu time 1.75 seconds
system cpu time 0.14 seconds
memory 11362.28k
OS Memory 37248.00k
Timestamp 15/09/2020 02:54:58 PM
Step Count 78 Switch Count 327

43 + proc sort data=W.TMP8; by ACCT DATE; run;

NOTE: Sorting was performed by the data source.
NOTE: There were 9756555 observations read from the data set W.TMP8.
NOTE: The data set W.TMP8 has 9756555 observations and 3 variables.
NOTE: Compressing data set W.TMP8 decreased size by 75.31 percent.
NOTE: PROCEDURE SORT used (Total process time):
real time 5.81 seconds
user cpu time 8.37 seconds
system cpu time 1.35 seconds
memory 64577.84k
OS Memory 89700.00k
Timestamp 15/09/2020 02:55:04 PM
Step Count 79 Switch Count 260

44 + proc sql;
44 + create table W.OUT as select a.*, mean(b.VALUE) as AVG
45 + from W.TMP8 a left join W.TMP8 b
46 + on a.ACCT=b.ACCT and b.DATE between a.DATE and a.DATE-3600*24
47 + group by a.ACCT, a.DATE, a.VALUE;
NOTE: SAS threaded sort was used.
NOTE: Compressing data set W.OUT decreased size by 80.01 percent.
NOTE: Table W.OUT created, with 9756555 rows and 4 columns.

47 + quit;
NOTE: PROCEDURE SQL used (Total process time):
real time 2:12.95
user cpu time 1:18.17
system cpu time 1.59 seconds
memory 982613.87k
OS Memory 1006356.00k
Timestamp 15/09/2020 02:57:17 PM
Step Count 80 Switch Count 328

48 + proc append base=W.WANT data=W.OUT; run;

NOTE: Appending W.OUT to W.WANT.
NOTE: There were 9756555 observations read from the data set W.OUT.
NOTE: 9756555 observations added.
NOTE: The data set W.WANT has 78074907 observations and 4 variables.
NOTE: Compressing data set W.WANT decreased size by 80.00 percent.
NOTE: PROCEDURE APPEND used (Total process time):
real time 6.59 seconds
user cpu time 1.73 seconds
system cpu time 0.12 seconds
memory 11363.43k
OS Memory 36224.00k
Timestamp 15/09/2020 02:57:24 PM
Step Count 81 Switch Count 326

49 + proc sort data=W.TMP9; by ACCT DATE; run;

NOTE: Sorting was performed by the data source.
NOTE: There were 9758082 observations read from the data set W.TMP9.
NOTE: The data set W.TMP9 has 9758082 observations and 3 variables.
NOTE: Compressing data set W.TMP9 decreased size by 75.31 percent.
NOTE: PROCEDURE SORT used (Total process time):
real time 6.29 seconds
user cpu time 7.17 seconds
system cpu time 1.29 seconds
memory 64596.15k
OS Memory 89188.00k
Timestamp 15/09/2020 02:57:30 PM
Step Count 82 Switch Count 295

50 + proc sql;
50 + create table W.OUT as select a.*, mean(b.VALUE) as AVG
51 + from W.TMP9 a left join W.TMP9 b
52 + on a.ACCT=b.ACCT and b.DATE between a.DATE and a.DATE-3600*24
53 + group by a.ACCT, a.DATE, a.VALUE;
NOTE: SAS threaded sort was used.
NOTE: Compressing data set W.OUT decreased size by 80.01 percent.
NOTE: Table W.OUT created, with 9758082 rows and 4 columns.

53 + quit;
NOTE: PROCEDURE SQL used (Total process time):
real time 1:32.81
user cpu time 1:16.87
system cpu time 2.00 seconds
memory 982736.64k
OS Memory 1006868.00k
Timestamp 15/09/2020 02:59:03 PM
Step Count 83 Switch Count 327

54 + proc append base=W.WANT data=W.OUT; run;

NOTE: Appending W.OUT to W.WANT.
NOTE: There were 9758082 observations read from the data set W.OUT.
NOTE: 9758082 observations added.
NOTE: The data set W.WANT has 87832989 observations and 4 variables.
NOTE: Compressing data set W.WANT decreased size by 80.00 percent.
NOTE: PROCEDURE APPEND used (Total process time):
real time 2.00 seconds
user cpu time 1.81 seconds
system cpu time 0.17 seconds
memory 11362.28k
OS Memory 36736.00k
Timestamp 15/09/2020 02:59:05 PM
Step Count 84 Switch Count 326

55 + proc sort data=W.TMP10; by ACCT DATE; run;

NOTE: Sorting was performed by the data source.
NOTE: There were 9762843 observations read from the data set W.TMP10.
NOTE: The data set W.TMP10 has 9762843 observations and 3 variables.
NOTE: Compressing data set W.TMP10 decreased size by 75.31 percent.
NOTE: PROCEDURE SORT used (Total process time):
real time 2.57 seconds
user cpu time 7.45 seconds
system cpu time 1.28 seconds
memory 64241.87k
OS Memory 88676.00k
Timestamp 15/09/2020 02:59:08 PM
Step Count 85 Switch Count 262

56 + proc sql;
56 + create table W.OUT as select a.*, mean(b.VALUE) as AVG
57 + from W.TMP10 a left join W.TMP10 b
58 + on a.ACCT=b.ACCT and b.DATE between a.DATE and a.DATE-3600*24
59 + group by a.ACCT, a.DATE, a.VALUE;
NOTE: SAS threaded sort was used.
NOTE: Compressing data set W.OUT decreased size by 80.01 percent.
NOTE: Table W.OUT created, with 9762843 rows and 4 columns.

59 + quit;
NOTE: PROCEDURE SQL used (Total process time):
real time 1:46.00
user cpu time 1:22.04
system cpu time 1.59 seconds
memory 983749.64k
OS Memory 1008152.00k
Timestamp 15/09/2020 03:00:54 PM
Step Count 86 Switch Count 292

60 + proc append base=W.WANT data=W.OUT; run;

NOTE: Appending W.OUT to W.WANT.
NOTE: There were 9762843 observations read from the data set W.OUT.
NOTE: 9762843 observations added.
NOTE: The data set W.WANT has 97595832 observations and 4 variables.
NOTE: Compressing data set W.WANT decreased size by 80.01 percent.
NOTE: PROCEDURE APPEND used (Total process time):
real time 2.79 seconds
user cpu time 1.89 seconds
system cpu time 0.07 seconds
memory 11362.50k
OS Memory 37248.00k
Timestamp 15/09/2020 03:00:57 PM
Step Count 87 Switch Count 332

61 + proc sort data=W.TMP11; by ACCT DATE; run;

NOTE: Sorting was performed by the data source.
NOTE: There were 9756001 observations read from the data set W.TMP11.
NOTE: The data set W.TMP11 has 9756001 observations and 3 variables.
NOTE: Compressing data set W.TMP11 decreased size by 75.31 percent.
NOTE: PROCEDURE SORT used (Total process time):
real time 4.82 seconds
user cpu time 7.37 seconds
system cpu time 1.51 seconds
memory 64484.43k
OS Memory 89956.00k
Timestamp 15/09/2020 03:01:02 PM
Step Count 88 Switch Count 352

62 + proc sql;
62 + create table W.OUT as select a.*, mean(b.VALUE) as AVG
63 + from W.TMP11 a left join W.TMP11 b
64 + on a.ACCT=b.ACCT and b.DATE between a.DATE and a.DATE-3600*24
65 + group by a.ACCT, a.DATE, a.VALUE;
NOTE: SAS threaded sort was used.
NOTE: Compressing data set W.OUT decreased size by 80.01 percent.
NOTE: Table W.OUT created, with 9756001 rows and 4 columns.

65 + quit;
NOTE: PROCEDURE SQL used (Total process time):
real time 1:31.79
user cpu time 1:16.17
system cpu time 1.62 seconds
memory 982736.67k
OS Memory 1007124.00k
Timestamp 15/09/2020 03:02:34 PM
Step Count 89 Switch Count 327

66 + proc append base=W.WANT data=W.OUT; run;

NOTE: Appending W.OUT to W.WANT.
NOTE: There were 9756001 observations read from the data set W.OUT.
NOTE: 9756001 observations added.
NOTE: The data set W.WANT has 107351833 observations and 4 variables.
NOTE: Compressing data set W.WANT decreased size by 80.01 percent.
NOTE: PROCEDURE APPEND used (Total process time):
real time 1.17 seconds
user cpu time 1.73 seconds
system cpu time 0.18 seconds
memory 11361.15k
OS Memory 36736.00k
Timestamp 15/09/2020 03:02:35 PM
Step Count 90 Switch Count 332

67 + proc sort data=W.TMP12; by ACCT DATE; run;

NOTE: Sorting was performed by the data source.
NOTE: There were 9758699 observations read from the data set W.TMP12.
NOTE: The data set W.TMP12 has 9758699 observations and 3 variables.
NOTE: Compressing data set W.TMP12 decreased size by 75.31 percent.
NOTE: PROCEDURE SORT used (Total process time):
real time 2.90 seconds
user cpu time 7.48 seconds
system cpu time 1.43 seconds
memory 64465.90k
OS Memory 89700.00k
Timestamp 15/09/2020 03:02:38 PM
Step Count 91 Switch Count 265

68 + proc sql;
68 + create table W.OUT as select a.*, mean(b.VALUE) as AVG
69 + from W.TMP12 a left join W.TMP12 b
70 + on a.ACCT=b.ACCT and b.DATE between a.DATE and a.DATE-3600*24
71 + group by a.ACCT, a.DATE, a.VALUE;
NOTE: SAS threaded sort was used.
NOTE: Compressing data set W.OUT decreased size by 80.01 percent.
NOTE: Table W.OUT created, with 9758699 rows and 4 columns.

71 + quit;
NOTE: PROCEDURE SQL used (Total process time):
real time 1:16.48
user cpu time 1:15.57
system cpu time 1.64 seconds
memory 982604.29k
OS Memory 1006868.00k
Timestamp 15/09/2020 03:03:55 PM
Step Count 92 Switch Count 316

72 + proc append base=W.WANT data=W.OUT; run;

NOTE: Appending W.OUT to W.WANT.
NOTE: There were 9758699 observations read from the data set W.OUT.
NOTE: 9758699 observations added.
NOTE: The data set W.WANT has 117110532 observations and 4 variables.
NOTE: Compressing data set W.WANT decreased size by 80.01 percent.
NOTE: PROCEDURE APPEND used (Total process time):
real time 2.06 seconds
user cpu time 1.85 seconds
system cpu time 0.10 seconds
memory 11361.87k
OS Memory 36736.00k
Timestamp 15/09/2020 03:03:57 PM
Step Count 93 Switch Count 322

73 + proc sort data=W.TMP13; by ACCT DATE; run;

NOTE: Sorting was performed by the data source.
NOTE: There were 9761100 observations read from the data set W.TMP13.
NOTE: The data set W.TMP13 has 9761100 observations and 3 variables.
NOTE: Compressing data set W.TMP13 decreased size by 75.31 percent.
NOTE: PROCEDURE SORT used (Total process time):
real time 3.48 seconds
user cpu time 7.21 seconds
system cpu time 1.42 seconds
memory 64370.68k
OS Memory 89188.00k
Timestamp 15/09/2020 03:04:00 PM
Step Count 94 Switch Count 262

74 + proc sql;
74 + create table W.OUT as select a.*, mean(b.VALUE) as AVG
75 + from W.TMP13 a left join W.TMP13 b
76 + on a.ACCT=b.ACCT and b.DATE between a.DATE and a.DATE-3600*24
77 + group by a.ACCT, a.DATE, a.VALUE;
NOTE: SAS threaded sort was used.
NOTE: Compressing data set W.OUT decreased size by 80.01 percent.
NOTE: Table W.OUT created, with 9761100 rows and 4 columns.

77 + quit;
NOTE: PROCEDURE SQL used (Total process time):
real time 1:48.70
user cpu time 1:18.07
system cpu time 1.85 seconds
memory 982606.70k
OS Memory 1007124.00k
Timestamp 15/09/2020 03:05:49 PM
Step Count 95 Switch Count 326

78 + proc append base=W.WANT data=W.OUT; run;

NOTE: Appending W.OUT to W.WANT.
NOTE: There were 9761100 observations read from the data set W.OUT.
NOTE: 9761100 observations added.
NOTE: The data set W.WANT has 126871632 observations and 4 variables.
NOTE: Compressing data set W.WANT decreased size by 80.01 percent.
NOTE: PROCEDURE APPEND used (Total process time):
real time 2.62 seconds
user cpu time 1.76 seconds
system cpu time 0.17 seconds
memory 11248.59k
OS Memory 36992.00k
Timestamp 15/09/2020 03:05:52 PM
Step Count 96 Switch Count 325

79 + proc sort data=W.TMP14; by ACCT DATE; run;

NOTE: Sorting was performed by the data source.
NOTE: There were 9757221 observations read from the data set W.TMP14.
NOTE: The data set W.TMP14 has 9757221 observations and 3 variables.
NOTE: Compressing data set W.TMP14 decreased size by 75.31 percent.
NOTE: PROCEDURE SORT used (Total process time):
real time 4.78 seconds
user cpu time 7.36 seconds
system cpu time 1.42 seconds
memory 64915.62k
OS Memory 89956.00k
Timestamp 15/09/2020 03:05:57 PM
Step Count 97 Switch Count 259

80 + proc sql;
80 + create table W.OUT as select a.*, mean(b.VALUE) as AVG
81 + from W.TMP14 a left join W.TMP14 b
82 + on a.ACCT=b.ACCT and b.DATE between a.DATE and a.DATE-3600*24
83 + group by a.ACCT, a.DATE, a.VALUE;
NOTE: SAS threaded sort was used.
NOTE: Compressing data set W.OUT decreased size by 80.01 percent.
NOTE: Table W.OUT created, with 9757221 rows and 4 columns.

83 + quit;
NOTE: PROCEDURE SQL used (Total process time):
real time 1:42.58
user cpu time 1:18.12
system cpu time 1.62 seconds
memory 982604.17k
OS Memory 1006868.00k
Timestamp 15/09/2020 03:07:39 PM
Step Count 98 Switch Count 322

84 + proc append base=W.WANT data=W.OUT; run;

NOTE: Appending W.OUT to W.WANT.
NOTE: There were 9757221 observations read from the data set W.OUT.
NOTE: 9757221 observations added.
NOTE: The data set W.WANT has 136628853 observations and 4 variables.
NOTE: Compressing data set W.WANT decreased size by 80.01 percent.
NOTE: PROCEDURE APPEND used (Total process time):
real time 3.28 seconds
user cpu time 1.75 seconds
system cpu time 0.21 seconds
memory 11250.18k
OS Memory 36480.00k
Timestamp 15/09/2020 03:07:43 PM
Step Count 99 Switch Count 326

85 + proc sort data=W.TMP15; by ACCT DATE; run;

NOTE: Sorting was performed by the data source.
NOTE: There were 9760533 observations read from the data set W.TMP15.
NOTE: The data set W.TMP15 has 9760533 observations and 3 variables.
NOTE: Compressing data set W.TMP15 decreased size by 75.31 percent.
NOTE: PROCEDURE SORT used (Total process time):
real time 5.00 seconds
user cpu time 8.04 seconds
system cpu time 1.14 seconds
memory 64594.12k
OS Memory 88932.00k
Timestamp 15/09/2020 03:07:48 PM
Step Count 100 Switch Count 257

86 + proc sql;
86 + create table W.OUT as select a.*, mean(b.VALUE) as AVG
87 + from W.TMP15 a left join W.TMP15 b
88 + on a.ACCT=b.ACCT and b.DATE between a.DATE and a.DATE-3600*24
89 + group by a.ACCT, a.DATE, a.VALUE;
NOTE: SAS threaded sort was used.
NOTE: Compressing data set W.OUT decreased size by 80.02 percent.
NOTE: Table W.OUT created, with 9760533 rows and 4 columns.

89 + quit;
NOTE: PROCEDURE SQL used (Total process time):
real time 1:52.97
user cpu time 1:18.12
system cpu time 1.34 seconds
memory 983635.42k
OS Memory 1007640.00k
Timestamp 15/09/2020 03:09:41 PM
Step Count 101 Switch Count 319

90 + proc append base=W.WANT data=W.OUT; run;

NOTE: Appending W.OUT to W.WANT.
NOTE: There were 9760533 observations read from the data set W.OUT.
NOTE: 9760533 observations added.
NOTE: The data set W.WANT has 146389386 observations and 4 variables.
NOTE: Compressing data set W.WANT decreased size by 80.01 percent.
NOTE: PROCEDURE APPEND used (Total process time):
real time 2.54 seconds
user cpu time 1.85 seconds
system cpu time 0.21 seconds
memory 11251.18k
OS Memory 36224.00k
Timestamp 15/09/2020 03:09:43 PM
Step Count 102 Switch Count 326

91 + proc sort data=W.TMP16; by ACCT DATE; run;

NOTE: Sorting was performed by the data source.
NOTE: There were 9757133 observations read from the data set W.TMP16.
NOTE: The data set W.TMP16 has 9757133 observations and 3 variables.
NOTE: Compressing data set W.TMP16 decreased size by 75.31 percent.
NOTE: PROCEDURE SORT used (Total process time):
real time 3.78 seconds
user cpu time 7.40 seconds
system cpu time 1.40 seconds
memory 64466.78k
OS Memory 88932.00k
Timestamp 15/09/2020 03:09:47 PM
Step Count 103 Switch Count 277

92 + proc sql;
92 + create table W.OUT as select a.*, mean(b.VALUE) as AVG
93 + from W.TMP16 a left join W.TMP16 b
94 + on a.ACCT=b.ACCT and b.DATE between a.DATE and a.DATE-3600*24
95 + group by a.ACCT, a.DATE, a.VALUE;
NOTE: SAS threaded sort was used.
NOTE: Compressing data set W.OUT decreased size by 80.01 percent.
NOTE: Table W.OUT created, with 9757133 rows and 4 columns.

95 + quit;
NOTE: PROCEDURE SQL used (Total process time):
real time 1:53.46
user cpu time 1:20.43
system cpu time 1.60 seconds
memory 982719.57k
OS Memory 1006868.00k
Timestamp 15/09/2020 03:11:41 PM
Step Count 104 Switch Count 319

96 + proc append base=W.WANT data=W.OUT; run;

NOTE: Appending W.OUT to W.WANT.
NOTE: There were 9757133 observations read from the data set W.OUT.
NOTE: 9757133 observations added.
NOTE: The data set W.WANT has 156146519 observations and 4 variables.
NOTE: Compressing data set W.WANT decreased size by 80.01 percent.
NOTE: PROCEDURE APPEND used (Total process time):
real time 1.18 seconds
user cpu time 1.79 seconds
system cpu time 0.12 seconds
memory 11249.75k
OS Memory 36992.00k
Timestamp 15/09/2020 03:11:42 PM
Step Count 105 Switch Count 332

97 + proc sort data=W.TMP17; by ACCT DATE; run;

NOTE: Sorting was performed by the data source.
NOTE: There were 9760740 observations read from the data set W.TMP17.
NOTE: The data set W.TMP17 has 9760740 observations and 3 variables.
NOTE: Compressing data set W.TMP17 decreased size by 75.31 percent.
NOTE: PROCEDURE SORT used (Total process time):
real time 5.68 seconds
user cpu time 8.01 seconds
system cpu time 1.39 seconds
memory 64707.06k
OS Memory 89956.00k
Timestamp 15/09/2020 03:11:48 PM
Step Count 106 Switch Count 288

98 + proc sql;
98 + create table W.OUT as select a.*, mean(b.VALUE) as AVG
99 + from W.TMP17 a left join W.TMP17 b
100 + on a.ACCT=b.ACCT and b.DATE between a.DATE and a.DATE-3600*24
101 + group by a.ACCT, a.DATE, a.VALUE;
NOTE: SAS threaded sort was used.
NOTE: Compressing data set W.OUT decreased size by 80.01 percent.
NOTE: Table W.OUT created, with 9760740 rows and 4 columns.

101 + quit;
NOTE: PROCEDURE SQL used (Total process time):
real time 1:15.79
user cpu time 1:15.35
system cpu time 1.15 seconds
memory 982718.82k
OS Memory 1007380.00k
Timestamp 15/09/2020 03:13:04 PM
Step Count 107 Switch Count 322

102 + proc append base=W.WANT data=W.OUT; run;

NOTE: Appending W.OUT to W.WANT.
NOTE: There were 9760740 observations read from the data set W.OUT.
NOTE: 9760740 observations added.
NOTE: The data set W.WANT has 165907259 observations and 4 variables.
NOTE: Compressing data set W.WANT decreased size by 80.01 percent.
NOTE: PROCEDURE APPEND used (Total process time):
real time 1.17 seconds
user cpu time 1.73 seconds
system cpu time 0.12 seconds
memory 11362.37k
OS Memory 37248.00k
Timestamp 15/09/2020 03:13:05 PM
Step Count 108 Switch Count 338

103 + proc sort data=W.TMP18; by ACCT DATE; run;

NOTE: Sorting was performed by the data source.
NOTE: There were 9763384 observations read from the data set W.TMP18.
NOTE: The data set W.TMP18 has 9763384 observations and 3 variables.
NOTE: Compressing data set W.TMP18 decreased size by 75.31 percent.
NOTE: PROCEDURE SORT used (Total process time):
real time 2.96 seconds
user cpu time 7.54 seconds
system cpu time 1.34 seconds
memory 64579.59k
OS Memory 89700.00k
Timestamp 15/09/2020 03:13:08 PM
Step Count 109 Switch Count 271

104 + proc sql;
104 + create table W.OUT as select a.*, mean(b.VALUE) as AVG
105 + from W.TMP18 a left join W.TMP18 b
106 + on a.ACCT=b.ACCT and b.DATE between a.DATE and a.DATE-3600*24
107 + group by a.ACCT, a.DATE, a.VALUE;
NOTE: SAS threaded sort was used.
NOTE: Compressing data set W.OUT decreased size by 80.01 percent.
NOTE: Table W.OUT created, with 9763384 rows and 4 columns.

107 + quit;
NOTE: PROCEDURE SQL used (Total process time):
real time 1:35.59
user cpu time 1:16.67
system cpu time 1.60 seconds
memory 983760.70k
OS Memory 1008152.00k
Timestamp 15/09/2020 03:14:43 PM
Step Count 110 Switch Count 322

108 + proc append base=W.WANT data=W.OUT; run;

NOTE: Appending W.OUT to W.WANT.
NOTE: There were 9763384 observations read from the data set W.OUT.
NOTE: 9763384 observations added.
NOTE: The data set W.WANT has 175670643 observations and 4 variables.
NOTE: Compressing data set W.WANT decreased size by 80.01 percent.
NOTE: PROCEDURE APPEND used (Total process time):
real time 3.76 seconds
user cpu time 1.59 seconds
system cpu time 0.20 seconds
memory 11361.00k
OS Memory 36992.00k
Timestamp 15/09/2020 03:14:47 PM
Step Count 111 Switch Count 330

109 + proc sort data=W.TMP19; by ACCT DATE; run;

NOTE: Sorting was performed by the data source.
NOTE: There were 9762164 observations read from the data set W.TMP19.
NOTE: The data set W.TMP19 has 9762164 observations and 3 variables.
NOTE: Compressing data set W.TMP19 decreased size by 75.31 percent.
NOTE: PROCEDURE SORT used (Total process time):
real time 6.73 seconds
user cpu time 7.20 seconds
system cpu time 1.20 seconds
memory 64466.46k
OS Memory 89188.00k
Timestamp 15/09/2020 03:14:54 PM
Step Count 112 Switch Count 271

110 + proc sql;
110 + create table W.OUT as select a.*, mean(b.VALUE) as AVG
111 + from W.TMP19 a left join W.TMP19 b
112 + on a.ACCT=b.ACCT and b.DATE between a.DATE and a.DATE-3600*24
113 + group by a.ACCT, a.DATE, a.VALUE;
NOTE: SAS threaded sort was used.
NOTE: Compressing data set W.OUT decreased size by 80.01 percent.
NOTE: Table W.OUT created, with 9762164 rows and 4 columns.

113 + quit;
NOTE: PROCEDURE SQL used (Total process time):
real time 2:10.28
user cpu time 1:13.26
system cpu time 1.45 seconds
memory 982720.64k
OS Memory 1007124.00k
Timestamp 15/09/2020 03:17:05 PM
Step Count 113 Switch Count 317

114 + proc append base=W.WANT data=W.OUT; run;

NOTE: Appending W.OUT to W.WANT.
NOTE: There were 9762164 observations read from the data set W.OUT.
NOTE: 9762164 observations added.
NOTE: The data set W.WANT has 185432807 observations and 4 variables.
NOTE: Compressing data set W.WANT decreased size by 80.01 percent.
NOTE: PROCEDURE APPEND used (Total process time):
real time 2.78 seconds
user cpu time 1.70 seconds
system cpu time 0.07 seconds
memory 11360.56k
OS Memory 36992.00k
Timestamp 15/09/2020 03:17:08 PM
Step Count 114 Switch Count 316

115 + proc sort data=W.TMP20; by ACCT DATE; run;

NOTE: Sorting was performed by the data source.
NOTE: There were 9763895 observations read from the data set W.TMP20.
NOTE: The data set W.TMP20 has 9763895 observations and 3 variables.
NOTE: Compressing data set W.TMP20 decreased size by 75.31 percent.
NOTE: PROCEDURE SORT used (Total process time):
real time 5.15 seconds
user cpu time 7.07 seconds
system cpu time 1.65 seconds
memory 64594.12k
OS Memory 89700.00k
Timestamp 15/09/2020 03:17:13 PM
Step Count 115 Switch Count 257

116 + proc sql;
116 + create table W.OUT as select a.*, mean(b.VALUE) as AVG
117 + from W.TMP20 a left join W.TMP20 b
118 + on a.ACCT=b.ACCT and b.DATE between a.DATE and a.DATE-3600*24
119 + group by a.ACCT, a.DATE, a.VALUE;
NOTE: SAS threaded sort was used.
NOTE: Compressing data set W.OUT decreased size by 80.01 percent.
NOTE: Table W.OUT created, with 9763895 rows and 4 columns.

119 + quit;
NOTE: PROCEDURE SQL used (Total process time):
real time 1:25.20
user cpu time 1:17.31
system cpu time 1.64 seconds
memory 983744.17k
OS Memory 1008408.00k
Timestamp 15/09/2020 03:18:38 PM
Step Count 116 Switch Count 306

120 + proc append base=W.WANT data=W.OUT; run;

NOTE: Appending W.OUT to W.WANT.
NOTE: There were 9763895 observations read from the data set W.OUT.
NOTE: 9763895 observations added.
NOTE: The data set W.WANT has 195196702 observations and 4 variables.
NOTE: Compressing data set W.WANT decreased size by 80.01 percent.
NOTE: PROCEDURE APPEND used (Total process time):
real time 1.98 seconds
user cpu time 1.73 seconds
system cpu time 0.15 seconds
memory 11362.09k
OS Memory 37504.00k
Timestamp 15/09/2020 03:18:40 PM
Step Count 117 Switch Count 326

59
60 /* Insert custom code after submitted code here */
61 run;%z_eg_post_run;
###################################################
# Elapse time: 00h38m24s
###################################################
# SYSCC=0 SYSERR=0
###################################################

37 minutes to split and match the 200 million records in 20 chunks.

I have no idea why proc sql sorts the data again, as indicated by the message

NOTE: SAS threaded sort was used.

There is room for gains here, if one can figure out what on earth is going on.

Ksharp
Super User
data sample;
    input customerid : 7.
	transactionid: $13.
        date : datetime26.
        value : 3.
    ;
    format date datetime26.;
datalines;
2133780 2133780_00355 15JUN2020:12:47:17.000000   53
2133780 2133780_00687 15JUN2020:12:48:35.000000   72
2133780 2133780_00181 15JUN2020:12:59:44.000000   88
2133780 2133780_00273 15JUN2020:16:43:46.000000   96
2133780 2133780_00803 16JUN2020:22:30:15.000000   25
2133780 2133780_00411 17JUN2020:07:22:18.000000   75
2133780 2133780_00191 17JUN2020:23:16:39.000000   59
2133780 2133780_00461 14JUL2020:09:35:32.000000   22
2133780 2133780_00532 21JUL2020:10:40:54.000000   46
2133780 2133780_00609 28JUL2020:09:58:30.000000   36
2133780 2133780_00504 28JUL2020:10:12:55.000000   87
2133780 2133780_00488 31JUL2020:15:30:44.000000   36
2689999 2689999_00284 02JUN2020:09:03:55.000000   44
2689999 2689999_00734 06JUN2020:11:20:23.000000   95
2689999 2689999_00061 06JUN2020:13:40:22.000000   29
2689999 2689999_00540 07JUN2020:14:00:19.000000   14
2689999 2689999_00616 08JUN2020:13:36:24.000000   14
2689999 2689999_00002 08JUN2020:13:59:32.000000   82
2689999 2689999_00966 04JUL2020:11:29:49.000000   21
2689999 2689999_00859 10JUL2020:14:46:37.000000   34
2689999 2689999_00816 29JUL2020:13:56:45.000000   95
2689999 2689999_00045 29JUL2020:15:12:19.000000   42
2689999 2689999_00481 02AUG2020:11:02:21.000000   38
2689999 2689999_00227 02AUG2020:11:03:41.000000   42
2689999 2689999_00354 02AUG2020:12:26:34.000000   42
2689999 2689999_00794 03AUG2020:09:10:43.000000   73
2689999 2689999_00857 20AUG2020:15:03:19.000000   16
;
run;
data sample;
 set sample;
 date=int(date);
run;
/*Assuming there are not duplicated date within a customerid */
data want;
 if _n_=1 then do;
   if 0 then set sample;
   declare hash h();
   h.definekey('date');
   h.definedata('value');
   h.definedone();
 end;

h.clear();
do until(last.customerid);
 set sample;
 by customerid;
 h.add();
end;

array x{99999} _temporary_;
do until(last.customerid);
 set sample;
 by customerid;
 i=0;call missing(of x{*});
 do temp=intnx('hour',date,-24,'s') to date;
   	if h.find(key:temp)=0 then do;i+1;x{i}=value;end;
 end;
 rolling_mean=mean(of x{*});
 output;
end;

drop i temp;
run;
Array_Mover
Obsidian | Level 7

Wow, this works flawlessly on the table I provided.

 

There could be duplicates in the date field, so I provided a 'transactionid' to show unique transactions. Fraudsters can really hammer merchants with quick transactions and if some transactions are processed in batches, the dates might match.

 

I can sort of wrap my head around your code.  It is leveraging the hash table to do a lookup.  If there were a duplicate date, would it just stop at the first instance of that date occurring?

Ksharp
Super User
. If there were a duplicate date, Just sum value as

data sample;
set sample;
date=int(date);
run;
proc sql;
create table temp as
select customerid,date,sum(value) as value
from sample
group by customerid,date;
quit;

data sample;
set temp;
run;
mkeintz
PROC Star

If the data were sorted by timestamp/customerid, then I would go to the hash approach - in fact you would need a hash-of-hashes to enable tracking of a varying number of qualifying records over a varying number of customerid's.

 

But this data is sorted by customerid/timestamp  (where "date" is the misleading name for timestamp).  So let's assume it is impossible for there to be more than 400 transactions per customerid.  Then:

 

data sample;
    input customerid : 7.
	transactionid: $13.
        date : datetime26.
        value : 3.
    ;
    format date datetime26.;

datalines;
2133780 2133780_00355 15JUN2020:12:47:17.000000   53
2133780 2133780_00687 15JUN2020:12:48:35.000000   72
2133780 2133780_00181 15JUN2020:12:59:44.000000   88
2133780 2133780_00273 15JUN2020:16:43:46.000000   96
2133780 2133780_00803 16JUN2020:22:30:15.000000   25
2133780 2133780_00411 17JUN2020:07:22:18.000000   75
2133780 2133780_00191 17JUN2020:23:16:39.000000   59
2133780 2133780_00461 14JUL2020:09:35:32.000000   22
2133780 2133780_00532 21JUL2020:10:40:54.000000   46
2133780 2133780_00609 28JUL2020:09:58:30.000000   36
2133780 2133780_00504 28JUL2020:10:12:55.000000   87
2133780 2133780_00488 31JUL2020:15:30:44.000000   36
2689999 2689999_00284 02JUN2020:09:03:55.000000   44
2689999 2689999_00734 06JUN2020:11:20:23.000000   95
2689999 2689999_00061 06JUN2020:13:40:22.000000   29
2689999 2689999_00540 07JUN2020:14:00:19.000000   14
2689999 2689999_00616 08JUN2020:13:36:24.000000   14
2689999 2689999_00002 08JUN2020:13:59:32.000000   82
2689999 2689999_00966 04JUL2020:11:29:49.000000   21
2689999 2689999_00859 10JUL2020:14:46:37.000000   34
2689999 2689999_00816 29JUL2020:13:56:45.000000   95
2689999 2689999_00045 29JUL2020:15:12:19.000000   42
2689999 2689999_00481 02AUG2020:11:02:21.000000   38
2689999 2689999_00227 02AUG2020:11:03:41.000000   42
2689999 2689999_00354 02AUG2020:12:26:34.000000   42
2689999 2689999_00794 03AUG2020:09:10:43.000000   73
2689999 2689999_00857 20AUG2020:15:03:19.000000   16

;
run;

%let interval=24:00:00;

data rolling_recent_counts (drop= j _:);
  set sample;
  by customerid;
  array _dats{400} _temporary_;
  array _vals{400} _temporary_;
  if first.customerid then call missing(of _n,_value_sum,_jlast);

  _n+1;
  _value_sum+value;

  _dats{_n}=date;
  _vals{_n}=value;

  _jlast+0;  /*Lazy way to retain, and reset missing at first.customerid above to zero */
  do j=_jlast+1 by 1 while (_dats{j} < date - "&interval"t); 
    _value_sum = _value_sum - _vals{j};
	call missing(_dats{j},_vals{j});
	_jlast=j;
  end;
  n_values=_n - _jlast;
  avg_value=_value_sum/n_values;
run;

The variable _N is the record number for the current customerid, initialized to 1 for each customer.  And _JLAST is the record number for the most recent non-qualifying record, so it is initialized to zero for each customer.

 

If a customer can have more than 400 records, then just change the size of the _VALS and _DATS arrays to accomodate the largest possible record count.

 

Edit comment:  I took out an extraneous statement with a PUT statement I was using to track the program logic. 

 

Also note you can set the interval however you want in the %let statement.

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Array_Mover
Obsidian | Level 7

@mkeintz wrote:

If the data were sorted by timestamp/customerid, then I would go to the hash approach - in fact you would need a hash-of-hashes to enable tracking of a varying number of qualifying records over a varying number of customerid's.

 


Actually the transaction data is sorted by timestamp/customerid in the database where I actually need to do the work.  I thought the sample data sorted by timestamp would be the best way to illustrate my problem.

 

I was introduced to the concept on Hash of Hashes by watching a SAS training video on youtube Paul Dorfman and Don Henderson presented. I then checked the book out from a university library.  I have no formal training in software engineering or programming, so I find the concepts difficult to grasp. I feel like I have a very basic understanding, but am having trouble applying the concepts in the book to my own real world scenario.

 

Would you mind tackling the problem as if the data were sorted by timestamp/customerid? 

DonH
Lapis Lazuli | Level 10

Mark makes a good point about this being a problem that can be done with the Hash of Hash approach. However, that approach requires that you can fit all the transactions for all the customers into memory. If you can't, you will have to do some sort of looping. Given that you have to implement looping thru customers (I'm assuming that each customer is processed independently), the benefit of the Hash of Hash approach (a separate hash object for each customer) may not be worth it (unless you also have lots of customers.

There are multiple ways to do this looping. Before I suggest anything can you provide any details about how many customers you could have in your input file and a rough guesstimate as to the maximum number of transactions a customer may ?

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 31 replies
  • 2904 views
  • 16 likes
  • 6 in conversation