BookmarkSubscribeRSS Feed
JLW192
Calcite | Level 5

I have the following code that is being used generate running totals of features for the past 1 day, 7 days, 1 month, 3 months, and 6 months. 

 

LIBNAME A "C:\Users\James\Desktop\data\Base Data";
LIBNAME DATA "C:\Users\James\Desktop\data\Data1";

%MACRO HELPER(P);

data a1;
set data.final_master_&P. ;
QUERY = '%TEST('||STRIP(DATETIME)||','||STRIP(PARTICIPANT)||');';
CALL EXECUTE(QUERY);
run;

%MEND;

%MACRO TEST(TIME,PAR);
proc sql; select SUM(APP_1), SUM(APP_2), sum(APP_3), SUM(APP_4), SUM(APP_5) INTO :APP_1_24, :APP_2_24, :APP_3_24, :APP_4_24, :APP_5_24 FROM A1 WHERE DATETIME BETWEEN INTNX('SECONDS',&TIME.,-60*60*24) AND &TIME.; /* 7 Days */ select SUM(APP_1), SUM(APP_2), sum(APP_3), SUM(APP_4), SUM(APP_5) INTO :APP_1_7DAY, :APP_2_7DAY, :APP_3_7DAY, :APP_4_7DAY, :APP_5_7DAY FROM A1 WHERE DATETIME BETWEEN INTNX('SECONDS',&TIME.,-60*60*24*7) AND &TIME.; /* One Month */ select SUM(APP_1), SUM(APP_2), sum(APP_3), SUM(APP_4), SUM(APP_5) INTO :APP_1_1MONTH, :APP_2_1MONTH, :APP_3_1MONTH, :APP_4_1MONTH, :APP_5_1MONTH FROM A1 WHERE DATETIME BETWEEN INTNX('SECONDS',&TIME.,-60*60*24*7*4) AND &TIME.; /* Three Months */ select SUM(APP_1), SUM(APP_2), sum(APP_3), SUM(APP_4), SUM(APP_5) INTO :APP_1_3MONTH, :APP_2_3MONTH, :APP_3_3MONTH, :APP_4_3MONTH, :APP_5_3MONTH FROM A1 WHERE DATETIME BETWEEN INTNX('SECONDS',&TIME.,-60*60*24*7*4*3) AND &TIME.; /* Six Months */ select SUM(APP_1), SUM(APP_2), sum(APP_3), SUM(APP_4), SUM(APP_5) INTO :APP_1_6MONTH, :APP_2_6MONTH, :APP_3_6MONTH, :APP_4_6MONTH, :APP_5_6MONTH FROM A1 WHERE DATETIME BETWEEN INTNX('SECONDS',&TIME.,-60*60*24*7*4*6) AND &TIME.; quit; DATA T; PARTICIPANT = &PAR.; DATETIME = &TIME; APP_1_24 = &APP_1_24.; APP_2_24 = &APP_2_24.; APP_3_24 = &APP_3_24.; APP_4_24 = &APP_4_24.; APP_5_24 = &APP_5_24.; APP_1_7DAY = &APP_1_7DAY.; APP_2_7DAY = &APP_2_7DAY.; APP_3_7DAY = &APP_3_7DAY.; APP_4_7DAY = &APP_4_7DAY.; APP_5_7DAY = &APP_5_7DAY.; APP_1_1MONTH = &APP_1_1MONTH.; APP_2_1MONTH = &APP_2_1MONTH.; APP_3_1MONTH = &APP_3_1MONTH.; APP_4_1MONTH = &APP_4_1MONTH.; APP_5_1MONTH = &APP_5_1MONTH.; APP_1_3MONTH = &APP_1_3MONTH.; APP_2_3MONTH = &APP_2_3MONTH.; APP_3_3MONTH = &APP_3_3MONTH.; APP_4_3MONTH = &APP_4_3MONTH.; APP_5_3MONTH = &APP_5_3MONTH.; APP_1_6MONTH = &APP_1_6MONTH.; APP_2_6MONTH = &APP_2_6MONTH.; APP_3_6MONTH = &APP_3_6MONTH.; APP_4_6MONTH = &APP_4_6MONTH.; APP_5_6MONTH = &APP_5_6MONTH.; FORMAT DATETIME DATETIME.; RUN; PROC APPEND BASE=DATA.FLAGS_&par. DATA=T; RUN; %MEND; %helper(1);

This code runs perfectly if I limit the number of observations in the %helper macro, using an (obs=) in the creation of the a1 dataset. However, when I put no limit on the obs number, i.e. execute the %test macro for every row in the dataset a1, I get errors. In SAS EG, I get a "server disconnected" popup after the status bar hangs at "running data step", and on Base SAS 9.4 I get the error that none of the macro variables have been resolved that are created in the proc sql into. 

 

I'm confused as the code works fine for a limited amount of observations, but when trying on the whole dataset it hangs or gives errors. The dataset I'm doing this for has around 130,000 observations.

 

Any ideas?

3 REPLIES 3
Kurt_Bremser
Super User

If

data.final_master_&P.

actually contains 130000 observations, you're trying to load your test macro (which contains the proc sql and the data step) 130000 times into the execution queue before SAS has the opportunity to run anything off that. That probably cracks the internal structures and causes a memory-related crash.

 

If you supplied some example data from

data.final_master_&P.

(data step, please!) we might be able to optimize what you are trying to do.

JLW192
Calcite | Level 5

Hi Kurt,

 

Thanks for getting back to me, heres a sample of the dataset:

 

DATA final_master_sample;
infile datalines dsd truncover;
INPUT PARTICIPANT DATETIME APP_1 APP_2 APP_3 APP_4 APP_5 ;
DATALINES;
1,1615939200,21,1,96,3,94
1,1615939501,64,23,97,7,11
1,1615939802,58,42,42,50,93
1,1615940103,77,74,82,21,44
1,1615940404,46,47,42,32,96
1,1615940705,45,55,7,42,71
1,1615941006,49,41,21,100,39
1,1615941307,82,85,46,86,5
1,1615941608,11,33,86,40,43
1,1615941909,19,21,84,3,29
1,1615942210,45,100,11,55,49
1,1615942511,58,67,13,15,47
1,1615942812,47,56,55,13,50
1,1615943113,76,14,3,10,41
1,1615943414,27,56,31,79,17
1,1615943715,32,85,94,40,23
1,1615944016,84,23,80,94,36
1,1615944317,37,13,17,49,54
1,1615944618,45,78,45,78,96
1,1615944919,2,43,24,55,57
1,1615945220,37,70,3,61,10
1,1615945521,78,95,33,30,6
1,1615945822,94,18,58,86,73
1,1615946123,79,70,94,32,8
1,1615946424,40,95,4,60,50
1,1615946725,98,8,54,55,44
1,1615947026,32,54,4,89,47
1,1615947327,17,5,31,80,39
1,1615947628,18,52,84,32,62
1,1615947929,59,27,2,45,50
1,1615948230,17,55,100,74,6
1,1615948531,18,84,9,8,14
1,1615948832,5,34,63,96,75
1,1615949133,66,71,18,50,70
1,1615949434,4,16,76,53,93
1,1615949735,3,58,30,27,5
1,1615950036,49,50,28,78,44
1,1615950337,68,66,58,13,7
;

run;

I had a feeling it would be something to do with memory, is there a way to execute the macro without loading it in every time? The rest of the data has this structure, with (roughly) 5 minute intervals between each row, some are missing however. Thanks!

Kurt_Bremser
Super User

So this looks like a rolling sum problem.

Try this to create just one of the sums:

proc sql;
create table a1 as
select
  a.participant,
  a.datetime,
  (
    select sum(app_1)
    from final_master_sample b
    where
      a.participant = b.participant and
      b.DATETIME between intnx('seconds',a.datetime,-60*60*24) and a.datetime
  ) as app_1_24
from final_master_sample a;
quit;

to see how much time this takes. If runtime is OK, expand for further values step-by-step.

I have the suspicion that this method will start to perform very badly quite quickly.

 

My preferred method for building rolling sums would be to create a correct sequence (ie your 5 minutes interval) first, inserting records with 0 values for every missing time point.

Then you can build an array that contains the rolling values (a FIFO queue) for your given time frame.

data want;
set have;
by participant;
array min_5 {288} app_1_1-app_1_288;
if first.participant
then do i = 1 to 288;
  min_5{i} = 0;
end;
do i = 1 to 287;
  min_5{i} = min_5{i+1};
end;
min_5{288} = app_1;
sum_app_1_1 = sum(of app_1_1-app_1_288);
drop i app_1_1-app_1_288;
run;

This is much better performancewise, as it only requires 1 sequential pass through the dataset (and probably a sort before that).

For longer time frames I would reduce the time-resolution (ie for months I would sum up the days first) to keep the array small enough so it fits into memory.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 3 replies
  • 616 views
  • 0 likes
  • 2 in conversation