BookmarkSubscribeRSS Feed
tush974
Calcite | Level 5

I want to perform linear regression after every 7 days upto the total days for different serial id's. What I have done is, I used the macro function and iterated upto total days by 7... and preformed the regression... but this process is taking lot of time. Is there any better way to do it?

 

 

%MACRO LR_WEEK(TABLE);
PROC DATASETS LIBRARY=WORK NODETAILS NOLIST;
DELETE LR_PARTS;
RUN;

PROC SQL NOPRINT;
SELECT DISTINCT SERIAL_id
INTO :SERIAL_ARRAY SEPARATED BY ' '
FROM &TABLE
QUIT;

PROC SQL NOPRINT;
SELECT COUNT (DISTINCT SERIAL_id) AS TOT_DISTINCT_SERIAL
INTO :TOT_DISTINCT_SERIAL_1
FROM &TABLE
QUIT;

%DO i = 1 %TO &TOT_DISTINCT_SERIAL_1.;

PROC SQL NOPRINT;
SELECT DISTINCT CYCLE_COUNTER
INTO :CYCLE_COUNTER_ARRAY SEPARATED BY ' '
FROM &TABLE
WHERE SERIAL_NR_N = %SCAN(&SERIAL_ARRAY.,&i.);
QUIT;

PROC SQL NOPRINT;
SELECT COUNT (DISTINCT CYCLE_COUNTER) AS TOT_DISTINCT_CYCLE_COUNTER
INTO :TOT_DISTINCT_CYCLE_COUNTER_1
FROM &TABLE
WHERE SERIAL_id = %SCAN(&SERIAL_ARRAY.,&i.);
QUIT;

%DO j = 1 %TO &TOT_DISTINCT_CYCLE_COUNTER_1.;

PROC SQL NOPRINT;
SELECT MAX(TOTAL_DAYS)
INTO :TOTAL_DAYS
FROM &TABLE
WHERE SERIAL_id = %SCAN(&SERIAL_ARRAY.,&i.) AND CYCLE_COUNTER = %SCAN(&CYCLE_COUNTER_ARRAY.,&j.) ;
QUIT;

%IF &TOTAL_DAYS. > 30 %THEN %DO;
%DO k=7 %TO &TOTAL_DAYS. %BY 7;

DATA LR_INP;
SET &TABLE;
BY MATERIAL_NR_N SERIAL_id CYCLE_COUNTER;
WHERE SERIAL_id=%SCAN(&SERIAL_ARRAY.,&i.) AND CYCLE_COUNTER = %SCAN(&CYCLE_COUNTER_ARRAY.,&j.) AND DAYS_ELAPSED<&k AND TIME > 5;
RUN;

PROC REG DATA=LR_INP OUTEST=SIO_LR_VAR NOPRINT;
BY MATERIAL_NR_N SERIAL_id CYCLE_COUNTER;
MODEL SIO_MA_5 = DAYS_ELAPSED;
RUN;

DATA LR_ADD;
SET SIO_LR_VAR;
BY MATERIAL_NR_N SERIAL_id CYCLE_COUNTER;
DAYS=&k;
RUN;

PROC APPEND BASE=WORK.LR_PARTS DATA=LR_ADD;
RUN;

PROC SORT DATA=LR_PARTS;
BY MATERIAL_NR_N SERIAL_id CYCLE_COUNTER;
RUN;

PROC SQL;
CREATE TABLE WORK.LR_JOIN AS
SELECT A.*,B.INTERCEPT,B.DAYS_ELAPSED AS LR_SLOPE
FROM &TABLE A LEFT JOIN LR_PARTS B
ON A.SERIAL_id=B.SERIALid AND A.CYCLE_COUNTER=B.CYCLE_COUNTER AND A.DAYS_ELAPSED=B.DAYS
ORDER BY SERIAL_id,CYCLE_COUNTER,DATE_N;
QUIT;
%END;
%END;
%END;
%END;
%MEND;

3 REPLIES 3
Kurt_Bremser
Super User

SAS does not need shouting, so using lowercase is possible and preferred.

Use the "little running man" icon for posting code, so that the formatting and code is preserved as-is.

But if that is really the way you write code, see Maxim 12.

mkeintz
PROC Star

You have

  1. An outer loop:
    1. %DO i = 1 %TO &TOT_DISTINCT_SERIAL_1.;
  2. A middle loop:
    1. %DO j = 1 %TO &TOT_DISTINCT_CYCLE_COUNTER_1.;
  3. And a conditional inner loop"
    1. %IF &TOTAL_DAYS. > 30 %THEN %DO;
    2. %DO k=7 %TO &TOTAL_DAYS. %BY 7;

You can probably eliminate the inner loop which iteratively makes a series dataset and runs a regression on each one.  Instead you can simultaneously make all the datasets, concatenate them into a single data set view (with new variable DAYS=macrovar &K, and run a regression adding the new by-variable DAYS.

 

That is you can replace this:

 

 

%IF &TOTAL_DAYS. > 30 %THEN %DO;
  %DO k=7 %TO &TOTAL_DAYS. %BY 7;
    DATA LR_INP;
      SET &TABLE;
      BY MATERIAL_NR_N SERIAL_id CYCLE_COUNTER;
      WHERE SERIAL_id=%SCAN(&SERIAL_ARRAY.,&i.) AND CYCLE_COUNTER = %SCAN(&CYCLE_COUNTER_ARRAY.,&j.) AND DAYS_ELAPSED<&k AND TIME > 5;
    RUN;

    PROC REG DATA=LR_INP OUTEST=SIO_LR_VAR NOPRINT;
      BY MATERIAL_NR_N SERIAL_id CYCLE_COUNTER;
      MODEL SIO_MA_5 = DAYS_ELAPSED;
    RUN;

    DATA LR_ADD;
      SET SIO_LR_VAR;
      BY MATERIAL_NR_N SERIAL_id CYCLE_COUNTER;
      DAYS=&k;
    RUN;

    PROC APPEND BASE=WORK.LR_PARTS DATA=LR_ADD;
    RUN;

    PROC SORT DATA=LR_PARTS;
      BY MATERIAL_NR_N SERIAL_id CYCLE_COUNTER;
    RUN;

    PROC SQL;
      CREATE TABLE WORK.LR_JOIN AS
      SELECT A.*,B.INTERCEPT,B.DAYS_ELAPSED AS LR_SLOPE
      FROM &TABLE A LEFT JOIN LR_PARTS B
      ON A.SERIAL_id=B.SERIALid AND A.CYCLE_COUNTER=B.CYCLE_COUNTER AND A.DAYS_ELAPSED=B.DAYS
      ORDER BY SERIAL_id,CYCLE_COUNTER,DATE_N;
    QUIT;
  %END;
%END;

 

 

with this:

 

 

%if &total_days. > 30 %then %do; 
  data
    %do k=7 %to %total_days. %by 7;
      lr_inp&k  (where=(days_elapsed<&k)) 
    %end; ;
    set &table;
    by material_nr_n serial_id cycle_counter;
    where serial_id=%scan(&serial_array.,&i)
    and cycle_counter = %scan(&cycle_counter_array.,&j.) 
    and time > 5 ;
  run;

  data vtemp / view=vtemp;
    set
    %do k=7 %to %total_days. %by 7;
     lr_inp&k (in=in&k)
    %end; ;
    %do k=7 %to %total_days. %by 7;
      if in&k then days=&k; else
    %end; ;
  run;

  proc reg data=vtemp  outest=lr_add NOPRINT;
    by days material_nr_n serial_id cycle_counter;
    model sio_ma_5 = days_elapsed;
  run;

  proc append base=work.lr_parts data=lr_add;
  run;

  proc sort data=lr_parts;
    by material_nr_n serial_id cycle_counter days;
  run;

  proc sql;
    create table work.lr_join as
    select a.*,b.intercept,b.days_elapsed as lr_slope
    from &table a left join lr_parts b
    on a.serial_id=b.serialid and a.cycle_counter=b.cycle_counter and a.days_elapsed=b.days
    order by serial_id,cycle_counter,date_n;
  quit;
%end;

 

Note I have a number of loop statements
      %do k=7 %to %total_days. %by 7;
But they are not looping over proc steps and data steps.  Instead they just do things like

 

replace DATA LR_INP   
with   data lr_inp7 (where=(days_elapsed<7))   lr_inp14 (where=days_elapsed<14)) ...

 

So the loop produces much more efficient code.

 

You can also replace

PROC SQL NOPRINT;
SELECT DISTINCT SERIAL_id
INTO :SERIAL_ARRAY SEPARATED BY ' '
FROM &TABLE
QUIT;

PROC SQL NOPRINT;
SELECT COUNT (DISTINCT SERIAL_id) AS TOT_DISTINCT_SERIAL
INTO :TOT_DISTINCT_SERIAL_1
FROM &TABLE
QUIT;

with  (no need to run the second proc sql, just use the automatic variable &sqlobs to get the count of rows generated.

PROC SQL NOPRINT;
SELECT DISTINCT SERIAL_id
INTO :SERIAL_ARRAY SEPARATED BY ' '
FROM &TABLE
QUIT;
%let tot_distinct_serial=&sqlobs;

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
PaigeMiller
Diamond | Level 26

INTO Smiley FrustratedERIAL_ARRAY SEPARATED BY ' '

 

Please, click on the running man icon and paste your code into the box that appears. Not only will this format the code properly, but it will avoid these unintended smiley faces.

 

If it is taking a "lot of time", can you give us some idea of what a "lot of time" means? Is it 10 minutes? 48.3 minutes?

 

You are performing lots and lots of iterations, can you tell us how many iterations the code goes through?

--
Paige Miller

sas-innovate-white.png

Register Today!

Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9.

 

Early bird rate extended! Save $200 when you sign up by March 31.

Register now!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 3 replies
  • 613 views
  • 1 like
  • 4 in conversation