Solved: Re: Why Is INTCK Slower Than INTNX in SQL?

Junyong · Posted 09-06-2020 02:55 AM

I have the following panel data of 500 individuals for 60 months.

data have;
do i=1 to 500;
do j="1jan2016"d to "31dec2020"d;
x=rannor(1);
if j=intnx("month",j,0,"end") then output;
end;
end;
run;

And suppose I want to add lag(x) with SQL as follows.

proc sql;
create table want as
select a.*,b.x as x1
from have a join have b on a.i=b.i and intnx("month",a.j,0)=intnx("month",b.j,1);
quit;

And I found that altering intnx("month",a.j,0)=intnx("month",b.j,1) by intck("month",a.j,b.j)=-1 makes the proc sql slower—why is this the case?

FreelanceReinh · Posted 09-06-2020 05:40 AM

Hi @Junyong,

You can see the difference that @Kurt_Bremser has pointed out by modifying the queries as shown in the log below:

128  data have2;
129  do j="1jan2016"d to "31dec2020"d;
130    x=rannor(1);
131    output;
132  end;
133  run;

NOTE: The data set WORK.HAVE2 has 1827 observations and 2 variables.
NOTE: DATA statement used (Total process time):
      real time           0.07 seconds
      cpu time            0.07 seconds


134
135  proc sql;
136  create table want2 as
137  select a.*, b.x as x1
138  from have2 a join have2 b on intnx("month",a.j,0)=intnx("month",b.j,1);
NOTE: Table WORK.WANT2 created, with 54653 rows and 3 columns.

139  quit;
NOTE: PROCEDURE SQL used (Total process time):
      real time           0.06 seconds
      cpu time            0.06 seconds


140
141  proc sql;
142  create table want2_ as
143  select a.*, b.x as x1
144  from have2 a join have2 b on intck("month",a.j,b.j)=-1;
NOTE: The execution of this query involves performing one or more Cartesian product joins that can not be optimized.
NOTE: Table WORK.WANT2_ created, with 54653 rows and 3 columns.

145  quit;
NOTE: PROCEDURE SQL used (Total process time):
      real time           0.59 seconds
      cpu time            0.59 seconds

View solution in original post

Kurt_Bremser · Posted 09-06-2020 05:29 AM

I guess it is because the INTNX functions are applied once for every observation in the source table, but for the INTCK function, SQL needs to do a cartesian join and then apply the function to every result of that cartesian join.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

FreelanceReinh · Posted 09-06-2020 05:40 AM

Hi @Junyong,

You can see the difference that @Kurt_Bremser has pointed out by modifying the queries as shown in the log below:

128  data have2;
129  do j="1jan2016"d to "31dec2020"d;
130    x=rannor(1);
131    output;
132  end;
133  run;

NOTE: The data set WORK.HAVE2 has 1827 observations and 2 variables.
NOTE: DATA statement used (Total process time):
      real time           0.07 seconds
      cpu time            0.07 seconds


134
135  proc sql;
136  create table want2 as
137  select a.*, b.x as x1
138  from have2 a join have2 b on intnx("month",a.j,0)=intnx("month",b.j,1);
NOTE: Table WORK.WANT2 created, with 54653 rows and 3 columns.

139  quit;
NOTE: PROCEDURE SQL used (Total process time):
      real time           0.06 seconds
      cpu time            0.06 seconds


140
141  proc sql;
142  create table want2_ as
143  select a.*, b.x as x1
144  from have2 a join have2 b on intck("month",a.j,b.j)=-1;
NOTE: The execution of this query involves performing one or more Cartesian product joins that can not be optimized.
NOTE: Table WORK.WANT2_ created, with 54653 rows and 3 columns.

145  quit;
NOTE: PROCEDURE SQL used (Total process time):
      real time           0.59 seconds
      cpu time            0.59 seconds

Kurt_Bremser · Posted 09-06-2020 05:46 AM

PS the fastest method to get your result is, of course, a DATA step:

proc sql stimer;

create table want1 as
select a.*,b.x as x1
from have a , have b where a.i=b.i and intnx("month",a.j,0)=intnx("month",b.j,1);

create table want2 as
select a.*,b.x as x1
from have a, have b where a.i=b.i and intck("month",a.j,b.j)=-1;

quit;

data want3;
set have;
by i;
x1 = lag(x);
if not first.i;
run;

Log:

 84         proc sql stimer;
 NOTE:  Verwendet wurde: SQL Statement - (Gesamtverarbeitungszeit):
       real time           0.00 seconds
       cpu time            0.00 seconds
       
 85         
 86         create table want1 as
 87         select a.*,b.x as x1
 88         from have a , have b where a.i=b.i and intnx("month",a.j,0)=intnx("month",b.j,1);
 NOTE: Table WORK.WANT1 created, with 29500 rows and 4 columns.
 
 NOTE:  Verwendet wurde: SQL Statement - (Gesamtverarbeitungszeit):
       real time           0.04 seconds
       cpu time            0.04 seconds
       
 89         
 90         create table want2 as
 91         select a.*,b.x as x1
 92         from have a, have b where a.i=b.i and intck("month",a.j,b.j)=-1;
 NOTE: Table WORK.WANT2 created, with 29500 rows and 4 columns.
 
 NOTE:  Verwendet wurde: SQL Statement - (Gesamtverarbeitungszeit):
       real time           0.46 seconds
       cpu time            0.45 seconds
       
 93         
 94         quit;
 NOTE:  Verwendet wurde: PROZEDUR SQL - (Gesamtverarbeitungszeit):
       real time           0.00 seconds
       cpu time            0.01 seconds
       
 
 95         
 96         data want3;
 97         set have;
 98         by i;
 99         x1 = lag(x);
 100        if not first.i;
 101        run;
 
 NOTE: There were 30000 observations read from the data set WORK.HAVE.
 NOTE: The data set WORK.WANT3 has 29500 observations and 4 variables.
 NOTE:  Verwendet wurde: DATA statement - (Gesamtverarbeitungszeit):
       real time           0.01 seconds
       cpu time            0.01 seconds

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

Junyong · Posted 09-06-2020 02:41 PM

My real data rather than this example are not monthly continuous with some skipped months, which is why I am avoiding data, but thanks a lot for your clarification—I think I need to be careful in using intck next time.

Kurt_Bremser · Posted 09-06-2020 02:49 PM

Even with skipped months, a data step is the right tool for lagged values. To see if a lagged value can be used, you just need to look at the lagged date to see if it is the previous month or older.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

Junyong · Posted 09-06-2020 03:32 PM

I think I need to clarify in detail. In my case,

data have;
input firm date yymmdd8. assets;
cards;
1 20200131 110
1 20200331 120
1 20200430 130
2 20200131 210
2 20200229 220
2 20200331 230
;

If I define assets1 as the assets a month behind, then the observation for firm 1 on March 31 would be incorrect (110 is the value from January) with data as follows.

data no;
set have;
assets1=lag(assets);
run;

sql will have no such issue.

proc sql;
create table yes as
select a.*,b.assets as assets1
from have a join have b on a.firm=b.firm and a.date=intnx("month",b.date,1,"end")
order by firm,date;
quit;

Though this does not produce any value for firm 1 on February 29, I needed this outcome—which is why I am using sql over data currently.

+ I think the following is possible with data instead.

data yes;
set have;
assets1=lag(assets);
if firm>lag(firm) or date>intnx("month",lag(date),1,"end") then assets1=.;
run;

Thanks.

Kurt_Bremser · Posted 09-06-2020 07:22 AM

PPS you can create your have dataset much faster by avoiding unnecessary looping:

data have;
do i = 1 to 500;
  j = "31jan2016"d;
  do while (j le "31dec2020"d);
    output;
    j = intnx("month",j,1,"e");
  end;
end;
run;

Look at this log:

 73         data have1;
 74         do i = 1 to 500;
 75           do j = "1jan2016"d to "31dec2020"d;
 76             x = rannor(1);
 77             if j = intnx("month",j,0,"end") then output;
 78           end;
 79         end;
 80         run;
 
 NOTE: The data set WORK.HAVE1 has 30000 observations and 3 variables.
 NOTE:  Verwendet wurde: DATA statement - (Gesamtverarbeitungszeit):
       real time           0.43 seconds
       cpu time            0.43 seconds
       
 
 81         
 82         data have2;
 83         do i = 1 to 500;
 84           j = "31jan2016"d;
 85           do while (j le "31dec2020"d);
 86             output;
 87             j = intnx("month",j,1,"e");
 88           end;
 89         end;
 90         run;
 
 NOTE: The data set WORK.HAVE2 has 30000 observations and 2 variables.
 NOTE:  Verwendet wurde: DATA statement - (Gesamtverarbeitungszeit):
       real time           0.02 seconds
       cpu time            0.01 seconds

It creates the same amount of observations 20 times as fast.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

RichardDeVen · Posted 09-06-2020 08:14 AM

Use the undocumented _TREE option and you will see @Kurt_Bremser supposition is correct.

For

intnx("month",a.j,0)=intnx("month",b.j,1)

INTNX is called for rows selected and results are cross joined.

For

intck("month",a.j,b.j)=-1

Rows are cross joined and INTCK is called for row in cross join and result applied to evaluation criteria

The trees:

203  proc sql _tree;
204  create table want as select a.*,b.x as x1
205  from have a join have b on a.i = b.i and
206       intnx("month",a.j,0)=intnx("month",b.j,1) ;

Tree as planned.
                               /-SYM-V-(a.i:1 flag=00000001)
                     /-OBJ----|
                    |         |--SYM-V-(a.j:2 flag=00000001)
                    |         |--SYM-V-(a.x:3 flag=00000001)
                    |          \-SYM-M-(x1:1 flag=00000005)
           /-JOIN---|
          |         |                              /-SYM-V-(a.i:1 flag=00000001)
          |         |                    /-OBJ----|
          |         |                   |         |--SYM-V-(a.j:2 flag=00000001)
          |         |                   |         |--SYM-V-(a.x:3 flag=00000001)
          |         |                   |          \-SYM-A-(#TEMJ001:2 flag=00000004)
          |         |          /-FIL----|
          |         |         |         |                    /-SYM-V-(a.i:1 flag=00000001)
          |         |         |         |          /-OBJ----|
          |         |         |         |         |         |--SYM-V-(a.j:2 flag=00000001)
          |         |         |         |         |          \-SYM-V-(a.x:3 flag=00000001)
          |         |         |         |--SRC----|
          |         |         |         |          \-TABL[WORK].have opt=''
          |         |         |         |--empty-
          |         |         |         |--empty-
          |         |         |         |--empty-
          |         |         |         |--empty-
          |         |         |         |                    /-SYM-A-(#TEMJ001:2 flag=00000004)
          |         |         |         |          /-ASGN---|
          |         |         |         |         |         |          /-SYM-F-(INTNX:1)
          |         |         |         |         |          \-FLST---|
          |         |         |         |         |                   |--LITC('month')
          |         |         |         |         |                   |--SYM-V-(a.j:2)
          |         |         |         |         |                    \-LITN(0)
          |         |         |          \-OBJE---|
          |         |--FROM---|
          |         |         |                    /-SYM-M-(x1:1 flag=00000001)
          |         |         |          /-OBJ----|
          |         |         |         |         |--SYM-V-(b.i:1 flag=00000001)
          |         |         |         |         |--SYM-V-(b.j:2 flag=00000001)
          |         |         |         |          \-SYM-A-(#TEMJ002:3 flag=00000004)
          |         |          \-FIL----|
          |         |                   |                    /-SYM-M-(x1:1 flag=00000001)
          |         |                   |          /-OBJ----|
          |         |                   |         |         |--SYM-V-(b.i:1 flag=00000001)
          |         |                   |         |          \-SYM-V-(b.j:2 flag=00000001)
          |         |                   |--SRC----|
          |         |                   |          \-TABL[WORK].have opt=''
          |         |                   |--empty-
          |         |                   |--empty-
          |         |                   |--empty-
          |         |                   |--empty-
          |         |                   |                    /-SYM-A-(#TEMJ002:3 flag=00000004)
          |         |                   |          /-ASGN---|
          |         |                   |         |         |          /-SYM-F-(INTNX:2)
          |         |                   |         |          \-FLST---|
          |         |                   |         |                   |--LITC('month')
          |         |                   |         |                   |--SYM-V-(b.j:2)
          |         |                   |         |                    \-LITN(1)
          |         |                    \-OBJE---|
          |         |--empty-
          |         |                    /-SYM-V-(a.i:1)
          |         |          /-CEQ----|
          |         |         |          \-SYM-V-(b.i:1)
          |          \-LAND---|
          |                   |          /-SYM-A-(#TEMJ001:2)
          |                    \-CEQ----|
          |                              \-SYM-A-(#TEMJ002:3)
 --SSEL---|


NOTE: Table WORK.WANT created, with 29500 rows and 4 columns.

207
208  create table want as select a.*,b.x as x1
209  from have a join have b on a.i = b.i and
210       intck("month",a.j,b.j)=-1 ;

Tree as planned.
                               /-SYM-V-(a.i:1 flag=00000001)
                     /-OBJ----|
                    |         |--SYM-V-(a.j:2 flag=00000001)
                    |         |--SYM-V-(a.x:3 flag=00000001)
                    |          \-SYM-M-(x1:1 flag=00000005)
           /-JOIN---|
          |         |                              /-SYM-V-(a.i:1 flag=00000001)
          |         |                    /-OBJ----|
          |         |                   |         |--SYM-V-(a.j:2 flag=00000001)
          |         |                   |          \-SYM-V-(a.x:3 flag=00000001)
          |         |          /-SRC----|
          |         |         |          \-TABL[WORK].have opt=''
          |         |--FROM---|
          |         |         |                    /-SYM-M-(x1:1 flag=00000001)
          |         |         |          /-OBJ----|
          |         |         |         |         |--SYM-V-(b.i:1 flag=00000001)
          |         |         |         |          \-SYM-V-(b.j:2 flag=00000001)
          |         |          \-SRC----|
          |         |                    \-TABL[WORK].have opt=''
          |         |                    /-SYM-F-(INTCK:1)
          |         |          /-FLST---|
          |         |         |         |--LITC('month')
          |         |         |         |--SYM-V-(a.j:2)
          |         |         |          \-SYM-V-(b.j:2)
          |         |--CEQ----|
          |         |          \-LITN(-1)
          |         |          /-SYM-V-(a.i:1)
          |          \-CEQ----|
          |                    \-SYM-V-(b.i:1)
 --SSEL---|


NOTE: Table WORK.WANT created, with 29500 rows and 4 columns.

211  quit;
NOTE: PROCEDURE SQL used (Total process time):
      real time           0.47 seconds
      cpu time            0.46 seconds

Junyong · Posted 09-06-2020 03:39 PM

intnx subsets and then joins, while intck joins and then subsets, which is why intnx was faster than intck—thanks for this clarification. Probably functions requiring multiple variables from different data sets cause bottlenecks.

The 2025 SAS Hackathon has begun!

SAS Training: Just a Click Away