There is a join that runs out of space by inserting many variables. So I tried to go step by step to see where the problem began and I noticed that by selecting up to 17 variables there are no problems, but by entering the 18th variable the processing time goes from 3'50" to 32'57" and this seems inexplicable to me,
because in this case the total memory of the final table does not increase much, ie it goes from 0.58Gb to 0.59Gb.
So I can't figure out what could be happening.
I report the code of the join and the 2 resulting logs respectively with 17 variables in the select and 18
proc sql;
create table TAB1 (COMPRESS = YES) as
select TAB_BASE.ANNO_EFFETTO length = 10
format = $10.
informat = $10.,
TAB_BASE.cod_edwh length = 35
format = $35.
informat = $35.
label = 'COD_EDWH',
TAB_BASE.CODICE_BANCA length = 8,
TAB_BASE.CONVENZIONE_ULTIMO_MOV length = 3
format = $3.
informat = $3.
label = 'CONVENZIONE_ULTIMO_MOV',
TAB_BASE.CONVENZLIGHT length = 37
format = $255.
informat = $255.,
CAL.DATE_ID as DATA_REGOLARIZZAZIONE_ID length = 8
format = 11.
informat = 11.
label = 'DATE_ID',
CAL2.DATE_ID as DATA_RINNOVO_ID length = 8
format = 11.
informat = 11.
label = 'DATE_ID',
CAL3.DATE_ID as DATAEFFMOV_ID length = 8
format = 11.
informat = 11.
label = 'DATE_ID',
CAL4.DATE_ID as DATAEMISSMOV_ID length = 8
format = 11.
informat = 11.
label = 'DATE_ID',
CAL5.DATE_ID as DATAPRIMAEMISSIONE_ID length = 8
format = 11.
informat = 11.
label = 'DATE_ID',
CAL6.DATE_ID as DATASCADENZA_ID length = 8
format = 11.
informat = 11.
label = 'DATE_ID',
CAL7.DATE_ID as DATAULTIMOMOVIMENTO_ID length = 8
format = 11.
informat = 11.
label = 'DATE_ID',
CAL8.DATE_ID as DT_FOTOGRAFIA_ID length = 8
format = 11.
informat = 11.
label = 'DATE_ID',
TAB_BASE.GARANZIA length = 1
format = $1.
informat = $1.,
TAB_BASE.LINEA length = 16
format = $16.
informat = $16.,
TAB_BASE.MENSILIZZAZIONE length = 10
format = $10.
informat = $10.,
TAB_BASE.MESE_EFFETTO length = 10
format = $10.
informat = $10.,
/*-----Adding variables:-----*/
TAB_BASE.POLIZZA length = 10
format = $10.
informat = $10.
from DSS_DN.TAB_BASE as TAB_BASE
left join EDWHDSST.CAL as CAL
on ( TAB_BASE.DATA_REGOLARIZZAZIONE = CAL.DATE_DES)
left join EDWHDSST.CAL as CAL2
on (TAB_BASE.DATA_RINNOVO = CAL2.DATE_DES)
left join EDWHDSST.CAL as CAL3
on (TAB_BASE.DATAEFFMOV = CAL3.DATE_DES )
left join EDWHDSST.CAL as CAL4
on (TAB_BASE.DATAEMISSMOV = CAL4.DATE_DES)
left join EDWHDSST.CAL as CAL5
on (TAB_BASE.DATAPRIMAEMISSIONE = CAL5.DATE_DES)
left join EDWHDSST.CAL as CAL6
on (TAB_BASE.DATASCADENZA = CAL6.DATE_DES)
left join EDWHDSST.CAL as CAL7
on (TAB_BASE.DATAULTIMOMOVIMENTO = CAL7.DATE_DES)
left join EDWHDSST.CAL as CAL8
on (TAB_BASE.DT_FOTOGRAFIA = CAL8.DATE_DES)
;
quit;
Result LOG with 17 variables:
NOTE: Compressing data set WORK.TAB1 decreased size by 58.80 percent.
Compressed is 9520 pages; un-compressed would require 23104 pages.
NOTE: Table WORK.TAB1 created, with 7554789 rows and 17 columns.
NOTE: PROCEDURE SQL ha utilizzato (tempo totale di elaborazione):
real time 3:50.02
user cpu time 1:47.79
system cpu time 2:16.57
memory 11239934.57k
OS Memory 11257540.00k
Timestamp 22/06/2023 03:15:48 p.
Step Count 3 Switch Count 28
#------------------------------------------------------------------------------------------------------------------#
Result LOG with 18 variables:
NOTE: Compressing data set WORK.TAB1 decreased size by 59.52 percent.
Compressed is 9739 pages; un-compressed would require 24060 pages.
NOTE: Table WORK.TAB1 created, with 7554789 rows and 18 columns.
NOTE: PROCEDURE SQL ha utilizzato (tempo totale di elaborazione):
real time 32:57.52
user cpu time 3:45.15
system cpu time 6:08.00
memory 26985262.70k
OS Memory 27004260.00k
Timestamp 23/06/2023 09:06:10 m.
Step Count 5 Switch Count 53
You probably crossed over the point where it could do it in memory or without making extra utility files. That would explain the big change in run times.
What the heck is the code trying to do? If kind of looks like you are converting descriptions into ids. Encoding/Decoding data is what FORMATS and INFORMATS are for.
Looks like you want to create numeric ids, so you probably need an INFORMAT.
So assuming that DATE_DES is character you could do that easily enough using a CNTLIN dataset to PROC FORMAT.
data formats;
set EDWHDSST.CAL;
fmtname='CAL_ID';
type='I';
start=date_des;
label=date_id;
keep fmtname type start label;
run;
proc format cntlin=formats;
run;
Then you can use the INFORMAT to convert the descriptions into numbers.
data TAB1 (COMPRESS = YES);
set DSS_DN.TAB_BASE;
DATA_REGOLARIZZAZIONE_ID = input(DATA_REGOLARIZZAZIONE,CAL_ID.);
DATA_RINNOVO_ID = input(DATA_RINNOVO,cal_id.);
...
run;
If DATE_DES is numeric then you will need to create a FORMAT (type='N') instead and then use the PUT() function to convert the number into a string and if you want the result to also be a number add an INPUT() function to convert the string back into a number.
DATA_REGOLARIZZAZIONE_ID = input(put(DATA_REGOLARIZZAZIONE,CAL_ID.),32.);
This looks like a multiple lookup into one table, based on dates.
Don't use SQL for this. Load the edwhdsst.cal table into a hash object in a DATA step, and retrieve the date_id from there with multiple FIND method calls.
SQL will do a sort for every key to do the join, and at one point Maxim 10 kicks in.
Something like this:
data tab1 (compress=yes);
set DSS_DN.TAB_BAS;
if _n_ = 1
then do;
length date_des 8 date_id 8;
declare hash cal (dataset:"EDWHDSST.CAL");
cal.definekey ("date_des");
cal.definedata("date_id");
cal.definedone();
call missing(date_des,date_id);
end;
rc = cal.find(key:DATA_REGOLARIZZAZIONE);
DATA_REGOLARIZZAZIONE_ID = date_id;
/* and so on for the other dates */
drop date_des date_id;
run;
I don't quite understand how to bind the keys to the following variables:
rc = cal.find(key:DATA_REGOLARIZZAZIONE);
DATA_REGOLARIZZAZIONE_ID = date_id;
rc = cal.find(key:DATA_RINNOVO_ID);
DATA_RINNOVO_ID = date_id;
/* and so on for the other dates */
...
and so on?
You probably crossed over the point where it could do it in memory or without making extra utility files. That would explain the big change in run times.
What the heck is the code trying to do? If kind of looks like you are converting descriptions into ids. Encoding/Decoding data is what FORMATS and INFORMATS are for.
Looks like you want to create numeric ids, so you probably need an INFORMAT.
So assuming that DATE_DES is character you could do that easily enough using a CNTLIN dataset to PROC FORMAT.
data formats;
set EDWHDSST.CAL;
fmtname='CAL_ID';
type='I';
start=date_des;
label=date_id;
keep fmtname type start label;
run;
proc format cntlin=formats;
run;
Then you can use the INFORMAT to convert the descriptions into numbers.
data TAB1 (COMPRESS = YES);
set DSS_DN.TAB_BASE;
DATA_REGOLARIZZAZIONE_ID = input(DATA_REGOLARIZZAZIONE,CAL_ID.);
DATA_RINNOVO_ID = input(DATA_RINNOVO,cal_id.);
...
run;
If DATE_DES is numeric then you will need to create a FORMAT (type='N') instead and then use the PUT() function to convert the number into a string and if you want the result to also be a number add an INPUT() function to convert the string back into a number.
DATA_REGOLARIZZAZIONE_ID = input(put(DATA_REGOLARIZZAZIONE,CAL_ID.),32.);
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.