BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.

 There is a join that runs out of space by inserting many variables. So I tried to go step by step to see where the problem began and I noticed that by selecting up to 17 variables there are no problems, but by entering the 18th variable the processing time goes from 3'50" to 32'57" and this seems inexplicable to me,
because in this case the total memory of the final table does not increase much, ie it goes from 0.58Gb to 0.59Gb.
So I can't figure out what could be happening.
I report the code of the join and the 2 resulting logs respectively with 17 variables in the select and 18

 

proc sql;
create table TAB1 (COMPRESS = YES) as
select TAB_BASE.ANNO_EFFETTO length = 10   
         format = $10.
         informat = $10.,
      TAB_BASE.cod_edwh length = 35   
         format = $35.
         informat = $35.
         label = 'COD_EDWH',
      TAB_BASE.CODICE_BANCA length = 8,
      TAB_BASE.CONVENZIONE_ULTIMO_MOV length = 3  
         format = $3.
         informat = $3.
         label = 'CONVENZIONE_ULTIMO_MOV',
      TAB_BASE.CONVENZLIGHT length = 37   
         format = $255.
         informat = $255.,
      CAL.DATE_ID as DATA_REGOLARIZZAZIONE_ID length = 8   
         format = 11.
         informat = 11.
         label = 'DATE_ID',
      CAL2.DATE_ID as DATA_RINNOVO_ID length = 8   
         format = 11.
         informat = 11.
         label = 'DATE_ID',
      CAL3.DATE_ID as DATAEFFMOV_ID length = 8   
         format = 11.
         informat = 11.
         label = 'DATE_ID',
      CAL4.DATE_ID as DATAEMISSMOV_ID length = 8   
         format = 11.
         informat = 11.
         label = 'DATE_ID',
      CAL5.DATE_ID as DATAPRIMAEMISSIONE_ID length = 8   
         format = 11.
         informat = 11.
         label = 'DATE_ID',
      CAL6.DATE_ID as DATASCADENZA_ID length = 8   
         format = 11.
         informat = 11.
         label = 'DATE_ID',
      CAL7.DATE_ID as DATAULTIMOMOVIMENTO_ID length = 8   
         format = 11.
         informat = 11.
         label = 'DATE_ID',
      CAL8.DATE_ID as DT_FOTOGRAFIA_ID length = 8   
         format = 11.
         informat = 11.
         label = 'DATE_ID',
      TAB_BASE.GARANZIA length = 1  
         format = $1.
         informat = $1.,
      TAB_BASE.LINEA length = 16  
         format = $16.
         informat = $16.,
      TAB_BASE.MENSILIZZAZIONE length = 10   
         format = $10.
         informat = $10.,
      TAB_BASE.MESE_EFFETTO length = 10   
         format = $10.
         informat = $10.,

		 /*-----Adding variables:-----*/
	TAB_BASE.POLIZZA length = 10    
         format = $10.
         informat = $10.

from DSS_DN.TAB_BASE as TAB_BASE 
left join EDWHDSST.CAL as CAL
         on  ( TAB_BASE.DATA_REGOLARIZZAZIONE = CAL.DATE_DES) 
left join EDWHDSST.CAL as CAL2
         on (TAB_BASE.DATA_RINNOVO = CAL2.DATE_DES) 
left join EDWHDSST.CAL as CAL3
         on (TAB_BASE.DATAEFFMOV = CAL3.DATE_DES ) 
left join EDWHDSST.CAL as CAL4
         on (TAB_BASE.DATAEMISSMOV = CAL4.DATE_DES)
left join EDWHDSST.CAL as CAL5
         on (TAB_BASE.DATAPRIMAEMISSIONE = CAL5.DATE_DES)
left join EDWHDSST.CAL as CAL6
         on (TAB_BASE.DATASCADENZA = CAL6.DATE_DES) 
left join  EDWHDSST.CAL as CAL7
         on (TAB_BASE.DATAULTIMOMOVIMENTO = CAL7.DATE_DES) 
left join EDWHDSST.CAL as CAL8
         on (TAB_BASE.DT_FOTOGRAFIA = CAL8.DATE_DES)
   ;
quit;

 

Result LOG with 17 variables:

 

NOTE: Compressing data set WORK.TAB1 decreased size by 58.80 percent.
Compressed is 9520 pages; un-compressed would require 23104 pages.
NOTE: Table WORK.TAB1 created, with 7554789 rows and 17 columns.
NOTE: PROCEDURE SQL ha utilizzato (tempo totale di elaborazione):

real time 3:50.02
user cpu time 1:47.79
system cpu time 2:16.57
memory 11239934.57k
OS Memory 11257540.00k
Timestamp 22/06/2023 03:15:48 p.
Step Count 3 Switch Count 28


#------------------------------------------------------------------------------------------------------------------#

 

Result LOG with 18 variables:

 

NOTE: Compressing data set WORK.TAB1 decreased size by 59.52 percent.
Compressed is 9739 pages; un-compressed would require 24060 pages.
NOTE: Table WORK.TAB1 created, with 7554789 rows and 18 columns.
NOTE: PROCEDURE SQL ha utilizzato (tempo totale di elaborazione):
real time 32:57.52
user cpu time 3:45.15
system cpu time 6:08.00
memory 26985262.70k
OS Memory 27004260.00k
Timestamp 23/06/2023 09:06:10 m.
Step Count 5 Switch Count 53

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

You probably crossed over the point where it could do it in memory or without making extra utility files.  That would explain the big change in run times.

 

What the heck is the code trying to do?  If kind of looks like you are converting descriptions into ids.  Encoding/Decoding data is what FORMATS and INFORMATS are for.

 

Looks like you want to create numeric ids, so you probably need an INFORMAT.

 

So assuming that DATE_DES is character you could do that easily enough using a CNTLIN dataset to PROC FORMAT.

data formats;
  set EDWHDSST.CAL;
  fmtname='CAL_ID';
  type='I';
  start=date_des;
  label=date_id;
  keep fmtname type start label;
run;

proc format cntlin=formats;
run;

Then you can use the INFORMAT to convert the descriptions into numbers.

data TAB1 (COMPRESS = YES);
  set DSS_DN.TAB_BASE;
  DATA_REGOLARIZZAZIONE_ID = input(DATA_REGOLARIZZAZIONE,CAL_ID.);
  DATA_RINNOVO_ID = input(DATA_RINNOVO,cal_id.);
  ...
run;

If DATE_DES is numeric then you will need to create a FORMAT (type='N') instead and then use the PUT() function to convert the number into a string and if you want the result to also be a number add an INPUT() function to convert the string back into a number.

  DATA_REGOLARIZZAZIONE_ID = input(put(DATA_REGOLARIZZAZIONE,CAL_ID.),32.);

 

View solution in original post

5 REPLIES 5
Kurt_Bremser
Super User

This looks like a multiple lookup into one table, based on dates.

Don't use SQL for this. Load the edwhdsst.cal table into a hash object in a DATA step, and retrieve the date_id from there with multiple FIND method calls.

 

SQL will do a sort for every key to do the join, and at one point Maxim 10 kicks in.

Kurt_Bremser
Super User

Something like this:

 

data tab1 (compress=yes);
set DSS_DN.TAB_BAS;
if _n_ = 1
then do;
  length date_des 8 date_id 8;
  declare hash cal (dataset:"EDWHDSST.CAL");
  cal.definekey ("date_des");
  cal.definedata("date_id");
  cal.definedone();
  call missing(date_des,date_id);
end;
rc = cal.find(key:DATA_REGOLARIZZAZIONE);
DATA_REGOLARIZZAZIONE_ID = date_id;
/* and so on for the other dates */
drop date_des date_id;
run;

 

 

mariopellegrini
Pyrite | Level 9

I don't quite understand how to bind the keys to the following variables:

 

rc = cal.find(key:DATA_REGOLARIZZAZIONE);
DATA_REGOLARIZZAZIONE_ID = date_id;


rc = cal.find(key:DATA_RINNOVO_ID);
DATA_RINNOVO_ID = date_id;

/* and so on for the other dates */

 

...

 

and so on?

Tom
Super User Tom
Super User

You probably crossed over the point where it could do it in memory or without making extra utility files.  That would explain the big change in run times.

 

What the heck is the code trying to do?  If kind of looks like you are converting descriptions into ids.  Encoding/Decoding data is what FORMATS and INFORMATS are for.

 

Looks like you want to create numeric ids, so you probably need an INFORMAT.

 

So assuming that DATE_DES is character you could do that easily enough using a CNTLIN dataset to PROC FORMAT.

data formats;
  set EDWHDSST.CAL;
  fmtname='CAL_ID';
  type='I';
  start=date_des;
  label=date_id;
  keep fmtname type start label;
run;

proc format cntlin=formats;
run;

Then you can use the INFORMAT to convert the descriptions into numbers.

data TAB1 (COMPRESS = YES);
  set DSS_DN.TAB_BASE;
  DATA_REGOLARIZZAZIONE_ID = input(DATA_REGOLARIZZAZIONE,CAL_ID.);
  DATA_RINNOVO_ID = input(DATA_RINNOVO,cal_id.);
  ...
run;

If DATE_DES is numeric then you will need to create a FORMAT (type='N') instead and then use the PUT() function to convert the number into a string and if you want the result to also be a number add an INPUT() function to convert the string back into a number.

  DATA_REGOLARIZZAZIONE_ID = input(put(DATA_REGOLARIZZAZIONE,CAL_ID.),32.);

 

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 353 views
  • 3 likes
  • 3 in conversation