Solved: defineKey in hash object join

mariopellegrini · Posted 04-26-2023 02:00 AM

I should transform a proc sql join into a hash, but I'm having trouble because I don't know how to define the "DefineKey" since there are various operations, not a single key to define.
this is the example:

data RUOLO_DANNI;
input cod_source:$2. cod_anagrafico:$9. ts_inizio_validita:datetime20.  ts_fine_validita:datetime20.;
datalines;
1 A 02JAN2015:00:00:00 01JAN5999:00:00:00
5 B 17MAY2017:00:00:00 01JAN5999:00:00:00
5 C 09JAN2022:00:00:00 01JAN5999:00:00:00
;



data T_A_SOGGETTI_DN;
input cod_source:$2. cod_anagrafico:$9. cod_anagrafico_edwh:$20. ts_inizio_validita:datetime20.  ts_fine_validita:datetime20.;
datalines;
1 A AAA 02JAN2015:00:00:00 01JAN5999:00:00:00
;


   proc sql noprint;
   create table ruolo_danni_NEW
   as select RU.*
			,SO.cod_anagrafico_EDWH
  from      RUOLO_DANNI RU

   left join   T_A_SOGGETTI_DN SO
   on        
             RU.cod_anagrafico = SO.cod_anagrafico
             and ru.ts_inizio_validita between so.ts_inizio_validita and so.ts_fine_validita
             and compress(SO.cod_source)='1';
quit;

Kurt_Bremser · Posted 04-26-2023 05:51 AM

Then it should be

data ruolo_danni_NEW;
set ruolo_danni;
if _n_ = 1
then do;
  length ts_inizio_validita ts_fine_validita 8;
  declare hash so (dataset:"ta_soggetti_dn (where=(compress(cod_source)='1'))");
  so.definekey("cod_anagrafico");
  so.definedata("ts_inizio_validita","ts_fine_validita");
  so.definedone();
  call missing(ts_inizio_validita,ts_fine_validita);
end;
rc = so.find();
if rc = 0 and ts_inizio_validita le  ts_inizio_validita le ts_fine_validita;
drop rc ts_inizio_validita ts_fine_validita;
run;

Spoiler

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

View solution in original post

Kurt_Bremser · Posted 04-26-2023 02:07 AM

Is cod_anagrafico unique in dataset in t_a_soggetti_dn?

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

mariopellegrini · Posted 04-26-2023 02:13 AM

no because there is also cod_source

Kurt_Bremser · Posted 04-26-2023 02:19 AM

But you don't use that in your ON condition in the SQL? This would mean that you may get multiple observations in your SQL for a single obs in the "left" dataset.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

mariopellegrini · Posted 04-26-2023 03:27 AM

no because there is the final expression and compress(SO.cod_source)='1', so you only take the cod_source = '1'

Kurt_Bremser · Posted 04-26-2023 04:50 AM

So within cod_source = '1', cod_anagrafico is unique?

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

mariopellegrini · Posted 04-26-2023 05:29 AM

yes

Kurt_Bremser · Posted 04-26-2023 05:51 AM

Then it should be

data ruolo_danni_NEW;
set ruolo_danni;
if _n_ = 1
then do;
  length ts_inizio_validita ts_fine_validita 8;
  declare hash so (dataset:"ta_soggetti_dn (where=(compress(cod_source)='1'))");
  so.definekey("cod_anagrafico");
  so.definedata("ts_inizio_validita","ts_fine_validita");
  so.definedone();
  call missing(ts_inizio_validita,ts_fine_validita);
end;
rc = so.find();
if rc = 0 and ts_inizio_validita le  ts_inizio_validita le ts_fine_validita;
drop rc ts_inizio_validita ts_fine_validita;
run;

Spoiler

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

mariopellegrini · Posted 04-26-2023 10:31 AM

thanks Kurt_Bremser. for completeness I report the final code to obtain the same result of the proc sql;

data ruolo_danni_HASH;
set ruolo_danni;
if _n_ = 1
then do;
  length ts_inizio_validita ts_fine_validita 8;
  length cod_anagrafico_edwh $20.;
  declare hash so (dataset:"t_a_soggetti_dn (where=(compress(cod_source)='1'))");
  so.definekey("cod_anagrafico");
  so.definedata("cod_anagrafico_edwh","ts_inizio_validita","ts_fine_validita");
  so.definedone();
  call missing(cod_anagrafico_edwh, ts_inizio_validita,ts_fine_validita);
end;
rc = so.find();
if rc >= 0 and ts_inizio_validita le  ts_inizio_validita le ts_fine_validita;
drop rc ;
run;

mariopellegrini · Posted 04-27-2023 05:02 AM

Kurt I found some data inconsistencies between sql and hash, because the "ON" condition is different: ts_start_validation is taken from the left dataset, while ts_start_validation ts_end_validation of the "beetween" are taken from the right dataset. This example explains the problem with a particular case ((the results of ruolo_danni_sql and ruolo_danni_hash are different):

data RUOLO_DANNI;
input cod_source:$2. cod_anagrafico:$9. ts_inizio_validita:datetime20.;
datalines;
1 21278 02JUL11:03:25:44
1 21278 02JUL11:03:25:45
1 21278 02JUL11:03:25:46
;
data T_A_SOGGETTI_DN;
input cod_source:$2. cod_anagrafico:$9. cod_anagrafico_edwh:$20. ts_inizio_validita:datetime20.  ts_fine_validita:datetime20.;
datalines;
1 21278 F_SCDDNL64D45Z133D 01JUL2011:11:15:02 01JUL2011:11:15:01
;
proc sql noprint;
create table ruolo_danni_sql as 
select RU.cod_anagrafico,
       RU.ts_inizio_validita,
	   RU.cod_source,
	   SO.cod_anagrafico_edwh
from RUOLO_DANNI RU
left join   T_A_SOGGETTI_DN SO
on      RU.cod_anagrafico = SO.cod_anagrafico
    and ru.ts_inizio_validita between so.ts_inizio_validita and so.ts_fine_validita
    and compress(SO.cod_source)='1';
quit;
data ruolo_danni_HASH;
set ruolo_danni;
if _n_ = 1
then do;
  length ts_inizio_validita ts_fine_validita 8;
  length cod_anagrafico_edwh $20.;
  declare hash so (dataset:"t_a_soggetti_dn (where=(compress(cod_source)='1'))");
  so.definekey("cod_anagrafico");
  so.definedata("cod_anagrafico_edwh","ts_inizio_validita","ts_fine_validita");
  so.definedone();
  call missing(cod_anagrafico_edwh, ts_inizio_validita,ts_fine_validita);
end;
rc = so.find();
if rc >= 0 and ts_inizio_validita le ts_inizio_validita le ts_fine_validita;
drop rc ;
run;

defineKey in hash object join

Re: defineKey in hash object join

Re: defineKey in hash object join

Re: defineKey in hash object join

Re: defineKey in hash object join

Re: defineKey in hash object join

Re: defineKey in hash object join

Re: defineKey in hash object join

Re: defineKey in hash object join

Re: defineKey in hash object join

Re: defineKey in hash object join

Catch up on SAS Innovate 2026

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away