Good morning everyone.
I'm trying to convert a proc sql join to a hashed data step, but I don't get the same result because of:
- duplicates that the hash does not consider;
- of an "ON" condition that I can't put into the data hash step.
Here is the example:
data ds1;
input k1 ts_start ;
datalines;
1 4
;
data ds2;
input k1 ts_start ts_end;
datalines;
1 1 9
1 2 8
1 6 8
;
proc sql noprint;
create table ds_tot
as select A.k1,
B.ts_start,
B.ts_end
from ds1 A
left join ds2 B
on A.K1=B.K1 and
A.ts_start between B.ts_start and B.ts_end;
quit;
data ds_tot_hash;
if 0 then set ds1 ds2;
declare hash h_merge(dataset:"ds2", multidata:'y' );
rc = h_merge.DefineKey("k1");
rc = h_merge.DefineData("ts_start", "ts_end");
rc = h_merge.DefineDone();
drop rc;
do while (not eof);
set ds1 (keep=k1) end=eof;
if h_merge.find() >= 0;
output;
end;
run;
Try this
data ds_tot_hash(drop = ts);
if _N_ = 1 then do;
dcl hash h(dataset : 'ds2', multidata : 'Y');
h.definekey('k1');
h.definedata(all : 'Y');
h.definedone();
end;
set ds1(rename = ts_start = ts);
if 0 then set ds2;
do while (h.do_over() = 0);
if ts_start <= ts <= ts_end then output;
end;
run;
Result:
k1 ts_start ts_end 1 1 9 1 2 8
Try this
data ds_tot_hash(drop = ts);
if _N_ = 1 then do;
dcl hash h(dataset : 'ds2', multidata : 'Y');
h.definekey('k1');
h.definedata(all : 'Y');
h.definedone();
end;
set ds1(rename = ts_start = ts);
if 0 then set ds2;
do while (h.do_over() = 0);
if ts_start <= ts <= ts_end then output;
end;
run;
Result:
k1 ts_start ts_end 1 1 9 1 2 8
I think this should work:
data ds_tot_hash;
if 0 then set ds1 ds2;
if _N_=1 then do;
declare hash h_merge(dataset:"ds2", multidata:'y' );
rc = h_merge.DefineKey("k1");
rc = h_merge.DefineData("ts_start", "ts_end");
rc = h_merge.DefineDone();
end;
found=0;
set ds1 (keep=k1 ts_start rename=(ts_start=a_start)) end=eof;
rc=h_merge.find();
do while(rc=0);
if ts_start<=a_start<=ts_end then do;
found=1;
output;
end;
rc=h_merge.find_next();
end;
if found=0 then do;
call missing(ts_start,ts_end);
output;
end;
keep k1 ts_start ts_end;
run;
We have to include the FOUND variable, to check if something has been output, in order to simulate the left join.
Only question: is this really worth the trouble?
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.