@PegaZeus wrote:
proc sql magic=103; create table final as select * from have a inner join ids b on (a.id = b.id) left join nad g on (a.fpc = g.fpc) left join pricing e on (a.gpi = e.gpi) and (d.list_id = e.list_id) left join network f on (a.id = f.id) left join client_segment d on (a.client = d.client) and (a.bob = d.bob) and (a.seg = d.seg) and (a.ex_id = d.ex_id) and (a.aids = d.aids) left join instate c on (a.ncp_id = c.ncp_id); quit;
Have a look at:
https://github.com/scottbass/SAS/blob/master/Macro/hash_define.sas
https://github.com/scottbass/SAS/blob/master/Macro/hash_lookup.sas
You'll also need the utility macros:
https://github.com/scottbass/SAS/blob/master/Macro/parmv.sas
https://github.com/scottbass/SAS/blob/master/Macro/parmv.sas
Read the hash_define macro header thoroughly. Run the self-contained examples.
You can simulate an inner join by filtering (subsetting if statement) on the return code from the hash object lookup.
Given your join criteria the order of your lookups is important.
Hopefully your dimension tables will fit into the memory available to your SAS session, otherwise you'll need to split the code into multiple datasteps. Alternatively, you can create indexes on your dimension tables and use an index key lookup ("double set statement" - see the documentation on the SET statement, esp. the examples showing transaction datasets).
Give it a go and post followup questions.
Assuming there is a column "name" in dataset "ids", this does the first part of your SQL:
data final;
set have;
if _n_ = 1
then do;
length name $20; /* set attributes of data variables in the hash */
declare hash ids(dataset:"ids");
ids.definekey('id');
ids.definedata('name'); /* insert more columns as needed */
ids.definedone();
call missing(name); /* prevents "uninitialized" NOTE */
end;
if ids.find() = 0; /* does two things:
- fetches data coming from dataset ids
- proceeds with the data step iteration only if match is found
(does the "inner join")
*/
run;
Expand this as needed.
For more detailed assistance, show the structures of all the datasets in your SQL.
Edited according to answer by @ScottBass
There is an implied retain statement for data set variables. Therefore the "if 0 then set ids" construct will cause a retain on all variables in ids. Of particular interest are your definedata() variables, eg. "name".
Since ids is never read by a set statement, any failed find() method will cause the value from the previous successful find to be carried forward in the PDV. Probably not what you want.
I recommend a call missing(of _all_) or at least call missing(of list,of,hash,object,satellite,variables) at the appropriate point in your data step.
You are absolutely right. In this particular case, though, it would not be of consequence, as the subsetting if lets only observations through where a value is found.
But I will edit my post, as the other ("left join") hashes would suffer from this effect.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.