I have three data sets:
(1) S1 (n=441000 Unique records KEY (= ID) with 22 columns : ID and A1:A21) (2) S2 (n=1,467,661,903 records, with duplicate Key (= ID) records with 5 columns : ID, B, C, D and Action) (3) S3 (n=905 Unique records with 3 columns : X, Action and Action_Desc)
SQL query: ==========
Proc SQL; Select S1.*,S3.Action_Desc from S1 Left join S2 on s1.id=s2.id left join s2.action=s3.action; quit;
HASH CODE: ==========
DATA HASH_ACTION; IF 0 THEN SET s1; if _N_ = 1 then do; declare hash HASH_NAME(dataset: "s1", multidata: 'y'); HASH_NAME.defineKEY("ID"); HASH_NAME.defineData (ALL:'YES'); HASH_NAME.defineDone(); END; set s2(keep=ID B C D ACTIOn); IF HASH_NAME.FIND(KEY:ID) = 0 THEN OUTPUT; RUN;
Question: (1) I got duplicate records HASH_ACTION dataset, how can I get only unique records. (2) How can I embed : left join s2.action=s3.action into HASH Code.
... View more