Hi. I have the following macro that iterates multiple times identifying various records. Records are unique by IMB_CODE. With each iteration the identified data is appended into my FinalData dataset.
My problem is that the query can identify the same record (with the same IMB_CODE) in later iterations. When this happens I must have a way to only keep the first instance of the record in my FinalData dataset..
Any suggestions?
%macro BuildQueryData(analysis_desc= , rule= , rule_order= );
proc sql;
connect to oracle as db (user="me" password="mepw" path="medb");
create table MyRuleQuery as
select * from connection to db
( select '&analysis_desc' as RULE_NM, b.actual_dlvry_date as AD_DT, b.imb_code, &rule_order as rule_order
from ivprl.bi_spm_piece_iv_recon a, ivprl.bi_spm_piece_bids_recon b
where trunc(a.upload_create_dt) = trunc(sysdate)
and trunc(b.upload_create_dt) = trunc(sysdate)
and a.imb_code = b.imb_code
and a.imb_dlvry_zip_3 in(select imb_dlvry_zip_3 from ivprl.bi_spm_piece_bids_recon)
and a.ml_cl_code in(select ml_cl_code from ivprl.bi_spm_piece_bids_recon)
and a.ml_cat_code in(select ml_cat_code from ivprl.bi_spm_piece_bids_recon)
and a.imb_code not in(select imb_code from MyRuleQueryData where rule_order < 999.1)
and &rule);
disconnect from db;
quit;
proc append base=FinalData data = MyRuleQuery;
run;
%mend BuildQueryData;
data _null_;
set QueryRules;
call execute('%BuildQueryData(analysis_desc='||analysis_desc||' , rule='||rule||' , rule_order='||rule_order||');');
run;
It is quite customary to preprocess the data you want to append by filtering out the rows with a key that's already in the base dataset. A simple SQL query should do (untested):
proc sql;
create table newrows as
select * form MyRuleQuery where ID
not in (select distinct IMB_CODE from FinalData);
quit;
proc append base=FinalData data = newrows;
run;
Alternatively you van ceate a unique index on the target table.
PROC DATASETS LIBRARY=yourlib;
MODIFY Finaldata;
INDEX CREATE IMB_CODE / UNIQUE NOMISS ;
quit;
This will reject rows that have a key that is already present. Personally I dislike the (ab)use of a unique index (or actually the associated integrity constraint) as an ETL tool, but is very effective. Apply the unique index just the same as a safety net.
Best of luck,
- Jan.
Hope this helps,
- Jan.
proc sort nodupkey
But I need to guarentee the first instance inserted into the FinalData dataset is the one that stays. I'm not sure NoDupKey guarentees the first record in is not eliminated.
It is quite customary to preprocess the data you want to append by filtering out the rows with a key that's already in the base dataset. A simple SQL query should do (untested):
proc sql;
create table newrows as
select * form MyRuleQuery where ID
not in (select distinct IMB_CODE from FinalData);
quit;
proc append base=FinalData data = newrows;
run;
Alternatively you van ceate a unique index on the target table.
PROC DATASETS LIBRARY=yourlib;
MODIFY Finaldata;
INDEX CREATE IMB_CODE / UNIQUE NOMISS ;
quit;
This will reject rows that have a key that is already present. Personally I dislike the (ab)use of a unique index (or actually the associated integrity constraint) as an ETL tool, but is very effective. Apply the unique index just the same as a safety net.
Best of luck,
- Jan.
Hope this helps,
- Jan.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.