Hi There,
I have a monthly table containing 4000000 observations and Historical table containing around 16700000 observations (Oracle table) On monthly loading day, I use proc append to append data to historical table. However, after 1-2 hours loading, it usually fails giving unique key violation error. Hence, I need to write validation code before proc append. There is unique key on target table for 4 variables.
Dummy Source data looks like as below
Key | Account Num | Premium | Balance | Outstanding | Source_ID | Reporing Date | Year |
123 | 6665 | 200 | 677 | 444 | Customer Care | 12-Oct-19 | 2 |
34 | 445 | 899 | 556 | 888 | Bank | 17-Oct-19 | 2 |
789 | 554 | 900 | 777 | 998 | Partner Site | 04-Oct-19 | 2 |
7479 | 4782 | 9087 | 655 | 978 | Bank | 16-Oct-19 | 1 |
7657 | 5468 | 886 | 5467 | 665 | Bank | 16-Oct-19 | 1 |
And below is my Target Historical Table before loading
Key | Account Num | Premium | Balance | Outstanding | Source_ID | Reporing Date | Year |
123 | 45489 | 777 | 887 | 899 | Customer Care | 12-Oct-18 | 1 |
34 | 445 | 899 | 556 | 888 | Bank | 17-Oct-18 | 1 |
789 | 554 | 900 | 777 | 998 | Partner Site | 04-Oct-18 | 1 |
789 | 554 | 900 | 777 | 998 | Partner Site | 04-Oct-19 | 2 |
Unique Key constraints is on Key, Account Num, Reporting Date and Year.
Below is the code
Proc append base=Hist_mart data=month_tab_oct2019 force; run;
So this fails for account no 554 (Record no 3 in source) which is correct. My problem is how do I findout before applying proc append. Any help or ideas is really appreciated.
proc sort data=month_tab_oct2019;
by key account_num reporting_date year;
run;
proc sort
data=hist_mart (keep=key account_num reporting_date year)
out=lookup
;
by key account_num reporting_date year;
run;
data month_append;
merge
month_tab_oct2019 (in=in_app)
lookup (in=in_look)
;
by key account_num reporting_date year;
if in_app and not in_look;
run;
Assuming that monthly data are all OCT2019 and the fact that historic data is huge
I suggest to add a WHERE statement to the 2nd sort step, in order to eliminate time execution:
proc sort
data=hist_mart (keep=key account_num reporting_date year
where=(reporting_date between '01oct2019'd and '31oct2019'd))
out=lookup
;
by key account_num reporting_date year;
run;
Good idea. Thinking about it, I'd draw the min/max date first from the update dataset, just to be sure:
proc sql no print;
select min(reporting_date), max(reporting_date) into :mindate, maxdate
from month_tab_oct2019;
quit;
proc sort
data=hist_mart (
keep=key account_num reporting_date year
where=(reporting_date between &mindate. and &maxdate.))
)
out=lookup
;
by key account_num reporting_date year;
run;
You never know what people put into datasets 😉
What should happen if there is a matching key? Update, just ignore or raise an error? Do you want to insert whatever is possible or not insert anything if there is a matching key.
Using Oracle:
I would first upload your SAS data into an Oracle staging table and have all the remaining processing done in Oracle using explicit pass-through SQL. Depending on the details of the load logic you need using an Oracle SQL Merge statement could eventually make it very simple to deal with your data.
2+ hours for Inserting 400K records sounds like bad performance. Changing some of the SAS parameters like insertbuff and dbcommit might help to improve performance. Also bulk-loading could eventually speed-up time especially if loading into a staging table.
Instead of using append, you could use a data step to add the values:
data oralib.target;
modify oralib.target monthly_source;
by Key Account_Num Reporting_Date Year;
if _iorc_=%sysrc(_DSENMR) then do; /* key not found in master data */
_error_=0;
output;
end;
run;
As the Oracle database is checking the key values anyway (because of the unique key restraint), this may be just as fast as PROC APPEND.
Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.