Hi There,
I have a monthly table containing 4000000 observations and Historical table containing around 16700000 observations (Oracle table) On monthly loading day, I use proc append to append data to historical table. However, after 1-2 hours loading, it usually fails giving unique key violation error. Hence, I need to write validation code before proc append. There is unique key on target table for 4 variables.
Dummy Source data looks like as below
Key | Account Num | Premium | Balance | Outstanding | Source_ID | Reporing Date | Year |
123 | 6665 | 200 | 677 | 444 | Customer Care | 12-Oct-19 | 2 |
34 | 445 | 899 | 556 | 888 | Bank | 17-Oct-19 | 2 |
789 | 554 | 900 | 777 | 998 | Partner Site | 04-Oct-19 | 2 |
7479 | 4782 | 9087 | 655 | 978 | Bank | 16-Oct-19 | 1 |
7657 | 5468 | 886 | 5467 | 665 | Bank | 16-Oct-19 | 1 |
And below is my Target Historical Table before loading
Key | Account Num | Premium | Balance | Outstanding | Source_ID | Reporing Date | Year |
123 | 45489 | 777 | 887 | 899 | Customer Care | 12-Oct-18 | 1 |
34 | 445 | 899 | 556 | 888 | Bank | 17-Oct-18 | 1 |
789 | 554 | 900 | 777 | 998 | Partner Site | 04-Oct-18 | 1 |
789 | 554 | 900 | 777 | 998 | Partner Site | 04-Oct-19 | 2 |
Unique Key constraints is on Key, Account Num, Reporting Date and Year.
Below is the code
Proc append base=Hist_mart data=month_tab_oct2019 force; run;
So this fails for account no 554 (Record no 3 in source) which is correct. My problem is how do I findout before applying proc append. Any help or ideas is really appreciated.
proc sort data=month_tab_oct2019;
by key account_num reporting_date year;
run;
proc sort
data=hist_mart (keep=key account_num reporting_date year)
out=lookup
;
by key account_num reporting_date year;
run;
data month_append;
merge
month_tab_oct2019 (in=in_app)
lookup (in=in_look)
;
by key account_num reporting_date year;
if in_app and not in_look;
run;
Assuming that monthly data are all OCT2019 and the fact that historic data is huge
I suggest to add a WHERE statement to the 2nd sort step, in order to eliminate time execution:
proc sort
data=hist_mart (keep=key account_num reporting_date year
where=(reporting_date between '01oct2019'd and '31oct2019'd))
out=lookup
;
by key account_num reporting_date year;
run;
Good idea. Thinking about it, I'd draw the min/max date first from the update dataset, just to be sure:
proc sql no print;
select min(reporting_date), max(reporting_date) into :mindate, maxdate
from month_tab_oct2019;
quit;
proc sort
data=hist_mart (
keep=key account_num reporting_date year
where=(reporting_date between &mindate. and &maxdate.))
)
out=lookup
;
by key account_num reporting_date year;
run;
You never know what people put into datasets 😉
What should happen if there is a matching key? Update, just ignore or raise an error? Do you want to insert whatever is possible or not insert anything if there is a matching key.
Using Oracle:
I would first upload your SAS data into an Oracle staging table and have all the remaining processing done in Oracle using explicit pass-through SQL. Depending on the details of the load logic you need using an Oracle SQL Merge statement could eventually make it very simple to deal with your data.
2+ hours for Inserting 400K records sounds like bad performance. Changing some of the SAS parameters like insertbuff and dbcommit might help to improve performance. Also bulk-loading could eventually speed-up time especially if loading into a staging table.
Instead of using append, you could use a data step to add the values:
data oralib.target;
modify oralib.target monthly_source;
by Key Account_Num Reporting_Date Year;
if _iorc_=%sysrc(_DSENMR) then do; /* key not found in master data */
_error_=0;
output;
end;
run;
As the Oracle database is checking the key values anyway (because of the unique key restraint), this may be just as fast as PROC APPEND.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.