BookmarkSubscribeRSS Feed
Swapnil_21
Obsidian | Level 7

Hi There,

 

I have a monthly table containing 4000000 observations and Historical table containing around 16700000 observations (Oracle table) On monthly loading day, I use proc append to append data to historical table. However, after 1-2 hours loading, it usually fails giving unique key violation error. Hence, I need to write validation code before proc append. There is unique key on target table for 4 variables. 

 

Dummy Source data looks like as below 

 

KeyAccount NumPremiumBalanceOutstandingSource_IDReporing DateYear
1236665200677444Customer Care12-Oct-192
34445899556888Bank17-Oct-192
789554900777998Partner Site04-Oct-192
747947829087655978Bank16-Oct-191
765754688865467665Bank16-Oct-191

 

And below is my Target Historical Table before  loading 

 

KeyAccount NumPremiumBalanceOutstandingSource_IDReporing DateYear
12345489777887899Customer Care12-Oct-181
34445899556888Bank17-Oct-181
789554900777998Partner Site04-Oct-181
789554900777998Partner Site04-Oct-192

 

Unique Key constraints is on Key, Account Num, Reporting Date and Year. 

 

 

Below is the code 

 

Proc append base=Hist_mart data=month_tab_oct2019 force; run;

 

So this fails for account no 554 (Record no 3 in source) which is correct.  My problem is how do I findout before applying proc append.  Any help or ideas is really appreciated. 

5 REPLIES 5
Kurt_Bremser
Super User
proc sort data=month_tab_oct2019;
by key account_num  reporting_date year;
run;

proc sort
  data=hist_mart (keep=key account_num  reporting_date year)
  out=lookup
;
by key account_num  reporting_date year;
run;

data month_append;
merge
  month_tab_oct2019 (in=in_app)
  lookup (in=in_look)
;
by key account_num  reporting_date year;
if in_app and not in_look;
run;
Shmuel
Garnet | Level 18

@Kurt_Bremser ,

Assuming that monthly data are all OCT2019 and the fact that historic data is huge

I suggest to add a WHERE statement to the 2nd sort step, in order to eliminate time execution:

proc sort
  data=hist_mart (keep=key account_num  reporting_date year
        where=(reporting_date between '01oct2019'd and '31oct2019'd))
  out=lookup
;
by key account_num  reporting_date year;
run;
Kurt_Bremser
Super User

Good idea. Thinking about it, I'd draw the min/max date first from the update dataset, just to be sure:

proc sql no print;
select min(reporting_date), max(reporting_date) into :mindate, maxdate
from month_tab_oct2019;
quit;

proc sort
  data=hist_mart (
    keep=key account_num  reporting_date year
    where=(reporting_date between &mindate. and &maxdate.))
  )
  out=lookup
;
by key account_num reporting_date year;
run;

You never know what people put into datasets 😉

Patrick
Opal | Level 21

@Swapnil_21 

What should happen if there is a matching key? Update, just ignore or raise an error? Do you want to insert whatever is possible or not insert anything if there is a matching key.

 

Using Oracle:

I would first upload your SAS data into an Oracle staging table and have all the remaining processing done in Oracle using explicit pass-through SQL. Depending on the details of the load logic you need using an Oracle SQL Merge statement could eventually make it very simple to deal with your data. 

 

2+ hours for Inserting 400K records sounds like bad performance. Changing some of the SAS parameters like insertbuff and dbcommit might help to improve performance. Also bulk-loading could eventually speed-up time especially if loading into a staging table.

s_lassen
Meteorite | Level 14

Instead of using append, you could use a data step to add the values:

data oralib.target;
  modify oralib.target monthly_source;
  by Key Account_Num Reporting_Date Year;
  if _iorc_=%sysrc(_DSENMR) then do; /* key not found in master data */
    _error_=0;
    output;
    end;
run;

As the Oracle database is checking the key values anyway (because of the unique key restraint), this may be just as fast as PROC APPEND.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 659 views
  • 0 likes
  • 5 in conversation