09-13-2017 06:39 AM
I am trying to create a daily incremental code for Impala tables in SAS with Proc Append. But if I execute proc append twice or multiple time so the data gets appended multiple time.
How can I avoide this replication of data. Or else, is there any way to overwrite or update the data in impala table using sas.
09-13-2017 06:58 AM
Assuming you have SAS/ACCESS to Impala there are lots of ways - you can use a data step merge to update existing records and add any new ones at the same time, Proc SQL Union to append, data step Update statement etc.
09-13-2017 06:59 AM
I use two different ways to avoid duplicate data:
- set a variable that identifies a group of new records. This can be an infile name, a date, or similar. While concatenating (I do not use proc append), observations with the same values as those that are to be appended are excluded from the master dataset.
- identify a unique key (this may be one or more variables). After appending/concatenating, do a proc sort with nodupkey.
09-13-2017 08:19 AM
A piece of blueprint code might look like this:
%let infile1=/shared/data/data_20170913.dat; %let outlib=out; %let masterfile=my_dataset; data infile; infile "&infile1"; input indata $ ; todays_file = "&infile1"; run; data &outlib..&masterfile._new; set &outlib..&masterfile (where=(todays_file ne "&infile1")) infile ; run; proc datasets library=&outlib nolist; delete &masterfile; change &masterfile._new=&masterfile; run;
Note that I do the "append" in a separate step. That way I can wrap the final proc datasets into a macro that checks for &syscc=0, to prevent replacing the master dataset if anything went wrong.
Also note that this is a SAS-only solution; you may have to check your options with the administrators of the Impala DBMS.