I am trying to create a daily incremental code for Impala tables in SAS with Proc Append. But if I execute proc append twice or multiple time so the data gets appended multiple time.
How can I avoide this replication of data. Or else, is there any way to overwrite or update the data in impala table using sas.
I use two different ways to avoid duplicate data:
- set a variable that identifies a group of new records. This can be an infile name, a date, or similar. While concatenating (I do not use proc append), observations with the same values as those that are to be appended are excluded from the master dataset.
- identify a unique key (this may be one or more variables). After appending/concatenating, do a proc sort with nodupkey.
A piece of blueprint code might look like this:
%let infile1=/shared/data/data_20170913.dat; %let outlib=out; %let masterfile=my_dataset; data infile; infile "&infile1"; input indata $ ; todays_file = "&infile1"; run; data &outlib..&masterfile._new; set &outlib..&masterfile (where=(todays_file ne "&infile1")) infile ; run; proc datasets library=&outlib nolist; delete &masterfile; change &masterfile._new=&masterfile; run;
Note that I do the "append" in a separate step. That way I can wrap the final proc datasets into a macro that checks for &syscc=0, to prevent replacing the master dataset if anything went wrong.
Also note that this is a SAS-only solution; you may have to check your options with the administrators of the Impala DBMS.
Registration is open! SAS is returning to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team. Register for just $495 by 12/31/2023.
If you are interested in speaking, there is still time to submit a session idea. More details are posted on the website.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.