BookmarkSubscribeRSS Feed
Rahul_SAS
Quartz | Level 8

Hi Experts,

 

I am trying to create a daily incremental code for Impala tables in SAS with Proc Append. But if I execute proc append twice or multiple time so the data gets appended multiple time.

 

How can I avoide this replication of data. Or else, is there any way to overwrite or update the data in impala table using sas.

Please help.

 

-Rahul

4 REPLIES 4
ChrisBrooks
Ammonite | Level 13

Assuming you have SAS/ACCESS to Impala there are lots of ways - you can use a data step merge to update existing records and add any new ones at the same time, Proc SQL Union to append, data step Update statement etc.

Kurt_Bremser
Super User

I use two different ways to avoid duplicate data:

 

- set a variable that identifies a group of new records. This can be an infile name, a date, or similar. While concatenating (I do not use proc append), observations with the same values as those that are to be appended are excluded from the master dataset.

- identify a unique key (this may be one or more variables). After appending/concatenating, do a proc sort with nodupkey.

Rahul_SAS
Quartz | Level 8

hi KurtBremser....could you please share a sample code for the first scenario.

Kurt_Bremser
Super User

A piece of blueprint code might look like this:

%let infile1=/shared/data/data_20170913.dat;
%let outlib=out;
%let masterfile=my_dataset;

data infile;
infile "&infile1";
input
  indata $
;
todays_file = "&infile1";
run;

data &outlib..&masterfile._new;
set
  &outlib..&masterfile (where=(todays_file ne "&infile1"))
  infile
;
run;

proc datasets library=&outlib nolist;
delete &masterfile;
change &masterfile._new=&masterfile;
run;

Note that I do the "append" in a separate step. That way I can wrap the final proc datasets into a macro that checks for &syscc=0, to prevent replacing the master dataset if anything went wrong.

Also note that this is a SAS-only solution; you may have to check your options with the administrators of the Impala DBMS.

sas-innovate-2024.png

 

Time is running out to save with the early bird rate. Register by Friday, March 1 for just $695 - $100 off the standard rate.

 

Check out the agenda and get ready for a jam-packed event featuring workshops, super demos, breakout sessions, roundtables, inspiring keynotes and incredible networking events. 

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 1147 views
  • 0 likes
  • 3 in conversation