BookmarkSubscribeRSS Feed
Rahul_SAS
Quartz | Level 8

Hi Experts,

 

I am trying to create a daily incremental code for Impala tables in SAS with Proc Append. But if I execute proc append twice or multiple time so the data gets appended multiple time.

 

How can I avoide this replication of data. Or else, is there any way to overwrite or update the data in impala table using sas.

Please help.

 

-Rahul

4 REPLIES 4
ChrisBrooks
Ammonite | Level 13

Assuming you have SAS/ACCESS to Impala there are lots of ways - you can use a data step merge to update existing records and add any new ones at the same time, Proc SQL Union to append, data step Update statement etc.

Kurt_Bremser
Super User

I use two different ways to avoid duplicate data:

 

- set a variable that identifies a group of new records. This can be an infile name, a date, or similar. While concatenating (I do not use proc append), observations with the same values as those that are to be appended are excluded from the master dataset.

- identify a unique key (this may be one or more variables). After appending/concatenating, do a proc sort with nodupkey.

Rahul_SAS
Quartz | Level 8

hi KurtBremser....could you please share a sample code for the first scenario.

Kurt_Bremser
Super User

A piece of blueprint code might look like this:

%let infile1=/shared/data/data_20170913.dat;
%let outlib=out;
%let masterfile=my_dataset;

data infile;
infile "&infile1";
input
  indata $
;
todays_file = "&infile1";
run;

data &outlib..&masterfile._new;
set
  &outlib..&masterfile (where=(todays_file ne "&infile1"))
  infile
;
run;

proc datasets library=&outlib nolist;
delete &masterfile;
change &masterfile._new=&masterfile;
run;

Note that I do the "append" in a separate step. That way I can wrap the final proc datasets into a macro that checks for &syscc=0, to prevent replacing the master dataset if anything went wrong.

Also note that this is a SAS-only solution; you may have to check your options with the administrators of the Impala DBMS.

SAS INNOVATE 2024

Innovate_SAS_Blue.png

Registration is open! SAS is returning to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team. Register for just $495 by 12/31/2023.

If you are interested in speaking, there is still time to submit a session idea. More details are posted on the website. 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Get the $99 certification deal.jpg

 

 

Back in the Classroom!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 1054 views
  • 0 likes
  • 3 in conversation