Help using Base SAS procedures

Overwrite or update Impala table using sas

Reply
Frequent Contributor
Posts: 79

Overwrite or update Impala table using sas

Hi Experts,

 

I am trying to create a daily incremental code for Impala tables in SAS with Proc Append. But if I execute proc append twice or multiple time so the data gets appended multiple time.

 

How can I avoide this replication of data. Or else, is there any way to overwrite or update the data in impala table using sas.

Please help.

 

-Rahul

Valued Guide
Posts: 593

Re: Overwrite or update Impala table using sas

Posted in reply to Rahul_SAS

Assuming you have SAS/ACCESS to Impala there are lots of ways - you can use a data step merge to update existing records and add any new ones at the same time, Proc SQL Union to append, data step Update statement etc.

Super User
Posts: 10,239

Re: Overwrite or update Impala table using sas

Posted in reply to Rahul_SAS

I use two different ways to avoid duplicate data:

 

- set a variable that identifies a group of new records. This can be an infile name, a date, or similar. While concatenating (I do not use proc append), observations with the same values as those that are to be appended are excluded from the master dataset.

- identify a unique key (this may be one or more variables). After appending/concatenating, do a proc sort with nodupkey.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
How to post code
Frequent Contributor
Posts: 79

Re: Overwrite or update Impala table using sas

Posted in reply to KurtBremser

hi KurtBremser....could you please share a sample code for the first scenario.

Super User
Posts: 10,239

Re: Overwrite or update Impala table using sas

Posted in reply to Rahul_SAS

A piece of blueprint code might look like this:

%let infile1=/shared/data/data_20170913.dat;
%let outlib=out;
%let masterfile=my_dataset;

data infile;
infile "&infile1";
input
  indata $
;
todays_file = "&infile1";
run;

data &outlib..&masterfile._new;
set
  &outlib..&masterfile (where=(todays_file ne "&infile1"))
  infile
;
run;

proc datasets library=&outlib nolist;
delete &masterfile;
change &masterfile._new=&masterfile;
run;

Note that I do the "append" in a separate step. That way I can wrap the final proc datasets into a macro that checks for &syscc=0, to prevent replacing the master dataset if anything went wrong.

Also note that this is a SAS-only solution; you may have to check your options with the administrators of the Impala DBMS.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
How to post code
Ask a Question
Discussion stats
  • 4 replies
  • 171 views
  • 0 likes
  • 3 in conversation