Re: proc append- stop appending duplicate observation in Base table

nbonda · Posted 11-11-2015 04:02 PM

I have a question in proc append.

I have two data sets.

inv_base

part_no	Part_name	description
1002	windshield	wind shield
1003	steering	steering wheel
1004	Tire	Good Year tire

inv_tr

part_no	Part_name	description
1001	mirror	front mirror
1003	steering	steering wheel

I am using proc append to concatenate these two datasets.

proc append base=inv_base data=inv_tr force;

run;

After append I have duplicate records in base dataset.

How can I stop appending duplicate records in other words how can I stop adding a record that is already exist in base dataset.?

is there any other way I can get the required result.

Thank you in advance

Astounding · Posted 11-11-2015 04:23 PM

There are other choices: SET statement, MERGE statement, UPDATE statement, even PROC SQL. All of these choices will take longer to run. PROC APPEND does not have to process records in the BASE= data set, only add to it. But other choices require processing the records in both data sets.

To pick the best method, you'll have to think about how to handle conflicts. For example, suppose one data set contains:

10003 steering steering wheel

But the other contains

10003 steering leather steering wheel

What should the outcome be in that case? Once you have made your choices, the programming can begin.

JoshB · Posted 11-11-2015 04:59 PM

Detailed discussion with a few solutions already on this forum here.

Peter_C · Posted 11-11-2015 05:00 PM

To identify that a new record is a duplicate either needs some kind of merge or for the "master" table to be indexed on the keys that would define duplicates.
Proc Append is very effective without duplicates check because of that reason. With an indexed "master" similar efficiency - avoiding the rewriting of all rows in the "master" - is available by using SQL INSERT or by using the data step MODIFY statement. See the examples in the online doc.

kannand · Posted 11-11-2015 10:19 PM

Please try the SYNCADD option in your PROC APPEND step as shown below. The rejects will be saved with the UNIQUESAVE option.

(syncadd=no uniquesave=yes)

Please refer to the following URL for detailed explanation with example & log.....

http://support.sas.com/documentation/cdl/en/engspde/67961/HTML/default/viewer.htm#p105mkhis1sjfln1qv...

Good Luck...!!!

Kannan Deivasigamani

kannand · Posted 11-12-2015 12:50 AM

It might be worthwhile to note that the OPTIONS syntax SYNCADD and UNIQUESAVE does not work on normal libraries. I tried and the syntax wasn't recognized (without SPDE). You need to define your library where you have the master dataset as as SPD engine. I found some literature on SPD (Scalable Performance Data) listed below that may help provide some background from SAS 9. The document reveals many things of which the primary focus of SPD is for rapid data access with heavy CPU operation through parallel processing and multithreading operations.

https://support.sas.com/documentation/cdl/en/engspde/62981/PDF/default/engspde.pdf

Here is the example, I tried out using SPDE in the LIBNAME statement that provided the desired PROC APPEND result eliminating duplicates.

libname lib1 spde '/folders/myfolders/.sasstudio/';
data lib1.d1(index=(name/unique));
     set sashelp.class(obs=5);
run;
data d2(index=(name/unique));
     set sashelp.class(obs=7 firstobs=4);
run;
proc append base=lib1.d1(syncadd=no uniquesave=yes) data=work.d2 ;
run;
proc print data=lib1.d1; title '==d1 after unique append==';
run;
proc print data=&spdsusds; title '==Uniques append rejects==';
run;

The result it produced is here for your reference:

==d1 after unique append==
Obs	Name	Sex	Age	Height	Weight
1	Alfred	M	14	69.0	112.5
2	Alice	F	13	56.5	84.0
3	Barbara	F	13	65.3	98.0
4	Carol	F	14	62.8	102.5
5	Henry	M	14	63.5	102.5
6	James	M	12	57.3	83.0
7	Jane	F	12	59.8	84.5
==Uniques append rejects==
Obs	Name	Sex	Age	Height	Weight	XXX00000
1	Carol	F	14	62.8	102.5	Name
2	Henry	M	14	63.5	102.5	Name

Hope this helps...!!!

Kannan Deivasigamani

proc append- stop appending duplicate observation in Base table