Re: Append Unique records

fredbell · Posted 10-06-2010 01:17 PM

Hello

I need to append records from one table to another but have the append fuction not append if record already exists in Base dataset.

Example of Base Dataset, Name of Base dataset "Employees"

Name Age Sex Province
--------------------------------------------------------------
FRED 48 M ONT.
SAM 55 M ONT.
KIM 19 F ONT.
KAREN 33 F ONT.

Example of dataset that i want to append to Base, Name "NewEmployees"

Name Age Sex Province
-------------------------------------------------------------
FRED 48 M ONT.
Tony 51 M ONT.

I don't want it to append record below because it already exists in Base.

Name Age Sex Province
-------------------------------------------------------------
FRED 48 M ONT.

I do want it to append this record because it does not appear in Base.

Name Age Sex Province
-------------------------------------------------------------
Tony 51 M ONT.

proc append Base = Employees Data = NewEmployees FORCE ;
run;

Thanks and i hope this is clear and possible.

Fred

sbb · Posted 10-06-2010 01:25 PM

Consider using a DATA step approach (on sorted input files using your defined BY statement variable criteria for matching "like" conditions), and use the SET statement with a BY statement, and combine that with FIRST. and LAST. to detect conditions where you do or do not want to OUTPUT an observation. Suggest you add for diagnostics while programming the DATA step, this statement which will help in debugging the DATA step processing (where you might have one or more of these statements in a DATA step and the "nn" will help reveal in your SAS log just which one is being executed at a given time):

PUT '>DIAG-nn>' / _ALL_;

Scott Barry
SBBWorks, Inc.

Suggested Google advanced search arguments, this topic / post:

data step by group procesing site:sas.com

data step append site:sas.com

fredbell · Posted 10-06-2010 01:45 PM

I am very new to SAS but i will try to work through what you have posted,
any chance you could post example?

Thanks sbb

Fred

Cynthia_sas · Posted 10-06-2010 01:46 PM

Hi:
In addition to Scott's suggestion, you may also want to move away from the simple PROC APPEND approach and move toward using an INDEX with the UNIQUE option (assuming you can find a combination of variables that define "uniqueness -- other than an ID):

http://support.sas.com/documentation/cdl/en/lrcon/62955/HTML/default/viewer.htm#a000440261.htm
http://support.sas.com/kb/24/603.html
http://www2.sas.com/proceedings/sugi30/008-30.pdf

cynthia

sbb · Posted 10-06-2010 01:57 PM

The SAS-hosted DOC is your friend...especially now, with much information and technical reference materials available at the SAS Support website.

In addition to the Google searches already provided, here's where I would encourage your attention, to get started:

Methods of Combining SAS Data Sets (and related sections that follow):
http://support.sas.com/documentation/cdl/en/basess/58133/HTML/default/a001303248.htm

Scott Barry
SBBWorks, Inc.

SASPhile · Posted 10-06-2010 01:49 PM

if the purpose is to eliminate the dup records then you can use proc sort the following way:

data base1;
input name$1-5 age 6-7 sex $9-10 province $11-15;
datalines;
FRED 48 M ONT.
SAM 55 M ONT.
KIM 19 F ONT.
KAREN 33 F ONT.
;
run;

data base2;
input name$1-5 age 6-7 sex $9-10 province $11-14;
datalines;
FRED 48 M ONT.
TONY 51 M ONT.
;
run;

proc append base=base1 data=base2 force;
run;

proc sort data=base1 nodup;
by _all_;
run;

data_null__ · Posted 10-06-2010 07:42 PM

MODIFY with KEYed access.

Create a key on BASE1. Then code a MODIFY step to trap the transaction records that are not in BASE1 and OUTPUT. As written this cannot be used to update obs in BASE1 with changes to age sex etc., you have to turn the method over for that.

[pre]
data base1(index=(name));
input name$ age sex $ province $;
datalines;
FRED 48 M ONT.
SAM 55 M ONT.
KIM 19 F ONT.
KAREN 33 F ONT.
;
run;

data base2;
input name $ age sex $ province $;
datalines;
FRED 48 M ONT.
TONY 51 M ONT.
MARY 42 F ONT.
;
run;

data base1;
set base2;
modify base1 key=name/unique;
if _iorc_ ne 0 then do;
output;
_error_=0;
end;
run;
proc print data=base1;
run;
[/pre]

Ksharp · Posted 10-11-2010 08:52 PM

@data _null_; gives a method, and there is another way (proc sql) to get it.

[pre]
data base1;
input name$1-5 age 6-7 sex $9-10 province $11-15;
datalines;
FRED 48 M ONT.
SAM 55 M ONT.
KIM 19 F ONT.
KAREN 33 F ONT.
;
run;

data base2;
input name$1-5 age 6-7 sex $9-10 province $11-14;
datalines;
FRED 48 M ONT.
TONY 51 M ONT.
;
run;

proc sql feedback;
create table append as
select * from base1
union corresponding
select * from base2
;
quit;
proc print;
run;

[/pre]

Ksharp

atul_desh · Posted 06-08-2017 05:45 AM

Hey KSharp,

when I'm running this query its checking for all variables for duplicity.

I need to check based on only 3 Variables.

Is there any other way around.

Thanks

Sunny · Posted 10-12-2010 05:42 AM

here is another approach to get the final output. its quite simple approach.

data base1(index=(name));
input name$ age sex $ province $;
datalines;
FRED 48 M ONT.
SAM 55 M ONT.
KIM 19 F ONT.
KAREN 33 F ONT.
;
run;

data base2;
input name $ age sex $ province $;
datalines;
FRED 48 M ONT.
TONY 51 M ONT.
MARY 42 F ONT.
;
run;

proc sort data = base1;
by _all_;
run;

proc sort data = base2;
by _all_;
run;

data final;
merge base1(in = ina) base2(in = inb);
by _all_;
if ina = 1 or inb = 1 then output;
run;

data_null__ · Posted 10-12-2010 09:59 AM

While the result may be the same you did not append records to an existing file.

Sunny · Posted 10-12-2010 01:44 PM

yes. I thought, we are more concerned about output as compare to appending the datasets.

So i wrote the script keeping output in my mind. But there may be other way to achieve it.

Ksharp · Posted 10-17-2010 09:08 PM

Hi.
Definitely using merge-by and sort can do it.
But there is a disadvantage that the order of obs is changed.
And Somebody do not like it.
The @data_null_;'s advantage is to keep the original order.

Ksharp

nbonda · Posted 11-11-2015 11:28 PM

Thank you Everybody for giving suggestions. At last I decided to use modify statement. this update (overwrite) if observation exits the master dataset. I believe this is slower process than append.

shaffner · Posted 01-15-2016 05:38 PM

Can you post your code please?

Registration is open

SAS Training: Just a Click Away