Solved: Appending Non-Duplicate based on three variables

atul_desh · Posted 06-08-2017 06:46 AM

Hello,

I've two datasets, Lets say A and B, there are 215 Variables in each of the dataset and datasize is 90 G.

I want to append B dataset into A but not to append the duplicate values which are common based on "Three Variables, not based on all 215 variable".

I can't use (syncadd=no uniquesave=yes) and also can't use below code, as it will check all variables for duplicity.

proc sql feedback;
create table lib_w.want
select * from lib_A.dsA
union corresponding
select * from lib_B.dsB
;
quit;

I'm thininking to creat new varialbe as catx(all three variable) in both A and B and then "A.catx_var ne B.Catx_Var"

is there is other way around ?

Please help, Thanks in Advance.

atul_desh · Posted 06-09-2017 07:10 AM

This is how I'm doing it now.. now it is less time consuming and giving what I wanted :

proc sort data=A;
by Var1 Var2 Var3;
run;

proc sort data=B;
by Var1 Var2 Var3;
run;

data pre_want;
merge A (in=a) B (in=b);
byVar1 Var2 Var3;
if not a and b;
run;

proc append base=A data=pre_want;
run;

View solution in original post

RW9 · Posted 06-08-2017 06:56 AM

So, you have 2 datasets which are 90gb each? That is a really large amount of data. Do you want to drop data from A or B based on being in both? As a suggestion, you could get a distinct list of the 3 variables from the table you want to keep data, and then drop from the other before doing a proc append:

proc sort data=a out=distlist nodupkey;
  by var1 var2 var3;
run;

data _null_;
  set distlist end=last;
  if _n_=1 then call execute('data b_proc; set b;');
  call execute('if var1="'||strip(var1)||'" and var2="'||strip(var2)||'" and var3="'||strip(var3)||'" then delete;');
  if last then call execute('run;');
run;

proc append base=a data=b_proc force;
run;

This will create a big datastep with lots of if statements to drop the data from B based on a distinct list of var1-var3 from A.

atul_desh · Posted 06-08-2017 07:06 AM

what if I append all the observation and in last I do nodupkey ?

RW9 · Posted 06-08-2017 07:16 AM

Theres only one way to know, try it. The proc append literally should just drop the header block and tag the data onto the existing dataset, so that shouldn't matter either way, but the proc sort/datastep is where the real work is being done, so anything to minimize that.

Ksharp · Posted 06-08-2017 08:23 AM

Did you try Hash Table ? Or try SQL.

proc sql;
create table only_in_B as
select var1,var2,var3  from B
except
select var1,var2,var3 from A ;
quit;

atul_desh · Posted 06-08-2017 10:01 AM

how can use only_in_b for append b to a ??

atul_desh · Posted 06-09-2017 07:10 AM

This is how I'm doing it now.. now it is less time consuming and giving what I wanted :

proc sort data=A;
by Var1 Var2 Var3;
run;

proc sort data=B;
by Var1 Var2 Var3;
run;

data pre_want;
merge A (in=a) B (in=b);
byVar1 Var2 Var3;
if not a and b;
run;

proc append base=A data=pre_want;
run;

s_lassen · Posted 06-09-2017 07:27 AM

@atul_desh: You could try with a hash table for checking if a row is in table A (assuming your key variables are called VAR1, VAR2 and VAR3):

Data A; /* we are modifying A in place */
  if 0 then modify A; /* we are not reading any obs. here */
  if _N_=1 then do;
    declare hash h(dataset: 'A(keep=var1 var2 var3)');
    rc=h.definekey('var1','var2','var3');
    h.definedone();
    end;
  set B;
  if h.find() then
    output A; /* if key not found, append to A */
run;

Appending Non-Duplicate based on three variables

Re: Appending Non-Duplicate based on three variables

Re: Appending Non-Duplicate based on three variables

Re: Appending Non-Duplicate based on three variables

Re: Appending Non-Duplicate based on three variables

Re: Appending Non-Duplicate based on three variables

Re: Appending Non-Duplicate based on three variables

Re: Appending Non-Duplicate based on three variables

Re: Appending Non-Duplicate based on three variables

Catch up on SAS Innovate 2026

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away