Hello,
I've two datasets, Lets say A and B, there are 215 Variables in each of the dataset and datasize is 90 G.
I want to append B dataset into A but not to append the duplicate values which are common based on "Three Variables, not based on all 215 variable".
I can't use (syncadd=no uniquesave=yes) and also can't use below code, as it will check all variables for duplicity.
proc sql feedback;
create table lib_w.want
select * from lib_A.dsA
union corresponding
select * from lib_B.dsB
;
quit;
I'm thininking to creat new varialbe as catx(all three variable) in both A and B and then "A.catx_var ne B.Catx_Var"
is there is other way around ?
Please help, Thanks in Advance.
This is how I'm doing it now.. now it is less time consuming and giving what I wanted :
proc sort data=A;
by Var1 Var2 Var3;
run;
proc sort data=B;
by Var1 Var2 Var3;
run;
data pre_want;
merge A (in=a) B (in=b);
byVar1 Var2 Var3;
if not a and b;
run;
proc append base=A data=pre_want;
run;
So, you have 2 datasets which are 90gb each? That is a really large amount of data. Do you want to drop data from A or B based on being in both? As a suggestion, you could get a distinct list of the 3 variables from the table you want to keep data, and then drop from the other before doing a proc append:
proc sort data=a out=distlist nodupkey; by var1 var2 var3; run; data _null_; set distlist end=last; if _n_=1 then call execute('data b_proc; set b;'); call execute('if var1="'||strip(var1)||'" and var2="'||strip(var2)||'" and var3="'||strip(var3)||'" then delete;'); if last then call execute('run;'); run; proc append base=a data=b_proc force; run;
This will create a big datastep with lots of if statements to drop the data from B based on a distinct list of var1-var3 from A.
what if I append all the observation and in last I do nodupkey ?
Theres only one way to know, try it. The proc append literally should just drop the header block and tag the data onto the existing dataset, so that shouldn't matter either way, but the proc sort/datastep is where the real work is being done, so anything to minimize that.
Did you try Hash Table ? Or try SQL.
proc sql;
create table only_in_B as
select var1,var2,var3 from B
except
select var1,var2,var3 from A ;
quit;
This is how I'm doing it now.. now it is less time consuming and giving what I wanted :
proc sort data=A;
by Var1 Var2 Var3;
run;
proc sort data=B;
by Var1 Var2 Var3;
run;
data pre_want;
merge A (in=a) B (in=b);
byVar1 Var2 Var3;
if not a and b;
run;
proc append base=A data=pre_want;
run;
@atul_desh: You could try with a hash table for checking if a row is in table A (assuming your key variables are called VAR1, VAR2 and VAR3):
Data A; /* we are modifying A in place */ if 0 then modify A; /* we are not reading any obs. here */ if _N_=1 then do; declare hash h(dataset: 'A(keep=var1 var2 var3)'); rc=h.definekey('var1','var2','var3'); h.definedone(); end; set B; if h.find() then output A; /* if key not found, append to A */ run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.