Dear SAS community,
update libname.dataset1 as u set columnc=(select columnc from libname.dataset2 as n where u.columna=n.columna and u.columnb=n.columnb);
Was wondering if there is any other more efficient way to update a dataset from another dataset. The SAS code as above could cater for my need, but the runtime is way too long (probably due to large dataset, my dataset1 has around 3-5million records while dataset2 has 1million records).
Thank you very much in advance.
If the data are not sorted, you may want to use a hash object with all the observations from DATASET2 that have non-missing values of COLUMNC.
In addition, if DATASET2 is small and DATASET1 is large, you might save a lot of disk output activity by updating in place, via the MODIFY statement in a DATA step. This avoids re-writing all the observations in DATASET1.
data libname.dataset1;
modify libname.dataset1;
if _n_=1 then do;
declare hash h (dataset:'dataset2 (where=(not missing(columnc)))');
h.definekey('columna','columnb');
h.definedata('columnc');
h.definedone();
end;
if h.find()=0 then replace;
run;
I used the "not missing(columnc)" filter to replicate the UPDATE statement, i.e. to prevent COLUMNC values in the master dataset from being replaced by missing values .
Editted note: This assumes that DATASET2 is a subset of DATASET1. And it also assumes one observation per COLUMNA/COLLUMNB in DATASET2.
As long as both datasets are sorted by columns and column, try this:
data libname.dataset1;
update libname.dataset1 libname.dataset2 (keep=columna columnb columnc;
by columna columnb;
run;
I have more than 100 columns in dataset1, does that mean that I have to list out all the 100 columns in the keep argument?
You should only KEEP the variables from the second dataset that are (1) needed to do the merge (2) you want to update.
You can include as many variables from the second dataset as you want to update. But if the list of variables in the second dataset that you don't want to update include into the result is smaller than the list you do want then use the DROP= dataset option instead of the KEEP= dataset option.
You might also want to use the IN= dataset option so you can avoid adding new observations for combinations of the BY variables that only appear in the second dataset.
data want;
merge dataset1 (in=in1) dateset2(keep=by1 by2 var1);
by by1 by2;
if in1;
run;
No, you don't have to specify to keep the columns if they all are part of the update. However, for the observations in the dataset containing the updates (dataset2), only columns you want to update should have values in them. If a column contains a missing value in the update dataset, the corresponding column in the master (dataset1) will not be modified.
If the data are not sorted, you may want to use a hash object with all the observations from DATASET2 that have non-missing values of COLUMNC.
In addition, if DATASET2 is small and DATASET1 is large, you might save a lot of disk output activity by updating in place, via the MODIFY statement in a DATA step. This avoids re-writing all the observations in DATASET1.
data libname.dataset1;
modify libname.dataset1;
if _n_=1 then do;
declare hash h (dataset:'dataset2 (where=(not missing(columnc)))');
h.definekey('columna','columnb');
h.definedata('columnc');
h.definedone();
end;
if h.find()=0 then replace;
run;
I used the "not missing(columnc)" filter to replicate the UPDATE statement, i.e. to prevent COLUMNC values in the master dataset from being replaced by missing values .
Editted note: This assumes that DATASET2 is a subset of DATASET1. And it also assumes one observation per COLUMNA/COLLUMNB in DATASET2.
Don’t miss the livestream kicking off May 7. It’s free. It’s easy. And it’s the best seat in the house.
Join us virtually with our complimentary SAS Innovate Digital Pass. Watch live or on-demand in multiple languages, with translations available to help you get the most out of every session.
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.