I have a bigger dataset (Dataset A) that I used to generate a smaller dataset (Dataset B).
It turns out that 4 numeric variables from Dataset A were updated (Dataset C). Now I need to update these variables in dataset B too. Dataset A and B have in common ID numbers.
Can you please help me find a code to run this update?
You can name:
Dataset A: datasetA
Dataset C: datasetC
ID numbers: IDnum
Variables that need to be updated: Var1 Var 2 Var 3 Var4
Thank you,
Jose Diego.
One method: a hash object:
data ds_d;
set ds_b;
if _n_ = 1
then do;
declare hash c (dataset:"ds_c (keep=id_num var1 var2 var3 var4)");
c.definekey("id_num");
c.definedata("var1","var2","var3","var4");
c.definedone();
end;
rc = c.find();
drop rc;
run;
If both datasets are already sorted by id_num, a data step MERGE will probably be most efficient.
Dear Kurt,
Thank you for your answer.
Sorry but could you please clarify what do you mean by
if _n_=1 ?
Also, what is this declare hash (dataset:
Also, what is this rc=c.find.
Finally, the datasets are indeed sorted by ID numbers. However, they have different population sizes.
I am sorry for my ignorance...
The documentation of SAS Component Objects is found here.
The MERGE solution looks like this:
data ds_d;
merge
ds_b (in=b)
ds_c (keep=id_num var1 var2 var3 var4)
;
by id_num;
if b;
run;
Only observations already present in ds_b will be kept, and the variables read from ds_c will overwrite those read from ds_b.
I don't understand what you are asking. It might not even be possible depending on what is actually going on.
It sounds like you ran some program to produce B from A. Then you got a new dataset C that you used to update A. It is not clear how C updates A. Also it is not sure if you updated the dataset A in place or if you used C and A to produce a new dataset, say A_prime. And now you want to know if there is some way to use C to update B. Do you want to update B in place or use B and C to B_prime?
It would seem to me that whether you can use C to update B depends on how B was made from A.
Is B just a subset of the observations in A? If so then it might be possible.
If B was produced from A via a more complex algorithm than you probably cannot just use C to update B. You will need to recreate B from the updated A (or create B_prime from A_prime).
Hi,
Yes, I produced B from A using a matching code, matching cases and controls. However, Dataset A got the values (no/yes) of 4 numeric variables changed. That means that 4 columns received new values in dataset A.
I need to implement these changes that were made in Dataset A into Dataset B. I cannot simply run another matching procedure with Dataset A because I am already using the results of all other variables from Dataset B. And Matching would create another random dataset.
B is just a subset of observations in A.
Thank you,
So if C has just the five variables, ID and VAR1-VAR4 then a simple MERGE will do the trick.
You can use the IN= dataset option to make a variable you can use to make sure that merging in C does not add observations. In case C has other variables you don't want you can use the KEEP= dataset option to only keep the variables you want. Note this assumes that ID is unique identifier in both datasets. That is for any value of ID there is at most one observation in B and at most one observation in C that has that value.
data new_b ;
merge b(in=inb) c (keep=id var1-var4);
by id;
if inb;
run;
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.