BookmarkSubscribeRSS Feed
firesantos
Calcite | Level 5

I have a bigger dataset (Dataset A) that I used to generate a smaller dataset (Dataset B). 

 

It turns out that 4 numeric variables from Dataset A were updated (Dataset C). Now I need to update these variables in dataset B too. Dataset A and B have in common ID numbers. 

 

Can you please help me find a code to run this update?

 

You can name:

 

Dataset A: datasetA

Dataset C: datasetC

ID numbers: IDnum

Variables that need to be updated: Var1 Var 2 Var 3 Var4

 

Thank you, 

 

Jose Diego. 

 

 

 

 

6 REPLIES 6
Kurt_Bremser
Super User

One method: a hash object:

data ds_d;
set ds_b;
if _n_ = 1
then do;
  declare hash c (dataset:"ds_c (keep=id_num var1 var2 var3 var4)");
  c.definekey("id_num");
  c.definedata("var1","var2","var3","var4");
  c.definedone();
end;
rc = c.find();
drop rc;
run;

If both datasets are already sorted by id_num, a data step MERGE will probably be most efficient.

firesantos
Calcite | Level 5

Dear Kurt, 

 

Thank you for your answer. 

 

Sorry but could you please clarify what do you mean by 

if _n_=1 ?

Also, what is this declare hash (dataset:

Also, what is this rc=c.find. 

 

Finally, the datasets are indeed sorted by ID numbers. However, they have  different population sizes. 

 

I am sorry for my ignorance...

 

 

Kurt_Bremser
Super User

The documentation of SAS Component Objects is found here.

The MERGE solution looks like this:

data ds_d;
merge
  ds_b (in=b)
  ds_c (keep=id_num var1 var2 var3 var4)
;
by id_num;
if b;
run;

Only observations already present in ds_b will be kept, and the variables read from ds_c will overwrite those read from ds_b.

Tom
Super User Tom
Super User

I don't understand what you are asking.  It might not even be possible depending on what is actually going on. 

 

It sounds like you ran some program to produce B from A. Then you got a new dataset C that you used to update A.  It is not clear how C updates A.  Also it is not sure if you updated the dataset A in place or if you used C and A to produce a new dataset, say A_prime.  And now you want to know if there is some way to use C to update B.  Do you want to update B in place or use B and C to B_prime?

 

It would seem to me that whether you can use C to update B depends on how B was made from A. 

 

Is B just a subset of the observations in A?  If so then it might be possible. 

If B was produced from A via a more complex algorithm than you probably cannot just use C to update B.  You will need to recreate B from the updated A (or create B_prime from A_prime).

firesantos
Calcite | Level 5

Hi, 

 

Yes, I produced B from A using a matching code, matching cases and controls. However, Dataset A got the values (no/yes) of 4 numeric variables changed. That means that 4 columns received new values in dataset A. 

 

I need to implement these changes that were made in Dataset A into Dataset B. I cannot simply run another matching procedure with Dataset A because I am already using the results of all other variables from Dataset B. And Matching would create another random dataset.  

 

B is just a subset of observations in A. 

 

Thank you, 

 

 

Tom
Super User Tom
Super User

So if C has just the five variables, ID and VAR1-VAR4 then a simple MERGE will do the trick.

You can use the IN= dataset option to make a variable you can use to make sure that merging in C does not add observations.  In case C has other variables you don't want you can use the KEEP= dataset option to only keep the variables you want.  Note this assumes that ID is unique identifier in both datasets. That is for any value of ID there is at most one observation in B and at most one observation in C that has that value.

data new_b ;
  merge b(in=inb) c (keep=id var1-var4);
  by id;
  if inb;
run;

 

hackathon24-white-horiz.png

The 2025 SAS Hackathon Kicks Off on June 11!

Watch the live Hackathon Kickoff to get all the essential information about the SAS Hackathon—including how to join, how to participate, and expert tips for success.

YouTube LinkedIn

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 6 replies
  • 1251 views
  • 0 likes
  • 3 in conversation