Insert data from a bigger dataset into a smaller dataset

firesantos · Posted 11-30-2020 02:44 PM

I have a bigger dataset (Dataset A) that I used to generate a smaller dataset (Dataset B).

It turns out that 4 numeric variables from Dataset A were updated (Dataset C). Now I need to update these variables in dataset B too. Dataset A and B have in common ID numbers.

Can you please help me find a code to run this update?

You can name:

Dataset A: datasetA

Dataset C: datasetC

ID numbers: IDnum

Variables that need to be updated: Var1 Var 2 Var 3 Var4

Thank you,

Jose Diego.

Kurt_Bremser · Posted 11-30-2020 03:15 PM

One method: a hash object:

data ds_d;
set ds_b;
if _n_ = 1
then do;
  declare hash c (dataset:"ds_c (keep=id_num var1 var2 var3 var4)");
  c.definekey("id_num");
  c.definedata("var1","var2","var3","var4");
  c.definedone();
end;
rc = c.find();
drop rc;
run;

If both datasets are already sorted by id_num, a data step MERGE will probably be most efficient.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

firesantos · Posted 11-30-2020 03:40 PM

Dear Kurt,

Thank you for your answer.

Sorry but could you please clarify what do you mean by

if _n_=1 ?

Also, what is this declare hash (dataset:

Also, what is this rc=c.find.

Finally, the datasets are indeed sorted by ID numbers. However, they have different population sizes.

I am sorry for my ignorance...

Kurt_Bremser · Posted 11-30-2020 04:35 PM

The documentation of SAS Component Objects is found here.

The MERGE solution looks like this:

data ds_d;
merge
  ds_b (in=b)
  ds_c (keep=id_num var1 var2 var3 var4)
;
by id_num;
if b;
run;

Only observations already present in ds_b will be kept, and the variables read from ds_c will overwrite those read from ds_b.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

Tom · Posted 11-30-2020 03:38 PM

I don't understand what you are asking. It might not even be possible depending on what is actually going on.

It sounds like you ran some program to produce B from A. Then you got a new dataset C that you used to update A. It is not clear how C updates A. Also it is not sure if you updated the dataset A in place or if you used C and A to produce a new dataset, say A_prime. And now you want to know if there is some way to use C to update B. Do you want to update B in place or use B and C to B_prime?

It would seem to me that whether you can use C to update B depends on how B was made from A.

Is B just a subset of the observations in A? If so then it might be possible.

If B was produced from A via a more complex algorithm than you probably cannot just use C to update B. You will need to recreate B from the updated A (or create B_prime from A_prime).

firesantos · Posted 11-30-2020 04:13 PM

Hi,

Yes, I produced B from A using a matching code, matching cases and controls. However, Dataset A got the values (no/yes) of 4 numeric variables changed. That means that 4 columns received new values in dataset A.

I need to implement these changes that were made in Dataset A into Dataset B. I cannot simply run another matching procedure with Dataset A because I am already using the results of all other variables from Dataset B. And Matching would create another random dataset.

B is just a subset of observations in A.

Thank you,

Tom · Posted 11-30-2020 05:09 PM

So if C has just the five variables, ID and VAR1-VAR4 then a simple MERGE will do the trick.

You can use the IN= dataset option to make a variable you can use to make sure that merging in C does not add observations. In case C has other variables you don't want you can use the KEEP= dataset option to only keep the variables you want. Note this assumes that ID is unique identifier in both datasets. That is for any value of ID there is at most one observation in B and at most one observation in C that has that value.

data new_b ;
  merge b(in=inb) c (keep=id var1-var4);
  by id;
  if inb;
run;

Insert data from a bigger dataset into a smaller dataset

Re: Insert data from a bigger dataset into a smaller dataset

Re: Insert data from a bigger dataset into a smaller dataset

Re: Insert data from a bigger dataset into a smaller dataset

Re: Insert data from a bigger dataset into a smaller dataset

Re: Insert data from a bigger dataset into a smaller dataset

Re: Insert data from a bigger dataset into a smaller dataset

Registration is open

Insert data from a bigger dataset into a smaller dataset

Re: Insert data from a bigger dataset into a smaller dataset

Re: Insert data from a bigger dataset into a smaller dataset

Re: Insert data from a bigger dataset into a smaller dataset

Re: Insert data from a bigger dataset into a smaller dataset

Re: Insert data from a bigger dataset into a smaller dataset

Re: Insert data from a bigger dataset into a smaller dataset

Registration is open

SAS Training: Just a Click Away