Hi! I want to merge datasets by ID. Each dataset has different ages. I would like to get a merged dataset that looks like this:
ID | age1 | age2 | age3 |
1 | 12 | 13 | |
1 | 23 | ||
2 | 25 | ||
3 | 11 | 22 | |
3 | 32 | ||
3 | 33 | ||
4 | 10 | 12 | |
4 | 15 |
I used this code
data ages_all;
merge
age1 age2 age3;
by ID;
run;
The problem is that I get this dataset:
ID | age1 | age2 | age3 |
1 | 12 | 13 | |
1 | 12 | 23 | |
2 | 25 | ||
3 | 11 | 22 | |
3 | 11 | 32 | |
3 | 11 | 33 | |
4 | 10 | 12 | |
4 | 15 | 12 |
As you can see, for example for ID=1, it has two different observations for age2 and only one observation for age1. However, after merging, the value for age1 repeats. I would like to avoid that, since I want to calculate frequencies per variable and having two observations with the value 12 for ID is not correct.
Hi @ANKH1,
You can reset the age variables to missing by inserting
call missing(ID, of age1-age3);
before the MERGE statement (variable ID is included optionally just to maintain variable order).
However, I wouldn't merge the datasets this way with the intention "to calculate frequencies per variable."
@ANKH1 wrote:
(...) having two observations with the value 12 for ID is not correct.
Note that having an observation with missing age1 for ID 1 is actually not correct either.
Frequencies could be calculated from the original datasets or after interleaving them, e.g., like this:
data ages_all;
set age1(rename=(age1=age) in=in1)
age2(rename=(age2=age) in=in2)
age3(rename=(age3=age) in=in3);
by id;
ds=in1+2*in2+3*in3;
run;
(Instead of RENAME= and IN= dataset options you could use the COALESCE function and the INDSNAME= option of the SET statement, respectively.)
Sorry, I'm not understanding. You have variables AGE1 AGE2 and AGE3, then you talk about merging data sets AGE1 AGE2 and AGE3. That doesn't seem right.
Please show us (a portion of) all the data sets involved here. Please provide data as WORKING data step code (examples and instructions).
Please read my request in my previous message, where I said:
Please show us (a portion of) all the data sets involved here. Please provide data as WORKING data step code (examples and instructions).
Hi @ANKH1,
You can reset the age variables to missing by inserting
call missing(ID, of age1-age3);
before the MERGE statement (variable ID is included optionally just to maintain variable order).
However, I wouldn't merge the datasets this way with the intention "to calculate frequencies per variable."
@ANKH1 wrote:
(...) having two observations with the value 12 for ID is not correct.
Note that having an observation with missing age1 for ID 1 is actually not correct either.
Frequencies could be calculated from the original datasets or after interleaving them, e.g., like this:
data ages_all;
set age1(rename=(age1=age) in=in1)
age2(rename=(age2=age) in=in2)
age3(rename=(age3=age) in=in3);
by id;
ds=in1+2*in2+3*in3;
run;
(Instead of RENAME= and IN= dataset options you could use the COALESCE function and the INDSNAME= option of the SET statement, respectively.)
Thank you so much! The output was what I needed!
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9.
Lock in the best rate now before the price increases on April 1.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.