Hi! I want to merge datasets by ID. Each dataset has different ages. I would like to get a merged dataset that looks like this:
ID | age1 | age2 | age3 |
1 | 12 | 13 | |
1 | 23 | ||
2 | 25 | ||
3 | 11 | 22 | |
3 | 32 | ||
3 | 33 | ||
4 | 10 | 12 | |
4 | 15 |
I used this code
data ages_all;
merge
age1 age2 age3;
by ID;
run;
The problem is that I get this dataset:
ID | age1 | age2 | age3 |
1 | 12 | 13 | |
1 | 12 | 23 | |
2 | 25 | ||
3 | 11 | 22 | |
3 | 11 | 32 | |
3 | 11 | 33 | |
4 | 10 | 12 | |
4 | 15 | 12 |
As you can see, for example for ID=1, it has two different observations for age2 and only one observation for age1. However, after merging, the value for age1 repeats. I would like to avoid that, since I want to calculate frequencies per variable and having two observations with the value 12 for ID is not correct.
Hi @ANKH1,
You can reset the age variables to missing by inserting
call missing(ID, of age1-age3);
before the MERGE statement (variable ID is included optionally just to maintain variable order).
However, I wouldn't merge the datasets this way with the intention "to calculate frequencies per variable."
@ANKH1 wrote:
(...) having two observations with the value 12 for ID is not correct.
Note that having an observation with missing age1 for ID 1 is actually not correct either.
Frequencies could be calculated from the original datasets or after interleaving them, e.g., like this:
data ages_all;
set age1(rename=(age1=age) in=in1)
age2(rename=(age2=age) in=in2)
age3(rename=(age3=age) in=in3);
by id;
ds=in1+2*in2+3*in3;
run;
(Instead of RENAME= and IN= dataset options you could use the COALESCE function and the INDSNAME= option of the SET statement, respectively.)
Sorry, I'm not understanding. You have variables AGE1 AGE2 and AGE3, then you talk about merging data sets AGE1 AGE2 and AGE3. That doesn't seem right.
Please show us (a portion of) all the data sets involved here. Please provide data as WORKING data step code (examples and instructions).
Please read my request in my previous message, where I said:
Please show us (a portion of) all the data sets involved here. Please provide data as WORKING data step code (examples and instructions).
Hi @ANKH1,
You can reset the age variables to missing by inserting
call missing(ID, of age1-age3);
before the MERGE statement (variable ID is included optionally just to maintain variable order).
However, I wouldn't merge the datasets this way with the intention "to calculate frequencies per variable."
@ANKH1 wrote:
(...) having two observations with the value 12 for ID is not correct.
Note that having an observation with missing age1 for ID 1 is actually not correct either.
Frequencies could be calculated from the original datasets or after interleaving them, e.g., like this:
data ages_all;
set age1(rename=(age1=age) in=in1)
age2(rename=(age2=age) in=in2)
age3(rename=(age3=age) in=in3);
by id;
ds=in1+2*in2+3*in3;
run;
(Instead of RENAME= and IN= dataset options you could use the COALESCE function and the INDSNAME= option of the SET statement, respectively.)
Thank you so much! The output was what I needed!
Available on demand!
Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Select SAS Training centers are offering in-person courses. View upcoming courses for: