Solved: Merging datasets with different number of observations

ANKH1 · Posted 07-28-2023 12:44 PM

Hi! I want to merge datasets by ID. Each dataset has different ages. I would like to get a merged dataset that looks like this:

ID	age1	age2	age3
1	12	13
1		23
2		25
3	11		22
3			32
3			33
4	10		12
4	15

I used this code

data ages_all;
merge
age1 age2 age3;
by ID;
run;

The problem is that I get this dataset:

ID	age1	age2	age3
1	12	13
1	12	23
2		25
3	11		22
3	11		32
3	11		33
4	10		12
4	15		12

As you can see, for example for ID=1, it has two different observations for age2 and only one observation for age1. However, after merging, the value for age1 repeats. I would like to avoid that, since I want to calculate frequencies per variable and having two observations with the value 12 for ID is not correct.

FreelanceReinh · Posted 07-28-2023 01:49 PM

Hi @ANKH1,

You can reset the age variables to missing by inserting

call missing(ID, of age1-age3);

before the MERGE statement (variable ID is included optionally just to maintain variable order).

However, I wouldn't merge the datasets this way with the intention "to calculate frequencies per variable."

@ANKH1 wrote:

(...) having two observations with the value 12 for ID is not correct.

Note that having an observation with missing age1 for ID 1 is actually not correct either.

Frequencies could be calculated from the original datasets or after interleaving them, e.g., like this:

data ages_all;
set age1(rename=(age1=age) in=in1)
    age2(rename=(age2=age) in=in2)
    age3(rename=(age3=age) in=in3);
by id;
ds=in1+2*in2+3*in3;
run;

(Instead of RENAME= and IN= dataset options you could use the COALESCE function and the INDSNAME= option of the SET statement, respectively.)

View solution in original post

PaigeMiller · Posted 07-28-2023 12:52 PM

Sorry, I'm not understanding. You have variables AGE1 AGE2 and AGE3, then you talk about merging data sets AGE1 AGE2 and AGE3. That doesn't seem right.

Please show us (a portion of) all the data sets involved here. Please provide data as WORKING data step code (examples and instructions).

--
Paige Miller

ANKH1 · Posted 07-28-2023 01:01 PM

Hi! Sorry, the datasets names are different. I am aware of that. Just changed the names to publish here.

PaigeMiller · Posted 07-28-2023 01:06 PM

Please read my request in my previous message, where I said:

Please show us (a portion of) all the data sets involved here. Please provide data as WORKING data step code (examples and instructions).

--
Paige Miller

FreelanceReinh · Posted 07-28-2023 01:49 PM

Hi @ANKH1,

You can reset the age variables to missing by inserting

call missing(ID, of age1-age3);

before the MERGE statement (variable ID is included optionally just to maintain variable order).

However, I wouldn't merge the datasets this way with the intention "to calculate frequencies per variable."

@ANKH1 wrote:

(...) having two observations with the value 12 for ID is not correct.

Note that having an observation with missing age1 for ID 1 is actually not correct either.

Frequencies could be calculated from the original datasets or after interleaving them, e.g., like this:

data ages_all;
set age1(rename=(age1=age) in=in1)
    age2(rename=(age2=age) in=in2)
    age3(rename=(age3=age) in=in3);
by id;
ds=in1+2*in2+3*in3;
run;

(Instead of RENAME= and IN= dataset options you could use the COALESCE function and the INDSNAME= option of the SET statement, respectively.)

ANKH1 · Posted 07-28-2023 06:22 PM

Thank you so much! The output was what I needed!

Merging datasets with different number of observations

Re: Merging datasets with different number of observations

Re: Merging datasets with different number of observations

Re: Merging datasets with different number of observations

Re: Merging datasets with different number of observations

Re: Merging datasets with different number of observations

Re: Merging datasets with different number of observations

Ready to join fellow brilliant minds for the SAS Hackathon?

Classroom Training Available!