Help Merging SAS datasets

mitch0011 · Posted 05-11-2021 12:24 PM

Newbie here.

I have 2 datasets I'm trying to combine, keeping only those observations from the 2nd dataset:

Dataset1 - a person month file, this dataset holds all my variables and each observation for ID/PERSON is repeated for the MONTH=1 to 12

ID PERSON MONTH VAR1

1 1 1 2

1 1 2 3

1 1 3 7

2 1 1 2

Dataset2 - a person file

ID PERSON VAR3

I want to merge the 2 datasets, so the final dataset will only keep those observations from Dataset2, but pull in all variables from Dataset1, as well as those observations repeated in the MONTH varaible.

Is there an easy way to do this?

I've tried the following, but it's not working as expected:

proc sort data=Dataset1;

by ID PERSON; run;

proc sort data=Dataset2;

by ID PERSON; run;

data final;

merge Dataset1 (in=a) Dataset2;

by ID PERSON;

if a;

run;

Any help would be appreciated!

PaigeMiller · Posted 05-11-2021 12:29 PM

@mitch0011 wrote:

Newbie here.

I have 2 datasets I'm trying to combine, keeping only those observations from the 2nd dataset:

Dataset1 - a person month file, this dataset holds all my variables and each observation for ID/PERSON is repeated for the MONTH=1 to 12

ID PERSON MONTH VAR1

1 1               1            2

1 1               2            3

1 1               3            7

2 1               1             2

Dataset2 - a person file

ID PERSON VAR3

I want to merge the 2 datasets, so the final dataset will only keep those observations from Dataset2, but pull in all variables from Dataset1, as well as those observations repeated in the MONTH varaible.

If you want to keep observations from Dataset2, wouldn't that be:

merge Dataset1 Dataset2 (in=a);

If that's not it, please SHOW US portions of both data sets and SHOW US the desired output.

--
Paige Miller

mitch0011 · Posted 05-11-2021 01:35 PM

I would only like to keep the observations from Dataset2 that match to those in Dataset1 by ID and PERSON, so in the example of the datasets below,

ID=2, PERSON=1 would not be included in the finaldataset since it does not exist in Dataset2

Dataset1:

ID PERSON MONTH VAR1

1 1 1 2

1 1 2 3

1 1 3 7

2 1 1 2

2 1 2 2

2 1 3 2

3 1 1 5

3 1 2 5

3 1 3 5

3 2 1 6

3 2 2 6

3 2 3 6

4 1 1 7

4 1 2 7

4 1 3 7

Dataset2:

ID PERSON VAR2

1 1 2

3 1 3

3 2 7

4 1 2

FinalDataset:

ID PERSON MONTH VAR1 VAR2

1 1 1 2 2

1 1 2 3 2

1 1 3 7 2

3 1 1 5 3

3 1 2 5 3

3 1 3 5 3

3 2 1 6 7

3 2 2 6 7

3 2 3 6 7

4 1 1 7 2

4 1 2 7 2

4 1 3 7 2

PaigeMiller · Posted 05-11-2021 01:52 PM

Thanks. Did you try the suggested change to your code that I made? Does the suggested change to your code that I made? If not, what happens, what is wrong? Show us the actual code that you used.

--
Paige Miller

mitch0011 · Posted 05-11-2021 02:15 PM

Thanks, I tried your code suggestion:

proc sort data=Dataset1;

by ID PERSON; run;

proc sort data=Dataset2;

by ID PERSON; run;

data final;

merge Dataset1 Dataset2 (in=a);

by ID PERSON;

if a; run;

But I get a finaldataset of all observations from dataset1. it should only be a subset of those in dataset2 matching on ID/PERSON.

For instance, dataset1 has 1200 person month observations, which is for 100 persons since each one has 12 observations for each MONTH 1-12.

Dataset2 has 50 person observations that identify those records that I want to keep (Just ID/PERSON/VAR2, no MONTH variable).

Final dataset is expected to have 700 observations (50 persons from dataset2 and each of their 12 monthly obs from dataset1)

ballardw · Posted 05-11-2021 03:38 PM

@mitch0011 wrote:

Thanks, I tried your code suggestion:

proc sort data=Dataset1;

by ID PERSON; run;

proc sort data=Dataset2;

by ID PERSON; run;

data final;

merge Dataset1 Dataset2 (in=a);

by ID PERSON;

if a; run;

But I get a finaldataset of all observations from dataset1. it should only be a subset of those in dataset2 matching on ID/PERSON.

For instance, dataset1 has 1200 person month observations, which is for 100 persons since each one has 12 observations for each MONTH 1-12.

Dataset2 has 50 person observations that identify those records that I want to keep (Just ID/PERSON/VAR2, no MONTH variable).

Final dataset is expected to have 700 observations (50 persons from dataset2 and each of their 12 monthly obs from dataset1)

Which means that your "example" data posted behaves differently than your actual data. So you need to examine your example data and the real data.

Some things to consider: Are the values of the BY variables in the "dataset2" duplicated? That can cause an issue as that means that both data sets have multiple records for the by values and a merge typically does not perform well under that case. Show use the LOG from running the merge step. Copy the text from the log with all messages and note. On the forum open a text box with the </> icon and paste the text from the log.

Sajid01 · Posted 05-11-2021 03:02 PM

Hello
What I understand is that you want only those observations from dataset1 and dataset2 where the id and person are common.
For this do the following.

proc sort data=dataset1;
by id person;
run;
proc sort data=dataset2;
by id person;
run;

data final;
merge Dataset1(in=b) Dataset2 (in=a);
by ID PERSON;
if a and b; 
run;

The output will be something like this

Screenshot 2021-05-11 2.59.28 PM.png

You wanted this

"ID=2, PERSON=1 would not be included in the finaldataset since it does not exist in Dataset2"
The output does not have id=2 and person=1.
Hope this answers the question.

Help Merging SAS datasets

Re: Help Merging SAS datasets

Re: Help Merging SAS datasets

Re: Help Merging SAS datasets

Re: Help Merging SAS datasets

Re: Help Merging SAS datasets

Re: Help Merging SAS datasets

Help Merging SAS datasets

Re: Help Merging SAS datasets

Re: Help Merging SAS datasets

Re: Help Merging SAS datasets

Re: Help Merging SAS datasets

Re: Help Merging SAS datasets

Re: Help Merging SAS datasets

Registration is open

SAS Training: Just a Click Away