BookmarkSubscribeRSS Feed
mitch0011
Calcite | Level 5

Newbie here.

 

I have 2 datasets I'm trying to combine, keeping only those observations from the 2nd dataset:

 

Dataset1 - a person month file, this dataset holds all my variables and each observation for ID/PERSON is repeated for the MONTH=1 to 12

ID PERSON MONTH VAR1

1  1               1            2

1  1               2            3

1  1               3            7

2  1               1             2

 

Dataset2 - a person file

ID PERSON VAR3

 

I want to merge the 2 datasets, so the final dataset will only keep those observations from Dataset2, but pull in all variables from Dataset1, as well as those observations repeated in the MONTH varaible.

 

Is there an easy way to do this?

I've tried the following, but it's not working as expected:

 

proc sort data=Dataset1;

by ID PERSON; run;

 

proc sort data=Dataset2;

by ID PERSON; run;

 

data final;

merge Dataset1 (in=a) Dataset2;

by ID PERSON;

if a;

run;

 

 

Any help would be appreciated!

6 REPLIES 6
PaigeMiller
Diamond | Level 26

@mitch0011 wrote:

Newbie here.

 

I have 2 datasets I'm trying to combine, keeping only those observations from the 2nd dataset:

 

Dataset1 - a person month file, this dataset holds all my variables and each observation for ID/PERSON is repeated for the MONTH=1 to 12

ID PERSON MONTH VAR1

1  1               1            2

1  1               2            3

1  1               3            7

2  1               1             2

 

Dataset2 - a person file

ID PERSON VAR3

 

I want to merge the 2 datasets, so the final dataset will only keep those observations from Dataset2, but pull in all variables from Dataset1, as well as those observations repeated in the MONTH varaible.


If you want to keep observations from Dataset2, wouldn't that be:

 

merge Dataset1 Dataset2 (in=a);

If that's not it, please SHOW US portions of both data sets and SHOW US the desired output.

 

--
Paige Miller
mitch0011
Calcite | Level 5

I would only like to keep the observations from Dataset2 that match to those in Dataset1 by ID and PERSON, so in the example of the datasets below,

ID=2, PERSON=1 would not be included in the finaldataset since it does not exist in Dataset2

 

Dataset1:

ID PERSON MONTH VAR1

1   1              1            2

1   1              2            3

1   1              3            7

2   1              1            2

2   1              2            2

2   1              3            2

3   1              1            5

3   1              2            5

3   1              3            5

3   2              1            6

3   2              2            6

3   2              3            6

4   1              1            7

4   1              2            7

4   1              3            7

 

Dataset2:

ID PERSON VAR2

1   1              2

3   1              3

3   2              7

4   1              2

 

FinalDataset:

ID PERSON MONTH VAR1 VAR2

1   1              1            2         2

1   1              2            3         2

1   1              3            7         2

3   1              1            5         3

3   1              2            5         3

3   1              3            5         3

3   2              1            6         7

3   2              2            6         7

3   2              3            6         7

4   1              1            7         2

4   1              2            7         2

4   1              3            7         2

 

 

PaigeMiller
Diamond | Level 26

Thanks. Did you try the suggested change to your code that I made? Does the suggested change to your code that I made? If not, what happens, what is wrong? Show us the actual code that you used.

--
Paige Miller
mitch0011
Calcite | Level 5

Thanks, I tried your code suggestion:

proc sort data=Dataset1;

by ID PERSON; run;

proc sort data=Dataset2;

by ID PERSON; run;

 

data final;

merge Dataset1 Dataset2 (in=a);

by ID PERSON;

if a; run;

 

But I get a finaldataset of all observations from dataset1. it should only be a subset of those in dataset2 matching on ID/PERSON.

For instance, dataset1 has 1200 person month observations, which is for 100 persons since each one has 12 observations for each MONTH 1-12.

Dataset2 has 50 person observations that identify those records that I want to keep (Just ID/PERSON/VAR2, no MONTH variable).

Final dataset is expected to have 700 observations (50 persons from dataset2 and each of their 12 monthly obs from dataset1)

 

ballardw
Super User

@mitch0011 wrote:

Thanks, I tried your code suggestion:

proc sort data=Dataset1;

by ID PERSON; run;

proc sort data=Dataset2;

by ID PERSON; run;

 

data final;

merge Dataset1 Dataset2 (in=a);

by ID PERSON;

if a; run;

 

But I get a finaldataset of all observations from dataset1. it should only be a subset of those in dataset2 matching on ID/PERSON.

For instance, dataset1 has 1200 person month observations, which is for 100 persons since each one has 12 observations for each MONTH 1-12.

Dataset2 has 50 person observations that identify those records that I want to keep (Just ID/PERSON/VAR2, no MONTH variable).

Final dataset is expected to have 700 observations (50 persons from dataset2 and each of their 12 monthly obs from dataset1)

 


Which means that your "example" data posted behaves differently than your actual data. So you need to examine your example data and the real data.

Some things to consider: Are the values of the BY variables in the "dataset2" duplicated? That can cause an issue as that means that both data sets have multiple records for the by values and a merge typically does not perform well under that case. Show use the LOG from running the merge step. Copy the text from the log with all messages and note. On the forum open a text box with the </> icon and paste the text from the log.

Sajid01
Meteorite | Level 14

Hello
What I understand is that you want only those  observations from dataset1 and dataset2 where the id and person are common.
For this do the following.

proc sort data=dataset1;
by id person;
run;
proc sort data=dataset2;
by id person;
run;

data final;
merge Dataset1(in=b) Dataset2 (in=a);
by ID PERSON;
if a and b; 
run;

The output will be something like this

Screenshot 2021-05-11 2.59.28 PM.png

You wanted this

"ID=2, PERSON=1 would not be included in the finaldataset since it does not exist in Dataset2"
The output does not have id=2 and person=1.
Hope this answers the question.

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 6 replies
  • 1798 views
  • 0 likes
  • 4 in conversation