Newbie here.
I have 2 datasets I'm trying to combine, keeping only those observations from the 2nd dataset:
Dataset1 - a person month file, this dataset holds all my variables and each observation for ID/PERSON is repeated for the MONTH=1 to 12
ID PERSON MONTH VAR1
1 1 1 2
1 1 2 3
1 1 3 7
2 1 1 2
Dataset2 - a person file
ID PERSON VAR3
I want to merge the 2 datasets, so the final dataset will only keep those observations from Dataset2, but pull in all variables from Dataset1, as well as those observations repeated in the MONTH varaible.
Is there an easy way to do this?
I've tried the following, but it's not working as expected:
proc sort data=Dataset1;
by ID PERSON; run;
proc sort data=Dataset2;
by ID PERSON; run;
data final;
merge Dataset1 (in=a) Dataset2;
by ID PERSON;
if a;
run;
Any help would be appreciated!
@mitch0011 wrote:
Newbie here.
I have 2 datasets I'm trying to combine, keeping only those observations from the 2nd dataset:
Dataset1 - a person month file, this dataset holds all my variables and each observation for ID/PERSON is repeated for the MONTH=1 to 12
ID PERSON MONTH VAR1
1 1 1 2
1 1 2 3
1 1 3 7
2 1 1 2
Dataset2 - a person file
ID PERSON VAR3
I want to merge the 2 datasets, so the final dataset will only keep those observations from Dataset2, but pull in all variables from Dataset1, as well as those observations repeated in the MONTH varaible.
If you want to keep observations from Dataset2, wouldn't that be:
merge Dataset1 Dataset2 (in=a);
If that's not it, please SHOW US portions of both data sets and SHOW US the desired output.
I would only like to keep the observations from Dataset2 that match to those in Dataset1 by ID and PERSON, so in the example of the datasets below,
ID=2, PERSON=1 would not be included in the finaldataset since it does not exist in Dataset2
Dataset1:
ID PERSON MONTH VAR1
1 1 1 2
1 1 2 3
1 1 3 7
2 1 1 2
2 1 2 2
2 1 3 2
3 1 1 5
3 1 2 5
3 1 3 5
3 2 1 6
3 2 2 6
3 2 3 6
4 1 1 7
4 1 2 7
4 1 3 7
Dataset2:
ID PERSON VAR2
1 1 2
3 1 3
3 2 7
4 1 2
FinalDataset:
ID PERSON MONTH VAR1 VAR2
1 1 1 2 2
1 1 2 3 2
1 1 3 7 2
3 1 1 5 3
3 1 2 5 3
3 1 3 5 3
3 2 1 6 7
3 2 2 6 7
3 2 3 6 7
4 1 1 7 2
4 1 2 7 2
4 1 3 7 2
Thanks. Did you try the suggested change to your code that I made? Does the suggested change to your code that I made? If not, what happens, what is wrong? Show us the actual code that you used.
Thanks, I tried your code suggestion:
proc sort data=Dataset1;
by ID PERSON; run;
proc sort data=Dataset2;
by ID PERSON; run;
data final;
merge Dataset1 Dataset2 (in=a);
by ID PERSON;
if a; run;
But I get a finaldataset of all observations from dataset1. it should only be a subset of those in dataset2 matching on ID/PERSON.
For instance, dataset1 has 1200 person month observations, which is for 100 persons since each one has 12 observations for each MONTH 1-12.
Dataset2 has 50 person observations that identify those records that I want to keep (Just ID/PERSON/VAR2, no MONTH variable).
Final dataset is expected to have 700 observations (50 persons from dataset2 and each of their 12 monthly obs from dataset1)
@mitch0011 wrote:
Thanks, I tried your code suggestion:
proc sort data=Dataset1;
by ID PERSON; run;
proc sort data=Dataset2;
by ID PERSON; run;
data final;
merge Dataset1 Dataset2 (in=a);
by ID PERSON;
if a; run;
But I get a finaldataset of all observations from dataset1. it should only be a subset of those in dataset2 matching on ID/PERSON.
For instance, dataset1 has 1200 person month observations, which is for 100 persons since each one has 12 observations for each MONTH 1-12.
Dataset2 has 50 person observations that identify those records that I want to keep (Just ID/PERSON/VAR2, no MONTH variable).
Final dataset is expected to have 700 observations (50 persons from dataset2 and each of their 12 monthly obs from dataset1)
Which means that your "example" data posted behaves differently than your actual data. So you need to examine your example data and the real data.
Some things to consider: Are the values of the BY variables in the "dataset2" duplicated? That can cause an issue as that means that both data sets have multiple records for the by values and a merge typically does not perform well under that case. Show use the LOG from running the merge step. Copy the text from the log with all messages and note. On the forum open a text box with the </> icon and paste the text from the log.
Hello
What I understand is that you want only those observations from dataset1 and dataset2 where the id and person are common.
For this do the following.
proc sort data=dataset1;
by id person;
run;
proc sort data=dataset2;
by id person;
run;
data final;
merge Dataset1(in=b) Dataset2 (in=a);
by ID PERSON;
if a and b;
run;
The output will be something like this
You wanted this
"ID=2, PERSON=1 would not be included in the finaldataset since it does not exist in Dataset2"
The output does not have id=2 and person=1.
Hope this answers the question.
Save $250 on SAS Innovate and get a free advance copy of the new SAS For Dummies book! Use the code "SASforDummies" to register. Don't miss out, May 6-9, in Orlando, Florida.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.