Solved: Re: Keeping only observations that matches another dataset

mmh · Posted 01-12-2022 08:38 AM

Dear altruists,
I have the following SAS datasets:

data data1;
input Company_ID$ Fiscal_Year Fiscal_Quarter Announcement : yymmdd10.;
format Announcement yymmdd10.;
cards;
1001 2001 2 2001-08-31
1001 2001 3 2001-11-30
2002 2001 3 2001-07-31
2002 2001 4 2001-10-31
2002 2002 1 2002-01-31
3003 2001 4 2001-10-31
3003 2002 1 2002-01-31
4004 2004 4 2004-12-31
4004 2005 1 2005-04-02
5005 2001 1 2001-01-31
;
run;

data data2;
input Company_ID$ Company_Name$ Year_End_Month$;
cards;
2002 Google December
3003 Amazon June
5005 Apple March
;
run;

I want to keep only those observations in data1 that matches data2.
However, when matching the observations, I also want to include the additional columns that are in data2.

Essentially, I am looking for the following output:

Company_ID	Fiscal_Year	Fiscal_Quarter	Announcement	Company_Name	Year_End_Month
2002	2001	3	2001-07-31	Google	December
2002	2001	4	2001-10-31	Google	December
2002	2002	1	2002-01-31	Google	December
3003	2001	4	2001-10-31	Amazon	June
3003	2002	1	2002-01-31	Amazon	June
5005	2001	1	2001-01-31	Apple	March

I have tried the following code, but it does not seem to work:

proc sort data=work.data1;
by Company_ID;
run;

proc sort data=work.data2;
by Company_ID;
run;

data work.filtered_data;
merge work.data1 (in=a) work.data2 (in=b);
by Company_ID;
if b and not a;
run;

(Edit: changed the code a bit, but still does not show any result)

Requesting your kind support 🙂

PaigeMiller · Posted 01-12-2022 09:07 AM

You want: rows in DATA1 that are in DATA2

You got: rows in DATA1 that are not in DATA2

So how does

if b and not a;

match what you are looking for?

When I write out what you want in words (above, first line) there is no NOT in what I have written, there is no NOT in what you want. So does that give you a clue?

--
Paige Miller

View solution in original post

PaigeMiller · Posted 01-12-2022 08:54 AM

When I run your code, that shows the filtered data set as those rows in DATA1 that do NOT match rows in DATA2. That should be your clue as to what is happening, and the clue to how to fix this.

--
Paige Miller

mmh · Posted 01-12-2022 09:03 AM

Hello Paige,
Thank you for your response.
When I try the other way as you mentioned, this does not provide any result at all.

data work.filtered_data;
merge work.data1 (in=a) work.data2 (in=b);
by Company_ID;
if b and not a;
run;

PaigeMiller · Posted 01-12-2022 09:07 AM

You want: rows in DATA1 that are in DATA2

You got: rows in DATA1 that are not in DATA2

So how does

if b and not a;

match what you are looking for?

When I write out what you want in words (above, first line) there is no NOT in what I have written, there is no NOT in what you want. So does that give you a clue?

--
Paige Miller

mmh · Posted 01-12-2022 09:10 AM

Got it 😄
Thanks a lot!

Keeping only observations that matches another dataset

Re: Keeping only observations that matches another dataset

Re: Keeping only observations that matches another dataset

Re: Keeping only observations that matches another dataset

Re: Keeping only observations that matches another dataset

Re: Keeping only observations that matches another dataset

SAS Innovate 2025: Save the Date