Quartz | Level 8

## Code to compare how many account_id in one dataset are in another dataset

Hi,

I have one dataset (dataset 1) with 10,000 observations and another dataset (dataset 2) with 150,000 observations. What code could be used so that two output datasets are created when comparing account_id between two datasets?

1. In one output dataset (account_match) to be created, there is a list of all of the account_id (and also select all of the corresponding variables associated with that account_id in the output dataset) from dataset 1 that are also in dataset 2 (i.e. where there the account_id in dataset 1 is also in dataset 2)?

2. The second output dataset to be created, account_no_match, would output all of the account_id in dataset 1 that are not seen in dataset 2 (and also select all of the corresponding variables associated with that account_id in the output dataset) .

As a result, number of accounts in account_match + number of accounts in account_no_match = 10,000 observations (i.e. dataset 1)

Code:

data account_match account_no_match;

[insert code here]

run;

1 ACCEPTED SOLUTION

Accepted Solutions
SAS Super FREQ

## Re: Code to compare how many account_id in one dataset are in another dataset

``````data account_1;
input account_id \$;
datalines;
1
2
3
4
5
6
7
8
;

data account_2;
input account_id \$;
datalines;
14
5
1
2
99
98
93
;

proc sort data=account_1;
by account_id;
run;

proc sort data=account_2;
by account_id;
run;

data account_match account_no_match;
merge account_1 (in=a)
account_2 (in=b);
by account_id;
if a and b then output account_match;
else if a and not b then output account_no_match;
run;``````

Not sure if this is what you're trying to get at.

6 REPLIES 6
SAS Super FREQ

## Re: Code to compare how many account_id in one dataset are in another dataset

``````data account_1;
input account_id \$;
datalines;
1
2
3
4
5
6
7
8
;

data account_2;
input account_id \$;
datalines;
14
5
1
2
99
98
93
;

proc sort data=account_1;
by account_id;
run;

proc sort data=account_2;
by account_id;
run;

data account_match account_no_match;
merge account_1 (in=a)
account_2 (in=b);
by account_id;
if a and b then output account_match;
else if a and not b then output account_no_match;
run;``````

Not sure if this is what you're trying to get at.

Quartz | Level 8

Thanks a lot!
Quartz | Level 8

## Re: Code to compare how many account_id in one dataset are in another dataset

If I only wanted variables from dataset "account_1" to be included in both output datasets (account_match and account_no_match), how could I change your code?

``````data account_match account_no_match;
merge account_1 (in=a)
account_2 (in=b);
by account_id;
if a and b then output account_match;
else if a and not b then output account_no_match;
run;``````

SAS Super FREQ

## Re: Code to compare how many account_id in one dataset are in another dataset

You want the same results for both data sets? That's how I'm reading your question. If so:

``````data account_match account_no_match;
merge account_1 (in=a)
account_2 (in=b);
by account_id;
if a and b then output account_match account_no_match;
run;``````

I don't think this is what you're trying to convey though. You should post a snapshot of your data.

Opal | Level 21

## Re: Code to compare how many account_id in one dataset are in another dataset

@jeremy4 wrote:

If I only wanted variables from dataset "account_1" to be included in both output datasets (account_match and account_no_match), how could I change your code?

``````data account_match account_no_match;
merge account_1 (in=a)
account_2 (in=b);
by account_id;
if a and b then output account_match;
else if a and not b then output account_no_match;
run;``````

Only read account_id from table account_2.

``....account_2 (in=b keep=account_id);` `

Opal | Level 21

## Re: Code to compare how many account_id in one dataset are in another dataset

Here two other options how to do this.

``````data account_1;
input account_id \$;
some_other_var=_n_;
datalines;
1
2
3
4
5
6
7
8
;

data account_2;
input account_id \$;
datalines;
14
5
1
2
99
98
93
;

/* options 1 */
data match no_match;
if _n_=1 then
do;
dcl hash h1(dataset:'account_2(keep=account_id)');
h1.defineKey('account_id');
h1.defineDone();
end;
set account_1;
if h1.check()=0 then output match;
else output no_match;
run;

/* option 2 */
proc sql;
create table match_noMatch as
select
t1.*,
case when missing(t2.account_id) then '0' else '1' end as match_flg
from
account_1 as t1
left join
(select distinct account_id from account_2) as t2
on t1.account_id=t2.account_id
;
quit;

``````
Discussion stats
• 6 replies
• 817 views
• 0 likes
• 3 in conversation