Re: Code to output both matching and non-matching account_id for two d...

jeremy4 · Posted 09-25-2019 09:13 AM

I have one dataset (dataset 1) with 10,000 observations and another dataset (dataset 2) with 150,000 observations. What code could be used so that two output datasets are created when comparing account_id between two datasets?

1. In one output dataset (account_match) to be created, there is a list of all of the account_id (and also select all of the corresponding variables associated with that account_id in the output dataset) from dataset 1 that are also in dataset 2 (i.e. where there the account_id in dataset 1 is also in dataset 2)?

2. The second output dataset to be created, account_no_match, would output all of the account_id in dataset 1 that are notseen in dataset 2 (and also select all of the corresponding variables associated with that account_id in the output dataset) .

As a result, number of accounts in account_match + number of accounts in account_no_match = 10,000 observations (i.e. dataset 1)

Code:

data account_match account_no_match;

[insert code here]

run;

Reeza · Posted 09-25-2019 11:30 AM

data in1 in2 matched;

merge t1(in=in_t1) t2(in=in_t2);
by account_id;

if in_t1 and not in_t2 then output in1;
else if in_t2 and not in_t1 then output in2;
else output matched;

run;

Use a MERGE and IN data set option to control the output.

IN1 - all records only in T1 dataset

IN2 - all records only in the T2 dataset

MATCHED - records in both T1 and T2 datasets

EDIT: fixed per @PGStats comment

https://documentation.sas.com/?docsetId=ledsoptsref&docsetTarget=n1p1o2dsuc465nn198ovwdrj9mvy.htm&do...

@jeremy4 wrote:

I have one dataset (dataset 1) with 10,000 observations and another dataset (dataset 2) with 150,000 observations. What code could be used so that two output datasets are created when comparing account_id between two datasets?

1. In one output dataset (account_match) to be created, there is a list of all of the account_id (and also select all of the corresponding variables associated with that account_id in the output dataset) from dataset 1 that are also in dataset 2 (i.e. where there the account_id in dataset 1 is also in dataset 2)?

2. The second output dataset to be created, account_no_match, would output all of the account_id in dataset 1 that are notseen in dataset 2 (and also select all of the corresponding variables associated with that account_id in the output dataset) .

As a result, number of accounts in account_match + number of accounts in account_no_match = 10,000 observations (i.e. dataset 1)

Code:

data account_match account_no_match;

[insert code here]

run;